Change Playbook Data (Safe)
This playbook must be used for any schema/data changes that impact production.
1) Impact Analysis
Before writing migration, answer this:
- Which tables/columns have changed?
- Which endpoint/service is reading/writing that column?
- Are the changes breaking for the running application?
- Estimated data volume for backfill?
- Risk of locking, downtime, or data race?
Concise template:
Perubahan:
Entitas terdampak:
API/modul terdampak:
Potensi breaking:
Strategi mitigasi:
Rollback strategy:
Owner verifikasi:
2) Expand/Contract Migration Strategy
Use this pattern for secure changes without significant downtime.
Expand phase (safe, backward compatible)
- Add new columns/tables.
- Keep old columns working.
- Update the code so that it can read from the old + new path (dual-read if necessary).
- Start writing to the new structure (dual-write if needed).
Backfill Phase
- Fill old data into the new structure gradually (batch/chunk).
- Monitor errors and performance.
- Don't run large backfills during peak hours.
Contract Phase
- After complete verification, stop reading/writing the old path.
- Drop old columns/tables in separate release.
3) Backfill Strategy
Choose a strategy according to volume:
- Small (<100k rows): simple batch + checkpoint.
- Medium (100k-5M): chunk + progress log + retry.
- Large (>5M): scheduled workers + throttle + strict observability.
Backfill principle:
- Idempotent (safe to rerun).
- Can pause/resume.
- Save last checkpoint offset/id.
- Record metrics: processed, success, failed.
4) Rollback Plan
Rollback must be written before deploy:
- Rollback code: safe commit/tag version.
- Rollback scheme: is migration
downsafe? If not, write a manual rollback. - Data rollback: snapshot/backup used + restore procedure.
- Decision gate: when rollback is decided (error rate, data mismatch, latency).
If the schema rollback is high risk, mark requires verification and requires senior approval.
5) Post-Deploy Validation Checklist
- Primary endpoint success (2xx) and no 5xx spikes.
- Error logs related to migration/backfill do not increase significantly.
- Number of old vs new records is consistent (sampling + aggregate).
- New data write path running.
- Dashboard/monitoring does not show severe degradation.
- Business stakeholders confirm critical flow remains normal.
Example verification query (customize table name):
-- 1) Cek data null yang seharusnya sudah terisi setelah backfill
SELECT COUNT(*) AS kosong
FROM target_table
WHERE new_column IS NULL;
-- 2) Cek konsistensi jumlah berdasarkan status
SELECT status, COUNT(*)
FROM target_table
GROUP BY status
ORDER BY status;
-- 3) Sampling silang kolom lama vs baru
SELECT id, old_column, new_column
FROM target_table
WHERE old_column IS NOT NULL
ORDER BY id DESC
LIMIT 50;
6) When to stop deployment
Stop/rollback immediately if:
- error rate increases sharply,
- critical queries timeout many times,
- significant data mismatch,
- major business transactions fail.
7) Note uncertainty
If there are assumptions that have not been validated by the production environment, write clearly: requires verification.