How do you use AI to write data migration and format-conversion scripts?

Question

Accepted Answer

AI is well-suited to **format conversions** (CSV ↔ JSON ↔ SQL) and migration scripts because the transformation rules are mechanical. The trick is giving it enough context to be correct and safe: schemas, a sample row, and explicit safety requirements.

## How to prompt it

- **Give the source and target schema** — column names, types, nullability.
- **Include a real sample row** so it sees the actual shape and edge cases (quotes, nulls, dates).
- **Demand it be idempotent and validated** — re-running shouldn't duplicate or corrupt data; bad rows should be caught, not silently inserted.
- **Always dry-run on a copy** of the data before touching production. Inspect the output first.

## Example: CSV → SQL inserts

```python
import csv

# Source CSV: id,email,signup_date  (signup_date may be blank)
# Target table: users(id INT, email TEXT NOT NULL, signup_date DATE NULL)
def csv_to_sql(path: str) -> list[str]:
    statements = []
    with open(path, newline="", encoding="utf-8") as f:
        for row in csv.DictReader(f):
            email = row["email"].strip()
            if not email:                      # validate: skip invalid rows, don't insert garbage
                continue
            date = row["signup_date"].strip() or None
            email_sql = email.replace("'", "''")   # escape quotes to avoid broken SQL / injection
            date_sql = f"'{date}'" if date else "NULL"
            # ON CONFLICT makes it idempotent: re-running won't create duplicates
            statements.append(
                f"INSERT INTO users (id, email, signup_date) "
                f"VALUES ({int(row['id'])}, '{email_sql}', {date_sql}) "
                f"ON CONFLICT (id) DO NOTHING;"
            )
    return statements
```

The comments mark the parts that matter: **validation** (skip empty emails), **escaping** (quotes), and **idempotency** (`ON CONFLICT DO NOTHING`). Ask AI to include all three — they're the things a naive script forgets.

## Why it matters

Data migrations are high-stakes and often one-shot: a script that double-inserts or drops rows can be expensive to undo. AI accelerates writing the conversion, but the safety properties — idempotency, validation, and a dry-run on a copy — are non-negotiable. Treat the generated script as a draft you must read and test, never as something to run blind against real data.