Tips On Writing Data Migrations in Django Application
Introduction
In a Django application, when schema changes Django automatically generates a migration file for the schema changes. We can write additional migrations to change data.
In this article, we will learn some tips on writing data migrations in Django applications.
Use Management Commands
Applications can register custom actions with manage.py
by creating a file in management/commands
directory of the application. This makes it easy to (re)run and test data migrations.
Here is a management command which migrates the status
column of a Task
model.
from django.core.management.base import BaseCommand from library.tasks import Task class Command(BaseCommand): def handle(self, *args, **options): status_map = { 'valid': 'ACTIVE', 'invalid': 'ERROR', 'unknown': 'UKNOWN', } tasks = Task.objects.all() for tasks in tasks: task.status = status_map[task.status] task.save()
If the migration is included in Django migration files directly, we have to rollback and re-apply the entire migration which becomes cubersome.
Link Data Migrations & Schema Migrations
If a data migration needs to happen before/after a specific schema migration, include the migration command using RunPython in the same schema migration or create seperate schema migration file and add schema migration as a dependency.
def run_migrate_task_status(apps, schema_editor): from library.core.management.commands import migrate_task_status cmd = migrate_task_status.Command() cmd.handle() class Migration(migrations.Migration): dependencies = [ ] operations = [ migrations.RunPython(run_migrate_task_status, RunSQL.noop), ]
Watch Out For DB Queries
When working on a major feature that involves a series of migrations, we have to be careful with data migrations(which use ORM) coming in between schema migrations.
For example, if we write a data migration script and then make schema changes to the same table in one go, then the migration script fails as Django ORM will be in invalid state for that data migration.
To overcome this, we can explicitly select only required fields and process them while ignoring all other fields.
# instead of User.objects.all() # use User.objects.only('id', 'is_active')
As an alternative, we can use raw SQL queries for data migrations.
Conclusion
In this article, we have seen some of the problems which occur during data migrations in Django applications and tips to alleviate them.
Need further help with this? I am available for hire.