Tutorial: Using Django's Multiple Database Support



In a recent Django class one of my students posed the problem she was learning Django to tackle - she would be responsible for writing a web-based administrative interface to a database whose structure she wasn't allowed to modify. Can Django do that?

Absolutely - and Django even comes with a management command to bootstrap working with a legacy database. Let's create a brand new project, use a popular sample database as our target, and using Django's multi-db support to store Django's built-in model data in a separate database.

 Installation and setup

For this project I'm using Django 1.3 since that's the version we targeted in the class. All code referenced in the tutorial will be checked into the github project and can be reviewed there. I'm using virtualenv and pip to manage installation of Django, my Python environment, and eventually my third party applications.

$ mkdir dualdb-project
$ cd dualdb-project
$ mkvirtualenv django1.3
(django1.3)$ pip install django==1.3
...
Successfully installed django
Cleaning up...
(django1.3)$ django-admin.py startproject dualdb
(django1.3)$ ls dualdb
(django1.3)$ ls -l dualdb/
total 16
-rw-rw-r-- 1 simeon simeon    0 Aug  1 15:20 __init__.py
-rw-rw-r-- 1 simeon simeon  503 Aug  1 15:20 manage.py
-rw-rw-r-- 1 simeon simeon 5031 Aug  1 15:20 settings.py
-rw-rw-r-- 1 simeon simeon  565 Aug  1 15:20 urls.py

Now that I have a new Django project, I need a sample database to look at. The Chinook project provides a sample reference database in various database formats - think a new version of the old Northwind reference database. I downloaded the Sqlite version for our project from the Chinook project website and dropped it in my new dualdb folder. I'm also going to create a "chinook" app to work in.

(django1.3)$ cd dualdb/
(django1.3)$ chmod u+x manage.py 
(django1.3)$ ./manage.py startapp chinook
(django1.3)$ ls -l chinook
total 12
-rw-rw-r-- 1 simeon simeon   0 Aug  1 15:45 __init__.py
-rw-rw-r-- 1 simeon simeon  57 Aug  1 15:45 models.py
-rw-rw-r-- 1 simeon simeon 383 Aug  1 15:45 tests.py
-rw-rw-r-- 1 simeon simeon  26 Aug  1 15:45 views.py

Multi-database setup

We need two databases: one for django.contrib.auth models and any other custom models we may have and one which is a legacy database whose creation/alteration we won't be managing with Django.

Django has builtin support for multiple databases. Basically we need to configure our settings to know about both our databases and then optionally build a database router that determines how data in particular apps and models ends up in a particular database.

Let's start out editing our settings. I always add a new settings.py variable to calculate the absolute path of my project directory.

import os
DIR = os.path.abspath(os.path.dirname(__file__))

With my DIR variable in place I can edit my settings and add two database configurations to my settings.py.

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.sqlite3', 
        'NAME': os.path.join(DIR, 'django.sqlite3'),
        'USER': '',
        'PASSWORD': '',
        'HOST': '',
        'PORT': '',
    },
    'chinookdb': { # this is our sample db, already created
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(DIR, 'Chinook_Sqlite.sqlite'),
        'USER': '',
        'PASSWORD': '',
        'HOST': '',
        'PORT': '',
    }
}

Using two database configurations will allow us to keep all the django.contrib.auth models in default and we're pointing the chinookdb configuration at the Chinook database we downloaded. Let's see what's in the Chinook database.

Generating models with inspectdb

Django comes with a management command to automatically generate models from from existing databases. The inspectdb command can look at a database and spit out Python code representing the models that would generate the current state of the database. This isn't a silver bullet - inspectdb only works with three database backends and has varying support for particular field types and for foreign keys. If you can use it, however, it will at least save you some typing even if the resulting code needs some hand-editing to really work.

(django1.3)$ ./manage.py inspectdb --database=chinookdb > chinook/models.py
(django1.3)$ head chinook/models.py
# The following is created by running:
# $ python manage.py inspectdb --database=chinookdb > chinook/models.py

# This is an auto-generated Django model module.
# You'll have to do the following manually to clean this up:
#     * Rearrange models' order
#     * Make sure each model has one field with primary_key=True
# Feel free to rename the models, but don't rename db_table values or field names.
#
# Also note: You'll have to insert the output of 'django-admin.py sqlcustom [appname]'
# into your database.

from django.db import models

class Album(models.Model):
    albumid = models.IntegerField(primary_key=True, db_column=u'AlbumId') # Field name made lowercase.
    title = models.TextField(db_column=u'Title') # Field name made lowercase. This field type is a guess.
    artistid = models.IntegerField(db_column=u'ArtistId') # Field name made lowercase.
    class Meta:
        db_table = u'Album'

class Artist(models.Model):
    artistid = models.IntegerField(primary_key=True, db_column=u'ArtistId') # Field name made lowercase.
    ...

The full generated code is available on github - but we can already see that while the output is useful it isn't perfect. The inspectdb command didn't figure out that the Album.artistid field should be a foreignkey relationship to the Artist table. The Album.title field has a comment "This field type is a guess" - and looking at the underlying database reveals an nvchar(160) column. This should properly be

    title = models.CharField(max_length=160) 

it will take a little work to improve the accuracy of the mapping between our generated models and the database tables. Let's go ahead and explore how we can control which database our models communicate with.

Database routers

Django provides control over multi-db setups in several ways. Many management commands like dbshell take a --database flag that allows you to supply the database configuration name. Queryset operations allow you to chain using onto a query to specify a database other than default and other queryset methods like save and delete also allow you to specify the database to operate on. The more general way, however, is to define a custom database router and add it to your settings.py.

As described in the django docs database routers are simply Pythnon classes that may implement four methods. The db_for_read/db_for_write methods are passed a model and should return a database configuration name to read or write from for that particular model. The allow_relation method takes two models and allows you to determine if a relation is allowed. Django does not support relationships across databases so if our router is customizing which databases a model lives in, it must also provide the logic to disallow relationships between models that live in different databases. The allow_syncdb method allows us to restrict the operation of the syncdb management command. I created the following router to meet our requirements:

class ChinookRouter(object): 
    def db_for_read(self, model, **hints):
        "Point all operations on chinook models to 'chinookdb'"
        if model._meta.app_label == 'chinook':
            return 'chinookdb'
        return 'default'

    def db_for_write(self, model, **hints):
        "Point all operations on chinook models to 'chinookdb'"
        if model._meta.app_label == 'chinook':
            return 'chinookdb'
        return 'default'
    
    def allow_relation(self, obj1, obj2, **hints):
        "Allow any relation if a both models in chinook app"
        if obj1._meta.app_label == 'chinook' and obj2._meta.app_label == 'chinook':
            return True
        # Allow if neither is chinook app
        elif 'chinook' not in [obj1._meta.app_label, obj2._meta.app_label]: 
            return True
        return False
    
    def allow_syncdb(self, db, model):
        if db == 'chinookdb' or model._meta.app_label == "chinook":
            return False # we're not using syncdb on our legacy database
        else: # but all other models/databases are fine
            return True

The comments explain each the logic of each of the methods: the read/write methods make sure the auto-generated chinook read and write from the chinookdb database while all the other models read and write to the default database. The allow_relation method prevents relationships across databases and the allow_syncdb method disallows the syncdb management command on models that live in our chinookdb database, ensuring that syncdb never affects the structure of our legacy database.

To use our router we need to add it to our settings.py file.

DATABASE_ROUTERS = ['chinook.router.ChinookRouter']

Adding an admin and exploring the models

Now that we have our models created and our router is used to connect models to a particular database, let's use the contrib admin app to browse our legacy database. We won't customize our admin yet so I'll just loop over the generated models and register each one with the admin model. I've already added the admin app to my installed apps in settings and uncommented the admin url and admin.autodiscover() lines in my urls.py. I added an admin.py file in my chinook application:

from chinook import models as chinook_models
from django.contrib import admin
from django.db.models.base import ModelBase

# Very hacky!
for name, var in chinook_models.__dict__.items():
    if type(var) is ModelBase:
        admin.site.register(var)

and can now login to the admin and see each of my registered models.

Admin Screenshot

Admin Screenshot

Clicking on one of the Chinook models shows some limitations in the models - individual models don't have a __unicode__ method and we haven't customized the ModelAdmin class for each model so the listing screen is uninformative

Admin Screenshot

Admin Screenshot

and the individual admin screens have multiple errors: the primary key is editable, the ForeignKey wasn't recognized so it shows up as an IntegerField in the admin, and some field types that should be limited in size show up as textareas

Admin Screenshot

Admin Screenshot

We can fix most of these problems by editing our models. Fixing the auto-generated models is mostly a matter of making the same changes to each model - changing IntegerField to ForeignKey, changing TextField to CharField(max_length=160) adding __unicode__ methods to each model, etc. To make my models better I'll need to be able to interrogate my database about the structure of the tables like:

(django1.3)$ ./manage.py dbshell --database=chinookdb
SQLite version 3.7.9 2011-11-01 00:52:41
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .schema Album
CREATE TABLE [Album]
(
    [AlbumId] INTEGER  NOT NULL,
    [Title] NVARCHAR(160)  NOT NULL,
    [ArtistId] INTEGER  NOT NULL,
    CONSTRAINT [PK_Album] PRIMARY KEY  ([AlbumId]),
    FOREIGN KEY ([ArtistId]) REFERENCES [Artist] ([ArtistId]) 
                ON DELETE NO ACTION ON UPDATE NO ACTION
);
CREATE INDEX [IFK_AlbumArtistId] ON [Album] ([ArtistId]);
CREATE UNIQUE INDEX [IPK_Album] ON [Album]([AlbumId]);
sqlite> 

By looking at the database directly I can see that ArtistId is supposed to be a ForeignKey and that Title is supposed to have a maximum of 160 characters. I'll go ahead and modify my models.py to more accurately reflect the database.

Editing the Models

I spent about 30 minutes tweaking my models and you can see the diff on github - mostly I converted TextFields (looking up the field length in the database) to CharFields and cleaned up some field and class names to meet Django's conventions. I added __unicode__ methods to the model classes to give them a string representation in the console and in the Django admin. I also changed the IntegerField to ForeignKey where the database had relationships between tables - and it was here that I ran into a problem.

Our database has a table - PlaylistTrack that is merely a many-to-many relationship table. Complete records in PlaylistTrack consist of a foreignkey pointing to Playlist and a foreignkey pointing to Track. In Django models we can represent this with a ManyToMany field. Unlike most field types ManyToMany fields don't actually correspond to a database column, they tell Django to build an intermediary table to capture the connections between two models.

class PlaylistTrack(models.Model):
    playlist = models.ForeignKey("Playlist", db_column=u'PlaylistId')
    track = models.ForeignKey("Track", db_column=u'TrackId')
    class Meta:
        db_table = u'PlaylistTrack'
        unique_together = ('playlist', 'track')

    def __unicode__(self):
        return u"Playlist %s: %s" % (self.playlist, self.track)


class Track(models.Model):
    id = models.AutoField(primary_key=True, db_column=u'TrackId')
    playlist = models.ManyToManyField(Playlist, through=PlaylistTrack)
    ... snip ...

This almost works - but the Chinook database sensibly created a relationship table that has a multi-column primary key. The PlaylistTrack table has no single id column that serves as the primary key. However the Django ORM does not support multi-column keys and without a single primary key on the PlaylistTrack table the ORM has difficulty properly following the relationships between models.

I used sql in the sqlite shell to create a new table PlaylistTrack2 table just like PlayListTrack but with an additional id field that serves as a primary key. I copied all the records from the the original table to the new one, dropped the old table, and renamed PlaylistTrack2 to PlaylistTrack.

Obviously we've failed at our original mission - we've been forced to modify the structure of our database. This is the only change we've had to make, however, so we might be able to request this simple change to the database structure.

Let's take a look at our models. The image below (larger version) was created with the graph_models management command from the django-extensions app.

Chinook Models

Chinook Models

The graph_models command also requires the pygraphviz library to generate the images. YMMV but the following worked for me on Ubuntu:

sudo apt-get install libgraphviz-dev
pip install pygraphviz
python manage.py graph_models -a -g -o models.png

The image allows us to scan the graphical representation of our models and confirm that we've got the relationships set up right. We haven't done anything complicated yet with our admin (we should probably make liberal use of inlines) but at this point have a Django project that left a legacy database (nearly) unchanged and used Django's multiple-database support to keep our administrative data out of our legacy database. As it exists, the admin is a useful tool to edit our data and a few hours of tinkering would allow us to quickly develop a fairly powerful CRUD interface to our database.

Published August 14, 2012