Monday 30 August 2010

Multilingual database design approaches

A while ago I needed to create a database that would support storing data in multiple languages e.g. translated product descriptions for customer from different countries. When I started wondering how to reflect this in database schema it came out the answer is not straightforward. While searching Web for the one-and-only best practice i stumbled upon many opinions and approaches. Here are some most popular ones:

  1. Additional columns
    This is the simplest one, it's basically about creating an additional column for each text that needs to be translated e.g.
    CREATE TABLE app_product (
    Id Int IDENTITY NOT NULL,
    Description_en Text,
    Description_pl Text,
    PRIMARY KEY (Id)
    );


    Advantages:
    + simplicity
    + easy querying (no joins required)

    Disadvantages:
    - adding new language support requires schema changes for each table with multilingual content
    - if not all translations are required (e.g. at some places default language should always be used) it may cause redundant data or empty db fields.
    - hard to maintain

  2. Single translations table

    Approach with single translation table seems to be the cleanest one from database structure perspective. You store all texts that need to be translated in a single translation table:
    CREATE TABLE ref_language (
    Code Char(2)NOT NULL,
    Name Varchar(20) NOT NULL,
    PRIMARY KEY (Code)
    );

    CREATE TABLE app_translation (
    Id Int IDENTITY NOT NULL,
    PRIMARY KEY (Id)
    );

    CREATE TABLE app_translation_entry (
    TranslationId Int NOT NULL,
    LanguageCode Char(2) NOT NULL,
    Text Text NOT NULL,
    FOREIGN KEY (TranslationId) REFERENCES app_translation(Id),
    FOREIGN KEY (LanguageCode) REFERENCES ref_language(Code)
    );

    CREATE TABLE app_product (
    Id Int IDENTITY NOT NULL,
    Description Int NOT NULL,
    PRIMARY KEY (Id),
    FOREIGN KEY (Description) REFERENCES app_translation(Id)
    );


    Advantages:
    + adding new languages doesn't require schema changes
    + seems like clean, relational approach
    + all translations in one place (some may say it's a disadvantage because less readable/maintainable)

    Disadvantages:
    - complex querying (multiple joins required to retrieve correct product description)
    - overcomplicated

  3. Additional translation table for each table with multilingual content

    For each table that stores information that may need to be translated an additional table is created. The original table stores only language insensitive data and the new one all translated info:
    CREATE TABLE ref_language (
    Code Char(2)NOT NULL,
    Name Varchar(20) NOT NULL,
    PRIMARY KEY (Code)
    );

    CREATE TABLE app_product (
    Id Int IDENTITY NOT NULL,
    PRIMARY KEY (Id)
    );

    CREATE TABLE app_product_translation (
    ProductId Int NOT NULL,
    LanguageCode Char(2) NOT NULL,
    Description Text NOT NULL,
    FOREIGN KEY (ProductId) REFERENCES app_product(Id),
    FOREIGN KEY (LanguageCode) REFERENCES ref_language(Code)
    );


    Advantages:
    + adding new languages doesn't require schema changes
    + relatively simple querying (1 join required)

    Disadvantages:
    - may double the amount of tables

The 3 examples presented above give us an idea how different approaches may be used here. These are of course not all possible options, just the most popular ones. You can always modify them e.g. by introducing some additional views that would save you writing complex joins direct from your code.
The solution you choose depends mostly on your project requirements. If you need simplicity and are sure that the number of supported languages is small and fixed you could go with option 1. If you require bit more flexibility and can afford a simple join when querying for multilingual data option 3 would be a possible solution.

1 comment:

elgachon said...

Thanks Filip for the post. It has been really helpful.