Build multi-lingual database
- Tram Ho
In this globalization era, companies – including software developers – are always interested in expanding into new markets. This usually means localizing their products for different areas. In this article, I will introduce a few ways to handle this problem. Specifically with content management in multiple languages.
What is localization?
Localization is a target of many companies in commerce, service, … or even it is called with the phrase “preserving identity” on television channels. The most specific example you can see is chinatown, but these are not in the realm of your writing and knowledge. Should be shortened to localization in IT services as follows:
- Localization is the process of adapting a product to different markets. That is an outstanding factor to achieve maximum market share in product sales. When localization is done correctly, users will feel that the product is produced for their language, culture and needs.
- Where English is not a popular language, surveys have proven that the local language is always a priority for a software product.
Ways to build databases serving Localization.
- To do this article I need 2 things that are the idea of implementing and a demo about database to express it in the most effective way.
- Suppose we are asked to design a data model for a multilingual e-commerce application. We need to store fields like
product_name
,product
tabledescription
andproduct
descriptions in many different languages. We also need to store the related fields of thecustomer
table in all languages. A product can have multiple authors and vice versa. - In order to better understand the ways to build below the end of the ideas I will make their advantages and disadvantages so that we can have a more general view and the choice to use in any case is the most reasonable.
Idea 1: Add a column of each language that needs support for the fields to change
This is the simplest approach in terms of development. It can be done by adding a language column for each field.
Advantages
- It is very easy to implement
- There is no complexity in writing SQL to get basic data in any language. Suppose I want to write a query to get detailed information about products and customers for a specific order in Vietnamese.
1 2 3 4 | Select p.product_name_VI, p.description_VI, p.price, c.name_VI, c.address_VI, c.contact_name from customer_product cp, product p, customer c Where cp.product_id = p.id and cp.customer_id = c.id And id = <order number>; |
Defect
- Poor scalability: every time a new language is added, dozens of additional columns need to be added between the tables to easily cause database bulge.
- It takes time to support too many languages.
- Do you realize that with each language you have a different query ??
- So I wrote a store to perform the most common for this processing. Its content is as follows:
1 2 3 4 5 6 7 8 9 10 11 | SELECT CASE @in_language WHEN 'VI' THEN p.product_name_VI DEFAULT THEN p.product_name_EN, p.price, CASE @in_language WHEN 'VI' THEN c.name_VI DEFAULT THEN c.name_EN, c.contact_name FROM customer_product cp, product p, customer c WHERE cp.product_id = p.id AND cp.customer_id = c.id AND id = <order number>; |
- Do you realize that in building your database there is a very important thing but can’t support many countries? That is the face value.
- We cannot sell in every country but only insist that I want you to pay in USD.
- But please ignore this problem in another article, I will fly more about how to create payment database between countries later.
Idea 2: Create a separate table for translated text
- This method is based on creating a translation text store for all languages. Since then, the database model has not changed too much compared to the original structure.
- In addition to saving the translated text directly, we can save the translated documents.
Advantages
- When your data model is complete and running on production, this is very reasonable and it is very accessible.
- Just like
Ý tưởng 1
wants to add a language you need to add fields to the table but this time just add 1 unique field to the**translationb**
table. - When the original text is the same on the tables, there is no excess translation text.
For example, assume the customer name and product name are identical. In this case, only one record will be inserted into the translation table and the same record is referenced in both the customer
and product
tables.
Defect
- It still requires you to change the database model.
- 1 row will appear lots of
null
fields. For example, if you design the above, each1 row
will have at most2
null
fields. so if you have1000
records for supported languages, then you can simply have2000
null fields in that table. - Increased complexity and performance time will also be proportional to each other.
The query sentence to use it is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | SELECT CASE @in_language WHEN 'VI' THEN tp.text_VI WHEN 'JP' THEN tp.text_JP WHEN 'KR' THEN tp.text_KR DEFAULT THEN p.product_name_EN, p.price, CASE @in_language WHEN 'VI' THEN tp.text_VI WHEN 'JP' THEN tp.text_JP WHEN 'KR' THEN tp.text_KR DEFAULT THEN c.name_EN, c.contact_name FROM customer_product cp, product p, customer c, translation tp, translation tc WHERE cp.product_id = p.id AND cp.customer_id = c.id AND p.name_translation_id = tp.id AND c.name_translation_id = tc.id AND id = <order number>; |
Add 1 more observant you will immediately recognize our translation
table that always increases n row
in one insert in which n is the number of tables to translate. so I redesigned a bit so that the query would be faster and the translation
table would be better.
As for the query, please refer to the above sentence and write it yourself.
Idea 3: Translation by row
- Similar to the Idea 2 approach, but this idea will solve the idea of two big problems1, All languages refer to 1 table2, 1 more language will change the structure of the database
So we just need to add 1 language table to include the fields {id, name}
are okay.
Advantages
- This deployment allows developers to write SQL and access data quite easily.
- It’s easy to use ORM for this model
- There is no need to change the data model when you add new languages; Just insert the logs for the new language
- There is no need to worry about excess memory when there are not many
null
and a collection of other language fields as before - And the query is really cute.
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT tp.text, p.price, tc.text, c.contact_name FROM customer_product cp, product p, customer c, translation tp, translation tc, language l WHERE cp.product_id = p.id AND cp.customer_id = c.id AND p.name_translation_id = tp.id AND c.name_translation_id = tc.id AND tp.language_id = l.id AND tc.language_id = l.id AND l.name = @in_language AND id = <order number>; |
Defect
- Do you know about
ddos
attacks? implementation like this will result in all user requests focusing on a single table. - And similarly when you have only 1
translation
table, the thousands of articles translated with different languages recorded at the same time can cause bottlenecks and server crashes. - However nothing is irreparable. Combine with Idea 2b and we’ll have one more design later.
As for the query, you can write it yourself.
Idea 4: Turn the fields to be translated into a separate table
In this solution, entity tables contain one or more translated fields divided into two classes: one for the translated fields and one for the untranslated fields. In this way, we can create separate classes for each table.
Advantages
- You don’t need to join
translation
tables if you only need fields that don’t translate. Therefore, it will be more effective when taking non-translation fields. - Easy to use for ORM.
- Access is easier.
Query
1 2 3 4 5 6 7 | SELECT pt.product_name, pt.description, p.price FROM customer_product cp, product p, product_translation pt, language l WHERE cp.product_id = p.id AND AND p.id = pt.product_non_trans_id AND pt.language_id = l.id AND l.name = @in_language AND id = <order number>; |
Finally I have 2 ideas but very rarely used but you can refer to them.
Idea 5:
- This idea comes from the fact that in large projects we often split back end and front end into 2 parts now BE only write API for FE to use. so from there, we can convert the storage type of the data into a string with a stored json format.
For example:
1 2 3 4 | { EN: "Hello", VI: "Hello" } |
Advantages:
- Easy to take and use.
- Do not break the database structure.
- It is very easy to use with ORM.
Defect
- Not necessarily a downside, but storing multiple text into a field like this will often cause it to fill up the field. But if you have used ckeditor then you probably no longer think like this.
Idea 6: Multiple database
- If you use the config file to establish connect, it is very easy to write connect for using and separating DB for each country.
- You can save
multi tenant
.
Conclude
It’s too long to write, so no more conclusions. Although I know there are very important things or use in multiple languages like DateTime. But it’s too long.
The article is referenced from here .
Thanks for reading
Choose the type of database that is suitable for your project – easy to think!