9 most common database design errors

Tram Ho

Perhaps you made some mistakes when you designed your database at the time you started your career. Maybe you are still creating them or you will create some of them in the future. We can’t go back in time and undo our mistakes, but we can avoid future “headaches” (or present problems).

This article can save you many hours spent on troubleshooting design and code problems, so stay focused. Let’s divide the list of errors into two main groups: non-technical issues and technical issues. Both groups are an important part of database design.

Obviously, if you do not have the technical skills, you will not know what to do. It is not surprising to see these errors in the list. non-technical skills, people may forget them, but these skills are also a very important part of the design process, they add value to your code and relate to them. real technological problems that you need to solve.

So first start with non-technical issues, then move on to technical issues.

Error of non-technical database design

1. Poor planning

This is definitely a non-technical issue, but it is a major and common one. We all get excited when a new project begins and, going into it, everything is great. In the beginning, the project was still a blank page and you and your clients were happy to start working on something that would create a better future for both. All this is great now, and a great future is likely to be the end result. But we need to focus. This is part of a project where we can make serious mistakes.

Before you sit down to draw a data model, you need to make sure that:

  • You are fully aware of what your customers do (i.e. their business plans related to this project and their overall picture) and what they want the project to achieve right now. now and in the future.
  • You understand the business process and when needed, you are ready to make suggestions to simplify and improve it (for example, to increase efficiency and income, reduce costs and working hours, etc.).
  • You understand the data flow in a customer’s company. Ideally, you know all the details: who works with the data, who makes the changes, what reports are needed, when and why all of this happens.
  • You can use the language / term your customers use. Although you may or may not be an expert in their field, your customers certainly are. Ask them to explain what you do not understand. And when you explain the specifications to customers, use the language and terminology they understand.
  • You know what technologies you will use, from database tools and programming languages ​​to other tools. What you decide to use is closely related to the problem you are going to solve, but it is important to include customer options and their current IT infrastructure.

During the planning phase, you will get answers to the following questions:

  • Which tables will be the center table in your model? You may have some of them, while others will be some common tables (eg user_account, role). Do not forget the dictionary and the relationship between tables.
  • What names will be used for the tables in the model? Remember to keep the term similar to whatever the customer is currently using.
  • What rules will apply when naming tables and other objects?
  • How long will the whole project take? This is very important, both on your schedule and the customer’s timeline.
  • Only when you have all these answers are you ready to share an original solution to the problem. That solution doesn’t need to be a complete application – be it a short document or even a few sentences in the customer’s business language.

Good plans are not specific to the data model; It is applicable to almost every IT (and not IT) project. Skipping is only an option if 1) you have a really small project; 2) clear missions and goals and 3) you’re in a hurry. A historical example is the Sputnik 1 boot engineers giving verbal instructions to the technicians assembling it. The project was in a hurry to believe that the United States is planning to launch its own satellite soon – but I guess you won’t be in such a hurry.

2. Communication between customer and development team is not good

When you start the database design process, you will probably understand most of the main requirements. Some requirements are very common regardless of the project, such as user role and status. On the other hand, some of the tables in your model will be quite specific. For example, if you are building a model for a taxi company, you will have tables for vehicles, drivers, customers, etc.

However, not everything will be clear when starting a project. You may misunderstand some requirements, customers can add some new functions, you will see that something can be done differently, the process may change, and so on. This causes changes in the model. Most changes require the addition of a new table, but sometimes you will delete or modify the table. If you have started writing code that uses these tables, you will also need to rewrite that code.

To reduce the time spent on unexpected changes, you should:

  • Talk to the development team and customers and don’t be afraid to ask important business questions. When you think you’re ready to start, ask yourself if situation X is covered in our database? Customer is currently following this way Y; Do we expect a change in the near future? When we are confident that our model is capable of storing everything we need in the right way, we can start coding.
  • If you face a major design change and you already have a lot of code written, you shouldn’t try hot fix. Do it as it was done, regardless of the current situation. A hot fix may save some time now and will probably work well for a while, but it can turn into a real nightmare later.
  • If you think something is fine right now but might become a problem later, don’t overlook it. Analyze that area and make changes if they will improve the quality and performance of the system. It will take some time, but you will provide a much better product and sleep better.
  • If you try to avoid making changes in your data model when you encounter a potential problem – or if you choose a quick fix instead of doing it right – you will pay for it sooner or later. late.

Also, keep in touch with customers and the development team throughout the project. Always check and see if any changes have been made since the last discussion.

3. Poor documentation

For most of us, documentation will be finalized at the end of the project. If it is well organized, maybe we have recorded everything during the project implementation and we will just need to repack everything. But to be honest, that’s often not what we usually do. Document writing is done right before the project is over – and right after we finish the work with that data model!

The cost of a poorly documented project can be quite high, several times higher than the price we pay to properly record everything. Imagine you found an error a few months after you closed the project. Because you don’t have the right documentation, you don’t know where to start.

When you’re working, don’t forget to write a comment. Explain everything that needs more explanation, and basically write down everything you think will be helpful one day. You never know when you’ll need that extra information.

Database design error technical

4. Do not use naming conventions

You never know for sure how long a project will last, and if you have more than one person working on a data model, naming objects in a database is a common convention. It is really necessary that it will help the project’s people understand the things that you and others created when designing the database. You can try to start building common sense by answering the following questions:

  • Is the table name singular or plural?
  • Will we group the tables using names? (For example, all client-related tables contain “client_”, all task-related tables contain “task_”, etc.)
  • Are we going to use uppercase and lowercase letters, or just lowercase letters?
  • What name will we use for ID columns? (Most likely, it will be of type “id”.)
  • How do we name the foreign key? (Most likely the id “id_” + name of the referenced table.)

Imagine the mess we would create if our model contained hundreds of tables. Perhaps we can work with such a model (if we create it ourselves) but we will make someone feel unlucky if they have to work with it after us.

To avoid future problems with names, don’t use reserved SQL words, special characters or spaces in them.

So before you start creating any names, create a simple document (maybe just a few pages long) that describes the naming convention you used. This will increase the reading comprehension ability of the entire model and simplify future work

5. Standardization problem

Standardization is an essential part of database design. Each database must be standardized to at least 3NF (defined primary keys, columns are atomic and there are no repeating groups, partial or bridged dependencies). This reduces data duplication and ensures referential integrity.

You can read more about normalization in THIS ARTICLE . In short, whenever we talk about relational database model, we will talk about standardized database. If a database is not standardized, we will encounter a series of data integrity issues.

In some cases, we may want to not standardize our database. If you do this, you can read more about database non-standardization HERE .

6. Using Entity-Attribute-Value Model (EAV)

EAV stands for Entity-Attribute-Value. This structure can be used to store additional data about anything in our model. Let’s look at an example. Assume that we want to store some additional customer properties. The “customer” table is our entity, the “attribute” table obviously contains our properties, and the “attribute_value” table contains the value of the attribute for certain customers.

First, we will add a dictionary with a list of all possible properties we can assign to a customer. This is the “attribute” table. It may contain attributes such as “customer value”, “contact details”, “additional info” etc. The “customer_attribute” table contains a list of all attributes with values ​​for each customer. For each customer, we will only have records for the attributes they have and we will store “attribute_value” for that attribute. This looks really great. It will allow us to add new properties easily (because we add them as values ​​in the “customer_attribute” table. Therefore, we will avoid making changes in the database, it’s good, while the model will store the data we need, working with that data is much more complex, and covers almost everything, from writing simple SELECT queries to getting all Customer-related values ​​come in to insert, update or delete values.In short, we should avoid the EAV structure.If you have to use it, only use it when you are 100% sure that it is real. the need.

7. Use the GUID / UUID as the Primary Key

GUID Globally Unique Identifier) ​​is a 128 bit number created according to the rules defined in RFC 4122. Sometimes they are also called UUID (Universally Unique Identifiers). The main advantage of GUID is that it is unique; The chance you achieve the GUID twice is very unlikely. Therefore, GUID seems to be a great candidate for primary key column. But not quite so.

A general rule for primary keys is that we use an integer column with the autoincrement property set to yes. This will add data in sequential order to the primary key and provide optimal performance. Without sequential keys or timestamps, there’s no way to know which data was inserted first. This problem also arises when we use real-world UNIQUE values ​​(e.g., VAT ID).

8. Indexing is incomplete

Indexes are a very important part when working with databases, but to discuss them carefully is beyond the scope of this article. You can watch it to find out more:

9. Excess data

Excess data should generally be avoided in any model. It not only takes up more disk space but also significantly increases the likelihood of data integrity problems. If something is redundant, we should note that the original data and the copy are always in a consistent state. In fact, there are some situations where redundant data is desirable:

  • In some cases, we must assign a priority to a certain action – and to do this, we must perform complex calculations. These calculations can use multiple tables and consume a lot of resources. In such cases, it is wiser to make these calculations during the break (thus avoiding performance problems during working hours). If we do it this way, we can store that calculated value and use it later without having to recalculate. Of course, the value is redundant; however, what we achieve in performance is significantly higher than what we lose (some hard drive capacity).
  • We can also store a small subset of report data within the database. For example, at the end of the day, we will store the number of calls we make that day, number of successful sales, etc. Report data should only be stored this way if we need to use Regular use. Again, we will lose some hard drive space, but we will avoid recalculating the data or connecting to the reporting database (if we have one).

In most cases, we should not use redundant data because:

  • Storing the same data multiple times in a database can affect data integrity. If you store customer names in two different places, you must make any changes (insert / update / delete) for both locations at the same time. This also complicates the code you need, even for the simplest operations.
  • Although we can store some aggregate numbers in our operational database, we should only do this when we really need it. An operational database is not meant to store report data and to mix these two data is generally not a good idea. Anyone who creates the report will have to use the same resources as the user who works in operational tasks; Reporting queries are often more complex and can affect performance. Therefore, you should separate your active database and reporting database.

Original article: https://www.vertabelo.com/blog/the-9-most-common-database-design-errors/

Share the news now

Source : Viblo