Implementing Bulk Inserts with UNIQUE Constraints Using Node.js and PostgreSQL

Tram Ho

I had the opportunity to research how to do a bulk insert of multiple records in a Node.js + PostgreSQL environment, even when there is a UNIQUE constraint on the target table, so I’m going to summarize it in a blog post.

Using PostgreSQL with Node.js

First, if you want to implement PostgreSQL client functionality in Node.js, I think the most commonly used npm library is pg (node-postgres):

image.png

Like this, create a connection pooling.

This is an example of executing an SQL “INSERT” process and getting the result.

This time, I will introduce it using this pg.

Implement bulk insert with Node.js and PostgreSQL

In the example above, we showed an example of a process to insert one record with one SQL execution. You can also execute this multiple times to insert multiple records, but it is convenient if you can efficiently insert them when the number of records is large. This process of “creating multiple records at once” is called “bulk insert”.

There are several ways to implement bulk insert with Node.js and PostgreSQL, but this time we will introduce an example using the npm library node-pg-format.

image.png

When using it, prepare bulk insert data in an array and implement it as follows:

We are instantiating the entire SQL statement in pg-format, and attaching array data as multiple records to be inserted (the parameter for conn.query() execution is empty). This allows us to insert multiple records specified in the records array variable with one SQL execution.

Bulk inserting into Node.js and PostgreSQL with a UNIQUE constraint

Now we will move on to the main topic of this blog entry. We have already introduced the method of bulk insert. The problem is that this does not always work. Specifically, when the target table has a UNIQUE constraint and some of the records in the bulk insert are affected by this constraint.

I will give a specific example. For example, I am introducing an example of bulk inserting multiple records into a table called table1, but it is not possible to exclude the possibility that the following specifications were given when creating the table1 table:

The first line “create table…” is fine, but the problem is the second line. By giving this constraint, it has been specified that the combination of name and num must be unique. It’s okay for name to be repeated and num to be repeated, but the combination of name and num must be unique (data that violates this condition cannot be inserted).

Under the assumption that there is a UNIQUE constraint, let’s look at the example again. The records that are trying to be bulk inserted in this example,

The content was (the first element of the array is name and the second is num and it will be bulk inserted). Looking at this example alone (since the combination of name and num is all different), it seems that bulk insertion can be done without any problems. … But what if the data {name: "yamaha", num: 100} was already recorded before the bulk insertion was executed? The first record of the bulk insertion will be subject to the UNIQUE constraint, which means it will result in an execution error.

In addition, there is a complication that makes this problem more complicated. That is the question of whether or not records that did not fail the UNIQUE constraint should be inserted when bulk insert fails due to UNIQUE constraint error. The answer to this is probably case by case, depending on the data and application. If the records that must be inserted together have no meaning if not inserted together, then it is correct for the bulk insert to fail and none of the records to be inserted. However, if some of the records failed the UNIQUE constraint but the rest should be inserted, then only part of the bulk insert should fail and the rest should be inserted.

In the case of the bulk insert implementation introduced in the above example, it behaves like the former. In other words, “the bulk insert will only succeed if none of the records violate the constraints, and if one of the records violates the constraints, none of the records will be inserted and an error will occur” (strictly speaking, “bulk insert” means such behavior and processing).

Bulk inserting into Node.js and PostgreSQL with a UNIQUE constraint

After a long introduction, now that we understand the background, how can we implement a bulk insert so that records that violate the UNIQUE constraint are not inserted, but records that do not violate the constraint are inserted correctly? This is the key part of this blog entry.

If you are using PostgreSQL 9.5 or higher and Node.js + node-postgres, the following method can be used to achieve this, although it may vary depending on the RDB, library, and programming language you are actually using.

In this example, we are using programming to generate and execute the following SQL statement:

I’m not doing a bulk insert exactly, but I’m specifying that in one SQL execution, multiple records can be inserted and if a “conflict” (UNIQUE constraint violation) occurs, it should be “ignored”. As a result, only records without UNIQUE constraint violations are inserted together.

In my own tests, bulk insert worked as expected even when the number of records to be inserted at once became very large (in cases where the SQL statement itself becomes huge and complex). I was researching this because I was making an app that needs to do this kind of processing, so I hope this helps someone else.

Resource

Chia sẻ bài viết ngay

Nguồn bài viết : Viblo