ID generator Snowflake, Sonyflake and NanoId

Tram Ho

Introduce

In this article, I would like to introduce everyone to the algorithms and libraries to build the ID generator (ID generator) that I often use to solve these problems.

ID generator (ID generator)

First of all, you must have understood what an ID generator is, according to ChatGPT baby

An ID generator is a tool or software that creates unique identification numbers for entities such as individuals, products, or transactions. These IDs are used to efficiently identify and track items in databases and systems. Some common types of ID generators include sequential ID generators, random ID generators, and hash-based ID generators.

Vietnamese sub

An ID generator is a tool or software that generates unique identifiers for objects such as individuals, products, or transactions. These IDs are used to efficiently identify and track entries in databases and systems. Some common types of ID generators include sequential ID generators, random ID generators, and hash-based ID generators.

There are 3 most popular types of ID generators today

  • Sequential ID generator : the most common one you often see is the ascending ID in the Databases (Mysql, PosgreSql…), using this method the advantage is that the id looks pretty, always ensuring uniqueness. However, the reverse is that DBs often use sequence to count incrementally and this counting needs to be processed sequentially when encountering a large number of requests, which will be a weakness that cannot be “swallowed”. This method is suitable for small/medium systems where the inserted data is not too large.
  • Random ID generator : this method will randomize the characters/numbers based on the input input. However, to ensure that the ID does not match, the random result length usually needs to be large enough. The most famous library that every dev has heard of is probably UUID . In this article, I will introduce you to another library that is lighter, faster and better.
  • ID generator based on hash(Hash) : this method uses the band algorithm by bringing in data from many sources, be it email, id server, random string…. The method combined with other algorithms has can generate IDs with extremely low duplication, commonly used in distributed applications. In this article we will learn about Twitter’s Snowflake algorithm.

Random ID generator

In this article, I use Go to make a demo

According to the developer, NanoId is faster than UUID

“Nano ID is quite comparable to UUID v4 (random-based). It has a similar number of random bits in the ID (126 in Nano ID and 122 in UUID), so it has a similar collision probability — for there to be a one in a billion chance of duplication, 103 trillion version 4 IDs must be generated”

According to NanoID, if you use Standard mode with a length of 21 characters, every 1,000,000 (1 million) reuquests per paper will take you more than 40,000 years to have a 1% duplicate rate.

Below is a picture of the calculation of duplicate probability of another tool of NanoID. virtual ha =))). Screenshot 2023-02-06 at 16.47.00.png

Usually if you just want to use zero, the possibility of duplicates will increase quite a lot, but if you need to generate IDs for 1 million requests per second, forget about NanoID, and if you have a few tens of thousands, then NanoID is still great.

Golang for example.

And this is the result

Screenshot 2023-02-07 at 09.35.45.png

I tried adding genera to 10 million codes with 21 characters long and it took about 31 seconds. With sequential processing, I find the speed to be very good.

OK! Now do a high-intensity test to see if there is any infection.

Case 1: create a week from 10 million codes Screenshot 2023-02-07 at 09.41.19.png

Screenshot 2023-02-07 at 10.22.11.png

Case 2: simulate 100 requests together, each request generates 1000 codes and woww, no duplicates detected.

Screenshot 2023-02-07 at 10.55.57.png

Case 3: create 1 million channels, each channel 100 requests.
ah, the code above you guys edit a little and then test it, my computer can’t run it.

In some cases where the programmer needs an ID with increasing ability, in this case NanoID will probably no longer be preferred, especially with distributed systems that will make the problem really difficult, However. consider the case of random code NanoID is still worth considering.

Distributed ID generator

As I mentioned in the ID generator section above, there will be a lot of inadequacies if using the DB incrementer, especially with distributed systems that will make the problem really difficult.

Twitter’s snowflake algorithm is the typical solution in this context image.png

The code generated by the Snowflale algorithm is 64-bit and is divided into four parts:

  • Do not use the first bit because this bit is the sign bit.
  • timestamp: uses 41 bits to represent the timestamp when the request is received, in milliseconds.
  • datacenter_id: 5 digits to represent the id of the datacenter.
  • worker_id: 5 digits to represent the id of the server.
  • sequence_id: the last is the 12-bit loop increment id (increments from 0 to 111111111111 and then returns to 0).

With this mechanism snowfllake can generate 2 ^ 12 = 4096 per millisecond or about 4096 million per second per server.
timestamp stores 41 bits, so by calculation it will only work until 2039/9/7 23:47:35

timestamp , datacenter_id , worker_id and sequence_id are four fields, timestamp and sequence_id separately generated by the program at runtime. The datacenter_id and worker_id need to be obtained during the deployment phase and once the program has been started, it cannot be changed.

Let’s go to the demo, let’s go to Golang again today

lib: https://github.com/bwmarrin/snowflake

Screenshot 2023-02-07 at 11.41.45.png

IDs are generated sequentially, isn’t it awesome.

Next, I simulate there are 4 nodes, each node has 10 requests.

You can see the picture to check the results or copy the code to try it

Screenshot 2023-02-07 at 11.46.12.png

and of course, in addition to using it, you can also customize a few things to suit such as Epoch, NodeBits, StepBits…

But is 2039 too short for you? if it’s too short, then Sonyflake is an alternative, Sonyflake has a very similar design to Snowflake, but different in the way bits are distributed. image.png The time here uses only 39 bits, but the time unit becomes 10ms. Theoretically, it is 41 bits (174 years) longer than the standard snowflake time.

Sonyflake’s Sequence ID is similar to Snowflake’s and Machine ID is Node’s id. The difference between sonyflake is the configuration parameters during startup (see more at: https://github.com/sony/sonyflake )

Reference

https://developer.twitter.com/en/docs/twitter-ids
https://github.com/matoous/go-nanoid
https://github.com/bwmarrin/snowflake
https://github.com/sony/sonyflake
https://zalopay-oss.github.io/

Share the news now

Source : Viblo