Deploying TimescaleDB on Kubernetes

Tram Ho

1. What is TimescaleDB?

TimescaleDB is a PostgreSQL extension that improves on the following three key points:

  • Better performance when scaled at scale
  • Lower hosting costs
  • Using Hypertables to process time-series data

Besides these advantages, TimescaleDB is almost like regular PostgreSQL. That’s because TimescaleDB is an extension, not a fork. This means, you can still install other PostgreSQL extensions, take advantage of the full system availability, and benefit from the diverse PostgreSQL ecosystem.

2. High Availability in TimescaleDB

If you are working with databases, you already know that High Availability (HA) is a requirement for any reliable system. Setting the right HA eliminates the problem of single point-of-failure by providing multiple regions to retrieve data from and automatically selecting the appropriate region to read and write data from. PostgreSQL typically achieves HA by replicating data on the primary database into one or more read-only replicas.

Streaming replication is the primary replication method supported by TimescaleDB. It works by having the main database server communicate write-ahead logs (WALs) to its database replicas. Each replica then re-communicates these WALs to the primary database to reach a consistent state with the primary database.

Unfortunately, such Streaming replication alone does not guarantee that users will not experience downtime (“High Availability”) when the server is down. If the primary node fails or becomes inaccessible, then a method is required to promote one of the read replicas to be the new Leader node. And the faster this process happens, the smaller the free time the user has to wait and is called Automatic Failover. Automatic Failover refers to a mechanism by which the system detects that when a primary database fails, it completely removes it from rotation, moves a read-only replica to the primary state immediately, and ensures remaining aware of the new state of the system.

There are a number of third-party solutions that provide HA and Automatic Failover for PostgreSQL, many of which work well with TimescaleDB because they take advantage of streaming replication. In this article, I choose Patroni because of its simple architecture and implementation of load balancing in the direction of modularization. Patroni performs health checks on PostgreSQL nodes and allows you to use Kubernetes, AWS ELB, HA Proxy or another proxy solution to handle load balancing.

3. Learn about Patroni

Patroni’s core architecture is as follows:

Each PostgreSQL node has a Patroni bot deployed on it. The bots are capable of both managing the PostgreSQL database and updating the distributed consensus system (etcd, Zookeeper, Consul, and the Kubernetes API supported by etcd, are perfectly good options). etcd must be implemented under a Raft-based consensus algorithm. This requires a minimum deployment of 3 etcd nodes.

Leader node election is done by generating expiring key in etcd. The first PostgreSQL node that generates the etcd key through its Patroni bot becomes primary. All other nodes will see that a primary node has been selected and Patroni bots will establish PostgreSQL instances as replicas.

primary key has a short TTL attached to it. The Patroni primary bot must constantly check its PostgreSQL instance health and update the primary key. Cases that can happen:

  • (a) disconnection button
  • (b) database failure
  • (c) bot dies, key will expire

This will cause the primary key to be completely removed from the cluster. When the replicas find that there is no primary key, each replica tries to become the primary key through information exchange between Patroni bots. The bots will then choose a clear candidate for promotion or both will race to get promoted to Primary.

You can see an illustrative example here: image.png

4. Deploy Patroni TimescaleDB on Kubernetes

image.png

Before we start deploying, we need to prepare the Git repository:

Create the values dev-values.yaml file with the following content:

Use helm to install chart postgres-operator with the created values ​​file

Run the command kubectl get pod to check if operator pod is running or not image.png

Next, to create TimescaleDB pods you need to run the command kubectl get sc to check if Kubernetes already has a storage provisioner image.png

Create the values dev-manifest.yaml file with the following content:

Run the command kubectl get pod to check if the TimescaleDB pod is running or not image.png

Access the TimescaleDB pod with the command:

image.png

Check if TimescaleDB cluster has a Leader or not using patronictl patronictl list image.png

Get the TimescaleDB password with the command:

You can use some web-based tool to decode password Here image.png

Access the database and create a testdb: psql -h 127.0.0.1 -U postgres image.png

5. Conclusion

That’s all folks! In this article, I have shared with you about the advantages of TimescaleDB – A relational database based on modern cloud-native architecture, optimized for processing time-series data. I also presented detailed instructions on how to deploy the Zalando PostgreSQL Operator on Kubernetes. I hope you found this article useful.

Share the news now

Source : Viblo