1. What is TimescaleDB?
TimescaleDB is a PostgreSQL extension that improves on the following three key points:
- Better performance when scaled at scale
- Lower hosting costs
- Using Hypertables to process time-series data
Besides these advantages, TimescaleDB is almost like regular PostgreSQL. That’s because TimescaleDB is an extension, not a fork. This means, you can still install other PostgreSQL extensions, take advantage of the full system availability, and benefit from the diverse PostgreSQL ecosystem.
2. High Availability in TimescaleDB
If you are working with databases, you already know that High Availability (HA) is a requirement for any reliable system. Setting the right HA eliminates the problem of single point-of-failure by providing multiple regions to retrieve data from and automatically selecting the appropriate region to read and write data from. PostgreSQL typically achieves HA by replicating data on the primary database into one or more read-only replicas.
Streaming replication is the primary replication method supported by TimescaleDB. It works by having the main database server communicate write-ahead logs (WALs) to its database replicas. Each replica then re-communicates these WALs to the primary database to reach a consistent state with the primary database.
Unfortunately, such Streaming replication alone does not guarantee that users will not experience downtime (“High Availability”) when the server is down. If the primary node fails or becomes inaccessible, then a method is required to promote one of the read replicas to be the new Leader node. And the faster this process happens, the smaller the free time the user has to wait and is called Automatic Failover. Automatic Failover refers to a mechanism by which the system detects that when a primary database fails, it completely removes it from rotation, moves a read-only replica to the primary state immediately, and ensures remaining aware of the new state of the system.
There are a number of third-party solutions that provide HA and Automatic Failover for PostgreSQL, many of which work well with TimescaleDB because they take advantage of streaming replication. In this article, I choose Patroni because of its simple architecture and implementation of load balancing in the direction of modularization. Patroni performs health checks on PostgreSQL nodes and allows you to use Kubernetes, AWS ELB, HA Proxy or another proxy solution to handle load balancing.
3. Learn about Patroni
Patroni’s core architecture is as follows:
Each PostgreSQL node has a Patroni bot deployed on it. The bots are capable of both managing the PostgreSQL database and updating the distributed consensus system (etcd, Zookeeper, Consul, and the Kubernetes API supported by etcd, are perfectly good options). etcd must be implemented under a Raft-based consensus algorithm. This requires a minimum deployment of 3 etcd nodes.
Leader node election is done by generating expiring key in etcd. The first PostgreSQL node that generates the etcd key through its Patroni bot becomes primary. All other nodes will see that a primary node has been selected and Patroni bots will establish PostgreSQL instances as replicas.
primary key has a short TTL attached to it. The Patroni primary bot must constantly check its PostgreSQL instance health and update the primary key. Cases that can happen:
- (a) disconnection button
- (b) database failure
- (c) bot dies, key will expire
This will cause the primary key to be completely removed from the cluster. When the replicas find that there is no primary key, each replica tries to become the primary key through information exchange between Patroni bots. The bots will then choose a clear candidate for promotion or both will race to get promoted to Primary.
You can see an illustrative example here:
4. Deploy Patroni TimescaleDB on Kubernetes
Before we start deploying, we need to prepare the Git repository:
git clone https://github.com/zalando/postgres-operator.git
Create the values
dev-values.yaml file with the following content:
Use helm to install chart
postgres-operator with the created values file
helm install postgres-operator ./charts/postgres-operator -f dev-values.yaml
Run the command
kubectl get pod to check if operator pod is running or not
Next, to create TimescaleDB pods you need to run the command
kubectl get sc to check if Kubernetes already has a storage provisioner
Create the values
dev-manifest.yaml file with the following content:
enableConnectionPooler: false # enable/disable connection pooler deployment
parameters: # Expert section
- local all all trust
- hostssl all +zalandos 127.0.0.1/32 pam
- host all all 0.0.0.0/0 md5
- hostssl all +zalandos ::1/128 pam
- host all all ::1/128 md5
- local replication standby trust
- hostssl replication standby all md5
- hostssl all +zalandos all pam
- hostssl all all all md5
Run the command
kubectl get pod to check if the TimescaleDB pod is running or not
Access the TimescaleDB pod with the command:
kubectl exec -it ovng-timescaledb-0 -- /bin/bash
Check if TimescaleDB cluster has a Leader or not using patronictl
Get the TimescaleDB password with the command:
kubectl get secret/postgres.dev-timescaledb.credentials.postgresql.acid.zalan.do -o yaml
You can use some web-based tool to decode password Here
Access the database and create a testdb:
psql -h 127.0.0.1 -U postgres
That’s all folks! In this article, I have shared with you about the advantages of TimescaleDB – A relational database based on modern cloud-native architecture, optimized for processing time-series data. I also presented detailed instructions on how to deploy the Zalando PostgreSQL Operator on Kubernetes. I hope you found this article useful.