1. What is TimescaleDB?
TimescaleDB is a PostgreSQL extension that improves on the following three key points:
- Better performance when scaled at scale
- Lower hosting costs
- Using Hypertables to process time-series data
Besides these advantages, TimescaleDB is almost like regular PostgreSQL. That’s because TimescaleDB is an extension, not a fork. This means, you can still install other PostgreSQL extensions, take advantage of the full system availability, and benefit from the diverse PostgreSQL ecosystem.
2. High Availability in TimescaleDB
If you are working with databases, you already know that High Availability (HA) is a requirement for any reliable system. Setting the right HA eliminates the problem of single point-of-failure by providing multiple regions to retrieve data from and automatically selecting the appropriate region to read and write data from. PostgreSQL typically achieves HA by replicating data on the primary database into one or more read-only replicas.
Streaming replication is the primary replication method supported by TimescaleDB. It works by having the main database server communicate write-ahead logs (WALs) to its database replicas. Each replica then re-communicates these WALs to the primary database to reach a consistent state with the primary database.
Unfortunately, such Streaming replication alone does not guarantee that users will not experience downtime (“High Availability”) when the server is down. If the primary node fails or becomes inaccessible, then a method is required to promote one of the read replicas to be the new Leader node. And the faster this process happens, the smaller the free time the user has to wait and is called Automatic Failover. Automatic Failover refers to a mechanism by which the system detects that when a primary database fails, it completely removes it from rotation, moves a read-only replica to the primary state immediately, and ensures remaining aware of the new state of the system.
There are a number of third-party solutions that provide HA and Automatic Failover for PostgreSQL, many of which work well with TimescaleDB because they take advantage of streaming replication. In this article, I choose Patroni because of its simple architecture and implementation of load balancing in the direction of modularization. Patroni performs health checks on PostgreSQL nodes and allows you to use Kubernetes, AWS ELB, HA Proxy or another proxy solution to handle load balancing.
3. Learn about Patroni
Patroni’s core architecture is as follows:
Each PostgreSQL node has a Patroni bot deployed on it. The bots are capable of both managing the PostgreSQL database and updating the distributed consensus system (etcd, Zookeeper, Consul, and the Kubernetes API supported by etcd, are perfectly good options). etcd must be implemented under a Raft-based consensus algorithm. This requires a minimum deployment of 3 etcd nodes.
Leader node election is done by generating expiring key in etcd. The first PostgreSQL node that generates the etcd key through its Patroni bot becomes primary. All other nodes will see that a primary node has been selected and Patroni bots will establish PostgreSQL instances as replicas.
primary key has a short TTL attached to it. The Patroni primary bot must constantly check its PostgreSQL instance health and update the primary key. Cases that can happen:
- (a) disconnection button
- (b) database failure
- (c) bot dies, key will expire
This will cause the primary key to be completely removed from the cluster. When the replicas find that there is no primary key, each replica tries to become the primary key through information exchange between Patroni bots. The bots will then choose a clear candidate for promotion or both will race to get promoted to Primary.
You can see an illustrative example here:
4. Deploy Patroni TimescaleDB on Kubernetes
Before we start deploying, we need to prepare the Git repository:
1 2 3 | git clone https://github.com/zalando/postgres-operator.git cd postgres-operator |
Create the values dev-values.yaml
file with the following content:
1 2 3 4 5 6 | configGeneral: docker_image: registry.opensource.zalan.do/acid/spilo-14:2.1-p3 configKubernetes: enable_pod_antiaffinity: true watched_namespace: "default" |
Use helm to install chart postgres-operator
with the created values file
1 2 | helm install postgres-operator ./charts/postgres-operator -f dev-values.yaml |
Run the command kubectl get pod
to check if operator pod is running or not
Next, to create TimescaleDB pods you need to run the command kubectl get sc
to check if Kubernetes already has a storage provisioner
Create the values dev-manifest.yaml
file with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | apiVersion: "acid.zalan.do/v1" kind: postgresql metadata: name: dev-timescaledb namespace: default spec: dockerImage: registry.opensource.zalan.do/acid/spilo-14:2.1-p3 teamId: "dev" volume: size: 10Gi storageClass: local-path numberOfInstances: 1 enableConnectionPooler: false # enable/disable connection pooler deployment postgresql: version: "14" parameters: # Expert section max_connections: "200" timescaledb.max_background_workers: "16" timescaledb.license: "timescale" max_parallel_workers: "16" max_worker_processes: "32" patroni: pg_hba: - local all all trust - hostssl all +zalandos 127.0.0.1/32 pam - host all all 0.0.0.0/0 md5 - hostssl all +zalandos ::1/128 pam - host all all ::1/128 md5 - local replication standby trust - hostssl replication standby all md5 - hostssl all +zalandos all pam - hostssl all all all md5 |
Run the command kubectl get pod
to check if the TimescaleDB pod is running or not
Access the TimescaleDB pod with the command:
1 2 | kubectl exec -it ovng-timescaledb-0 -- /bin/bash |
Check if TimescaleDB cluster has a Leader or not using patronictl patronictl list
Get the TimescaleDB password with the command:
1 2 | kubectl get secret/postgres.dev-timescaledb.credentials.postgresql.acid.zalan.do -o yaml |
You can use some web-based tool to decode password Here
Access the database and create a testdb: psql -h 127.0.0.1 -U postgres
5. Conclusion
That’s all folks! In this article, I have shared with you about the advantages of TimescaleDB – A relational database based on modern cloud-native architecture, optimized for processing time-series data. I also presented detailed instructions on how to deploy the Zalando PostgreSQL Operator on Kubernetes. I hope you found this article useful.