Redis Cluster Learning

Introduction

Let’s review the key concepts of Redis clusters:

Configuration
- A Redis database system has one master database and multiple slave databases
- The master database doesn’t require any special configuration; slave databases just need to add –slaveof ip port to their configuration
- Slave databases are read-only by default, and even if changed to writable, it has no practical use
- Besides adding to the configuration file, you can also use the slaveof command at runtime to modify the master database
Persistence Review
- RDB: This method forks the Redis process, the child process writes memory to a file, then replaces the previous RDB snapshot
- AOF: This method writes every command to a file. When the AOF file gets too large, it needs to be rewritten after a certain number of commands are appended because some commands are redundant. File writing is cached by the operating system and can be configured. Without configuration, data is written to disk every 30s. You can configure it to synchronize the cache with each command or once per second.

Principles

Initialization Replication
- 1. When a slave database starts, it sends a SYNC command to the master database
- 1. The master database performs a snapshot save, caches commands during the snapshot, then sends both the snapshot and the commands to the slave database
- 1. If disconnected, Redis versions 2.6 and earlier would re-initialize replication. Redis 2.8 and later can just transmit commands during the disconnection period.
- 1. The slave database replaces the temporary snapshot with the RDB snapshot file specified in the configuration. Subsequent operations are consistent with the RDB persistence recovery process.
Replication Synchronization
- Afterward, the master database sends commands that cause database changes to the slave database
- Redis uses optimistic replication strategy, with a window of inconsistency. The master database can be configured to be writable when there are multiple database connections, as well as the maximum allowed disconnection time for databases. This can reduce data inconsistency issues to some extent.

Read/Write Separation for Performance Improvement

Some time-consuming read operations can be accelerated by establishing multiple database slave nodes through replication

Slave Database Persistence

When using slave databases, the master database’s persistence can be disabled. However, when the master database crashes, a role switch between master and slave databases is needed to recover the master database’s content. The slave database uses “slaveof no one”, and the master database uses the slaveof command to set itself as a slave database.

In this case, if the master database crashes, it cannot be restarted immediately. Because if restarted immediately, since the master database has no persistence, it would clear the content in the slave database.

Diskless Replication

Master-slave replication can use RDB persistence for initialization synchronization. When the master database disables RDB snapshots but performs initialization replication, it will still generate an RDB snapshot. At this point, when the master database restarts, it will use the RDB snapshot for recovery. Since the synchronization time point is uncertain, and initialization synchronization may not have happened for a long time, this could lead to recovery at an arbitrary time point.

Additionally, creating an RDB snapshot during initialization synchronization requires disk operations, which may affect synchronization efficiency if the disk is too slow.

Starting from Redis 2.8.18, Redis introduced diskless replication. When this option is enabled, initialization replication will no longer create an RDB snapshot but will send the RDB snapshot content directly over the network.

Incremental Replication

If the master and slave databases disconnect, using initialization replication for the next reconnection would be quite expensive.

Databases generate a unique ID each time they run.
During the replication synchronization phase, the master database places each command in a replay log and records the current command’s offset.
The slave database also needs to record the command offset when receiving commands.
When the master-slave connection is ready, the slave database sends psync to tell the master database it can use incremental replication.
The master database checks if the ID sent by the slave database matches its own. If not, the master database may have restarted.
Then it checks if the slave database’s command offset is in the backlog replay log. If so, it sends the commands from the replay log to the slave database.
The backlog size can be configured, with a default of 1MB. A larger size allows for better tolerance after disconnection and reconnection.

Sentinel

The sentinel system was created to solve the inconvenience of maintaining master-slave databases. Previously, there were two problems: inability to detect when the master database goes down, and the master-slave conversion issue after the master database fails.

When a sentinel process starts, it reads the configuration file. The configuration file includes master-name, ip, and port. When the master node fails, the sentinel will automatically convert one of the slave nodes to become the master node. When the previous master node recovers, it automatically becomes a slave node.

After the sentinel starts, it will periodically perform the following three operations:

Send INFO to master and slave every 10 seconds
Send its own information to the __sentinel__:hello channel of master and slave every 2 seconds
Send PING to master database, slave database, and other sentinels every 1 second

INFO information allows the sentinel to obtain relevant information about the current database. Sending INFO to the master can obtain its slave databases. By sending its information to the channel, all sentinels can receive information about other sentinels monitoring the same database. Sentinels establish connections with each other to send ping information. When a sentinel sends a ping command and receives no reply, it will consider the target subjectively down. If a master database is found to be subjectively down, the sentinel will ask other sentinels if they also consider it subjectively down. When a certain number is reached, exceeding the quorum set in the sentinel startup parameters, the sentinel considers it objectively down.

RAFT Election:

Sentinel A, which discovered the master database to be objectively down, sends a command to other sentinels requesting to be elected as the leader sentinel.
If the requested sentinel hasn’t voted for anyone else, it will agree.
If A finds that the number of agreements exceeds half of the sentinel count and the quorum parameter, then A becomes the leader sentinel.
Otherwise, A waits a random time and reinitiates the request for the next round of elections.

The leader sentinel will perform the following operations:

Check the priority of slave databases; higher priority ones are chosen
Check the replication command offset of slave databases; those with larger offsets are chosen
Choose the slave database with the smaller ID

The selected database will receive a “slaveof no one” command from the leader sentinel to become the master database, and other slave databases will receive slaveof commands to change their master database. Finally, the internal record is updated to mark the stopped database as a slave of the master database, so when it resumes service, it will recover as a slave database.

Deployment

Sentinels are typically deployed on each node. If the Redis cluster is very large, the number of connections between all sentinels and a slave database might be too many, affecting Redis’s response to all sentinels. Therefore, the number of deployed sentinels should not be too high.

Cluster

For horizontal scaling, I became familiar with Codis during my internship at Tencent. Codis consists of Redis proxies and slave nodes. The main principle is to hash the keys and place them in specific Redis slaves. During script execution, scripts are allocated to specific slaves, which might lead to situations where other keys cannot be found.

Besides Codis, there’s also Redis Cluster.

When starting a Redis instance, change cluster-enables to yes in the configuration to enable cluster configuration.

Scripts can be used to conveniently create clusters. New nodes can be added using the cluster meet command with IP and port. After adding a new node, the node will use the gossip protocol to broadcast the node’s information to all nodes.

Slot allocation: After a new node joins the cluster, it has two options: either use the cluster replicate command to replicate each master database and run as a slave database, or request slot allocation from the cluster to run as a master database.

Keys are hashed using crc16. If {*} exists, that part is used for hashing, and then assigned to one of the 16384 slots. The cluster has a slot mapping table at the beginning of creation. Eventually, the data will be stored in the corresponding Redis node.