How to fix mysql replication error

Tram Ho

When running a MySQL replication system, when a server slave fails, the data between the Slave and the Master is misleading, and then the Slave will no longer work in the system. The task of the system operator is to need a fix to get the slave back to normal operation.

There are many ways to fix slaves depending on each specific case, one of the necessary requirements to be able to resolve quickly when the MS system fails is that you must understand your data, must know that the table makes What, the data stored in it has what effect, for what function, requires the accuracy of that data to have a specific solution.

There are basically 2 main methods:

Stop the slave and rebuild the new slave -> this is the easiest way without understanding the data, but it comes with a lot of time if the data is quite large. Analyzing binlog and fix on data sync process -> this way helps recovery slave very quickly with systems with big data, but requires to understand their own data.

1. Rebuild new slave with mysqldump

Rebuilding a slave is similar to the way we set up a MySQL replication system from the beginning, except that the system is currently online and any actions affecting the Master server directly affect people. use. With this method, we need to use mysqldump to dump data into sql file and synchronize data, I summarize it in the following step:

On the server Master:

On the Slave server:

The only note as I mentioned above is that the master lock table will directly affect the user, so this method should only be used during system maintenance and if the data is larger, the more dump time The longer the system has to maintain.

2. Rebuild new slave with xtrabackup

xtrabackup is a Percona C tool, commonly used for MySQL hotbackup, its advantages / disadvantages over mysqldump are:

  • No system maintenance, no db lock and no effect on users.
  • The backup process is much faster than mysqldump.
  • The downside is that the backup folder has the same size as the data-raw, while the dump sql is very light.

For systems with large databases, using xtrabackup is the perfect solution for backing up and rebuilding a slave server that doesn’t affect the running system and is faster than the traditional method. Specifically includes 2 steps as follows:

  • Backup and restore data with xtrabackup.
  • Point slave to binlog and position of binlog on master and sync data.

To backup and restore data with xtrabackup we have 3 basic steps as follows:

Backup all data of mysql master with the command:

=> Here I have configured the user / pass of the root user in the file ~ / .my.cnf so I don’t need to pass this information when running xtrabackup, and can run with the –compact option to reduce the size when backing up .

Sync data via slave and run prepare data:

=> during the backup process, there may be uncommited uncompleted transactions or log transactions that need to be relayed -> this process to ensure data consistency. You can also use the –rebuild-threads = 16 option to create multiple worker-theard to rebuild the index at the same time.

Restore data and start mysql

This is the process of copying data backups into the datadir. It should be noted that the datadir must be completely empty before copy-back.

After the 3 steps above, the backup and restore process has been completed, the remaining is to synchronize data with the master to perform the sync process.

After the copy-back is complete, in / var / lib / mysql will have an xtrabackup_binlog_info file, this file contains information about the binlog and the position to use for sync.

And configure sync from master as normal in mysqldump

Note that xtrabackup is written in C, innobackupex is a Perl script symlink to xtrabackup, currently Percona encourages us to switch to xtrabackup because innobackupex is no longer being developed, but in some cases we will still need it.

2.1 Some tips with xtrabackup

If you are digging into xtrabackup documentt you can find quite a lot of useful things in many different situations. In fact, I have also encountered those situations and have just seen the power of xtrabackup.

Backup and stream over the network

One situation is that the server master’s hard drive is 1TB in size, the data in MySQL is> 500GB, which means that backing up to the current drive is not enough space, attaching a hard drive will take time with the system. bare-metal system. We can simply stream the backup directly via the server slave as follows.

New slave

will open a port 9999 and output to an archive file.

Master

Backup and send directly via slave.

As we can see, after the backup is completed we will have an archive file on the slave, and the rest will be unarchive and restore data. BUT we can also extract it during the stream.

New Slave

Master

Create new newslave from old slave

Suppose we have a Master server, a slave server, but for some reason the server slave needs maintenance, we will need to build a new slave instead. But we don’t want to touch the Master server, we can easily create a new slave from the servicing server slave with xtrabackup as follows:

Old slave

Or stream

The only difference is that binlog and position instead of being saved on xtrabackup_binlog_info will be saved in xtrabackup_slave_info, we only need to prepare-data, restore and point the new slave to the master with the information in the file xtrabackup_slave_info.

2.2 Some practical information

  • Data 500GB, using xtrabackup stream over network takes 93 minutes, unachive tar file 420GB takes 39 minutes, –prepare takes about 60 minutes.
  • Data 500GB, use xtrabackup stream and unarchive directly until the slave is completed building takes ~ 2 hours.
  • Data 500GB, using mysqldump data to disk, takes 183 minutes, sql file weighs 182GB uncompressed.

3. Analyze binlog and fix on data sync process

If you search the Internet, you will see a number of blogs instructing to fix replication data between MS as follows:

However, this is a pretty bad idea and it is best to avoid using this method. With the above statement, Slave will ignore the error and next to the next transaction to apply binlog => loss of data consistency.

For example:

  • Inventory management application, read queries are pushed into Slave.
  • One product has 1 inventory, user after buying quantity 1, the inventory number is deducted to 0 on the master.
  • Binlog was born, sync via slave and was applied to slave but it failed.
  • Perform skip through that transaction -> the inventory of that product is still 1 on the slave -> and without any further impact on that record, the slave will work and sync data normally.

=> When another user comes into view of the product, generate a query that reads the inventory from the slave and reports the product has a quantity of 1. The user enters the shopping cart (write data -> calls on the slave) => Cannot put in the cart because the master check the number is 0.

In fact, the above use only when we really understand how our data will be affected, for example, there are many applications that save log requests or push_notification_log to mysql, we can SKIP if we know for sure apply. log into the above data sheet.

In fact, I had a few weird situations before, but by decode binlog, I solved a lot of cases without rebuilding the slave, saving a lot of time and effort. Usually these are:

  • The data type of column and charet of the table between slave and master is inconsistent.
  • Data a record on a slave other than master, or index already exists on slave. => These errors are actually caused by humans, but a quick fix is ​​still needed.

Example 1:

As above, the error message in the binlog file is mysql-bin.000569 , location 619131881, we will need the binlog decode.

The essence is that somehow users_id_photo_idx already exists on the slave, and when we index the master, this index is synchronized via the slave and applied, the error message exists.

The simple fix is ​​to delete the users_id_photo_idx index on the slave and the replication replication is fixable.

Example 2:

On the table t of the slave already exists a record with id equal to 5, so when the sync command INSERT through the slave is duplicated, simply delete that match record DELETE FROM t WHERE id = 5 AND pid = 2 and start again slave.

In a nutshell this way, the slave fix is ​​extremely fast, only takes a few seconds, minutes and the slave can return to operation and you do not have to spend time rebuilding the slave and waiting for sync data.

Reference: https://techzones.me

Share the news now

Source : Viblo