Skip to main content

MySQL Bidirectional Synchronization

Overview

This topic mainly introduces how CloudCanal does MySQL bidirectional synchronization and anti-loop, and the solution features include:

  • GTID is not dependent
  • Independent of the order of transactions, can be parallelized
  • Fewer peer-to-peer operations
  • Universal support for cloud databases (MySQL).
  • Supports library table column pruning, mapping, and custom data processing

Key Points

Anti-loop Marker

We used the binlog event flag to prevent incremental change loop synchronization.

DML in the Binlog normal event sequence is QueryEvent (TxBegin), TableMapEvent, WriteRowEvent (IUD), QueryEvent (TxEnd), if you need to mark the synchronized data, unless the WriteRowEvent is marked, there is no place to start. But to achieve this goal, as we know, unless the engine code is changed.

We then focused on RowsQueryLogEvent, an event in the MySQL binlog that describes the statement of a changed line, which only appears when the MySQL binlog_rows_query_log_events parameter is opened, and this event can be annotated to execute SQL.

To this end, we designed CloudCanal to automatically bring the /ccw/ tag when writing to the peer, so that the tag will also appear in the SQL statement of the RowsQueryLogEvent, and in bidirectional synchronization, if this flag is encountered, filter.

The new DML event sequence became QueryEvent (TxBegin), TableMapEvent, RowsQueryLogEvent, WriteRowEvent (IUD), QueryEvent (TxEnd).

Example

Install CloudCanal

Add DataSource

  • We use Alibaba Cloud RDS for MySQL for example, 2 instances are located in the Hangzhou region.
  • Log in to the RDS console, Instance Details > Parameters,set binlog_rows_query_log_events to on.
  • Log in to the CloudCanal console ,DataSource > Add DataSource.
  • Modify the description of the DataSource to prevent the wrong database from being identified when you configure positive and negative DataJobs. loop_sync_1

Create Forward DataJob

  • DataJob > Create DataJob

  • In bidirectional synchronization, the forward task generally refers to the DataJob where the source end has data and the target end has no data, which involves the initialization of data at the peer end.

  • In the first page, select the source and target DataSources , and click Next Step. loop_sync_2

  • In the second page

    • Select Incremental, then check Full Data option.
    • Check Synchronize DDL.
    • Graying out starts automatically to set parameters after the DataJob is created.
    • Click Next Step.

    loop_sync_3

  • In the third and fourth page,select tables and columns, and click Next Step.

  • In the fifth page,click Create DataJob.

  • Details > Functions > Modify Parameters

    • Check target tab, set deCycle to true.
    • Save and start the DataJob.

    loop_sync_4

Create Reverse DataJob

  • DataJob > Create DataJob

  • In the first page, select the source and target DataSources(reverse selection of Forward DataJob), and click Next Step. loop_sync_5

  • In the second page

    • Select Incremental, and DO NOT check Full Data option.
    • Check Synchronize DDL.
    • Graying out starts automatically to set parameters after the DataJob is created.
    • Click Next Step.

    loop_sync_6

  • In the third and fourth page,select tables and columns, and click Next Step.

  • In the fifth page,click Create DataJob.

  • Details > Functions > Modify Configurations

    • Check target tab, set deCycle to true.
    • Save and start the DataJob.

    loop_sync_7

DataJob Running

  • Forward and reverse DataJobs running well. loop_sync_8

Test

  • The source database do some DMLs.

  • There are changes in forward DataJob monitoring but reverse DataJob has no. loop_sync_9 loop_sync_10 loop_sync_11

  • The target database do some DMLs.

  • There are changes in reverse DataJob monitoring but forward DataJob has no. loop_sync_12 loop_sync_13 loop_sync_14

  • Validate the source and target data and the results are consistent. loop_sync_15

FAQ

What Are The Disadvantages?

You need to change the MySQL global variable binlog_rows_query_log_events to on, this parameter is turned off by default, which is a disadvantage compared to GTID is generally turned on.

In addition, binlog growth is relatively fast, which may bring disk growth troubles, and the cleanup binlog cycle will be shorter.

Finally, for CloudCanal, the memory footprint of the statement text is increased, resulting in increased resource consumption.

However, we believe that the losses are worth the benefits of the new solution – the performance and stability gains – are vastly improved.

What Other Connection Are Supported?

Until now, only support MySQL to MySQL bidirectional data synchronization.

In the near future,we will support bidirectional synchronization between Redis and Redis, between PostgreSQL and PostgreSQL, and between cross-DataSource kinds.

Summary

This topic mainly introduces how CloudCanal does MySQL bidirectional synchronization and anti-loop,help people achieve database remote multi-activity and disaster recovery goals.