Skip to main content

Data Verification and Correction

Overview

This article introduces how to use CloudCanal to perform data verification and correction.

Key Points

Compare Field by Field

CloudCanal scans the data through the source DataSource, finds out the data from the peer in batches, compares it field by field, finds out the loss and diff data, and records it in the log.

Solve problems such as scanning efficiency, field type compatibility, and type accuracy matching.

Overriding Correction

With peer override correction, the correction capability can be turned on for the peer DataSource that has REPLACE capability.

The corrected data is differential data, that is, the verification result of the verification DataTask.

All In One

Verification and Correction is performed as a two-step DataJob, similar to the relationship between Full Data and Incremental in a data synchronization DataJob.

The Correction step can be ignored when creating a DataJob and added to the DataJob after the Verification is completed.

Replay

The Verification and Correction DataJob supports replay, after the Correction step is run, need to see the effect, click the Replay button to execute again.

Scheduled Execution

The Verification and Correction DataJob supports regular starting, automatically record results while completing corresponding step, and clean up associated logs.

Example

Install CloudCanal

Create Incremental DataJob

  • DataJob > Create DataJob

  • Select the source and target DataSources, click Next Step. verify_normal_1

  • Select Incremental, check Full Data option,click Next Step. verify_normal_2

  • Select tables and columns, click Next Step. verify_normal_3 verify_normal_4

  • Click Create DataJob button. verify_normal_5

  • Schema migration, full data, incremental are running. verify_normal_6

Create Discrepancy Data

  • In the peer DataSource, delete and modifie some data. verify_datajob_3

Create Verification And Correction DataJob

  • DataJob Details > Functions > Create Similar DataJob. verify_datajob_1

  • In the second step, select the Verification and Correction, and check Revise After Check option, and other steps do not need to be changed. verify_datajob_2

  • The Verification and Correction DataJob finishes and displays the status. verify_datajob_4

  • Replay the Verification and Correction DataJob, data is consistent. verify_datajob_5

FAQ

What Are The Remaining Problems?

  • For the extra data of the peer end, it cannot be verified, and you need to configure a reverse verification DataJob to solve it.
  • For no primary key and timestamp type tables, verification is ignored by default, the former cannot locate the data, and the latter is difficult to locate due to the difference in time accuracy.
  • If the peer DataSource does not have the REPLACE capability, the Correction DataTask cannot be created.
  • If the source table does not have a primary key, the Correction DataTask cannot be created.
  • Custom code is not supported for Verification and Correction DataJob.

Summary

This article introduces use CloudCanal to do data verification and correction,and supports Verification and Correction integration, single/scheduled execution, and more.