Data Verification and Correction
Overview
This article introduces how to use CloudCanal to perform data verification and correction.
Key Points
Compare Field by Field
CloudCanal scans the data through the source DataSource, finds out the data from the peer in batches, compares it field by field, finds out the loss and diff data, and records it in the log.
Solve problems such as scanning efficiency, field type compatibility, and type accuracy matching.
Overriding Correction
With peer override correction, the correction capability can be turned on for the peer DataSource that has REPLACE capability.
The corrected data is differential data, that is, the verification result of the verification DataTask.
All In One
Verification and Correction is performed as a two-step DataJob, similar to the relationship between Full Data and Incremental in a data synchronization DataJob.
The Correction step can be ignored when creating a DataJob and added to the DataJob after the Verification is completed.
Replay
The Verification and Correction DataJob supports replay, after the Correction step is run, need to see the effect, click the Replay button to execute again.
Scheduled Execution
The Verification and Correction DataJob supports regular starting, automatically record results while completing corresponding step, and clean up associated logs.
Example
Install CloudCanal
- Download, install, and activate CloudCanal.
Create Incremental DataJob
DataJob > Create DataJob
Select the source and target DataSources, click Next Step.
Select Incremental, check Full Data option,click Next Step.
Select tables and columns, click Next Step.
Click Create DataJob button.
Schema migration, full data, incremental are running.
Create Discrepancy Data
- In the peer DataSource, delete and modifie some data.
Create Verification And Correction DataJob
DataJob Details > Functions > Create Similar DataJob.
In the second step, select the Verification and Correction, and check Revise After Check option, and other steps do not need to be changed.
The Verification and Correction DataJob finishes and displays the status.
Replay the Verification and Correction DataJob, data is consistent.
FAQ
What Are The Remaining Problems?
- For the extra data of the peer end, it cannot be verified, and you need to configure a reverse verification DataJob to solve it.
- For no primary key and timestamp type tables, verification is ignored by default, the former cannot locate the data, and the latter is difficult to locate due to the difference in time accuracy.
- If the peer DataSource does not have the REPLACE capability, the Correction DataTask cannot be created.
- If the source table does not have a primary key, the Correction DataTask cannot be created.
- Custom code is not supported for Verification and Correction DataJob.
Summary
This article introduces use CloudCanal to do data verification and correction,and supports Verification and Correction integration, single/scheduled execution, and more.