Custom Code Processing
Overview
Custom Code allows users to write Java code with data processing logic , upload Java jar packages to CloudCanal, and CloudCanal automatically call these codes during Full Data and Incremental to achieve various data transformation processing purposes.
Custom Code calls are located in the middle of CloudCanal's overall task processing chain, as shown in the following diagram:
Scenarios
Custom code is mainly used in Data Migration or Synchronization scenarios where CloudCanal cannot be standardized at present, and has the characteristics of flexibility, certain business semantics, and some complexity.
Some scenarios are listed below for reference:
- Data transformation
- Data masking can be accompanied by service encryption and decryption algorithms
- Time zone conversion
- Data cleansing
- Outlier and null handling
- Missing value completion
- Data normalization
- Real-time wide table construction
- The Fact table join dimension tables
- Data aggregation
- Aggregate data shards
- Cross-region dataset centralization
- Business logic processing
- Complex data transformation resulting from business architecture upgrades
Steps
Development
- Development is recommended in a Java IDE such as Intellij Idea or Eclipse.
- CloudCanal provides the basic project cloudcanal-data-process.
- The cloudcanal-data-process project provides several examples that can be applied with reference to modifications
- Custom Code classes need to implement interfaces to achieve the purpose of being called by CloudCanal.
Packaging
Modify the packaging meta information.
Go to the project directory and use the command to package.
% pwd
/Users/zylicfc/source/product/cloudcanal/cloudcanal-data-process
% mvn -Dtest -DfailIfNoTests=false -Dmaven.javadoc.skip=true -Dmaven.compile.fork=true clean packageAfter executing the command, you can get the jar package in the corresponding directory.
DataJob creation
Other steps omitted...
Select the Columns page, top right corner, click button to Upload Custom Code.
DataJob runs automatically.
Debugging
Remote Debug
- refer to the Custom Code Debuging documentation for detail steps.
- After the DataJob running, debug can be found at the IDE breakpoint.
Print The Log
CloudCanal provides a fixed log file (custom_processor.log) to print logs in Custom Code.
For specific steps, please refer to the Print Log in Custom Code document.
After it takes effect,you can see the log content in the Console (Details > Log > custom_process.log).
You can also go to the DataTask log directory to view the full log file.
Code Update
Details > Custome Code [Management] > [View]
Upload, Active, and Restart
FAQ
The code package not founded
- You can restart DataJob on the console.Because jar package distribution triggered by Start operation.
- If not effected,check whether the jar package exists in the sidecar container or node directory /home/clougence/cloudcanal/datahandle.
- You can manually upload the code package (the code package name can be modified according to the prompts in the error log).
The log is not printed
- Check that the logger name is correct
Custom Code doesn't work in Verification and Correction
- The Verification and Correction DataJob does not support Custom Code now.
Summary
This article briefly introduces CloudCanal Custom Code, covering code development, task creation and updates, troubleshooting.