Terminology
This article introduces the terminologies in CloudCanal.
DataSource
It can be configured for relational databases (MySQL/PostgreSQL/Oracle, etc.), message middleware (Kafka/RocketMQ, etc.), caching (Redis, etc.), real-time data warehouses (Greenplum/Doris/StarRocks, etc.), big data products (Hive/Kudu, etc.) or their corresponding cloud hosting products. It generally contains attributes such as connection url, login authentication information, etc.
A DataSource is typically represented by an ID similar to my-59bi20aqxxxxx96.
DataJob
The configuration of completing a data migration and synchronization work may include a set of Schema Migration, Full Data, Incremental, and Verification and Correction processes (DataTasks) that run successively or simultaneously.
The maximum scope of data tables for a data job, single/multiple schemas for relational databases, and multiple topics for the message middleware.
A data job is typically represented by an instance ID similar to canal7yr4y7xxxx3.
DataTask
A DataJob consists of multiple DataTask, such as Schema Migration, Full Data, Incremental, Verification, and Correction.
Schema Migration
Copy the schema of the DataSource to the peer DataSource.
Full Data
Single/timed data migration.
Incremental
Continuously replay source-side incremental operations on peer DataSource.
Verification and Correction
Single/timed data verification, then correct the difference.
Custom Code
During Full Data and Incremental , CloudCanal allows users to upload business code (Java code, and upload jar packages) to transform, filter, and supplement data.
Cluster
The basic unit of DataTask scheduling between machines (DataTasks are only scheduled in a single cluster) can span racks, data centers, availability zones, and even regions.
A cluster is generally represented by a cluster name similar to clusterl79txxxxku.
Worker
It is used to run DataTask, which can be self-built virtual machines (VMs), physical machines, cloud-hosted virtual machines (ECS, EC2, etc.), and development machines (Macs, etc.).
A Worker belongs to only one Cluster.
ConsoleJob
The basic components of CloudCanal governance, targeting business logic such as long processes, retries required, and state waits.
A ConsoleJob generally consists of 1~n steps, each step completes a specific work, when the step fails, the ConsoleJob will stop running until the problem is eliminated and then retry or cancel execution.