Skip to main content

Terminology

This article introduces the terminologies in CloudCanal.

DataSource

It can be configured for relational databases (MySQL/PostgreSQL/Oracle, etc.), message middleware (Kafka/RocketMQ, etc.), caching (Redis, etc.), real-time data warehouses (Greenplum/Doris/StarRocks, etc.), big data products (Hive/Kudu, etc.) or their corresponding cloud hosting products. It generally contains attributes such as connection url, login authentication information, etc.

A DataSource is typically represented by an ID similar to my-59bi20aqxxxxx96.

DataJob

The configuration of completing a data migration and synchronization work may include a set of Schema Migration, Full Data, Incremental, and Verification and Correction processes (DataTasks) that run successively or simultaneously.

The maximum scope of data tables for a data job, single/multiple schemas for relational databases, and multiple topics for the message middleware.

A data job is typically represented by an instance ID similar to canal7yr4y7xxxx3.

DataTask

A DataJob consists of multiple DataTask, such as Schema Migration, Full Data, Incremental, Verification, and Correction.

Schema Migration

Copy the schema of the DataSource to the peer DataSource.

Full Data

Single/timed data migration.

Incremental

Continuously replay source-side incremental operations on peer DataSource.

Verification and Correction

Single/timed data verification, then correct the difference.

Custom Code

During Full Data and Incremental , CloudCanal allows users to upload business code (Java code, and upload jar packages) to transform, filter, and supplement data.

Cluster

The basic unit of DataTask scheduling between machines (DataTasks are only scheduled in a single cluster) can span racks, data centers, availability zones, and even regions.

A cluster is generally represented by a cluster name similar to clusterl79txxxxku.

Worker

It is used to run DataTask, which can be self-built virtual machines (VMs), physical machines, cloud-hosted virtual machines (ECS, EC2, etc.), and development machines (Macs, etc.).

A Worker belongs to only one Cluster.

ConsoleJob

The basic components of CloudCanal governance, targeting business logic such as long processes, retries required, and state waits.

A ConsoleJob generally consists of 1~n steps, each step completes a specific work, when the step fails, the ConsoleJob will stop running until the problem is eliminated and then retry or cancel execution.