About Change Data Capture
- Tram Ho
Currently, there are many ways to save change data when performing Insert, Update, Delete such as: triggers , compare source and destination tables, cdc, change tracking, Row version, time stamps, etc. Each solution has pros and cons, and developers need to choose to suit each system This document will introduce cdc
- Mechanism of action
When enabling cdc, we will have 2 jobs. 1 capture job does change logging and 1 clean job clears changes log, to avoid system overload
The first is Job capture: This job will execute exec sp sys.sp_MScdc_capture_job to define the following parameter:
Continuous (default 1): 1 if the capture job is configured to run continuously, 0 if it is only run once
Maxtrans (default 500): maximum number of transactions in each log scan
Maxscans (default 10): number of log scans in each scan phase before pausing (if continuous = 1) or stopped completely (if continuous = 0)
Pollinginterval (default 5): pausing time (seconds) between each scan phase
After defining the params, exec sp sys.sp_cdc_scan. This sp does two things: exec sp sys.sp_replcmds to read the log, and insert the log into the cdc tables. Each time the above two tasks are done, it is a maxtrans scan, also called a log scan session
Since the Capture Process (reads the log and writes the log to the cdc table) reads data from the transaction log, there will be a delay between when the transaction occurs and when the Capture Process records the change as well as when the log is recorded on the cdc .table_CT. We will talk more about delay optimization in the cdc job configuration section below
With Cleanup Job: perform a clean change table and default is to run periodically at 2am so that change data can only be saved up to 3 days. This job will be configured by 2 parameters
Retention: the time (minutes) the log was saved on the cdc . table
Threshold: the maximum number of log records on the cdc table that are deleted at each cleanup run. Each time the clean job runs, there will be multiple cleanups to ensure the Retention value
- Configuration cdc
To enable cdc, we execute enable cdc in the database and on the table that we want to capture change data USE MyDBGO
EXEC sys.sp_cdc_enable_db
GO
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’MyTable’,
@role_name = N’MyRole’,
@filegroup_name = N’MyDB_CT’,
@supports_net_changes = 1
GO
To check the job configuration parameters, select the table msdb.dbo.cdc_jobs
To change the above parameters, execute the following command:
EXEC sp_cdc_change_job @job_type=’capture’, @maxtrans = 500, @maxscans = 10, @continuous = 1, @pollinginterval = 5
EXEC sp_cdc_change_job @job_type=’cleanup’, @retention = 4320, @threshold = 5000
Note: after resetting the above parameters, need to stop and start the capture job again
To disable cdc on board or above database USE MyDB
GO
EXEC sys.sp_cdc_disable_db
GO
USE MyDB
GO
EXEC sys.sp_cdc_disable_table
@source_schema = N’dbo’,
@source_name = N’MyTable’,
@capture_instance = N’dbo_MyTable’
GO
3. Conclude
Thus, we have understood the mechanism of operation and how to configure cdc. In the next article, we will learn how to optimize the parameters in cdc