Replication Guide and Reference

The Replication Tools

IBM Replication is a set of easy-to-use, automated copy tools that replicate relational data from sources to targets. Data can be copied:

Among DB2 for MVS, DB2 for VSE and VM, DB2 Universal Database, and DB2 for OS/400 database servers
Across platforms for MVS, VM, VSE, OS/400, AIX, HP-UX, OS/2, Windows NT, Windows 95, and the Solaris operating environment
On non-DB2 sources and targets through DataJoiner; such as Oracle, Sybase, Informix, Microsoft SQLServer.

You can use IBM Replication to define, synchronize, automate, and manage copy operations from a single control point for data across your enterprise, as shown in Figure 2.

Figure 2. IBM Replication Tools. The IBM Replication tools copy data across your enterprise.

You can tailor or enhance data as it is copied and deliver detailed, subset, summarized, or derived data when and where it is needed. IBM Replication consists of the replication administration features of the Control Center and two tools, the Capture and Apply programs. The Control Center provides administration support for the replication environment with objects and actions that define and manage source and target table definitions. The Capture and Apply programs are responsible for capturing the updates to the source tables and applying the changes to the target tables.

The following sections describe how the IBM Replication tools work interact with each other. Replication concepts related to each tool are also explained.

The Control Center

The Control Center is the database administration tool for the DB2 Universal Database and includes replication administration. The Control Center automates many initialization functions, such as creating target tables and control tables when you create your replication subscriptions. A replication administrator performs the replication administration tasks.

Administering Replication: Overview

You work with three GUI objects, Tables, Replication Sources, and Replication Subscriptions to manage replication sources and targets.

Specific Control Center administration tasks include:

Defining tables as sources, called replication sources
Removing replication sources that are no longer sources for copies.
Defining target table definitions, called replication subscriptions.
Cloning replication subscriptions to other servers.
Removing replication subscriptions no longer in use.

How the Control Center Manages the Replication Environment

The Control Center helps you perform two major tasks: defining replication sources and defining replication targets, or replication subscriptions. Replication sources are DB2 tables, or views, used as sources for copying data to a target table. Replication subscriptions are the specifications for one or more target tables and their location, structure, and timing schedule, as well as any necessary SQL enhancements. Other replication tasks include maintaining the sources and targets, and cloning them to other servers.

When you use the Control Center to perform replication administration tasks, it creates and maintains the replication control tables used by the Capture and Apply programs. These tables are created and stored at the source, control, and target servers. Thus, the workstation where the Control Center is located must have connectivity to all the databases where source tables exist and the Capture program runs (the source server), where the target tables will be created and the Apply program usually runs (the target server), and where the subscription control tables are stored (the control server). The exception is the "occasionally connected," or mobile environment, in which you can create and save as SQL a replication subscription definition, transfer it to another machine, and run it from there.

The Control Center is not used to operate the Capture and Apply programs.

Replication Sources and Targets

You can access your replication source and target definitions through the Control Center. There are three containers that organize the objects that you use to set up and maintain your replication environment:

Tables folder: Contains user-defined DB2 relational tables, as well as system catalog tables. You select the tables in the folder to define replication sources.
Replication Sources folder: Contains tables that have been defined as replication sources. These can be DB2 tables, system tables, views, or target tables redefined as sources for replication.
Replication Subscription folder: Contains replication subscription definitions for copying updated source data to target tables.

Figure 3 shows the Control Center window.

Figure 3. The Control Center and the Replication Folders. The Control Center and the replication folders provide administration tools for replication.

See "Capacity Requirements Analysis" and "Authorization Requirements for Administration" for information about administration planning. See Part 3. "Administering Your Replication System" for information about administration tasks. See DB2 Getting Started for your platform for more information about the Control Center.

The Replication Control Tables

IBM Replication tools use control tables to communicate with each other and to manage replication requests such as defining and managing sources and targets, capturing changes, and replicating changes. These control tables are located at the source, control, and target servers, which are defined in "Terminology".

The replication control tables are created in one of the following ways:

By customizing and running the DPCNTL.* files.
The first time you define a replication object at a database server.

Most customers need to customize the tables for their site requirements or DB2 platform. To customize the control tables, you must edit the DPCNTL.* files and run then before performing any replication requests, such as defining replication sources, at the source or control server. See "Working with Customized Replication Control Tables" to learn how to customize and run the DPCNTL.* files.

SQL Generated for Replication Requests

Whenever you submit a replication request from the Control Center, such as defining a replication source, the processing information is generated as SQL statements. You can choose to run the statements immediately, or you can save the SQL statements to a plain ASCII file, which you can edit and run at a later time from a replication folder object in the Control Center.

Deferred SQL files allow you to customize the replication tasks for your shop or application and give you flexibility as to when and how you run the SQL files. You can batch the definitions together, defer the processing of the replication action until a specified time, and create libraries of SQL files for backup, site-specific customizing, or to run stand-alone at distributed sites, as in a mobile environment. You can rerun the definitions as necessary. See "Customizing and Running Replication SQL Files" to learn how to defer and customize SQL files.

The Capture Program

The Capture program is the replication tool that captures the changed data and makes the changed data available for replication. It runs at the source server database.

Capturing Changed Data: Overview

The Capture program captures changes made to data in replication source tables by reading the database log or journal. It places the captured changes into change data tables. There is only one Capture program at each domain of the database directory, that is, one Capture program in each of the following:

DB2 Universal Database database
DB2 for MVS subsystem in a non-data-sharing environment
DB2 for MVS data-sharing group
DB2 for VM or DB2 for VSE database

The Capture program usually runs continuously, but you can bring it down while running utilities or modifying replication sources. The Capture program runs independently of the Control Center, but uses control information created by the Control Center.

Tasks for the Capture program include:

Starting
Stopping
Suspending temporarily
Reinitializing
Specifying tuning parameters
Pruning

See the Capture and Apply chapter for your Capture program platform for instructions about performing these tasks.

How the Capture Program Captures Changes

DB2 records every transaction in a log file for diagnostic and recovery purposes. The Capture program monitors the DB2 log to detect change records from source tables that both have the DATA CAPTURE CHANGES attribute and are defined as replication sources. The Capture program retrieves change and commit information from the active and archive logs on DB2 for MVS 4.1 or higher and DB2 Universal Database. (1) These records contain a before-image and after-image of the table row. The Capture program captures these changes in the CD tables. The Capture program also maintains information about committed units of work in the unit-of-work (UOW) table. This table is joined with the CD table to identify and replicate committed updates. The Apply program can then read the CD table and copy the changes to the target site and apply them to copies of the source table. Figure 4 shows the relationship between the Capture program, the DB2 log, the source table, and the control tables.

Figure 4. The Capture Program at the Source Server. The Capture program at the source server reads the DB2 log and captures changed data to the CD table.

Updates made to source tables are written to the log when the DB2 page is full. (A DB2 log page is usually 4 KB long.) The Capture program captures changes from this log.

Control Tables Used by the Capture program

The Capture program uses control tables to manage capturing changes from multiple source tables. These tables are created while setting up administration for the Control Center. The control tables used by the Capture program are located at the source server. The following list describes the source server control tables:

Register: Contains the name of the source table and the names of the change data table associated with the source table.
Critical section: Contains the Apply qualifier and is used for concurrency control between the Capture and Apply programs.
Tuning parameters: Contains performance tuning information specific to the Capture program.
Trace: Contains trace information for the Capture program.
Unit-of-work (UOW): Contains committed unit-of-work information used by the Apply program.
Pruning control: Contains information about how far through the CD table the Apply program has progressed in replicating changes to the target tables. The Capture program uses this information to prune the CD table.
Warm start: Contains information about the units-of-work in progress and starting sequence information necessary for restarting the Capture program without performing a full refresh.
Change data (CD): Contains the changed data read by the Capture program from the database log or journal, and is a source for update copies. There is only one CD table for each replication source table.

These tables are described in more detail in Chapter 20. "Table Structures".

How the Change Data and UOW Tables are Pruned

The Capture program begins capturing changes from the DB2 log when requested by the Apply program. The Capture program puts the changes into the CD table; there is one CD table for each defined replication source that has DATA CAPTURE CHANGES enabled. Additionally, the Capture program records the unit of work information in the UOW table.

These two tables have the potential for unlimited growth; therefore, it is important to manage the number of rows to be kept by discarding rows that are no longer needed. This process is called pruning.

Replication Logical Partitioning Key Support

By default, the Capture program captures an update to the source table as an UPDATE statement. In cases a source column change updates a target primary key column, a column in the predicate for a target table, or to a table space partitioning key column, you can have the Capture program capture the updates as DELETE and INSERT statements. This is known as replication logical partitioning key support. You must explicitly set this option while defining replication sources.

Without replication logical partitioning key support, when the primary keys of either the source or target tables are being updated, the Capture program captures the changed row for the update. The Apply program then attempts to make an update to a row on the target table with the new key value. This new key value is not found in the target table, so the Apply program converts this update to an insert. The new row is fine, but the old row with the old key value remains in the table and is unnecessary. When you enable replication logical partitioning key support, the Capture program captures the change as separate DELETE and INSERT statements in the CD table.

For example, the primary key value for a table is the employee last name "Wood" and the employee gets married and changes her last name to "Smith". Without replication logical partitioning key support, when the Apply program updates the employee's last name with a new primary key value, it does not find the value "Smith" and inserts a new row. The row with the primary key value "Wood" is not deleted.

With replication logical partitioning key support, the row with the primary key value "Wood" is first deleted, and a new row with the primary key value "Smith" is inserted.

See "Replication Logical Partitioning Key Considerations" to learn when to use replication logical partitioning key support. See "Defining a Custom Replication Source" to learn how to specify that updates are captured as DELETE and INSERT statements.

Capture for MVS in a Data-Sharing Environment

The Capture program requires exclusive use of the IBM Replication control tables in a subsystem. In a data-sharing environment, tables are shared among multiple subsystems. Consequently, if you are using DB2 for MVS/ESA Version 4.1 or higher, only one instance of the Capture program can run for all subsystems that are members of a data-sharing group.

The Apply Program

The Apply program is the replication tool that replicates copies of the source table data to the target table. You can run it at any server but it is generally run at the target server database.

Applying Changed Data to Target Tables: Overview

The Apply program reads the changed data previously captured and stored in a CD table and applies the changes to target tables. The Apply program also reads data directly from source tables when copying the entire source table data for a full refresh to the target table.

The Apply program generally runs at the target server, but it can run at any server in your network as long as it can connect the source, control, and target servers. Several Apply program instances can run on the same or different servers.

The Apply program runs independently of the Control Center, but uses control information that the Control Center creates. The control information that the Apply program uses is stored in tables at the control server.

See "Types of Copies", and Chapter 20. "Table Structures", for more information about target table types.

Tasks for the Apply program include:

Starting
Scheduling
Stopping

See the Capture and Apply chapter in this book for your Apply program platform for instructions about performing these tasks.

How the Apply Program Replicates Data

When the Apply program reads the changed data stored in CD tables, it applies it to target tables at either local or remote servers. It can also apply column functions, such as SUM and AVG, to the source table or CD table and append the result to the target tables. The Apply program can run at any server that can connect through the SQL CONNECT statement to each database server where source and target tables reside. Figure 5 shows the Apply program's relationship with the source server control tables, the subscription definition control tables, and the target table. The control tables used by the Apply program are described below.

Figure 5. The Apply Program. The Apply program reads the source server tables, manages the subscription control tables, and updates the target tables.

The Apply Qualifier

When the Apply program is started, it is supplied with an arbitrary job qualifier independent from a logon user ID. Known as the Apply qualifier, it is associated with individual replication subscriptions and is responsible for the replication of only those replication subscriptions.

The Apply qualifier allows a user ID to run more than one instance of the Apply program, using different Apply qualifiers.

Control Tables Used by the Apply Program

The Apply program relies on the following control tables to control the replication of source table changes to the target tables:

Source server control tables

The control tables used by the Capture program.

Control server control tables

Apply trail: Records a history of refreshes and updates performed against target tables.
Subscription set: Defines the characteristics of the replication subscription, such as the replication subscription name, the owner of the replication subscription, the Apply qualifier that is associated with the replication subscription, and the timing of the replication subscription.
Subscription statements: Contains the SQL statements or stored procedures to be run before or after the replication subscription is processed.
Subscription events: Contains the timing information for copying the replication subscription based on event triggering. This table is maintained by user application.
Subscription targets members: Identifies each source-to-target pair in the replication subscription.
Subscription columns: Contains supplemental information for the replication subscription control table, such as renamed columns or computed columns.

These tables are described in more detail in Chapter 20. "Table Structures".

The control tables used by the Apply program reside in the source server and the control server.

The Control Server

The control server is the logical server that contains the subscription control tables. Each Apply program is associated with a control server, which is specified during invocation of the Apply program. Multiple Apply program can share a control server.

The control server can be located at the source server, the target server, or any database server that the Apply program can connect to. For better performance, the control server should be located at the server where the Apply program runs because the Apply program frequently reads the tables in the control server. However, locating the control server at the source server, if it is a mainframe database, can provide improved security.

Each control server is associated with one or more Apply qualifiers. The qualifier associates a control server with an Apply program and a replication subscription. This association allows a replication subscription to be serviced by one Apply program and identifies where the control tables for that subscription are located. You cannot easily change the Apply qualifier, so it is important to plan carefully.

Types of Copies

You can define a replication subscription with the Control Center to create the following types of copy tables, known as target table types:

User copy: A complete, condensed copy of the replication source table that must have a primary key.
Point-in-time: A complete, condensed copy of the replication source table at a certain point in time that must have a primary key. This table contains timestamp columns to indicate when a transaction occurred.
Base aggregate: A history table in which new rows are appended for each subscription cycle using the result of an SQL column function calculation against the replication source table data.
Change aggregate: A history table in which a new row is appended for each changed row in the replication source table using the result of an SQL column function calculation against only recently changed data.
Condensed, noncomplete consistent change data (CCD): A small staging table that, when defined locally to the source, is useful for netting out "hot spot" updates before replication to other sites.
Condensed, complete CCD: A full-sized staging table useful for efficient remote staging, which allows for remote copies to be both initialized and maintained without needing to access the original source each time it is updated.
Noncondensed, noncomplete CCD: A table that is initially empty and is appended by each insert, update, and delete action; useful for auditing purposes.
Noncondensed, complete CCD: A table that starts out as a copy of the full-sized source table and is appended by each insert, update, and delete action. It retains all information about the source table and supports "as of" historical queries. For example, it can return an answer to a query as if you had run the query against the replication source table last Tuesday, or a month ago, or yesterday.
Replica: A target table that can be updated. Changes to this table are replicated back to the replication source table. This table is used in update-anywhere scenarios.

The Apply Processing Cycle

In the more common "pull" mode, the Apply program runs at the target server and connects to the source and control servers to pick up changed data and to read the control tables. Table 1 describes how the Apply program completes a replication cycle from the target server.

Table 1. The Apply Program Cycle: A high-level overview of how a replication subscription is processed.

Step Server
1. Look for work; check the subscription control tables. Control server
2. Pick up recent change data to be applied to the target table. Source server
3. Write the answer set into a local "spill" file (possibly an in-memory file). Target server
4. Apply the change data in the spill file to the target table. Target server
5. Update subscription status. Control server
6. Report subscription progress in the pruning control table. Source server

Step	Server
1. Look for work; check the subscription control tables.	Control server
2. Pick up recent change data to be applied to the target table.	Source server
3. Write the answer set into a local "spill" file (possibly an in-memory file).	Target server
4. Apply the change data in the spill file to the target table.	Target server
5. Update subscription status.	Control server
6. Report subscription progress in the pruning control table.	Source server

Full-Refresh and Update Replication

The Apply program copies data from the source to the target either by refresh or update copying. In refresh copying (also known as full refresh), the Apply program copies the entire source table and copies it to the target table. The Capture program does not capture changes, and there are no CD or CCD tables involved. You can specify refresh copying while defining the replication source table. If you have large tables, you might want to use a fastload program to simulate an initial full refresh copy. See the Capture and Apply chapter for your platform in this book for more information about fastload programs such as ASNLOAD.

In update copying, the Apply program copies only the changed data from the CD table to the target table. The first time the Apply program copies data to the target table or after a cold start of the Capture program, the Apply program uses refresh copying to populate the target table (See the Capture and Apply chapter for your platform in this book for more information about cold start). After the target table is populated, update copying is used. This type of copying is the default and is used unless you specify otherwise while defining the replication source. In order to capture changes for a replication source, the DATA CAPTURE CHANGES clause of the CREATE or ALTER TABLE SQL statement must be enabled. The Control Center automatically alters a replication source table for this option, unless you specify refresh copying or the table is ineligible for update copying.

Subsetted Target Tables

IBM Replication supports both vertical and horizontal subsetting of the source table. This means that you can specify that only a subset of the replication source table columns and rows are replicated to the target table, rather than all of the columns and rows.

Vertical Subsetting

In some replication scenarios, you might not want to replicate all columns into the CD table. With vertical subsetting, you can define the CD table as having fewer columns than your base table. This is appropriate if the column data types are not supported.
Attention: Vertical subsetting is not available for replica tables.

You can define vertical subsetting in two ways:

While defining a replication source table, using update copying: Deselect the columns that you do not want available for replication to a target table. The CD table does not contain the columns that you deselected.
Note: CD tables must contain sufficient key data for point-in-time copies, which are maintained with key-qualified predicates.
While defining a replication subscription: Use the advanced subscription options to deselect source table columns that should not be subscribed to for the target table.

Horizontal Subsetting

You can also subset the target table horizontally by specifying a row predicate while defining the replication subscription for the target table. Use the advanced subscription options to define a WHERE clause.

Target table primary keys are assumed to be invariable. If the primary key of the source table is updated, the Capture program places an UPDATE row in the change data table for the source table. When Apply reads this UPDATE row, it first attempts to perform a searched UPDATE, which will likely fail with a "row not found" condition. Apply then automatically reworks the failed update into an INSERT. The target table then can have both the original row with the original key and the updated row with the new key. The original key no longer exists in the source table.
Attention: If one or more of the columns defined in the predicate are updated, you must use the Capture program function that defines updates as DELETE and INSERT statements. See "Defining a Custom Replication Source" to learn how to use this option.

Defining New Columns in the Target Table

One of the advantages of IBM Replication is that it allows you to create new columns in the target tables with the exact information from existing columns in the source table. You can also create a new column in the target table based on:

The computation of one or more existing columns in the source table
Other SQL functions that summarize or average all the rows from a source table (or CD table in the case of a change aggregate table) into a single summary row

A target table containing any aggregate columns with other SQL functions such as AVG and SUM is an aggregate table.

Footnotes:

(1) Capture for MVS can only retrieve the active log on DB2 for MVS 3.1. Capture for VSE and VM 5.1 can read only the active log on DB2 for VSE and VM.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]