Anticipating the basic networking requirements for replication is easy--connect all of the sources to the targets, and connect the control point to all sources and targets for which it will perform administrative tasks.
Deciding among the various possible connectivity scenarios, estimating how much capacity will be required, and determining what level of data currency will be possible given the current available bandwidth can be very difficult chores, however. This section describes connectivity possibilities, bandwidth impact analysis, and throughput capacity factors that you must consider.
There can be definitive trade-offs between storage required, CPU consumed, network bandwidth consumed, and achieved replication throughput. In the planning stages it is a good idea to consider these aspects and understand both the available threshold capacity as well the relative priorities of each aspect.
The Capture program is always self-contained to a database, subsystem, or data sharing group, and must be able to connect to the source server database. The Control Center workstation must be able to connect to source and target server databases to perform its tasks, and the Apply program component must communicate with both the source and target server databases, when these are different.
Where communications are used, the connectivity is always through DRDA or the DB2 Universal Database equivalent. The actual communications software that can be used to support the DRDA connectivity varies according to the platforms being connected. Between DB2 Universal Database databases, the choices are TCP/IP, SNA, NetBIOS, and IPX/SPX. DB2 Personal Connect Edition (PCE) is required for connections between DB2 Universal Database databases and DB2 for MVS, DB2 for VSE, or DB2 for VM. TCP/IP or SNA can be used with DB2 for MVS 5.1 and PCE 5.1. All other connections use SNA only.
The more layers of emulation used, LAN bridges added, or router linkups required, the more restricted the replication performance will be. Planning for both current and future needs is essential.
Communications resources can be a major factor in a replication design that involves staging data at a server different from the source database. For example, in a mobile replication scenario between DB2 for MVS 5.1 and DB2 for OS/2, the best connectivity scenario might be to run TCP/IP over a modem link between the remote OS/2 and an AIX staging platform running Apply for AIX. The AIX database would be connected to DB2 for MVS 5.1 through PCE and TCP/IP.
IBM Replication is designed to allow for low impact to the network. For example, it allows for the replication of changed data only, supports a data staging arrangement, provides for summarization at the source, can be scheduled to run at off-peak times, and uses DRDA, which enables high-speed, secure data delivery. However, replicating data is not free, and one of the key costs is in bandwidth. So, what is IBM Replication going to do to your network?
The Control Center requires a small amount of capacity. However, the Control Center impact is limited to set up and maintenance of the replication objects.
The Apply program task requires network capacity if the target server and the source server are not the same database or subsystem. In general, the capacity required depends on the volume of data to be applied, the timing window available in which to apply the data, the desired currency of the target data, and the bandwidth installed or to be installed. For example, if a batch program generates many megabytes of change data and the data must be applied to the target system within 30 minutes, the bandwidth requirements will be higher than if the target can be up to 24 hours out of date. The Apply program could then be scheduled to use surplus capacity during periods when network traffic is lighter. For more efficient use of the network capacity, the Apply program is usually installed at the target server so that it can pull the data from the source server. For a more detailed discussion of the differences in pull versus push design, see "Pull Versus Push Apply Design".
Remember that the Apply program is an SQL application and is therefore subject to all of the influences with which any SQL application must handle. Given these factors, the best indicators of likely performance are often found outside the IBM Replication area, in general distributed relational database studies. (4)
Many individual factors influence the throughput possible with IBM Replication. The most important factors include:
Given the complex set of variables involved, you cannot accurately predict the throughput that might be achievable in a given system. At the same time, a feasibility study would normally need to include some estimation of the potential throughput that is possible.
One way of looking at throughput estimation is to break it down into two parts (assumes a remote pull configuration):
A feasibility study normally includes some estimation of the potential throughput that is possible, and developing a prototype is recommended to verify the throughput in an environment that reflects production conditions.
(4) See the following sources for detailed performance measurements: