The LOAD utility is intended for the initial load or an append of a table where large amounts of data are moved. There are no restrictions on the data types used by the LOAD utility including large objects (LOBs) and user-defined types (UDTs). The LOAD utility speeds up the task of loading large amounts of data into a database. LOAD is faster than IMPORT because LOAD writes formatted pages directly into the database while IMPORT does SQL INSERTs. The data being loaded must be local to the server (unlike IMPORT and EXPORT where data can be passed from the client).
In addition to the overview to the LOAD utility above, here are some details about the LOAD utility that may be of interest to you. The LOAD utility also almost completely eliminates the logging associated with the loading of data. In place of logging, you have the option of making a copy of the loaded portion of the table. LOAD does not fire triggers; and does not perform referential, and table, constraint checking (other than validating the uniqueness of the unique indexes). Tables with such options defined may be populated faster, or more simply, using IMPORT. If you have a recoverable database, you can do one of the following:
The LOAD utility can take advantage of a hardware configuration where multiple processors and/or multiple storage devices are used such as in a symmetric multiprocessor (SMP) environment. There are several ways in which parallel processing of large amounts of data can take place using the LOAD utility. One way is through the use of multiple storage devices which allows for I/O parallelism during the LOAD process. Another way involves the use of the multiple processors in an SMP environment which allows for intra-partition parallelism. And both can be used together to provide even faster loading of the data.
The following topics provide more information:
There are multiple phases to the LOAD process: Load, where the data is written into the table; Build, where the indexes are created; and Delete, where the rows that caused a unique key violation are removed from the table. You must run the SET CONSTRAINTS SQL statement after the load completes if there are tables left in the check pending state to validate the table for referential integrity and check constraints. The LOAD utility generates messages about the progress of each phase. If a failure occurs during the LOAD process, these messages will assist you in deciding how to recover.
During the Load phase, data is loaded into the table; index keys and table statistics are collected if necessary. Save points, or points of consistency, are established at intervals specified by you in the SAVECOUNT parameter on the LOAD command. These points of consistency are not established exactly on the number of rows specified with the SAVECOUNT parameter; rather the number of rows are converted to a page count, and rounded up to intervals of the extent size. Messages let you know how many input rows were successfully loaded at the time of the save point. If a failure occurs, you should use the number of input rows at the last successful consistency point with the RESTARTCOUNT parameter during a restart. If the failure occurs near the beginning of the LOAD process and you were doing a REPLACE, you might consider restarting the load using the REPLACE option.
During the Build phase, indexes are created based on the index keys collected in the Load phase. The index keys are sorted during the Load phase and index statistics are collected. The statistics collected are similar to those collected during RUNSTATS. If a failure occurs, the Build phase restarts from the beginning.
Unique key violations are placed into the exception table, if one was specified, and messages on rejected rows are put into the message file. Following the completion of the LOAD process: review these messages, correct any problems, and insert the corrected rows into the table.
Note: | The recording of warnings has a detrimental effect on the performance of the LOAD. If performance is important, and you anticipate a large number of warnings, you should consider using the NOROWWARNINGS filetype modifier. If this filetype modifier was specified, these warnings are suppressed. |
During the Delete phase, all rows causing a unique key violation are deleted. If a failure occurs, this phase should be restarted by you from the beginning. Information on the rows containing the invalid keys is stored in a temporary file. After you request a restart to begin at the Delete phase, the violating rows are deleted based on the information in a temporary file. You must not modify any data in any temporary files. Also, you must restart the LOAD command with the same parameters, otherwise the Delete phase will fail. If the temporary file has been modified, or does not exist, you should restart the LOAD command at the Build phase. Once the index is re-built, any invalid keys are placed in the exception table if it exists, and duplicate keys are deleted.
Since regular logging is not performed, LOAD uses pending states to preserve consistency of the database. The Load and Build phases of the LOAD process place any associated table spaces into a load pending state. The Delete phase of the LOAD process places any associated table spaces into a delete pending state. If you complete the LOAD process but you do not have either logretain or userexit "on"; and, you have not specified the COPY YES option nor the NONRECOVERABLE option, then any associated table spaces are placed in a backup pending state. These states can be checked by using the LIST TABLESPACES command. (For more information on LIST TABLESPACES, see the Command Reference manual.) One last possible state associated with the LOAD process is concerned with referential and check constraints. Dependent tables may be placed in a check pending state following the completion of the LOAD process.
LOAD can also be used with a non-recoverable option. This allows you to perform a non-recoverable LOAD without affecting the recoverability of all other tables in the database. When this type of LOAD is run, there is no requirement for either using the COPY YES option or having a backup taken.
If a LOAD fails, the table space(s) involved could be in an inconsistent state because there is no logging. For this reason, the table spaces are left in a load pending state. To remove the load pending state, you will have to restart the LOAD, perform a LOAD REPLACE on the same table on which the LOAD failed, or recover the table space(s) using a RESTORE with the most recent backup (either table space or database backup) and then carry out further recovery actions. (You could also drop the table space and then re-create it.)
For more information on how to recover, see Chapter 6. "Recovering a Database".
The LOAD utility inserts data into a table from an input file, from a device, or using a named pipe, any of which must reside on the node where the database resides. The table must exist. (Indexes on the table may or may not exist. LOAD only builds indexes that are already defined on the table.) If the table receiving the new data already contains data, you can replace or append to the existing data.
The table spaces in which the loaded table resides are quiesced in exclusive mode. For more information on QUIESCE, see the Command Reference manual.
Note: | In the following command line processor example, and in the other in the other examples in this chapter, relative path names are used. Please be aware that relative path names are only allowed on calls from a client on the same node as the database. The use of fully-qualified path names is recommended. |
The following is an example of the command line processor syntax for the LOAD command:
db2 load from stafftab.ixf of ixf messages staff.msgs insert into userid.staff copy yes use adsm data buffer 20
This example assumes no indexes are involved, any warning or error messages are placed in staff.msgs, a copy of the changes made is stored in ADSTAR Distributed Storage Manager (ADSM), and 20 pages of buffer space are to be used during the load. See "ADSTAR Distributed Storage Manager" for more information on using ADSM.
If indexes are involved, the following is an example of the command line processor syntax for the LOAD command:
db2 load from stafftab.ixf of ixf messages staff.msgs remote file stafftmp replace into staff using . sort buffer 200
This example is similar to the previous one except that: stafftmp is used as a base file name for temporary files such as "stafftmp.msg", "stafftmp.log", and "stafftmp.rid"; the current directory of the user is used as the working directory; and 200 pages of buffer space are used to sort index keys. For more information on the LOAD command, see the Command Reference manual.
Since you will typically be loading large amounts of data using the LOAD command, a LOAD QUERY command can be used to check the progress of the LOAD process if the REMOTE FILE option has been specified. You require a connection to the same database and a separate CLP session to enter this command. This command can be used by local and remote users. For more information on the LOAD QUERY command, see the Command Reference manual.
The LOAD utility can also be invoked by the application programming interfaces (APIs) sqluload and sqlgload. For more information on the requirements when loading data to a table using these APIs, see the API Reference manual.
You should also review the following points. Each represents a task that you may need to perform and each is carried out as part of the LOAD command. For more information on any of the following tasks, see the API Reference and/or the Command Reference manuals.
For more information on how to recover, see Chapter 6. "Recovering a Database".
The remote file resides on the server machine and is accessed by the DB2 instance exclusively. Therefore, any filename qualification given to this parameter must reflect the directory structure of the server, not the client, and the DB2 instance owner must have read and write permission on the file. In addition, the user must ensure that two loads are not issued having the same fully-qualified remote filename.
There are several ways in which the remote filename may be selected and qualified when the user has just given a partially qualified name, or no name at all.
Notes:
You should use a remote filename if you wish to use the LOAD QUERY tool. It is difficult to determine the remote filename if you do not provide the one to be used. For more information on LOAD and the importance of the filename, see "Restarting the LOAD".
To load data into a table, you must have either SYSADM or DBADM authority.
There are restrictions and limitations with the LOAD utility:
Note: | Options only available with this release of DB2 cannot be used with a server from the previous release. |
The LOAD utility optionally updates table and index statistics as part of the load process if run in REPLACE mode. If data is appended into a table, statistics are not gathered for the table. Run RUNSTATS following the completion of the load process to collect up-to-date statistics for the table. If gathering statistics on a table with a unique index, and duplicate keys are deleted during the Delete phase, then the statistics are not updated to account for the deleted records. If you think you will have duplicate records, you should not collect statistics during the LOAD but run RUNSTATS after the LOAD process.
To ensure the loaded data doesn't cause referential integrity or check constraint violations, any loaded table that is a parent table will cause all dependent tables to be placed in a check pending state. To remove the table from the restricted state, you must run the SET CONSTRAINTS statement.
With LOAD, there is a possibility of unequal code page situations involving possible expansion or contraction of the character data. Such situations could occur with Japanese or Traditional-Chinese Extended UNIX Code (EUC) and double-byte character sets (DBCS) which may have different length encodings for the same character. An option, NOCHECKLENGTHS, is used to toggle between two situations:
The performance of LOAD depends on the nature and size of the data, the number of indexes, the options used, and whether the SET CONSTRAINTS statement is required. Use of SET CONSTRAINTS lengthens the total time needed to load the table and make it usable again. (For more information on the SET CONSTRAINTS statement, see the SQL Reference manual.)
The LOAD utility performs almost equally well in either INSERT mode or REPLACE mode.
Index creation will reduce the performance of the load process, especially when data is added to a table already containing data. If there are many indexes on a table which already has a large amount of data and only a small percentage of data to be loaded, you should consider using the IMPORT utility instead of the LOAD utility. Unique indexes also reduce the performance of the load process if duplicates are encountered. In most cases it is still more efficient to create the index during the LOAD than to complete the LOAD and then use the CREATE INDEX command for each of the indexes.
The LOAD utility automatically attempts to provide the best performance possible by determining the best values for DISK_PARALLELISM, CPU_PARALLELISM, DATA_BUFFER, and SORT_BUFFER if these parameters have not be specified by the user at the time the utility is run. Optimization of these values is done based on the size and the free space available in the utility heap. Consider allowing the LOAD utility to choose the values for these parameters and then attempt to tune the parameters for your particular needs.
Performance of the LOAD can be improved by:
Note: | The additional performance improvements would be to the create index time, but may hurt in the query performance time if the original data was in clustered index sequence. |
Note: | When tables include either LOB or LONG VARCHAR data, CPU_PARALLELISM is set to one. Parallelism is not supported in this case. |
The COPY YES/NO option specifies whether to create a copy of the input data during LOAD or not. If "YES" is chosen, performance is reduced because all the data being loaded is copied at the same time. This choice is faster than accepting a backup pending state and having to do a backup later before accessing the table. If "NO" is chosen, and forward recovery in enabled, then the table is placed in a backup pending state.
When creating indexes during the LOAD, you require at least as much disk space as the sum of the index sizes and possibly twice as much. The space used is temporary; that is, it is located outside the database in the directories specified for the USING parameter or in the tmp directory defined by the DB2INSTPROF environment variable.
A LOAD can be restarted following a failure. A copy image of the loaded data can be created for use when recovering a database.
The following discuss these considerations in more detail:
If a failure occurs while loading data, you can restart the load from the last save point or point of consistency; or reload the entire table by using the REPLACE option.
The remote file specified in the LOAD restart operation should be the one that was specified in the LOAD command that failed.
There are a number of options available should you decide to restart the load.
If you decide to restart using the RESTARTCOUNT number option, then you must use the number of rows at the last successful consistency point. To determine that value, use the LOAD QUERY command with either the name specified with the REMOTE FILE option or the default name db2utmp. By choosing RESTARTCOUNT number, the LOAD restarts from the row following the row identified by the number and attempts to finish the load.
Note: | The RESTARTCOUNT number can only be used with the last successfully completed consistency point. If the last consistency point started but did not complete (that is, SQL3519W is not followed by SQL3520W), then you must carry out the action as described in the help for message SQL3519W. |
If you do not want to continue loading rows, or if the failure was during the Build phase, you can use the RESTARTCOUNT B option. The LOAD process brings the table to the state of the last save point or point of consistency and then restarts the Build phase. By choosing this option the LOAD restarts, does not attempt to load additional rows, and builds the indexes for the rows already loaded.
If the message file states that the Build phase completed and all temporary files are unmodified as left by the LOAD, you can use the RESTARTCOUNT D option. The information on the rows containing duplicate keys stored in the temporary files is used to delete those rows.
The restarted LOAD command should continue until the completion of the LOAD process.
Note: | For minor errors such as nonexistent data files or an invalid dcoldata, the LOAD will clean up and take the table out of the load pending state. You must do the LOAD again in either REPLACE or INSERT mode with correct parameters. |
If the table being loaded is part of a recoverable database, then logging is in effect. Since LOAD does not log the changes made to the table, you have the option of specifying COPY YES to create a copy of the data being loaded. This copy is used during roll-forward recovery to re-create the changes to the database done by LOAD.
When using this option you should also consider using multiple devices or directories to allow for the best possible I/O exploitation.
For more information on the load copy location file, see "Using the Load Copy Location File".
If forward recovery is enabled (logretain or userexit is "on") and the COPY option was not used, all table spaces in which the loaded table resides are left in a backup pending state. A backup of the database or the table space(s) is required to remove this pending status. The backup is done before any other units of work against the database or table space can be started.
For more information on how to recover, see Chapter 6. "Recovering a Database".
During forward recovery, if the load copy is not available, then the table spaces (of the loaded table) which are being rolled forward will be set to the restore pending state. These table spaces must be restored from a backup image taken after the table load.
An error is returned if you specify NONRECOVERABLE and COPY YES. There is no need for a copy in such a case since it would not be needed during recovery.
The exception table is a user-created table which mimics the definition of the table being loaded. It is specified by the FOR EXCEPTION option on the LOAD command. The table is used to store copies of rows that violate unique index rules.
Note: | Any rows rejected before the building of the index on the loaded table because of invalid data are not inserted into the exception table. |
Rows are added to the existing information in the exception table. The existing information may include rows listing check constraint or foreign key violations; or invalid rows from a previous LOAD. If you want only the invalid rows from this LOAD, you will need to remove the existing rows before invoking LOAD.
The exception table used with the LOAD utility is identical to the exception table(s) used by the SET CONSTRAINTS statement. An exception table should be created to perform a load which has a unique index and may have duplicate records. If an exception table is not provided for the LOAD, and duplicate records are found, then the LOAD will continue. However, only a warning message is issued about the deleted duplicate records and the deleted duplicate records are not placed anywhere.
After completing the load, information in the exception table can be used any way you wish. You may want to use the information to correct any data that was in error and insert the rows into the original table.
The exception table message column has the following structure:
Table 22. Exception Table Message Column Structure for LOAD
Field number | Contents | Size | Comments | ||
---|---|---|---|---|---|
1 | Number of violations | 5 characters | Right justified padded with '0' | ||
2 | Type of first violation. Only "I" is used by LOAD | 1 character | 'I' - Unique Index violation | ||
3 | Length of constraint/index token | 5 characters | Right justified padded with '0' | ||
4 | Constraint/index token | length from the previous field |
| ||
|
The loaded table may be in the check pending state if it has table check constraints or referential integrity constraints defined on it. The STATUS flag of the SYSCAT.TABLES in the row corresponding to the loaded table indicates the check pending state of the table. For the loaded table to be usable, the STATUS must have a value of "N" indicating the normal state of the table.
To remove a table from the check pending state, use the SET CONSTRAINT statement. For more information on the SET CONSTRAINTS statement, see the SQL Reference manual. One or more tables may be submitted to be checked in a single invocation. For a dependent table to be checked, the parent table must not be in the check pending state. In the case of a referential integrity cycle, all the tables involved in the cycle must be included in a single invocation of the SET CONSTRAINTS statement.
To manage the loading of several tables, consider the position of each within referential relationships along with the table size and time windows available to carry out the load. It may be convenient, for example, to check the parent table for check constraint violations while the dependent table is loaded. This can only occur if the two tables are not in the same table space.
Exception tables are convenient for a consolidated report of all the rows that have constraints violated. If the exception table option is not used, only the first violation is reported. This may be a cause for frustration when dealing with large tables having more than one constraint violation. The same exception table used for the LOAD utility may be used for checking constraint violations. As with the LOAD utility, there is no checking done when running the SET CONSTRAINTS statement to ensure that the exception table is empty. The extra timestamp column in the exception table may be used to distinguish newly-inserted rows from the old ones, if necessary.
The SET CONSTRAINTS statement does not activate any DELETE triggers as a result of deleting rows that violate constraints. It must be noted, however, that once the table is removed from the check pending state, triggers are active. This implies that, if we correct data and INSERT rows from the exception table into the loaded table, any INSERT trigger defined on the table will be activated. The implications of this on the data should be considered and, if necessary, suitable action should be taken. One option is to drop the INSERT trigger, INSERT rows from the exception table, and then recreate the INSERT trigger.