Administration Guide

Using the AutoLoader Utility

In a partitioned database, partitioning keys are used to determine the database partition where the data resides. Therefore, data must pass through a splitting phase before it can be loaded at the correct database partition.

The entire "split and load" process is accomplished by the AutoLoader utility which uses the hashing algorithm to partition the data into as many output pipes as there are database partitions in the nodegroup in which the table was defined. It then loads these output pipes concurrently across the set of database partitions in the nodegroup. A key feature of the AutoLoader utility, is that it uses pipes for all data transfer required in the split and load process. It also uses multiple database partitions for the splitting phase, thereby improving the performance significantly.

The AutoLoader utility may be run in one of the following modes:

SPLIT_AND_LOAD: In this mode, data is split and then loaded on the correct database partitions. Pipes are used for temporary storage and transfer of data.
SPLIT_ONLY: With this choice, the data is only split. A set of split data files are generated for the requested output database partitions. You must have sufficient storage for the original input source and for each of the split data files. The output from the split function writes the files in the location pointed to by the parameter SPLIT_FILE_LOCATION or in the AutoLoader current working directory. Data is split into separate files that are named using the convention filename.xxx where xxx is the node number to which the split file belongs and filename is no longer than eight characters.
LOAD_ONLY: Data is expected to be already split into separate files that are named using the following convention filename.xxx where xxx is the node number to which the split file belongs and filename is no longer than eight characters. AutoLoader expects to find these files in the SPLIT_FILE_LOCATION or in the current AutoLoader working directory. These split files are loaded concurrently on their corresponding nodes.
ANALYZE: This option generates a customized optimal partitioning map for a nodegroup. It is recommended that a data file with a large number of records be specified as input. The output from the ANALYZE mode can be used with the MAP_FILE_INPUT parameter. The larger the number of records used, the better the representation to the actual data that can be analyzed, and the better the resulting new partitioning map. The map will produce a more even distribution of data across each of the database partitions in the nodegroup.

See Appendix O. "Supplemental AutoLoader Information" for more information on database partitions and for platform-specific usage notes. Before using the AutoLoader utility, you should be familiar with the LOAD utility. To learn more about the LOAD process, see "Using the LOAD Utility".

Planning to Use the AutoLoader Utility

The AutoLoader utility requires the main autoloader driver db2autold and other related executable files each located in the misc sub-directory under the sqllib sub-directory. A sample configuration file, autoloader.cfg, is found under the autoloader sub-directory of the samples sub-directory of the sqllib sub-directory.

Before using the AutoLoader utility, you should:

Read Appendix O. "Supplemental AutoLoader Information" which shows specific details for using the AutoLoader utility.
Create a temporary directory and move the autoloader.cfg file into it. This directory must be accessible from all the participating split and load database partitions.
Modify the autoloader.cfg file according to the directions included in the file.
You should test the AutoLoader with small amounts of data first to get familiar with the utility.

Running the AutoLoader Utility

The AutoLoader utility is executed by typing the following command:

    db2autold [options]

with one or more of the following options:

-c: Uses the config_file specified as the configuration file for the AutoLoader utility. The default is 'autoloader.cfg'.
-d: Cleans up the temporary resources allocated by the AutoLoader utility. In case the AutoLoader utility exits abnormally, it is necessary to run this option to clean up all associated temporary directories, files and processes.
-i: This makes the cleanup interactive. By default, cleanup is done without a prompt. This must only be used with the -d option.

The AutoLoader utility creates a file called autoload.log to keep messages returned from the main autoloader script. You can check the contents of this file to track the progress of the AutoLoader utility. As well, AutoLoader creates files called load_log.XXX and splt_log.XXX which contain messages from the load and split processes respectively.

Some Considerations with AutoLoader

There are some items you should consider before using the AutoLoader utility:

If you wish to maintain the order of the input data, then only one database partition should be used for splitting. Parallel splitting cannot guarantee that the data is loaded in the order it was received.
If you are using the LOBSINFILE option of the LOAD utility, then all directories containing the LOB files have to be accessible from all the database partitions being loaded. See Appendix O. "Supplemental AutoLoader Information" for information on how this is done.
For optimal performance, the AutoLoader utility should be invoked on a database partition that is not participating in either the splitting or loading operations. As well, if the splitting nodes are different from the loading database partitions, there is less contention for CPU cycles on any one database partition. On a SMP system, you can improve performance if you ensure at least one CPU for every split to be done.
If you are using multiple database partitions to split the data and the AutoLoader utility is used with a savecount of greater than zero then an error is returned. This ensures that you can restart the AutoLoader utility since the arrival order of the records cannot be determined when you use multiple splitting database partitions.

Sample AutoLoader Configuration File

 ######################################################################
 # Release level of this configuration file.
 # Please do not delete or modify this line.
 Release=V5.1.0

 # LOAD Command
 # - Specify a complete LOAD command including the file_name, file_type and
 #   the table_name.
 # - It may be necessary that the load command is double-quoted if it includes
 #   special shell characters, like round brackets ( or ).
 #
 # - Refer to the Command Reference for the complete syntax.

 # DEL data file
 # -------------
 db2 load from your_data of del replace into your_schema.your_table

 # ASC data file
 # -------------
 # db2 load from your_data of asc modified by reclen=19\
 # method L "(1 16, 17 18)" replace into your_schema.your_table

 ######################################################################
 #
 # Miscellaneous AutoLoader Parameters
 
 # DATABASE ... Name of the database being loaded into.
 DATABASE=your_db
 
 # HOSTNAME ... Name of the machine to FTP the data file from. This may be
 #              an MVS host or another workstation. Make sure that the .netrc
 #              file is set up accordingly. Please comment out with a # or
 #              leave blank if file is local.
 #HOSTNAME=

 # SPLIT_FILE_LOCATION ... The complete path name of the location
 #               - to place the split files for SPLIT_ONLY mode
 #               - to look for split files if in LOAD_ONLY mode
 #               If not specified, and in SPLIT_ONLY mode, the split
 #               files are placed in the current working directory.
 #               If not specified, and in LOAD_ONLY mode, the
 #               AutoLoader utility looks for the split files in the
 #               current working directory.
 SPLIT_FILE_LOCATION=/u/user/

 # OUTPUT_NODES ... Database partitions on which load is to be performed.
 #                  The supplied node numbers must be a subset of database
 #                  partitions on which the table is defined and must also
 #                  exist in the db2nodes.cfg file. If left blank, all
 #                  database partitions that the table is defined on will
 #                  have data loaded into them.
 OUTPUT_NODES=(0,1)

 # SPLIT_NODES ... The list of database partitions participating in the
 #                 splitting process. Splitting database partitions may be
 #                 the same or different from the database partitions being
 #                 loaded into. These database partitions must also exist in
 #                 the db2nodes.cfg file. If left blank, all database
 #                 partitions that the table is defined on will be used for
 #                 splitting.
 SPLIT_NODES=(0)

 # RUN_STAT_NODE ... If "statistics yes" is specified in the LOAD command,
 #               then statistics will be collected only on one database
 #               partition. This parameter specifies the database partition
 #               you wish to collect statistics on. If left blank or -1,
 #               the default is the first database partition in output node
 #               list.
 RUN_STAT_NODE=-1

 ######################################################################
 #
 # Optional AutoLoader parameters ... These may or may not be specified.
 
 # MODE ... Specify the mode to run AutoLoader in.
 #          SPLIT_AND_LOAD is the default.
 #
 #          Other valid values are:
 #          SPLIT_ONLY ... Load process is not performed. Output from the
 #                         splitting database partitions is written to files
 #                         in the SPLIT_FILE_LOCATION or in the current
 #                         AutoLoader working directory.
 #
 #          LOAD_ONLY  ... Data must be pre-split. The split files are sent to
 #                         correct database partition for loading.  The split
 #                         filenames must follow the convention filename.xxx
 #                         where filename was provided in the LOAD command and
 #                         xxx is the nodenumber. Also, it is assumed that
 #                         filename.xxx is in the SPLIT_FILE_LOCATION or
 #                         in the current AutoLoader working directory.
 #
 #          ANALYZE    ... This option is used to generate an optimal
 #                         partition map for a nodegroup.
 MODE=SPLIT_AND_LOAD

 # LOGFILE ... This name is used as a base name to create the following files.
 #                autoload.LOG  ... used to track progress of AutoLoader.
 #                splt_LOG.XXX  ... Holds output from splitting operation on
 #                                  database partition xx.
 #                load_LOG.XXX  ... Holds output from load operation on
 #                                  database partition xx.
 LOGFILE=LOG
 
 # NOTNFS_DIR ... The path name to the not-nfs space on each database
 #                partition. If not specified, it is "notnfs" with
 #                respect to the root directory.
NOTNFS_DIR=/notnfs

 ######################################################################
 #
 # Optional Splitter parameters ... These may or may not be specified.
 
 # CHECK_LEVEL ... Can be either CHECK or NOCHECK
 #                 - CHECK: Program will check for truncation of record at
 #                   Input/Output (default).
 #                 - NOCHECK: Program will not check for truncation of record
 #                   at Input/Output.
 CHECK_LEVEL=CHECK

 # MAP_FILE_INPUT ... Input filename for the partitioning map. If the
 #                    partitioning map is customized rather than a
 #                    default one, this parameter must be specified.
 #                    It points to the file containing the customized
 #                    partitioning map. You can get a customized
 #                    partitioning map by either using the db2gpmap program
 #                    to extract the map from the database system catalog
 #                    table; or you can run the ANALYZE mode db2autold
 #                    to generate an optimal map. The map generated by the
 #                    ANALYZE mode must be moved to each database partition
 #                    in your database before actual loading can proceed.
 #
 MAP_FILE_INPUT=filename_of_your_customized_partitioning map

 # MAP_FILE_OUTPUT ... Output file name for partitioning map. This parameter
 #                     should be used with the db2autold program executed
 #                     in ANALYZE mode. An optimal partitioning map with even
 #                     distribution across all database partitions is
 #                     generated. If it is not specified and the running mode
 #                     is ANALYZE, a default filename "OutMap" is used.
 MAP_FILE_OUTPUT=filename_for_your_optimal_partitioning_map

 # TRACE ... Tracing hashing values. Dump of all the data conversion process
 #           and output of hashing values. Argument is the number of records
 #           to trace.
 TRACE=100

 # NEWLINE ... Only meaningful if the input data file is an ASC file with each
 #             record delimited by a new line character, and the RecLen
 #             parameter in the load command is specified.
 #             If YES, AutoLoader always checks if the record is terminated
 #             by a new line character or not. It also checks if the record
 #             length is the same as the expected RecLen or not. The default
 #             for this parameter is NO.
 NEWLINE=YES

Loading into Multiple Database Partitions

If you are loading data into a table in a multiple database partition nodegroup, the LOAD utility requires that the files that are to be loaded were split and contain the correct header information. The LOAD utility verifies the header information that the split operation of AutoLoader writes to each data file to ensure that the data goes to the correct location. (The header information is described in Appendix N. "Splitting Data with db2split".)

If you are loading data into a table in a single database partition nodegroup, the files do not have to be split, even if the table is defined to have a partitioning key. In this situation, you would specify the NOHEADER option of the LOAD utility.

The LOAD utility checks that the partitioning map used by the split operation of AutoLoader is the same one specified when the table is being loaded. If not, an error is returned. It also checks that the file partition is loaded at the correct database partition, and that the data types of the partitioning key columns specified during splitting match the current definition in the catalog. The nodegroup to which the table is loaded cannot be redistributed between the time that the data file is partitioned and the time that the parts are loaded into the corresponding database table. If redistribution has been done, the utility cannot load the partitioned data.

LOAD supports the following flat file formats:

Non-delimited ASCII (ASC)
Delimited ASCII (DEL)
PC/IXF Format (IXF)

However, AutoLoader can only be used to split DEL and ASC files.
Note: IXF files cannot be split, but can be loaded into a single-node nodegroup using the 'NOHEADER' option in the LOAD command.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]