IBM Books

Troubleshooting Guide


Troubleshooting Tips

This section describes how to deal with some frequently encountered problems faced by users working with a DB2 server. It addresses the following topics:

For additional troubleshooting information on this topic, see:

Important: This section represents a small sampling of the information available from DB2 Customer Service. For a complete and up-to-date source of DB2 information, use the DB2 Product and Service Technical Library at http://www.software.ibm.com/data/db2/library/.

Migrating to Version 5

If you experience problems migrating to Version 5, ensure that:

[  ]
You followed the instructions in the Quick Beginnings guide.

[  ]
You checked the list of incompatibilities from previous versions. (See the Administration Guide.)

[  ]
The database is consistent.

To verify that databases are consistent so that they can be migrated, use the db2ckmig command. (See the Quick Beginnings book for details.)

You must correct all errors that are reported for databases that are in any of the following states:

Additionally, this tool identifies databases that contain objects that use SYSCAT, SYSSTAT, or SYSFUN as schema names. These objects must be dropped and re-created using a different schema name.

[  ]
The disk space on the machine is adequate. (The amount of disk space required varies with each database.)

[  ]
You use appropriate values for the number of primary and secondary log files (logprimary and logsecond in database configuration), and for the size of the log files (logfilsiz in database configuration). See "Server Fails" for details.

[  ]
If migrating from a Version 1 DB2 for OS/2 database, you use the db2resdb command.

Server Fails

Symptom
The DB2 server does not function correctly.

Action
Ensure that:

[  ]
You check the db2diag.log file on the server to diagnose a cause. For information, see "Interpreting the db2diag.log File".

[  ]
DB2COMM is set to the protocols required by connecting clients. (If not, the message SQL5043N is usually received.)

Consult the db2diag.log file to see which communication protocols were not started successfully, and why. For information, see "Using the db2diag.log file to diagnose server communication problems".

[  ]
There is enough log space. (If not, the SQL0984C message is typically received.)

To increase log space:

  • Increase the number of primary and secondary log files (logprimary and logsecond in database configuration)

  • Increase the size of the log files ( logfilsiz in database configuration)

To initiate your changes, you must disconnect all applications from the database and/or deactivate the database.

You must ensure that the file system or disk space that contains the logs is big enough to hold all of the logs. To calculate the minimum required space for initializing the logs, use the following formula:

(logprimary + logsecondary) * logfilsiz * 4096

[  ]
You are using the latest version of DB2, with appropriate fix packs installed. For information, see "Updating DB2 Products".

[  ]
The index files are valid.

The db2diag.log file indicates if indexes need to be re-created. Use db2dart /MI to mark the indexes as inconsistent. See "Miscellaneous Troubleshooting Tools" for details on db2dart /MI.

[  ]
The data is consistent.

Look for the following symptoms:

  • There is a DIA3700C error in the db2diag.log file.

  • The server goes down when particular data is accessed. This data resource may be damaged.

  • The server fails repeatedly.

Run db2dart to verify that the data is consistent. If data is not consistent, contact DB2 Customer Service.

For Extended Enterprise Edition, see also "Database Manager Won't Start or Hangs".

Can't Administer Databases

Symptom
You have problems installing or administering a database server.

Action
Ensure that:

[  ]
You are using a valid user name and password that have administrator authority.

  • For OS/2 Warp Connect and later versions of OS/2: The default DB2 administrator user name and password are the same as those specified when the operating system was installed.

  • For versions of OS/2 before OS/2 Warp Connect: The default administrator user name and password depend on the national language installed on a machine, and on whether UPM was installed with a particular user name. In most cases, the defaults are USERID and PASSWORD in English countries. For more information, see the Quick Beginnings for your platform.

  • For Windows NT: Your membership in a group controls which actions you are permitted to perform. For DB2 for Windows NT, you must belong to the "Administrators" or "Domain Admins." groups to perform installation and some administration tasks.

    You can find out what your group is by using the User Manager tool to display user names and group memberships. To invoke this tool, select Start, then Programs, then Administrative Tools, and then User Manager.

    To install DB2, you must have administrator authority on the local machine with a user name that is eight characters or less (note that you cannot use the user name "Administrator"). To be added to the administrator group, you must be added by an existing member of that group.

  • For UNIX-based environments: You must belong to a SYSADM group. The name of the group must be eight characters or less in length.

  • For DB2 for Solaris: If a DB2 instance is started by a user who does not belong to the same group as the DB2 instance owner, the SQL1042C message is received for many commands. In this situation, the executing db2sysc process inherits the user's group name and does not have the correct read permission to open a file in the /proc directory.

    You must start the instance as the instance owner, and the id/group of the db2sysc process attempting to read a file must be the same as the id/group of the DB2 instance.

[  ]
For UNIX-based environments other than AIX, you update and rebuild the kernel configuration parameters. (If not, the SQL1016N or SQL1018N messages will typically occur when you try to create an instance or use the command line processor.) See the Quick Beginnings guide for your platform for details.

Backing Up and Restoring Data

It is important to have a good backup and recovery strategy, and a plan to implement this strategy. This plan should address questions such as the following:

Backup plans are heavily influenced by the importance of having the data in the database available for use. If your business depends heavily on the data, then you must develop a backup and recovery plan that will minimize downtime and ensure that the data is available when needed at a cost that you are willing to accept.

For help on backup and recovery see the Administration Guide and use the Backup Database SmartGuide from the Control Center.

SQL0902C received when backing up a database or accessing DB2 data

Symptom
The SQL0902C message is received when a database is backed up.

Possible Cause
The database is not architecturally consistent.

Action
Use the db2dart command to verify the architectural consistency of the database. See "Miscellaneous Troubleshooting Tools" for information on this command.

Usually you can perform a restore with a roll forward to the end of the logs to correct the problem. If the problem persists, contact DB2 Customer Service.

Can't restore a database or table space backup

Symptom
Data cannot be restored.

Action
Ensure that:

[  ]
You have enough disk space to restore the data.

Use redirected restore to specify a modified list of containers for the table spaces being restored. See the Administration Guide.

[  ]
You specify the correct path to the backup image and logs. (It is possible that they were moved.)

[  ]
If an online backup was taken, all logs from the beginning to the end of the backup are required to restore and roll forward the database.

(The rollforward must proceed until at least the point where the online backup ended. Otherwise, the database is not accessible.)

Loading Data

Symptom
Data will not load on a database server.

Action
Ensure that:

[  ]
You used the RESTART or REPLACE parameters with the LOAD command. (If not, the SQL3805N message is typically received.)

For more information on the LOAD command, see the Command Reference.

[  ]
The size of a database object in an SMS table space has not reached an operating system limit.

For example, OS/2, Windows NT, and Version 3.2 of AIX have a 2 GB file size limit.

(Refer to "Problems with Splitting and Loading Data" for more information on loading data in Extended Enterprise Edition.)

Importing Data

Symptom
You are having problems importing data.

Action

[  ]
Ensure that there is enough log space available. (See "Server Fails".)

[  ]
Use the COMMITCOUNT n option to commit data after every n records are imported. This option protects the committed data from loss in case of a failure. It also reduces the log requirements for an import operation, allowing a number of smaller transactions rather than one long transaction.

Can't Use Commands, Utilities, or the Command Line Processor

Symptom
You cannot use commands, utilities, or the Command Line Processor.

Action
Ensure that:

[  ]
You bound the utilities and your applications to the database after you installed a fix or newer executable. (The SQL0818N or SQL0805N messages indicate that you need to rebind.)

When you precompile an SQL application, a compilable file is produced, and optionally a bind file. Both of these files have a new timestamp. The default behavior of the precompile operation is to automatically create a package so that you do not need to bind anything. However, if you create a bind file but not a package, then you must bind the new bind file to the database.

For information on binding, see your Quick Beginnings guide.

[  ]
You used the correct syntax (especially for UNIX-based systems).

It is recommended that you either:

  • Use double quotation marks (" ") around a command line processor request.

  • Precede any special characters such as asterisks (*), brackets, or question marks (?) with a back slash (\) character to ensure that the command line processor interprets them correctly.

Example

For:

db2 SELECT * FROM SYSCAT.TABLES

Use either:

db2 "SELECT * FROM SYSCAT.TABLES"
db2 SELECT \* FROM SYSCAT.TABLES

Tables or Rows Locked

The database manager provides concurrency control and prevents uncontrolled access by means of locks. Because of the basic principles of DB2 locking, you do not need to take action to control locks in most cases. For information on how locking works, see the Administration Guide.

Diagnosing locking problems

Use the following methods to investigate whether or not applications are creating deadlocks or holding locks:

For information, see "The Database System Monitor" and the System Monitor Guide and Reference.

Applications slow or appear to hang

Symptom
Applications accessing a DB2 database are slow or appear to hang.

Possible Cause
There is lock contention or lock escalation.

Action
Use the Control Center or the LIST APPLICATIONS FOR DATABASE database-alias SHOW DETAIL command to find out if applications are waiting on a lock and who is holding the lock that the applications are waiting for. Then use the database monitor to determine if lock escalation is occurring.

If escalation is occurring, ensure that:

[  ]
Applications use an appropriate frequency of commits.

Take a monitor snapshot for locks to determine which applications are causing other applications to wait for locks. Also, monitor the db2.lock_waits performance variable.

[  ]
The values of the database configuration parameters maxlocks and locklist, which determine the number of locks allowed, are appropriate.

[  ]
The locktimeout database configuration parameter is set appropriately. When it is turned on, this parameter can help avoid hangs due to lock contention, and can tell you if lock contention is your problem.

[  ]
The dlchktime database configuration parameter is set appropriately. This parameter defines the frequency at which the database manager checks for deadlocks among all the applications connected to a database. If it is set high, CPU time is saved but deadlocks may not be detected soon enough.

[  ]
Other users are not preparing and binding applications.

The procedure of preparing and binding applications requires that locks be obtained on system catalog tables. This procedure should be performed in off-peak hours.

[  ]
Lock waits or deadlocks are not caused by next key locking.

Next key locking guarantees a cursor stability (CS) isolation level by automatically locking the next key for all INSERT and DELETE statements and the next higher key value above the result set for SELECT statements. It is required to guarantee ANSII and SQL 92 standard CS, and is the DB2 default.

Examine explain information for the application. If the problem appears to be with next key locking, you can change the isolation level to Read Stability (RS) by setting the DB2_RR_TO_RS flag on.

The DB2_RR_TO_RS flag stops all next key locking on user tables (catalog tables are not affected). Any packages bound with CS are automatically downgraded to RS, because DB2 can no longer guarantee CS. Do not use this flag if you require ANSII and SQL standard CS,

For information on locking, configuration parameters, and explain information, see the Administration Guide.

SQL0911N received

Symptom
The SQL0911N message is received.

Possible Cause
There are deadlocks or timeouts.

Action
Check the reason code in the SQLCA to determine if deadlocks or timeouts are causing the problem (see "Interpreting an SQLCA Structure").

In the case of deadlocks, create an event monitor for deadlocks to gather more information. Possibly increase the size of the lock list to avoid lock escalation which can often result in a deadlock scenario.

Note that timeouts are not captured using the deadlock event monitor.

User Authentication with DB2 for Windows NT

User authentication can cause problems for Windows NT users because of the way the operating system authenticates. This section describes some considerations for user authentication under DB2 for Windows NT:

User name and group name limitations

The DB2 for Windows NT security service

In DB2 Universal Database we have integrated the authentication of user names and passwords into the DB2 System Controller. The Security Service is only required when connecting a client to a server that is configured for authentication CLIENT.

Installing DB2 on a backup domain controller

In a Windows NT LAN environment a user can be authenticated at either a primary or backup domain controller. This feature is very important in large distributed LANs with one central primary domain controller and one or more backup domain controllers at each site. Users can then be authenticated on the backup domain controller at their site instead of requiring a call to the primary domain controller for authentication.

The advantage of having a backup domain controller (BDC), in this case, is that users are authenticated faster and the LAN is not as congested as it would have been had there been no BDC.

For Version 5, authentication can occur at a backup domain controller under the following conditions:

If the DB2DMNBCKCTLR profile variable is not set or is set to blank, DB2 for Windows NT performs authentication at the primary domain controller.

If the DB2DMNBCKCTLR profile variable is set to a question mark (DB2DMNBCKCTLR=?) then DB2 for Windows NT will perform its authentication on the backup domain controller under the following conditions:

Under normal circumstances the setting DB2DMNBCKCTLR=? will work; however, it will not work in all environments. The information supplied about the servers on the domain is dynamic, and Computer Browser must be running to keep this information accurate and current. Large LANs may not be running Computer Browser and therefore Server Manager's information may not be current. In this case, there is a second method to tell DB2 for Windows NT to authenticate at the backup domain controller: set DB2DMNBCKCTLR=xxx where xxx is the machine name of the backup domain controller and DB2 for Windows NT server. With this setting, authentication will occur on the backup domain controller under the following conditions:

Authentication with groups and domain security

DB2 for Windows NT supports the following types of groups:

For DB2 for Windows NT to work with domain security, you must grant authority and privileges to a local group. User names within the local and global groups MUST be defined on the same domain as the local or global group in order to be authenticated correctly.

If the db2_grp_lookup profile variable is set to local, then DB2 tries to find a user on the local machine only. If the user is not found on the local machine, or is not defined as a member of a local or global group, then authentication fails. DB2 does not try to find the user on another machine in the domain or on the domain controllers.

If the db2_grp_lookup profile variable is not set then:

  1. DB2 first tries to find the user on the same machine.

  2. If the user name is defined locally, the user is authenticated locally.

  3. If the user is not found locally, DB2 attempts to find the user name on its domain, and then on trusted domains.

The following examples illustrate how DB2 for Windows NT can support domain security. In the first example, the connection works because the user name and local group are on the same domain. In the second example, the connection does not work because the user name and local or global group are on different domains.

Example of a Successful Connection

The connection works in the following scenario because the user name and local or global group are on the same domain.

Note that the user name and local or global group do not need to be defined on the domain where the database server is running, but they must be on the same domain as each other.
Domain1 Domain2
A trust relationship exists with Domain2.

  • A trust relationship exists with Domain1.

  • The local or global group grp2 is defined.

  • The user name id2 is defined.

  • The user name id2 is part of grp2.

The DB2 server runs in this domain. The following DB2 commands are issued from it:
REVOKE CONNECT ON db FROM public
GRANT CONNECT ON db TO GROUP grp2
CONNECT TO db USER id2

The local or global domain is scanned but id2 is not found. Domain security is scanned.

The user name id2 is found on this domain. DB2 gets additional information about this user name (that is, it is part of the group grp2).
The connection works because the user name and local or global group are on the same domain.

Example of an Unsuccessful Connection

The connection does not work in the following scenario because the user name is defined on a different domain than where the local or global group is defined.
Domain1 Domain2
A trust relationship exists with Domain2.

  • A trust relationship exists with Domain1.

  • The local or global group grp2 is defined.

  • The global group grp1 is defined.

  • The user name id1 is defined.

  • The user name id1 is part of grp1.



Domain1\grp1 is part of grp2.
The DB2 server runs on this domain. The following DB2 commands are issued from it:
REVOKE CONNECT ON db FROM public
GRANT CONNECT ON db TO GROUP grp2
CONNECT TO db USER id1

The local or global domain is scanned and id1 is found. DB2 gets information for this user name (that is, the user name id1 is part of grp1 and the group grp1 is part of Domain2\grp2).

The group grp2 exists on this domain.
The connection does not work because the local or global group is on Domain2 and the actual user name is defined on Domain1.

The connection would work if the following command was issued instead: GRANT CONNECT ON db TO GROUP grp1


Improving Database Performance

Database performance is a complicated subject beyond the scope of this guide. As a starting point, be aware of the following:

For more information on performance:


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]