Trap Dump Reference 20 Dec 2007 -------------------- Steven H. Levine steve53@earthlink.net 0. Introduction. ------------- This note explains how to set up and record trap dumps and how to do basic dump analysis. 1. Setup. ------ There are several ways to enable and configure the trap dump facility. Pick the one that best suits you needs. To permanently enable trap dumps, add the following line to config.sys TRAPDUMP=R0,x: where x is the partition where the dumps will be stored. This must be a FAT formatted partition with the volume name SADUMP. WARNING - the dump will erase the partition. Don't store anything you don't want to lose on the partition. The partition must be at least as large as the installed memory. On LVM aware systems, volume letters may or may not match drive letters assigned by the BIOS and used by the dump routine. The kernel is supposed to map the volume letter to the correct drive letter when invoking the dump routines. However, there are lots of different partition types and setups, so it may take some experimentation to determine the correct volume letter for some setups. It is suggested that, if possible, the same drive letter be assigned to the SADUMP volume as will be assigned by the standard BIOS drive scan. After editing CONFIG.SYS, reboot to active the feature. For more information on this command, type view cmdref trapdump from the command line. This information is accurate but incomplete. Recent FixPak's have added many new features. See \os2\install\readme.dbg for the details. One new command is PDUMPSYS, which can control the level of detail included in the system dump. These settings do not apply to kernel trap dumps, but can be used to control the detail of ring 0 process dumps. Also added is the capability to set up the dump configuration from the command line using the TRAPDUMP command. For more information, see \os2\system\ras\procdump.doc If a ring 0 system dump seems to be missing information needed to analyze your problem try pdumpsys paddr(all) If needed, this command and others can be invoked from config.sys. For example run=z:\os2\system\ras\pdumpsys.exe paddr(all) where z: is your boot drive. 2. Recording Trap Dumps. --------------------- This is automatic. The trap dump files will be created as the traps occur and will be written to the drive you chose. Any existing data on drive will be erased including any prior dump file. Most often the trap will occur at the same cs:eip repeatedly. If not you may need to save the trap dump files to a temp directory to allow you to analyze the common factors. Those using non-US keyboards might have trouble responding to the dump prompt with the Y key. If so, use F1 instead. To turn off the feature, just REM out the line(s) in config.sys and reboot. 3. Preparing to Use the PM Dump Facility. -------------------------------------- The PM Dump Facility (PMDF) is a generic tool which needs to be configured to understand the dump files generated by a specific kernel version. PMDF is configured with a set of files known as Dump Symbols and are a combination of programs and data files. The programs (DF_RET.EXE and DF_DEB.EXE) are invoked by PMDF to retrieve data from the dump file. These programs understand the layout of the kernel data structures and are typically specific to a range of kernel revisions. The data files include System Definition files and Symbol files. The System Definition files (.sdf) configure df_ret and df_deb and are typically specific to a single kernel revision. The Symbol files (.sym) contain data used to translate binary addresses within an executable to symbolic labels. These files are typically specific to a specific version of the executable. This means that a set of Dump Symbols is typically kernel version specific, FixPak specific and application version specific. Assuming a standard install of the Dump Facility, the Dump Symbols sets are stored in subdirectories of \os2\pdpsi\pmdf on your boot drive and indentified by the index file \os2\pdpsi\pmdf\pmdfvers.lst. Depending on what components you installed when you installed eCS or Warp on the box where the trap occurred, you might already have the files needed to examine the dump installed. If not, the following examples describe how to get the files you need and how to install the files so that PMDF can use them. The examples that follow assume FixPak 15 Kernel version 14.062 Boot drive f: Be sure to replace the values shown in the examples with values that match your specific system. This applies to kernel revision numbers, boot drive letters and other values that are specific to your system. Add pathname prefixes as needed to match where you have stored the files. If you are not sure of what kernel revision you are running, use your favorite hex editor and search for the string "Internal revision" without the quotes. The string that follows is the value PMDF uses to match the kernel to the symbols. You can find the same information with bldlevel f:\os2krnl but the revision number reported in the build level string is usually not in exactly the format that PMDF is looking for. Go to ftp.software.ibm.com/ps/products/os2/fixes/debug and download m015dmp.zip. This is the symbols zip file for FP15 and kernel version 14.062. If you are using a testcase kernel, the dump symbols will be at testcase.boulder.ibm.com/ps/fromibm/os2 and will be named dfyyyymmdd.zip where mmdd matches the testcase kernel date. When you download a testcase kernel, be sure to download the symbols file at the same time. Otherwise, it might be gone when you need it. There are a few sites that archive copies of the test test case kernels. One such site is http://www.os2site.com/sw/upgrades/kernel/ If you are running eComStation 1.2 or newer, the dump symbols may already be installed. If not, they are available on your installation CD in the \os2image\debug and the \os2image\fi\sysmgmt directories. They are also available from your account at the eComStation website (www.ecomstation.com). Create the subdirectory f:\os2\pdpsi\pmdf\warp4.15 Unzip the contents of m015dmp.zip into this subdirectory with unzip -j m015pmd.zip -d f:\os2\pdpsi\pmdf\warp4.15 This will put all the files in m015pmd.zip into a single directory. This is important. If you don't use the -j option or it's equivalent, the Dump Facility will not be able to find the symbols. The zip files are structured for use with either the Kernel Debugger or PMDF. The Kernel Debugger requires that the .sym files be in the same directory as the associated executable. The zip files contain subdirectory information so that the files will be placed in the correct directory when unzipped. PMDF has different requirements. For PMDF, the .sym file sets must be in a subdirectory named in pmdfvers.lst. The Kernel Debugger runs on the Machine Under Test (MUT). The only set of symbols it needs, or can use, are the one's for the installed kernel revision. PMDF is often run on a system other than the MUT which generated the dump. Using pmdfvers.lst to locate the correct symbols for the dump allows PMDF to be used to analyze dumps even when the systems have different kernel revisions installed. Pmdfvers.lst will have a line defining where to find the symbols for each specific kernel revision to be analyzed. Add the following line to pmdfvers.lst warp4.15:14.062:OS/2 Warp 4 FP15 warp4.15 is the directory you unzipped the symbols. 14.062 is the internal version string for the kernel. The version string is case sensitive and must match exactly. The rest is a comment to help you remember what the other two values mean. This will allow PMDF to automatically find the symbols for the 14.062 kernel in the warp4.15 subdirectory. If your application came with .sym files copy them into this directory. If your application came with .map files, use the mapsym utility to create the .sym files. Mapsym is available with most compilers. If you need usage help, just type mapsym from the command line. If you don't have either .map or .sym files for your application, it may be a bit more difficult to analyze the dump file, but PMDF will still work. When editing pmdfvers.lst, make sure you don't have any blank lines otherwise pmdf will trap on startup. Thanks go to Lars Erdmann for this tip. 4. Analyzing Trap Dumps. --------------------- This is the bare bones. Start PMDF. For Warp4, you should have an object in the Problem Determination Tools folder. For eCS, the object should be in the Utilities folder. If you don't have an object, you can start PMDF from the command line. If you didn't install PMDF, you will need to run Selective Install and install the Problem Determination Tools and reapply the last FixPak you installed (i.e. FP15 or whatever). Open the dump file from the PMDF File menu. PMDF should find the matching .sym files, using the data in pmdfvers.lst. If for some reason PMDF cannot match up the dump file and the .sym files, it will prompt you to select a .sym file set from the sets defined in pmdfvers.lst. Since you configured PMDF above, this should not occur. You can try selecting one of the available .sym file sets, but the results will be unpredictable. PMDF may misinterpret the dump file content or it may even trap. Select Synopsis -> Trap Screen Info from the Analyze menu. Select Thread -> Call Gate from the Analyze menu. Select Thread -> Ring 0 Stack Trace from the Analyze menu. Select Thread -> Ring 2 Stack Trace from the Analyze menu. Select Thread -> Ring 3 Stack Trace from the Analyze menu. Select Process -> Open Files from the Analyze menu. Select Synopsis -> Process Synopsis from the Analyze menu. Select Synopsis -> System Synopsis from the Analyze menu. Select Process -> Module Table from the Analyze menu. Enter the commands r ln u eip-20 eip k dw bp in the PMDF command line at the bottom of the window. Press the Enter key after each command. If the r command does not report the same cs:eip as shown on the Trap Screen, repeat the above commands substituting the numeric cs:eip value from the Trap Screen for eip and the numeric ss:ebp value from the Trap Screen for ebp. Select Save Output from the File menu and save the window contents to a file. If you don't understand what you are seeing, you will have to find someone to help you interpret the content of the dump file. Ask questions and let your helper guide you. Be prepared to spend some time working with your helper to understand the cause of the trap. The bare bones information you generated is just a starting point. It may or may not be sufficient to identify the source of the trap. Unless you are lucky, your helper will request additional output and may ask you generate another dump file using different settings. Often, your helper will to want you to send a copy of the dump file and the debug symbols. It is a good idea to keep notes describing how each dump file was generated and what you were doing when the dump file was generated. This is especially important for intermittent failures where one is looking for a pattern. Save the dump file, the debug symbols and your notes until analysis is complete. 5. Interpreting Trap Dumps. ------------------------ This too is just the bare bones. The first goal is to decide what the code is trying to do when it traps. Start with the Ring 0 Trap Screen Dump. If CSLIM is all F's, the trap is in the kernel. Experienced developers can sometimes derive a module name from the cs:eip value, but this is beyond the scope of this note. If the CSLIM is not all F's, the trap is either in a device driver or 16-bit code within the kernel. Scan the device driver list. Look for a matching CS value in the strategy entry point column or a matching DS value in the Device Header column. If SS is E8 or 15E8, the trap is within an interrupt handler. 6. E-mailing Trap Dumps. --------------------- In general, don't e-mail a dump file to someone without asking them if they want you to send it. If you do need to e-mail the file, zip it up first. This will save transmit time and protect the dump file from corruption. It's always a good idea to give the zip file a useful name. Something like DavesTrapE_20040501.zip will help everyone remember what the zip file contains. It's also a good idea to include a note in the zip file describing how and why the trap occurred along with the bldlevel output. The zip file may get separated from the e-mail message. You should include the output of procdump query in your note. This will describe the type of data recorded in the dump file. If the zip file is over 5MB or so, check with your helper before sending the e-mail. You may need to use a file splitter and send each chunk in a separate e-mail giving your helper a chance to delete each e-mail from the server before you send the next chunk. Most ISPs limit e-mail inboxes to 10MB and unless you send the zip file in chunks, it will never get to your helper. If you can arrange to FTP the zip file to your helper, this is often a better solution. Be careful to send the e-mail containing the dump file only to the intended addressee. Sending a large, unexpected e-mail to all the members of a mailing list, some of whom may still be one dial up, is sure of upset someone. 7. Known Restrictions ------------------ The dump partition must be visible to the BIOS. This usually means it must be on the first or second drive and must be below the 1024 cylinder boundary unless the BIOS support the Int13 extensions. In a mixed IDE/SCSI system, the driver for the boot drive must load first. This applies even when booting from IDE and when the SCSI drive contains no bootable devices. Otherwise, OS2DUMP will hang. The standard trap dump facility is limited by the 2GB FAT partition limit. If you have more than 2GB of RAM, you will need to install and configure the DUMPFS IFS. Good luck. Steven