Embedded SQL Programming Guide

Developing Japanese or Traditional-Chinese EUC Applications

When developing EUC applications, you need to consider the following items:

"Graphic Data Handling"
"Developing for Mixed Code Set Environments"

For additional considerations for stored procedures, refer to "Considerations for Stored Procedures". Additional language-specific application development issues are discussed in:

"Japanese or Traditional-Chinese EUC Considerations" (for C and C++).
"Japanese or Traditional-Chinese EUC Considerations" (for COBOL).
"Japanese or Traditional-Chinese EUC Considerations" (for FORTRAN).
"Japanese or Traditional-Chinese EUC Considerations" (for REXX).

Graphic Data Handling

This section discusses EUC application development considerations in order to handle graphic data. This includes handling graphic constants, and handling graphic data in UDFs, stored procedures, DBCLOB files, as well as collation.

Graphic Constants

Graphic constants, or literals, are actually classified as mixed character data as they are part of an SQL statement. Any graphic constants in an SQL statement from a Japanese or Traditional-Chinese EUC client are implicitly converted to the graphic encoding by the database server. You can use graphic literals that are composed of EUC encoded characters in your SQL applications. An EUC database server will convert these literals to the graphic database code set which will be UCS-2. Graphic constants from EUC clients should never contain single-width characters such as CS0 7-bit ASCII characters or Japanese EUC CS2 (Katakana) characters.

For additional information on graphic constants see the section on Graphic Strings in the SQL Reference.

Considerations for UDFs

UDFs are invoked at the database server and are meant to deal with data encoded in the same code set as the database. In the case of databases running under the Japanese or Traditional-Chinese code set, mixed character data is encoded using the EUC code set under which the database is created. Graphic data is encoded using UCS-2. This means that UDFs need to recognize and handle graphic data which will be encoded with UCS-2.

For example, you create a UDF called VARCHAR which converts a graphic string to a mixed character string. The VARCHAR function has to convert a graphic string encoded as UCS-2 to an EUC representation if the database is created under the EUC code sets.

Considerations for Stored Procedures

A stored procedure, running under either a Japanese or Traditional-Chinese EUC code set, must be prepared to recognize and handle graphic data encoded using UCS-2. When running these code sets, graphic data received or returned through the stored procedure's input/output SQLDA is encoded using UCS-2.

Considerations for DBCLOB Files

There are two important considerations for DBCLOB files:

The DBCLOB file data is assumed to be in the EUC code page of the application. For EUC DBCLOB files, data is converted to UCS-2 at the client on read, and from UCS-2 at the client on write.
The number of bytes read or written at the server, is returned in the data length field of the file reference variable based on the number of UCS-2 encoded characters read from or written to the file. The number of bytes actually read from or written to the file may be larger.

Collation

Graphic data is sorted in binary sequence. Mixed data is sorted in the collating sequence of the database applied on each byte. For a discussion on sorting sequences, see Assignments and Comparisons in the SQL Reference. Due to the possible difference in the ordering of characters in an EUC code set and a DBCS code set for the same country, different results may be obtained when the same data is sorted in an EUC database and in a DBCS database.

Developing for Mixed Code Set Environments

This section deals with the following considerations related to the increase or decrease in the length of data under certain circumstances, when developing applications in a mixed EUC and DBCS environment:

"Unequal Code Page Situations"
"Client-Based Parameter Validation"
"Using the DESCRIBE Statement"
"Using Fixed or Variable Length Data Types"
"Character Conversion String Length Overflow"
"Rules for String Conversions"
"Character Conversions Past Data Type Limits"
"Character Conversions in Stored Procedures"

Unequal Code Page Situations

Depending on the character encoding schemes used by the application code page and the database code page, there may or may not be a change in the length of a string as it is converted from the source code page to the target code page. A change in length is usually associated with conversions between multi-byte code pages with different encoding schemes, for example DBCS and EUC.

A possible increase in length is usually more serious than a possible decrease in length since an over-allocation of memory is less problematic than an under-allocation. Application considerations for sending or retrieving data depending on where the possible expansion may occur need to be dealt with separately. It is also important to note the differences between a best-case and worst-case situation when an expansion or contraction in length is indicated. Positive values, indicating a possible expansion, will give the worst-case multiplying factor. For example, a value of 2 for the SQLERRD(1) or SQLERRD(2) field means that a maximum of twice the string length of storage will be required to handle the data after conversion. This is a worst-case indicator. In this example best-case would be that after conversion, the length remains the same.

Negative values for SQLERRD(1) or SQLERRD(2), indicating a possible contraction, also provide the worst-case expansion factor. For example, a value of -1 means that the maximum storage required is equal to the string length prior to conversion. It is indeed possible that less storage may be required, but practically this is of little use unless the receiving application knows in advance how the source data is structured.

To ensure that you always have sufficient storage allocated to cover the maximum possible expansion after character conversion, you should allocate storage equal to the value max_target_length obtained from the following calculation:

Determine the expansion factor for the data.
For data transfer from the application to the database:
```
    expansion_factor = ABS[SQLERRD(1)] 
    if expansion_factor = 0 
       expansion_factor = 1
```
For data transfer from the database to the application:
```
    expansion_factor = ABS[SQLERRD(2)] 
    if expansion_factor = 0 
       expansion_factor = 1 
```
In the above calculations, ABS refers to the absolute value.
The check for expansion_factor = 0 is necessary because some DB2 Universal Database products return 0 in SQLERRD(1) and SQLERRD(2). These servers do not support code page conversions that result in the expansion or shrinkage of data; this is represented by an expansion factor of 1.

Intermediate length calculation.

    temp_target_length = actual_source_length * expansion_factor

Determine the maximum length for target data type.

Target data type
Maximum length of type (type_maximum_length)
CHAR
254
VARCHAR
4 000
LONG VARCHAR
32 700
CLOB
2 147 483 647
Determine the maximum target length.
```
 
    if temp_target_length < actual_source_length        (1) 
       max_target_length = type_maximum_length 
    else 
       if temp_target_length > type_maximum_length      (2) 
          max_target_length = type_maximum_length 
       else                                             (3) 
          max_target_length = temp_target_length 
```
All the above checks are required to allow for overflow which may occur during the length calculation. The specific checks are:

(1)
Numeric overflow occurs during the calculation of temp_target_length in step 2.
If the result of multiplying two positive values together is greater than the maximum value for the data type, the result wraps around and is returned as a value less than the larger of the two values.
For example, the maximum value of a 2-byte signed integer (which is used for the length of non-CLOB data types) is 32 767. If the actual_source_length is 25 000 and the expansion factor is 2, then temp_target_length is theoretically 50 000. This value is too large for the 2-byte signed integer so it gets wrapped around and is returned as -15 536.
For the CLOB data type, a 4-byte signed integer is used for the length. The maximum value of a 4-byte signed integer is 2 147 483 647.
(2)
temp_target_length is too large for the data type.
The length of a data type cannot exceed the values listed in step 3.
If the conversion requires more space than is available in the data type, it may be possible to use a larger data type to hold the result. For example, if a CHAR(250) value requires 500 bytes to hold the converted string, it will not fit into a CHAR value because the maximum length is 254 bytes. However, it may be possible to use a VARCHAR(500) to hold the result after conversion. See "Character Conversions Past Data Type Limits" for more information.
(3)
temp_target_length is the correct length for the result.

Using the SQLERRD(1) and SQLERRD(2) values returned when connecting to the database and the above calculations, you can determine whether the length of a string will possibly increase or decrease as a result of character conversion. In general, a value of 0 or 1 indicates no expansion; a value greater than 1 indicates a possible expansion in length; a negative value indicates a possible contraction. (Note that values of '0' will only come from down-level DB2 Universal Database products. Also, these values are undefined for other database server products. Table 14 lists values to expect for various application code page and database code page combinations when using DB2 Universal Database.

Table 14. SQLCA.SQLERRD Settings on CONNECT

Application Code Page Database Code Page SQLERRD(1) SQLERRD(2)
SBCS SBCS +1 +1
DBCS DBCS +1 +1
eucJP eucJP +1 +1
eucJP DBCS -1 +2
DBCS eucJP +2 -1
eucTW eucTW +1 +1
eucTW DBCS -1 +2
DBCS eucTW +2 -1
eucKO eucKO +1 +1
eucKR DBCS +1 +1
DBCS eucKR +1 +1
eucCN eucCN +1 +1
eucCN DBCS +1 +1
DBCS eucCN +1 +1

Application Code Page	Database Code Page	SQLERRD(1)	SQLERRD(2)
SBCS	SBCS	+1	+1
DBCS	DBCS	+1	+1
eucJP	eucJP	+1	+1
eucJP	DBCS	-1	+2
DBCS	eucJP	+2	-1
eucTW	eucTW	+1	+1
eucTW	DBCS	-1	+2
DBCS	eucTW	+2	-1
eucKO	eucKO	+1	+1
eucKR	DBCS	+1	+1
DBCS	eucKR	+1	+1
eucCN	eucCN	+1	+1
eucCN	DBCS	+1	+1
DBCS	eucCN	+1	+1

Expansion at the Database Server

If the SQLERRD(1) entry indicates an expansion at the database server, your application must consider the possibility that length-dependent character data which is valid at the client will not be valid at the database server once it is converted. For example, DB2 products require that column names be no more than eighteen bytes in length. It is possible that a character string which is eighteen bytes in length encoded under a DBCS code page expands past the eighteen byte limit when it is converted to an EUC code page. This means that there may be activities which are valid when the application code page and the database code page are equal, which are invalid when they are different. Exercise caution when you design EUC and DBCS databases for unequal code page situations.

Expansion at the Application

If the SQLERRD(2) entry indicates an expansion at the client application, your application must consider the possibility that length-dependent character data will expand in length after being converted. For example, a row with a CHAR(18) column is retrieved. Under circumstances where the database and application code pages are equal, the length of the data returned is eighteen bytes. However, In an unequal code page situation eighteen bytes of data encoded under a DBCS code page may expand past eighteen bytes when converted to an EUC code page. Thus, additional storage may have to allocated in order to retrieve the complete string.

Client-Based Parameter Validation

An important side effect of potential character data expansion or contraction between the client and server involves the validation of data passed between the client application and the database server. In an unequal code page situation, it is possible that data determined to be valid at the client is actually invalid at the database server after character conversion. Conversely, data that is invalid at the client, may be valid at the database server after conversion.

Any end-user application or API library has the potential of not being able to handle all possibilities in an unequal code page situation. In addition, while some parameter validation such as string length is performed at the client for commands and APIs, the tokens within SQL statements are not verified until they have been converted to the database's code page. This can lead to situations where it is possible to use an SQL statement in an unequal code page environment to access a database object, such as a table, but it will not be possible to access the same object using a particular command or API.

Consider an application that returns data contained in table provided by an end-user, and checks that the table name is not greater than eighteen bytes long. Now consider the following scenarios for this application:

A DBCS database is created. From a DBCS client, a table (t1) is created with a table name which is eighteen bytes long. The table name includes several characters which would be greater than two bytes in length if the string is converted to EUC, resulting in the EUC representation of the table name being a total of twenty bytes in length.
An EUC client connects to the DBCS database. It creates a table (t2) with a table name which is fourteen bytes long when encoded as EUC and ten bytes long when converted to DBCS. The table name in the DBCS database is ten bytes. The CREATE TABLE is successful.
The EUC client creates a table (t3) with a table name that is nine EUC characters in length (twenty bytes). When this name is converted to DBCS its length shrinks to the eighteen byte limit. The CREATE TABLE is successful.
The EUC client invokes the application against the each of the tables (t1, t2, and t3) in the DBCS database, which results in:

Table
Result
t1
Displays correct results
t2
Displays correct results
t3
The application considers the table name invalid because it is twenty bytes long.
The EUC client is used to query the DBCS database from the CLP. Although the table name is twenty bytes long on the client, the queries are successful because the table name is eighteen bytes long at the server.

Using the DESCRIBE Statement

A DESCRIBE performed against an EUC database will return information about mixed character and GRAPHIC columns based on the definition of these columns in the database. This information is based on code page of the server, before it is converted to the client's code page.

When you perform a DESCRIBE against a select list item which is resolved in the application context (for example VALUES SUBSTR(?,1,2)); then for any character or graphic data involved, you should evaluate the returned SQLLEN value along with the returned code page. If the returned code page is the same as the application code page, there is no expansion. If the returned code page is the same as the database code page, expansion is possible. Select list items which are FOR BIT DATA (code page 0), or in the application code page are not converted when returned to the application, therefore there is no expansion or contraction of the reported length.

EUC Application with DBCS Database

If your application's code page is an EUC code page, and it issues a DESCRIBE against a database with a DBCS code page, the information returned for CHAR and GRAPHIC columns is returned in the database context. For example, a CHAR(5) column returned as part of a DESCRIBE has a value of five for the SQLLEN field. In the case of non-EUC data, you allocate five bytes of storage when you fetch the data from this column. With EUC data, this may not be the case. When the code page conversion from DBCS to EUC takes place, there may be an increase in the length of the data due to the different encoding used for characters for CHAR columns. For example, with the Traditional-Chinese character set, the maximum increase is double. That is, the maximum character length in the DBCS encoding is two bytes which may increase to a maximum character length of four bytes in EUC. For the Japanese code set, the maximum increase is also double. Note, however, that while the maximum character length in Japanese DBCS is two bytes, it may increase to a maximum character length in Japanese EUC of three bytes. Although this increase appears to be only by a factor of 1.5, the single-byte Katakana characters in Japanese DBCS are only one byte in length, while they are two bytes in length in Japanese EUC. See "Character Conversion Expansion Factor" for more information on determining the maximum size.

Possible changes in data length as a result of character conversions apply only to mixed character data. Graphic character data encoding is always the same length, two bytes, regardless of the encoding scheme. To avoid losing the data, you need to evaluate whether an unequal code page situation exists, and whether or not it is between a EUC application and a DBCS database. You can determine the database code page and the application code page from tokens in the SQLCA returned from a CONNECT statement. For more information, refer to "Deriving Code Page Values", or refer to the SQL Reference. If such a situation exists, your application needs to allocate additional storage for mixed character data, based on the maximum expansion factor for that encoding scheme.

DBCS Application with EUC Database

If your application code page is a DBCS code page and issues a DESCRIBE against an EUC database, a situation similar to that in "EUC Application with DBCS Database" occurs. However, in this case, your application may require less storage than indicated by the value of the SQLLEN field. The worst case in this situation is that all of the data is single-byte or double-byte under EUC, meaning that exactly SQLLEN bytes are required under the DBCS encoding scheme. In any other situation, less than SQLLEN bytes are required because a maximum of two bytes are required to store any EUC character.

Using Fixed or Variable Length Data Types

Due to the possible change in length of strings when conversions occur between DBCS and EUC code pages, you should consider not using fixed length data types. Depending on whether you require blank padding, you should consider changing the SQLTYPE from a fixed length character string, to a varying length character string after performing the DESCRIBE. For example, if an EUC to DBCS connection is informed of a maximum expansion factor of two, the application should allocate ten bytes (based on the CHAR(5) example in "EUC Application with DBCS Database").

If the SQLTYPE is fixed-length, the EUC application will receive the column as an EUC data stream converted from the DBCS data (which itself may have up to five bytes of trailing blank pads) with further blank padding if the code page conversion does not cause the data element to grow to its maximum size. If the SQLTYPE is varying-length, the original meaning of the content of the CHAR(5) column is preserved, however, the source five bytes may have a target of between five and ten bytes. Similarly, in the case of possible data shrinkage (DBCS application and EUC database), you should consider working with varying-length data types.
Note: It may also be necessary to promote the type. For example, if data originally selectable into a VARCHAR(3000) is expected to grow to a maximum of double its current size, a LONG VARCHAR will need to be used instead since VARCHARs can be defined up to a maximum of only 4 000 bytes in length.

An alternative to either allocating extra space or promoting the data type is to select the data in fragments. For example, to select the same VARCHAR(3000) which may be up to 6000 bytes in length after the conversion you could perform two selects, of SUBSTR(VC3000, 1, LENGTH(VC3000)/2) and SUBSTR(VC3000, (LENGTH(VC3000)/2)+1) separately into 2 VARCHAR(3000) application areas. This method is the only possible solution when the data type is no longer promotable. For example, a CLOB encoded in the Japanese DBCS code page with the maximum length of 2 gigabytes is possibly up to twice that size when encoded in the Japanese EUC code page. This means that the data will have to be broken up into fragments since there is no support for a data type in excess of 2 gigabytes in length.

Character Conversion String Length Overflow

In EUC and DBCS unequal code page environments, situations may occur after conversion takes place, when there is not enough space allocated in a column to accommodate the entire string. In this case, the maximum expansion will be twice the length of the string in bytes. In cases where expansion does exceed the capacity of the column, SQLCODE -334 (SQLSTATE 22524) is returned.

This leads to situations that may not be immediately obvious or previously considered as follows:

An SQL statement may be no longer than 32 765 bytes in length. If the statement is complex enough or uses enough constants or database object names that may be subject to expansion upon conversion, this limit may be reached earlier than expected.
SQL identifiers are allowed to expand on conversion up to their maximum lengths which is eight bytes for short identifiers and eighteen bytes for long identifiers.
Host language identifiers are allowed to expand on conversion up to their maximum length which is 255 bytes.
When the character fields in the SQLCA structure are converted, they are allowed to expand to no more than their maximum defined lengths.

Rules for String Conversions

If you are designing applications for mixed code page environments, familiarize yourself with the concepts in the Rules for String Conversions section of the SQL Reference for any of the following situations:

Corresponding string columns in full selects with set operations (UNION, INTERSECT and EXCEPT)
Operands of concatenation
Operands of predicates (with the exception of LIKE)
Result expressions of a CASE statement
Arguments of the scalar function COALESCE (and VALUE)
Expression values of the IN list of an IN predicate
Corresponding expressions of a multiple row VALUES clause.

In these situations, conversions may take place to the application code page instead of the database code page.

Character Conversions Past Data Type Limits

In EUC and DBCS unequal code page environments, situations may occur after conversion takes place, when the length of the mixed character or graphic string exceeds the maximum length allowed for that data type. If the length of the string, after expansion, exceeds the limit of the data type, then type promotion does not occur. Instead, an error message is returned indicating that the maximum allowed expansion length has been exceeded. This situation is more likely to occur while evaluating predicates than with inserts. With inserts, the column width is more readily known by the application, and the maximum expansion factor can be readily taken into account. In many cases, this side effect of character conversion can be avoided by casting the value to an associated data type with a longer maximum length. For example, the maximum length of a CHAR value is 254 bytes while the maximum length of a VARCHAR is 4000 bytes. In cases where expansion does exceed the maximum length of the data type, an SQLCODE -334 (SQLSTATE 22524) is returned.

Character Conversions in Stored Procedures

Mixed character or graphic data specified in host variables and SQLDAs in sqleproc() or SQL CALL invocations are converted in situations where the application and database code pages are different. In cases where string length expansion occurs as a result of conversion, you receive an SQLCODE -334 (SQLSTATE 22524) if there is not enough space allocated to handle the expansion. Thus you must be sure to provide enough space for potentially expanding strings when developing stored procedures. You should use varying length data types with enough space allocated to allow for expansion.

[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]