IBM Books

Embedded SQL Programming Guide


User-Defined Functions (UDF)

A user-defined function is a mechanism with which you can write your own extensions to SQL. The built-in functions supplied with DB2 are a useful set of functions, but they may not satisfy all of your requirements. Thus, you may need to extend SQL for the following reasons:

Why Use UDFs?

In writing DB2 applications, you have a choice when implementing desired actions or operations:

Although it may seem easier to implement new operations as subroutines or functions in your application, there are good reasons why you should consider using UDFs:

UDF Concepts

The following is a discussion of the important concepts you need to know prior to coding UDFs:

The concept of function path, the SET CURRENT FUNCTION PATH statement, and the function selection algorithm are discussed in detail in the SQL Reference. The FUNCPATH precompile and bind options are discussed in detail in the Command Reference.

Implementing UDFs

The process of implementing a UDF requires the following steps:

  1. Writing the UDF
  2. Compiling the UDF
  3. Linking the UDF
  4. Debugging the UDF
  5. Registering the UDF with DB2.

After these steps are successfully completed, your UDF is ready for use in data manipulation language (DML) or data definition language (DDL) statements such as CREATE VIEW. The steps of writing and defining UDFs are discussed in the following sections, followed by a discussion on using UDFs. For information on compiling and linking UDFs, see the DB2 SDK Building Applications book for your operating system. For information on debugging your UDF, see "Debugging your UDF".

Writing UDFs

You can find the details of how you write UDFs in Chapter 7. "Writing User-Defined Functions (UDFs)". This includes the details on the interface between DB2 and UDFs, UDF coding considerations, UDF coding examples, and UDF debugging information. For information on the related tasks of compiling and linking your UDFs, see the DB2 SDK Building Applications book for your operating system.

Registering UDFs

You should register the UDF to DB2 after you have written and completely tested the actual code. Note that it is possible to define the UDF prior to actually writing it. However, to avoid any problems with running your UDF, you are encouraged to write and test it extensively before registering it. For information on testing your UDF, see "Debugging your UDF".

Use the CREATE FUNCTION statement to define (or register) your UDF to DB2. You can find detailed explanations for this statement and its options in the SQL Reference.

Examples of Registering UDFs

The examples which follow illustrate a variety of typical situations where UDFs can be registered.

Note that in these examples:

Example: Exponentiation

Suppose you have written an external UDF to perform exponentiation of floating point values, and wish to register it in the MATH schema. Assume that you have DBADM authority. As you have tested the function extensively, and know that it does not represent any integrity exposure, you define it as NOT FENCED. By virtue of having DBADM authority, you possess the database authority, CREATE_NOT_FENCED, which is required to define the function as NOT FENCED.

     CREATE FUNCTION MATH.EXPON (DOUBLE, DOUBLE) 
       RETURNS DOUBLE 
       EXTERNAL NAME '/common/math/exponent' 
       LANGUAGE C 
       PARAMETER STYLE DB2SQL 
       NO SQL 
       DETERMINISTIC
       NO EXTERNAL ACTION 
       NOT FENCED

In this example, the system uses the NOT NULL CALL default value. This is desirable since you want the result to be NULL if either argument is NULL. Since you do not require a scratchpad and no final call is necessary, the NO SCRATCHPAD and NO FINAL CALL default values are used. As there is no reason why EXPON cannot be parallel, the ALLOW PARALLELISM default value is used.

Example: String Search

Your associate, Willie, has written a UDF to look for the existence of a given short string, passed as an argument, within a given CLOB value, which is also passed as an argument. The UDF returns the position of the string within the CLOB if it finds the string, or zero if it does not. Because you are concerned with database integrity for this function as you suspect the UDF is not fully tested, you define the function as FENCED.

Additionally, Willie has written the function to return a FLOAT result. Suppose you know that when it is used in SQL, it should always return an INTEGER. You can create the following function:

     CREATE FUNCTION FINDSTRING (CLOB(500K), VARCHAR(200)) 
       RETURNS INTEGER 
       CAST FROM FLOAT 
       SPECIFIC "willie_find_feb95" 
       EXTERNAL NAME '/u/willie/testfunc/testmod!findstr' 
       LANGUAGE C 
       PARAMETER STYLE DB2SQL 
       NO SQL 
       DETERMINISTIC
       NO EXTERNAL ACTION 
       FENCED

Note that a CAST FROM clause is used to specify that the UDF body really returns a FLOAT value but you want to cast this to INTEGER before returning the value to the statement which used the UDF. As discussed in the SQL Reference, the INTEGER built-in function can perform this cast for you. Also, you wish to provide your own specific name for the function and later reference it in DDL (see "Example: String Search over UDT"). Because the UDF was not written to handle NULL values, you use the NOT NULL CALL default value. And because there is no scratchpad, you use the NO SCRATCHPAD and NO FINAL CALL default values. As there is no reason why FINDSTRING cannot be parallel, the ALLOW PARALLELISM default value is used.

Example: BLOB String Search

Because you want this function to work on BLOBs as well as on CLOBs, you define another FINDSTRING taking BLOB as the first parameter:

     CREATE FUNCTION FINDSTRING (BLOB(500K), VARCHAR(200)) 
       RETURNS INTEGER 
       CAST FROM FLOAT 
       SPECIFIC "willie_fblob_feb95" 
       EXTERNAL NAME '/u/willie/testfunc/testmod!findstr' 
       LANGUAGE C 
       PARAMETER STYLE DB2SQL 
       NO SQL 
       DETERMINISTIC 
       NO EXTERNAL ACTION 
       FENCED

This example illustrates overloading of the UDF name, and shows that multiple UDFs can share the same body. Note that although a BLOB cannot be assigned to a CLOB, the same source code can be used. There is no programming problem in the above example as the programming interface for BLOB and CLOB between DB2 and UDF is the same; length followed by data. DB2 does not check if the UDF using a particular function body is in any way consistent with any other UDF using the same body.

Example: String Search over UDT

This example is a continuation of the previous example. Say you are satisfied with the FINDSTRING functions from "Example: BLOB String Search", but now you have defined a distinct type BOAT with source type BLOB. You also want FINDSTRING to operate on values having data type BOAT, so you create another FINDSTRING function. This function is sourced on the FINDSTRING which operates on BLOB values in "Example: BLOB String Search". Note the further overloading of FINDSTRING in this example:

     CREATE FUNCTION FINDSTRING (BOAT, VARCHAR(200)) 
       RETURNS INT 
       SPECIFIC "slick_fboat_mar95" 
       SOURCE SPECIFIC "willie_fblob_feb95"

Note that this FINDSTRING function has a different signature from the FINDSTRING functions in "Example: BLOB String Search", so there is no problem overloading the name. You wish to provide our own specific name for possible later reference in DDL. Because you are using the SOURCE clause, you cannot use the EXTERNAL NAME clause or any of the related keywords specifying function attributes. These attributes are taken from the source function. Finally, observe that in identifying the source function you are using the specific function name explicitly provided in "Example: BLOB String Search". Because this is an unqualified reference, the schema in which this source function resides must be in the function path, or the reference will not be resolved.

Example: External Function with UDT Parameter

You have written another UDF to take a BOAT and examine its design attributes and generate a cost for the boat in Canadian dollars. Even though internally, the labor cost may be priced in German marks, or Japanese yen, or US dollars, this function needs to generate the cost to build the boat in the required currency, Canadian dollars. This means it has to get current exchange rate information from the exchange_rate file, managed outside of DB2, and the answer depends on what it finds in this file. This makes the function NOT DETERMINISTIC (or VARIANT).

     CREATE FUNCTION BOAT_COST (BOAT) 
       RETURNS INTEGER 
       EXTERNAL NAME '/u/marine/funcdir/costs!boatcost' 
       LANGUAGE C 
       PARAMETER STYLE DB2SQL 
       NO SQL 
       NOT DETERMINISTIC
       NO EXTERNAL ACTION 
       FENCED

Observe that CAST FROM and SPECIFIC are not specified, but that NOT DETERMINISTIC is specified. Here again, FENCED is chosen for safety reasons.

Example: AVG over a UDT

This example implements the AVG column function over the CANADIAN_DOLLAR distinct type. See "Example: Money" for the definition of CANADIAN_DOLLAR. Strong typing prevents you from using the built-in AVG function on a distinct type. It turns out that the source type for CANADIAN_DOLLAR was DECIMAL, and so you implement the AVG by sourcing it on the AVG(DECIMAL) built-in function. The ability to do this depends on being able to cast from DECIMAL to CANADIAN_DOLLAR and vice versa, but since DECIMAL is the source type for CANADIAN_DOLLAR you know these casts will work.

     CREATE FUNCTION AVG (CANADIAN_DOLLAR) 
       RETURNS CANADIAN_DOLLAR 
       SOURCE "SYSIBM".AVG(DECIMAL(9,2))

Note that in the SOURCE clause you have qualified the function name, just in case there might be some other AVG function lurking in your function path.

Example: Counting

Your simple counting function returns a 1 the first time and increments the result by one each time it is called. This function takes no SQL arguments, and by definition it is a NOT DETERMINISTIC function since its answer varies from call to call. It uses the scratchpad to save the last value returned, and each time it is invoked it increments this value and returns it. You have rigorously tested this function, and possess DBADM authority on the database, so you will define it as NOT FENCED. (DBADM implies CREATE_NOT_FENCED.)

     CREATE FUNCTION COUNTER () 
       RETURNS INT 
       EXTERNAL NAME '/u/roberto/myfuncs/util!ctr' 
       LANGUAGE C 
       PARAMETER STYLE DB2SQL 
       NO SQL 
       NOT DETERMINISTIC
       NOT FENCED 
       SCRATCHPAD
       DISALLOW PARALLEL

Note that no parameter definitions are provided, just empty parentheses. The above function specifies SCRATCHPAD, and uses the default specification of NO FINAL CALL. In this case, as the size of the 100-byte scratchpad is sufficient, no storage has to be freed by means of a final call, and so NO FINAL CALL is specified. Since the COUNTER function requires that a single scratchpad be used to operate properly, DISALLOW PARALLEL is added to prevent DB2 from operating it in parallel. To see an implementation of this COUNTER function, refer to "Example: Counter".

EXAMPLE: Counting with an OLE automation object

This example implements the previous counting example as an OLE (Object Linking and Embedding) automation object, counter, with an instance variable, nbrOfInvoke, to keep track of the number of invocations. Every time the UDF gets invoked, the increment method of the object increments the nbrOfInvoke instance variable and returns its current state. The automation object is registered in the Windows registry with the OLE programmatic identifier (progID) bert.bcounter.

     CREATE FUNCTION bcounter ()
       RETURNS integer
       EXTERNAL NAME 'bert.bcounter!increment'
       LANGUAGE OLE 
       PARAMETER STYLE DB2SQL
       SCRATCHPAD 
       NOT DETERMINISTIC 
       FENCED
       NULL CALL 
       NO SQL
       NO EXTERNAL ACTION
       DISALLOW PARALLEL;

The implementation of the class counter is shown in "Example: Counter OLE Automation UDF in BASIC" and in "Example: Counter OLE Automation UDF in C++". For details of OLE support with DB2, see "Writing OLE Automation UDFs".

EXAMPLE: Table Function Returning Document IDs

You have written a table function which returns a row consisting of a single document identifier column for each known document in your text management system which matches a given subject area (the first parameter) and contains the given string (second parameter). This UDF uses the functions of the text management system to quickly identify the documents:

     CREATE FUNCTION DOCMATCH (VARCHAR(30), VARCHAR(255))
       RETURNS TABLE (DOC_ID CHAR(16))
       EXTERNAL NAME '/common/docfuncs/rajiv/udfmatch'
       LANGUAGE C
       PARAMETER STYLE DB2SQL
       NO SQL
       DETERMINISTIC
       NO EXTERNAL ACTION
       NOT FENCED
       SCRATCHPAD
       FINAL CALL
       DISALLOW PARALLEL
       CARDINALITY 20

Within the context of a single session it will always return the same table, and therefore it is defined as DETERMINISTIC. Note the RETURNS clause which defines the output from DOCMATCH, including the column name DOC_ID, and that FINAL CALL must be specified for each table function. In addition, the DISALLOW PARALLEL keyword is added as table functions cannot operate in parallel. Although the size of the output from DOCMATCH is highly variable, CARDINALITY 20 is a representative value, and is specified to help the DB2 optimizer to make good decisions.

Typically this table function would be used in a join with the table containing the document text, as follows:

  SELECT T.AUTHOR, T.DOCTEXT                                           
  FROM DOCS as T, TABLE(DOCMATCH('MATHEMATICS', 'ZORN''S LEMMA')) as F 
  WHERE T.DOCID = F.DOC_ID                                             

Note the special syntax (TABLE keyword) for specifying a table function in a FROM clause. In this invocation, the docmatch() table function returns a row containing the single column DOC_ID for each mathematics document referencing Zorn's Lemma. These DOC_ID values are joined to the master document table, retrieving the author's name and document text.

Using UDFs

Scalar and column UDFs can be invoked within an SQL statement wherever an expression is valid (there are additional rules for all column functions that limit validity). Table UDFs can only be referenced in the FROM clause of a SELECT. The SQL Reference discusses all these contexts in detail. The discussion and examples used in this section focus on relatively simple SELECT statement contexts, but note that their use is not restricted to these contexts.

Refer to "UDF Concepts" for a summary of the use and importance of the function path and the function selection algorithm. You can find the details for both of these concepts in the SQL Reference. The resolution of any Data Manipulation Language (DML) reference to a function uses the function selection algorithm, so it is important to understand how it works.

Referring to Functions

Each reference to a function, whether it is a UDF, or a built-in function, contains the following syntax:

>--function_name--(--.----------------.--)---------->
                     | <-,----------< |              
                     '---expression---'              

In the above, function_name can be either an unqualified or a qualified function name, and the arguments can number from 0 to 90, and are expressions which may contain:

The position of the arguments is important and must conform to the function definition for the semantics to be correct. Both the position of the arguments and the function definition must conform to the function body itself. DB2 does not attempt to shuffle arguments to better match a function definition, and DB2 does not understand the semantics of the individual function parameters.

Use of column names in UDF argument expressions requires that the table references which contain the column have proper scope. For table functions referenced in a join, this means that for any argument which involves columns from another table or table function, that other table or table function must appear before the table function containing the reference, in the FROM clause. For a complete discussion of the rules for using columns in the arguments of table functions, see the chapter on Queries in the SQL Reference.

Examples of Function Invocations

Some valid examples of function invocations are:

      AVG(FLOAT_COLUMN) 
      BLOOP(COLUMN1) 
      BLOOP(FLOAT_COLUMN + CAST(? AS INTEGER)) 
      BLOOP(:hostvar :indicvar) 
      BRIAN.PARSE(CHAR_COLUMN CONCAT USER, 1, 0, 0, 1) 
      CTR() 
      FLOOR(FLOAT_COLUMN) 
      PABLO.BLOOP(A+B) 
      PABLO.BLOOP(:hostvar) 
      "search_schema"(CURRENT FUNCTION PATH, 'GENE')
      SUBSTR(COLUMN2,8,3) 
      SYSFUN.FLOOR(AVG(EMP.SALARY)) 
      SYSFUN.AVG(SYSFUN.FLOOR(EMP.SALARY)) 
      SYSIBM.SUBSTR(COLUMN2,11,LENGTH(COLUMN3)) 
      SQRT(SELECT SUM(length*length) 
           FROM triangles 
           WHERE id= 'J522' 
           AND legtype <> 'HYP') 

Note that if any of the above functions are table functions, the syntax to reference them is slightly different than presented above. For example, if PABLO.BLOOP is a table function, to properly reference it, use:

     TABLE(PABLO.BLOOP(A+B)) AS Q

Using Parameter Markers in Functions

An important restriction involves parameter markers; you cannot simply code the following:

     BLOOP(?)

As the function selection logic does not know what data type the argument may turn out to be, it cannot resolve the reference. You can use the CAST specification to provide a type for the parameter marker, for example INTEGER, and then the function selection logic can proceed:

     BLOOP(CAST(? AS INTEGER))

Using Qualified Function Reference

If you use a qualified function reference, you restrict DB2's search for a matching function to that schema. For example, you have the following statement:

     SELECT PABLO.BLOOP(COLUMN1) FROM T 

Only the BLOOP functions in schema PABLO are considered. It does not matter that user SERGE has defined a BLOOP function, or whether or not there is a built-in BLOOP function. Now suppose that user PABLO has defined two BLOOP functions in his schema:

     CREATE FUNCTION BLOOP (INTEGER) RETURNS ... 
     CREATE FUNCTION BLOOP (DOUBLE) RETURNS ...

BLOOP is thus overloaded within the PABLO schema, and the function selection algorithm would choose the best BLOOP, depending on the data type of the argument, column1. In this case, both of the PABLO.BLOOPs take numeric arguments, and if column1 is not one of the numeric types, the statement will fail. On the other hand if column1 is either SMALLINT or INTEGER, function selection will resolve to the first BLOOP, while if column1 is DECIMAL or DOUBLE, the second BLOOP will be chosen.

Several points about this example:

  1. It illustrates argument promotion. The first BLOOP is defined with an INTEGER parameter, yet you can pass it a SMALLINT argument. The function selection algorithm supports promotions among the built-in data types (for details, see the SQL Reference) and DB2 performs the appropriate data value conversions.

  2. If for some reason you want to invoke the second BLOOP with a SMALLINT or INTEGER argument, you have to take an explicit action in your statement as follows:
         SELECT PABLO.BLOOP(DOUBLE(COLUMN1)) FROM T
    

  3. Alternatively, if you want to invoke the first BLOOP with a DECIMAL or DOUBLE argument, you have your choice of explicit actions, depending on your exact intent:
         SELECT PABLO.BLOOP(INTEGER(COLUMN1)) FROM T
         SELECT PABLO.BLOOP(FLOOR(COLUMN1)) FROM T 
         SELECT PABLO.BLOOP(CEILING(COLUMN1)) FROM T 
         SELECT PABLO.BLOOP(INTEGER(ROUND(COLUMN1,0))) FROM T
    

    You should investigate these other functions in the SQL Reference. The INTEGER function is a built-in function in the SYSIBM schema. The FLOOR, CEILING, and ROUND functions are UDFs shipped with DB2, which you can find in the SYSFUN schema along with many other useful functions.

Using Unqualified Function Reference

If, instead of a qualified function reference, you use an unqualified function reference, DB2's search for a matching function normally uses the function path to qualify the reference. In the case of the DROP FUNCTION or COMMENT ON FUNCTION functions, the reference is qualified using the current authorization ID, if they are unqualified. Thus, it is important that you know what your function path is, and what, if any, conflicting functions exist in the schemas of your current function path. For example, suppose you are PABLO and your static SQL statement is as follows, where COLUMN1 is data type INTEGER:

     SELECT BLOOP(COLUMN1) FROM T

You have created the two BLOOP functions cited in "Using Qualified Function Reference", and you want and expect one of them to be chosen. If the following default function path is used, the first BLOOP is chosen (since column1 is INTEGER), if there is no conflicting BLOOP in SYSIBM or SYSFUN:

     "SYSIBM","SYSFUN","PABLO"

However, suppose you have forgotten that you are using a script for precompiling and binding which you previously wrote for another purpose. In this script, you explicitly coded your FUNCPATH parameter to specify the following function path for another reason that does not apply to your current work:

     "KATHY","SYSIBM","SYSFUN","PABLO"

If Kathy has written a BLOOP function for her own purposes, the function selection could very well resolve to Kathy's function, and your statement would execute without error. You are not notified because DB2 assumes that you know what you are doing. It becomes your responsibility to identify the incorrect output from your statement and make the required correction.

Summary of Function References

For both qualified and unqualified function references, the function selection algorithm looks at all the applicable functions, both built-in and user-defined, that have:

(Applicable functions means functions in the named schema for a qualified reference, or functions in the schemas of the function path for an unqualified reference.) The algorithm looks for an exact match, or failing that, a best match among these functions. The current function path is used, in the case of an unqualified reference only, as the deciding factor if two identically good matches are found in different schemas. The details of the algorithm can be found in the SQL Reference.

An interesting feature, illustrated by the examples at the end of "Using Qualified Function Reference", is the fact that function references can be nested, even references to the same function. This is generally true for built-in functions as well as UDFs; however, there are some limitations when column functions are involved.

Refining an earlier example:

     CREATE FUNCTION BLOOP (INTEGER) RETURNS INTEGER ... 
     CREATE FUNCTION BLOOP (DOUBLE) RETURNS INTEGER ...

Now consider the following DML statement:

     SELECT BLOOP( BLOOP(COLUMN1)) FROM T

If column1 is a DECIMAL or DOUBLE column, the inner BLOOP reference resolves to the second BLOOP defined above. Because this BLOOP returns an INTEGER, the outer BLOOP resolves to the first BLOOP.

Alternatively, if column1 is a SMALLINT or INTEGER column, the inner bloop reference resolves to the first BLOOP defined above. Because this BLOOP returns an INTEGER, the outer BLOOP also resolves to the first BLOOP. In this case, you are seeing nested references to the same function.

A few additional points important for function references are:


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]

[ DB2 List of Books | Search the DB2 Books ]