Installation:

2

Requirements:

2

Bug Reports:

3

Using CodeAnalyzer:

3

Capabilities

3

Known Limitations:

4

Operation:

4

Output:

4

Function Reference Table

4

Subroutine Maps

5

Algorithm:

6

Rational

7

Major Variables:

8

Created in ReadRawSource( ).

8

Created in MapNMaskCommentsNLiterals( ).

8

Created in MakeLogicalLines( ).

8

Created in FindLabels( ).

8

Created in FunctionAnalysis( ).

9

Extending the Program

9

Design considerations

9

Reformatting

10

Desired or Future Developments

10

Variable Analysis

10

Subroutine Discussions

11

GenericError

11

Change History

12

CodeAnalyzer is a generic engine for analyzing and formatting REXX code. This is a tool for the REXX programmer who works with large, complex programs. I release the program in the hope others might find it useful and in the hope others might contribute to its development.

Remember, this program is a work in progress. I place the code in public domain in so far as I have any right to do so.

Doug Rickman

MSFC/NASA

Doug.Rickman@msfc.nasa.gov

August 28, 2003

Good luck! Remember the problem with knowing what you are doing is that you have deluded yourself. 2/17/94

Installation:

After checking the Requirements section, assuming all is well place the program CodeAnalyzer.cmd in the directory of your choice. How to use it is covered in the section "Operation".

Requirements:

This version of the code has been checked out under OS/2 with the IBM Object REXX interpreter. It makes calls only to the built in functions and to the REXXUTIL library provided by IBM. Patrick McPhee has made a version of the library available for other operating systems, though I don't know if the required functions are present. Again, look into CodeAnalyzer.AnalOCode.txt to find what and where the calls are at in CodeAnalyzer.cmd.

You must have a REXX interpreter which provides the COUNTSTR( ) function. The "Classic" REXX under OS/2 does not have this function. The newer interpreter available under Object REXX does have this function. OS/2 users may switch between the Classic and Object REXX versions by using BootDrive:\os2\switchrc.cmd. A reboot is necessary for the replacement DLLs to be loaded. Those that do not have the COUNTSTR function can easily replace this call with a subroutine of their creation.

Files:

CodeAnalyzer.cmd - program

CodeAnalyzer.ico - pretty icon

CodeAnalyzer.AnalOCode.txt - Example output from CodeAnalyzer when run on itself.

Documentation.html - This document in .html format.

Documentation.lwp - This document.

Documentation.ps - This document in .ps format. Prettier than the .html.

NormalRun.log - Example of output to screen when running CodeAnalyzer.

Bug Reports:

Well, first of all I did not release this code so that I could have more work. But I will look at problems on a time available basis. If you do have a problem and want me to look at it run the program with redirection of STDOUT to a file, i.e.

CodeAnalyzer YourProgram > errorlog.txt

Do this because the output of GenericError( ) will usually exceed the 23 line limit of a simple shell. Therefore a screen capture looses a lot. Obviously you will need to send a copy of the offending code also.

Using CodeAnalyzer:

Capabilities

Known Limitations:

Operation:

To execute from the command line

CodeAnalyzer.cmd PROGRAM

where "PROGRAM" is the REXX code to be analyzed. Of course this assumes CodeAnalyzer.cmd is either in the current directory or along the path. CodeAnalyzer will post progress information to the screen and create a file with the extension ".AnalOCode.txt" in the directory of PROGRAM. The distribution archive provides examples of both outputs. The information to the screen is provided in the file NormalRun.log of the. An example of the ".AnalOCode.txt" output is provided in the file CodeAnalyzer.AnalOCode.txt. Both are from a run where "PROGRAM" was CodeAnalyzer.cmd. For a discussion on the output file's contents see the section "Output".

Output:

One might hope that most of the output is fairly self evident. Please note that there is much more output possible than what the delivered code produces. To access the other outputs you will need to look into the code. For example, in the subroutine "Main" you can dump the reformated lines.

Function Reference Table

The table of function references does need a bit of explanation. I will use the following fragment for this discussion.

All Recognized Function and Subroutine References

Line Ref_Type Beg End NAME SOURCE BegA EndA |

00034 FUNCTION 4 14 RXFUNCQUERY "BI Library" 15 33 |

00035 CALL 6 14 RXFUNCADD "BI Library" 16 62 |

00036 CALL 6 20 REXXLIBREGISTER "REXXLIB Library" 22 22 |

00038 FUNCTION 4 14 RXFUNCQUERY "BI Library" 15 30 |

The "line" column gives the line number in the raw source code.

The "Ref_Type" denotes the nature of the reference used. A FUNCTION was done using the syntax

rc=function( )

A CALL was done using the CALL instruction syntax. There are several possible reference types beside these. For example, there is a "Var_CALL", where the CALL instruction has the form

CALL (Variable)

The "Beg", "End", "BegA", and "EndA" values refer to are the character numbers in the record. These are positions AFTER leading spaces have been removed! "Beg" and "End" are the positions where the name of the referenced function. "BegA" and "EndA" are the positions holding the arguments passed to the function.

"Name" gives the name of the function and "Source" gives what library provides this function. "BI Library" means the built-in functions provided by the interpreter.

If the analyzed source code has more than one reference on a single line this table repeats itself horizontally. Thus it can get rather wide.

Subroutine Maps

The subroutine maps tell what subroutines a given routine calls and which subroutine was called by another routine. This is done using two tables. The following examples are portions taken from an analysis of an earlier version of CodeAnalyzer.

Subroutine Reference Map 1:

ROUTINE - CALLED

_________________________ _________________________

ProgramBegan - COMMANDLINEEXECUTION

- QUOTE

......................... .........................

COMMANDLINEEXECUTION - MAIN

- QUOTE

......................... .........................

FINDCALLINSTRUCTIONS - FINDPARENTHESES

- MATCHSUBROUTINE

- SIGNALANALYSIS

......................... .........................

FINDCALLS2SUBROUTINES - FINDCALLINSTRUCTIONS

- FUNCTIONANALYSIS

- SIGNALANALYSIS

This table shows that the program began and called CommandLineExecution and Quote. The opening code in any program is given the name "ProgramBegan" and this "label" is used until the first label in the program is found. One can then see that CommandLineExecution then called "Main" and "Quote" again.

To learn what routines are calling a subroutine see the second map table. You can see "FindCallInstructions" was called only by "FindCalls2Subroutines" which was in turn only called by "Main". Looking back at the first reference map table we can see that "FindCallInstructions" calls three other routines.

Subroutine Reference Map 2:

ROUTINE - WAS CALLED BY

_________________________ _________________________

CLEARCOMMENTBLOCK - MAPNMASKCOMMENTSNLITERALS

......................... .........................

CLEARLITERALSTRINGS - MAPNMASKCOMMENTSNLITERALS

......................... .........................

COMMANDLINEEXECUTION - ProgramBegan

......................... .........................

FINDCALLINSTRUCTIONS - FINDCALLS2SUBROUTINES

......................... .........................

FINDCALLS2SUBROUTINES - MAIN

......................... .........................

FINDDIRECTIVES - FINDLABELSNDIRECTIVES

......................... .........................

FINDLABELS - FINDLABELSNDIRECTIVES

......................... .........................

FINDLABELSNDIRECTIVES - MAIN

......................... .........................

FINDLITERALS - MAPNMASKCOMMENTSNLITERALS

......................... .........................

FINDPARENTHESES - FINDCALLINSTRUCTIONS

- FUNCTIONANALYSIS

......................... .........................

Algorithm:

The principal actions of the program are initiated in the subroutine MAIN. The steps are

  1. Read the raw source file into memory [rc = ReadRawSource(in)]
In concept there appears to be an ambiguity in REXX interpreters about nesting comments and quotes inside of each other. I have chosen to assume that bounding comments are to be found first.

Recognizing comments and quotes is the single problem restricting CodeAnalyzer to working code. I currently assume comments and quotes are balanced. In code that does not work, this is not necessarily true. To recognize and handle unbalanced situations would require changes in MapNMaskCommentsNLiterals( ). Other than this, there is no real necessity for the code being analyzed to already work.

  1. Find the comments in raw source. [rc = MapNMaskCommentsNLiterals('RAW_C',)]
  2. Find the literal strings in the raw source.' [rc = MapNMaskCommentsNLiterals('RAW_L',)]
It is now possible to reformat the raw code into a consistent pattern.
  1. Make the logical lines.' [rc = MakeLogicalLines( )]
  2. Find the comments in the logical lines. [rc = MapNMaskCommentsNLiterals('LOGICAL_C','MAP')]
  3. Find the literal strings in the logical lines. [rc = MapNMaskCommentsNLiterals('LOGICAL_L','MAP')]
There is a copy of the ith new, clean line in the variable LogicalLineI.i LogicalLine1.i holds that logical line with comments removed.LogicalLine2.i holds that logical line with comments and quotes removed. SourceIndex.i is the line number in the raw source for the ith logical line. The position and contents of the comments and literal strings for the line are in the compound variables "Commnet." and "Literals.".
  1. Labels and directives are then found. [rc = FindLabelsNDirectives( )]
  2. The list of known functions, i.e. the DLL libraries, is loaded. [rc = LoadKnownFunctions( )]
  3. The list of default conditions is loaded. [rc = LoadDefaultConditions( )]
This list: ANY, ERROR, FAILURE', HALT, SYNTAX, etc, is not used by the existing code. It is expected to be used in the subroutine SignalAnalysis.
  1. Find all references to functions and subroutines. [rc = FindCalls2Subroutines( )]
This is the heart of the logic mapping operation. Information is stored in the "FRef.".
  1. Write the contents of the reference tables. [rc = WriteFRefTable( )]
  2. Map the relationship between subroutines. [rc = SubroutineAnalyzer(in)]

Rational

Much of the existing code reflects my desire to extend the analytical part of the program. For example, I would like to be able to find all variables used in a specific subroutine and compare that to the map of subroutines and their exposed variable lists. I have also tried to consider the future needs that might arise as the program is extended.

Major Variables:

Created in ReadRawSource( ).

 data. - Original source code.

Created in MapNMaskCommentsNLiterals( ).

 dataEdited1. - Source after replacing all comments with blanks.

 dataEdited2. - Source after blanking comments and literal strings.

Created in MakeLogicalLines( ).

 LogicalLineI. = Original source code.

 LogicalLine1. = Comments are blanked out.

 LogicalLine2. = Comments and literal strings are blanked out.

 SourceIndex.j = First line in original source of logical line j.

 Comment.i.0 = Number of comments in line i.

Comment.i._Str.k = Character position for start of comment k in line i.

Comment.i._End.k = Character position for end of comment k in line i.

Comment.i._Txt.k = Text of comment k in line i.  

 Literal.i.0 = Number of literals in line i.

Literal.i._Str.j = Character position for start of literal k in line i.

Literal.i._End.j = Character position for end of literal k in line i.

Literal.i._Typ.j = Type of literal k in line i (S|D - single or double).

 Literal.i._Txt.j = Text of literal k in line I.

Notes -

LogicalLines are lines after editing out of continuations, semicolons and blank lines.

Created in FindLabels( ).

Label.i = Line# || Type ("STRING"|"SYMBOL") || FunctionName

Notes - Since most programs may not name the initial routine I have chosen to refer to the intitial routine by the lablel "ProgramBegan".

Created in FunctionAnalysis( ).

FRef.i.0 = Number of functions referenced in line i.

FRef.i._Str.k = char 1 in name of kth function referenced in line i.

FRef.i._End.k = Last char of name of kth function referenced in i.

FRef.i._Txt.k = Text string (name) kth function referenced in line i.

 FRef.i._Typ.k = Type of function, kth function referenced in line i.

 FRef.i._Open.k = Postion of "(" for kth function referenced in line i.

 FRef.i._Close.k = Postion of ")" for kth function referenced in line i.

 FRef.i._Knd.k = Nature of reference, subroutine call or function.

Notes -

1. Positions are relative to the first non-blank character in the line.

 2. FRef.line._Open.k = FRef.line._Close.k when the reference is done using the CALL instruction and there are no arguments passed.

 3. For CALL instructions FRef.i._Open.k and FRef.i._Close.k give positions of first and last characters of argument string. If there are no arguments FRef.i._Close.k = FRef.i._Open.k.

 4. For a CALL (variable) FRef.i._Str.k and FRef.i._End.k are the same as FRef.i._Open.k and FRef.i._Close.k.

 5. FRef.i._Knd.k = "CALL" | "FUNCTION"

 6. For CALL instructions FRef.i._Open.k and FRef.i._Close.k are computed after all comment blocks have been deleted.

Extending the Program

Design considerations

Reformatting

If your interest is in reformatting REXX code look into the subroutine MAIN. By the line "say 'Finished finding literals in logical lines.' all comments, line continuations and quoted strings have been identified and a consistent, though unformatted, line of code created. There is a copy of the new, clean line in the variable LogicalLineI. LogicalLine1. holds the logical line with comments removed. LogicalLine2. holds the logical line with comments and quotes removed. The contents of comments and quotes are available. To see how look at the code following the comment "Debug aid and illustration ..." that immediately follows the above indicated line.

Logically, a line reformat operation would be inserted in this location. Remember, if the reformat operation modifies the line numbers the "Comment." and "Literal." indices will have to be updated.

Desired or Future Developments

There are several things I would like to implement or need to enhance.

The program currently does not handle a comment between the function's name and the opening parenthesis. To do so reference Comment.i._Str. and Comment.i._End. and see if one of the comments fills the space.

Handling unbalanced comment and quote blocks would remove the restriction that CodeAnalyzer be run only on working programs. See the note following step 1 in the Algorithm section of this document.

I would like to handle SIGNAL instructions more completely. This is why the list of default "CONDITONS" is provided.

Variable Analysis

I would like to be able to find all variables passed or used within a specific subroutine and compare that to the map of subroutines and their exposed variable lists. Detecting the variables is the real problem.

It might (???) be done using SysDumpVariables( ), which is part of REXXUTIL. How to implement this is a question! SysDumpVariables( ) gives a list of all variables within the current scope and have already been set. Thus

avar = 'b'

SysDumpVariables('afile')

bvar = 'b'

knows about "avar" but is unaware of "bvar".

Output of SysDumpVariables( ) is either to a file or to standard out. Presumably a named pipe could also be used but there is no standard, OS independent library providing named pipe handling. Assume we will output to a temp file.

First we must find the range of each subroutine in the source being analyzed. Conceptually, do this by looking through the code in from end to beginning looking for a label. For each label found check to see if the line above it is either a RETURN or EXIT instruction. If not this is an alternate entry point into a subroutine and can be ignored. If either RETURN or EXIT is found the lines from the current line number to the label define a subroutine. In practice it is not necessary to actually read through all the data. Just use "Label." The first word is the line number holding the label.

Having defined the limits of a subroutine in the source then it must be "executed" in order for SysDumpVariables( ) to recognize the variables. How to do this without endless syntax errors I have no idea.

An alternative to using SysDumpVariables( ) is to expand the logic used in GenericError( ) in the section where "if GenericErrorQUIET = 'DECODE' then do ...". The current logic is rudimentary at best.

Subroutine Discussions

GenericError

CodeAnalyzer incorporates the subroutine GenericError. This routine provides the programmer information in the event of failures. In addition to line number of failure, CONDITION: SYNTAX:, INSTRUCTION:, SIGNAL:, DESCRIPTION: and STATUS:, it gives the current value of strings and the subroutine history to the point of failure. I find these last two details very helpful in debugging code.

Operation of the subroutine is controlled by the variable "GenericErrorQuiet". This variable is set on entry to CodeAnalyzer. For a discussion of other options see the comments in the subroutine.

In order for the subroutine history to be available the variable SUBROUTINEHISTORY is created. On entry to each subroutine the name of the subroutine is prepended to the variable. On leaving the leading word of the variable is removed. When PROCEDURE instructions are used the SUBROUTINEHISTORY variable must be exposed.

A simple illustration of a GenericError output follows. The text in red is the information provided by GenericError. The cause of the failure in this case was leaving a VisProREXX call in the program when it was not being run under VisProREXX.

Read 3462 lines from source file, D:\source\VisProSource\CodeAnalyzer\CodeAnalyzer.cmd.

Finished finding comments in source.

Finished finding literals in source.

Finished making 2957 logical lines.

Finished finding comments in logical lines.

Finished finding literals in logical lines.

Finished finding labels and directives.

Finished loading table of known functions.

Finished loading table of default conditions.

Finished finding calls to subroutines.

Finished writing function references table.

Finished with the subroutine analyzer.

Analysis of the code: call _VPAppExit /* This will force an exit without opening a panel. */

LIT "_VPAppExit" = _VPAPPEXIT

LIT "This" = THIS

LIT "will" = WILL

LIT "force" = FORCE

LIT "an" = AN

LIT "exit" = EXIT

LIT "without" = WITHOUT

LIT "opening" = OPENING

LIT "a" = A

VAR "panel." = PANEL.

LIT "panel" = PANEL

A serious REXX ERROR has occurred! I do not know what.

Other information for a programmer's use:

The line that generated this error is: 227

" call _VPAppExit /* This will force an exit without opening a panel. */ "

Subroutine history to point of failure, most recent first:

CommandLineExit()

Begin_Program()

Condition: SYNTAX

Instruction: SIGNAL

Description:

Status: OFF

RC: 43 SYS0043: Drive %1 cannot locate a specific area or track on the disk.

Change History

August 26, 2003

August 22, 2003