Installation: |
2 |
Requirements: |
2 |
Bug Reports: |
3 |
Using CodeAnalyzer: |
3 |
Capabilities |
3 |
Known Limitations: |
4 |
Operation: |
4 |
Output: |
4 |
Function Reference Table |
4 |
Subroutine Maps |
5 |
Algorithm: |
6 |
Rational |
7 |
Major Variables: |
8 |
Created in ReadRawSource( ). |
8 |
Created in MapNMaskCommentsNLiterals( ). |
8 |
Created in MakeLogicalLines( ). |
8 |
Created in FindLabels( ). |
8 |
Created in FunctionAnalysis( ). |
9 |
Extending the Program |
9 |
Design considerations |
9 |
Reformatting |
10 |
Desired or Future Developments |
10 |
Variable Analysis |
10 |
Subroutine Discussions |
11 |
GenericError |
11 |
Change History |
12 |
CodeAnalyzer is a generic engine for analyzing and formatting REXX code. This is a tool for the REXX programmer who works with large, complex programs. I release the program in the hope others might find it useful and in the hope others might contribute to its development.
Remember, this program is a work in progress. I place the code in public domain in so far as I have any right to do so.
Doug Rickman
MSFC/NASA
Doug.Rickman@msfc.nasa.gov
August 28, 2003
Good luck! Remember the problem with knowing what you are doing is that you have deluded yourself. 2/17/94
After checking the Requirements section, assuming all is well place the program CodeAnalyzer.cmd in the directory of your choice. How to use it is covered in the section "Operation".
This version of the code has been checked out under OS/2 with the IBM Object REXX interpreter. It makes calls only to the built in functions and to the REXXUTIL library provided by IBM. Patrick McPhee has made a version of the library available for other operating systems, though I don't know if the required functions are present. Again, look into CodeAnalyzer.AnalOCode.txt to find what and where the calls are at in CodeAnalyzer.cmd.
You must have a REXX interpreter which provides the COUNTSTR( ) function. The "Classic" REXX under OS/2 does not have this function. The newer interpreter available under Object REXX does have this function. OS/2 users may switch between the Classic and Object REXX versions by using BootDrive:\os2\switchrc.cmd. A reboot is necessary for the replacement DLLs to be loaded. Those that do not have the COUNTSTR function can easily replace this call with a subroutine of their creation.
CodeAnalyzer.cmd - program
CodeAnalyzer.ico - pretty icon
CodeAnalyzer.AnalOCode.txt - Example output from CodeAnalyzer when run on itself.
Documentation.html - This document in .html format.
Documentation.lwp - This document.
Documentation.ps - This document in .ps format. Prettier than the .html.
NormalRun.log - Example of output to screen when running CodeAnalyzer.
Well, first of all I did not release this code so that I could have more work. But I will look at problems on a time available basis. If you do have a problem and want me to look at it run the program with redirection of STDOUT to a file, i.e.
CodeAnalyzer YourProgram > errorlog.txt
Do this because the output of GenericError( ) will usually exceed the 23 line limit of a simple shell. Therefore a screen capture looses a lot. Obviously you will need to send a copy of the offending code also.
The following function libraries are recognized all or in part:
Currently the program does not produce cleanly formatted code as a separate product. Reformatted lines are created and used internal to the program, but are not written to disk in this version of the program. Please see the section Extending the Program for more information if this topic is of interest to you.
To execute from the command line
CodeAnalyzer.cmd PROGRAM
where "PROGRAM" is the REXX code to be analyzed. Of course this assumes CodeAnalyzer.cmd is either in the current directory or along the path. CodeAnalyzer will post progress information to the screen and create a file with the extension ".AnalOCode.txt" in the directory of PROGRAM. The distribution archive provides examples of both outputs. The information to the screen is provided in the file NormalRun.log of the. An example of the ".AnalOCode.txt" output is provided in the file CodeAnalyzer.AnalOCode.txt. Both are from a run where "PROGRAM" was CodeAnalyzer.cmd. For a discussion on the output file's contents see the section "Output".
One might hope that most of the output is fairly self evident. Please note that there is much more output possible than what the delivered code produces. To access the other outputs you will need to look into the code. For example, in the subroutine "Main" you can dump the reformated lines.
The table of function references does need a bit of explanation. I will use the following fragment for this discussion.
All Recognized Function and Subroutine References
Line Ref_Type Beg End NAME SOURCE BegA EndA |
00034 FUNCTION 4 14 RXFUNCQUERY "BI Library" 15 33 |
00035 CALL 6 14 RXFUNCADD "BI Library" 16 62 |
00036 CALL 6 20 REXXLIBREGISTER "REXXLIB Library" 22 22 |
00038 FUNCTION 4 14 RXFUNCQUERY "BI Library" 15 30 |
The "line" column gives the line number in the raw source code.
The "Ref_Type" denotes the nature of the reference used. A FUNCTION was done using the syntax
rc=function( )
A CALL was done using the CALL instruction syntax. There are several possible reference types beside these. For example, there is a "Var_CALL", where the CALL instruction has the form
CALL (Variable)
The "Beg", "End", "BegA", and "EndA" values refer to are the character numbers in the record. These are positions AFTER leading spaces have been removed! "Beg" and "End" are the positions where the name of the referenced function. "BegA" and "EndA" are the positions holding the arguments passed to the function.
"Name" gives the name of the function and "Source" gives what library provides this function. "BI Library" means the built-in functions provided by the interpreter.
If the analyzed source code has more than one reference on a single line this table repeats itself horizontally. Thus it can get rather wide.
The subroutine maps tell what subroutines a given routine calls and which subroutine was called by another routine. This is done using two tables. The following examples are portions taken from an analysis of an earlier version of CodeAnalyzer.
Subroutine Reference Map 1:
ROUTINE - CALLED
_________________________ _________________________
ProgramBegan - COMMANDLINEEXECUTION
- QUOTE
......................... .........................
COMMANDLINEEXECUTION - MAIN
- QUOTE
......................... .........................
FINDCALLINSTRUCTIONS - FINDPARENTHESES
- MATCHSUBROUTINE
- SIGNALANALYSIS
......................... .........................
FINDCALLS2SUBROUTINES - FINDCALLINSTRUCTIONS
- FUNCTIONANALYSIS
- SIGNALANALYSIS
This table shows that the program began and called CommandLineExecution and Quote. The opening code in any program is given the name "ProgramBegan" and this "label" is used until the first label in the program is found. One can then see that CommandLineExecution then called "Main" and "Quote" again.
To learn what routines are calling a subroutine see the second map table. You can see "FindCallInstructions" was called only by "FindCalls2Subroutines" which was in turn only called by "Main". Looking back at the first reference map table we can see that "FindCallInstructions" calls three other routines.
Subroutine Reference Map 2:
ROUTINE - WAS CALLED BY
_________________________ _________________________
CLEARCOMMENTBLOCK - MAPNMASKCOMMENTSNLITERALS
......................... .........................
CLEARLITERALSTRINGS - MAPNMASKCOMMENTSNLITERALS
......................... .........................
COMMANDLINEEXECUTION - ProgramBegan
......................... .........................
FINDCALLINSTRUCTIONS - FINDCALLS2SUBROUTINES
......................... .........................
FINDCALLS2SUBROUTINES - MAIN
......................... .........................
FINDDIRECTIVES - FINDLABELSNDIRECTIVES
......................... .........................
FINDLABELS - FINDLABELSNDIRECTIVES
......................... .........................
FINDLABELSNDIRECTIVES - MAIN
......................... .........................
FINDLITERALS - MAPNMASKCOMMENTSNLITERALS
......................... .........................
FINDPARENTHESES - FINDCALLINSTRUCTIONS
- FUNCTIONANALYSIS
......................... .........................
The principal actions of the program are initiated in the subroutine MAIN. The steps are
Recognizing comments and quotes is the single problem restricting CodeAnalyzer to working code. I currently assume comments and quotes are balanced. In code that does not work, this is not necessarily true. To recognize and handle unbalanced situations would require changes in MapNMaskCommentsNLiterals( ). Other than this, there is no real necessity for the code being analyzed to already work.
Much of the existing code reflects my desire to extend the analytical part of the program. For example, I would like to be able to find all variables used in a specific subroutine and compare that to the map of subroutines and their exposed variable lists. I have also tried to consider the future needs that might arise as the program is extended.
data. - Original source code.
dataEdited1. - Source after replacing all comments with blanks.
dataEdited2. - Source after blanking comments and literal strings.
LogicalLineI. = Original source code.
LogicalLine1. = Comments are blanked out.
LogicalLine2. = Comments and literal strings are blanked out.
SourceIndex.j = First line in original source of logical line j.
Comment.i.0 = Number of comments in line i.
Comment.i._Str.k = Character position for start of comment k in line i.
Comment.i._End.k = Character position for end of comment k in line i.
Comment.i._Txt.k = Text of comment k in line i.
Literal.i.0 = Number of literals in line i.
Literal.i._Str.j = Character position for start of literal k in line i.
Literal.i._End.j = Character position for end of literal k in line i.
Literal.i._Typ.j = Type of literal k in line i (S|D - single or double).
Literal.i._Txt.j = Text of literal k in line I.
Notes -
LogicalLines are lines after editing out of continuations, semicolons and blank lines.
Label.i = Line# || Type ("STRING"|"SYMBOL") || FunctionName
Notes - Since most programs may not name the initial routine I have chosen to refer to the intitial routine by the lablel "ProgramBegan".
FRef.i.0 = Number of functions referenced in line i.
FRef.i._Str.k = char 1 in name of kth function referenced in line i.
FRef.i._End.k = Last char of name of kth function referenced in i.
FRef.i._Txt.k = Text string (name) kth function referenced in line i.
FRef.i._Typ.k = Type of function, kth function referenced in line i.
FRef.i._Open.k = Postion of "(" for kth function referenced in line i.
FRef.i._Close.k = Postion of ")" for kth function referenced in line i.
FRef.i._Knd.k = Nature of reference, subroutine call or function.
Notes -
1. Positions are relative to the first non-blank character in the line.
2. FRef.line._Open.k = FRef.line._Close.k when the reference is done using the CALL instruction and there are no arguments passed.
3. For CALL instructions FRef.i._Open.k and FRef.i._Close.k give positions of first and last characters of argument string. If there are no arguments FRef.i._Close.k = FRef.i._Open.k.
4. For a CALL (variable) FRef.i._Str.k and FRef.i._End.k are the same as FRef.i._Open.k and FRef.i._Close.k.
5. FRef.i._Knd.k = "CALL" | "FUNCTION"
6. For CALL instructions FRef.i._Open.k and FRef.i._Close.k are computed after all comment blocks have been deleted.
procedure expose (DefaultExposeList)
return 1
If your interest is in reformatting REXX code look into the subroutine MAIN. By the line "say 'Finished finding literals in logical lines.' all comments, line continuations and quoted strings have been identified and a consistent, though unformatted, line of code created. There is a copy of the new, clean line in the variable LogicalLineI. LogicalLine1. holds the logical line with comments removed. LogicalLine2. holds the logical line with comments and quotes removed. The contents of comments and quotes are available. To see how look at the code following the comment "Debug aid and illustration ..." that immediately follows the above indicated line.
Logically, a line reformat operation would be inserted in this location. Remember, if the reformat operation modifies the line numbers the "Comment." and "Literal." indices will have to be updated.
There are several things I would like to implement or need to enhance.
The program currently does not handle a comment between the function's name and the opening parenthesis. To do so reference Comment.i._Str. and Comment.i._End. and see if one of the comments fills the space.
Handling unbalanced comment and quote blocks would remove the restriction that CodeAnalyzer be run only on working programs. See the note following step 1 in the Algorithm section of this document.
I would like to handle SIGNAL instructions more completely. This is why the list of default "CONDITONS" is provided.
I would like to be able to find all variables passed or used within a specific subroutine and compare that to the map of subroutines and their exposed variable lists. Detecting the variables is the real problem.
It might (???) be done using SysDumpVariables( ), which is part of REXXUTIL. How to implement this is a question! SysDumpVariables( ) gives a list of all variables within the current scope and have already been set. Thus
avar = 'b'
SysDumpVariables('afile')
bvar = 'b'
knows about "avar" but is unaware of "bvar".
Output of SysDumpVariables( ) is either to a file or to standard out. Presumably a named pipe could also be used but there is no standard, OS independent library providing named pipe handling. Assume we will output to a temp file.
First we must find the range of each subroutine in the source being analyzed. Conceptually, do this by looking through the code in from end to beginning looking for a label. For each label found check to see if the line above it is either a RETURN or EXIT instruction. If not this is an alternate entry point into a subroutine and can be ignored. If either RETURN or EXIT is found the lines from the current line number to the label define a subroutine. In practice it is not necessary to actually read through all the data. Just use "Label." The first word is the line number holding the label.
Having defined the limits of a subroutine in the source then it must be "executed" in order for SysDumpVariables( ) to recognize the variables. How to do this without endless syntax errors I have no idea.
An alternative to using SysDumpVariables( ) is to expand the logic used in GenericError( ) in the section where "if GenericErrorQUIET = 'DECODE' then do ...". The current logic is rudimentary at best.
CodeAnalyzer incorporates the subroutine GenericError. This routine provides the programmer information in the event of failures. In addition to line number of failure, CONDITION: SYNTAX:, INSTRUCTION:, SIGNAL:, DESCRIPTION: and STATUS:, it gives the current value of strings and the subroutine history to the point of failure. I find these last two details very helpful in debugging code.
Operation of the subroutine is controlled by the variable "GenericErrorQuiet". This variable is set on entry to CodeAnalyzer. For a discussion of other options see the comments in the subroutine.
In order for the subroutine history to be available the variable SUBROUTINEHISTORY is created. On entry to each subroutine the name of the subroutine is prepended to the variable. On leaving the leading word of the variable is removed. When PROCEDURE instructions are used the SUBROUTINEHISTORY variable must be exposed.
A simple illustration of a GenericError output follows. The text in red is the information provided by GenericError. The cause of the failure in this case was leaving a VisProREXX call in the program when it was not being run under VisProREXX.
Read 3462 lines from source file, D:\source\VisProSource\CodeAnalyzer\CodeAnalyzer.cmd.
Finished finding comments in source.
Finished finding literals in source.
Finished making 2957 logical lines.
Finished finding comments in logical lines.
Finished finding literals in logical lines.
Finished finding labels and directives.
Finished loading table of known functions.
Finished loading table of default conditions.
Finished finding calls to subroutines.
Finished writing function references table.
Finished with the subroutine analyzer.
Analysis of the code: call _VPAppExit /* This will force an exit without opening a panel. */
LIT "_VPAppExit" = _VPAPPEXIT
LIT "This" = THIS
LIT "will" = WILL
LIT "force" = FORCE
LIT "an" = AN
LIT "exit" = EXIT
LIT "without" = WITHOUT
LIT "opening" = OPENING
LIT "a" = A
VAR "panel." = PANEL.
LIT "panel" = PANEL
A serious REXX ERROR has occurred! I do not know what.
Other information for a programmer's use:
The line that generated this error is: 227
" call _VPAppExit /* This will force an exit without opening a panel. */ "
Subroutine history to point of failure, most recent first:
CommandLineExit()
Begin_Program()
Condition: SYNTAX
Instruction: SIGNAL
Description:
Status: OFF
RC: 43 SYS0043: Drive %1 cannot locate a specific area or track on the disk.
August 26, 2003