Manpage For Tcl Regexp Command




NAME

       regexp - Match a regular expression against a string


SYNOPSIS

       regexp  ?switches? exp string ?matchVar? ?subMatchVar sub-
       MatchVar ...?
_________________________________________________________________



DESCRIPTION

       Determines whether the regular expression exp matches part
       or  all  of  string  and  returns  1  if  it does, 0 if it
       doesn't.

       If additional arguments are specified  after  string  then
       they  are  treated  as  the names of variables in which to
       return information about which part(s) of  string  matched
       exp.   MatchVar  will  be  set to the range of string that
       matched all of exp.  The first  subMatchVar  will  contain
       the  characters in string that matched the leftmost paren-
       thesized subexpression within exp,  the  next  subMatchVar
       will  contain  the characters that matched the next paren-
       thesized subexpression to the right in exp, and so on.

       If the initial arguments to regexp start with - then  they |
       are  treated as switches.  The following switches are cur- |
       rently supported:                                          |

       -nocase                                                           ||
                 Causes  upper-case  characters  in  string to be |
                 treated as lower case during the  matching  pro- |
                 cess.                                            |

       -indices                                                          ||
                 Changes what  is  stored  in  the  subMatchVars. |
                 Instead  of storing the matching characters from |
                 string, each variable will contain a list of two |
                 decimal  strings giving the indices in string of |
                 the first and last characters  in  the  matching |
                 range of characters.                             |

       --                                                                ||
                 Marks the end of switches.  The argument follow- |
                 ing  this  one will be treated as exp even if it |
                 starts with a -.

       If there are more subMatchVar's than parenthesized  subex-
       pressions  within exp, or if a particular subexpression in
       exp doesn't match the string (e.g. because  it  was  in  a
       portion  of  the expression that wasn't matched), then the
       corresponding subMatchVar will be  set  to  ``-1  -1''  if



REGULAR EXPRESSIONS

       Regular expressions are implemented using Henry  Spencer's
       package  (thanks,  Henry!), and much of the description of
       regular expressions below is copied verbatim from his man-
       ual entry.

       A  regular  expression is zero or more branches, separated
       by ``|''.  It matches anything that  matches  one  of  the
       branches.

       A branch is zero or more pieces, concatenated.  It matches
       a match for the first, followed by a match for the second,
       etc.

       A  piece  is an atom possibly followed by ``*'', ``+'', or
       ``?''.  An atom followed by ``*'' matches a sequence of  0
       or  more  matches  of the atom.  An atom followed by ``+''
       matches a sequence of 1 or more matches of the  atom.   An
       atom followed by ``?'' matches a match of the atom, or the
       null string.

       An atom is a regular expression in parentheses (matching a
       match  for  the  regular expression), a range (see below),
       ``.''  (matching any single  character),  ``^''  (matching
       the  null  string  at  the beginning of the input string),
       ``$'' (matching the null string at the end  of  the  input
       string),  a ``\'' followed by a single character (matching
       that character), or a single character with no other  sig-
       nificance (matching that character).

       A  range  is  a sequence of characters enclosed in ``[]''.
       It  normally  matches  any  single  character   from   the
       sequence.   If  the sequence begins with ``^'', it matches
       any single character not from the rest  of  the  sequence.
       If  two characters in the sequence are separated by ``-'',
       this is shorthand for the full list  of  ASCII  characters
       between  them  (e.g. ``[0-9]'' matches any decimal digit).
       To include a literal ``]'' in the sequence,  make  it  the
       first  character (following a possible ``^'').  To include
       a literal ``-'', make it the first or last character.



CHOOSING AMONG ALTERNATIVE MATCHES

       In general there may be more than one way to match a regu-
       lar  expression to an input string.  For example, consider
       the command

              regexp  (a*)b*  aabaaabb  x  y

       Considering only the rules given so far, x and y could end
       tial ambiguity regexp chooses among alternatives using the
       rule ``first then longest''.  In other words, it considers
       the  possible  matches in order working from left to right
       across the input string and the pattern, and  it  attempts
       to  match longer pieces of the input string before shorter
       ones.  More specifically, the  following  rules  apply  in
       decreasing order of priority:

       [1]    If  a  regular expression could match two different
              parts of an input string then it will match the one
              that begins earliest.

       [2]    If  a  regular expression contains | operators then
              the leftmost matching sub-expression is chosen.

       [3]    In *, +, and ? constructs, longer matches are  cho-
              sen in preference to shorter ones.

       [4]    In  sequences  of  expression components the compo-
              nents are considered from left to right.

       In the example from above, (a*)b* matches aab:   the  (a*)
       portion  of  the  pattern is matched first and it consumes
       the leading aa; then the b* portion of  the  pattern  con-
       sumes the next b.  Or, consider the following example:

              regexp  (ab|a)(b*)c  abc  x  y  z

       After this command x will be abc, y will be ab, and z will
       be an empty string.  Rule 4  specifies  that  (ab|a)  gets
       first  shot  at the input string and Rule 2 specifies that
       the  ab  sub-expression  is  checked  before  the  a  sub-
       expression.   Thus  the  b has already been claimed before
       the (b*) component is checked and (b*) must match an empty
       string.



KEYWORDS

       match, regular expression, string


Index