JFlex - Frequently Asked Questions
-
Can I use my old JLex specifications with JFlex?
Yes. You usually can use them unchanged. See section [porting
from JLex] of the manual for more information on that topic.
-
Where can I get the latest version of JFlex?
-
What platforms does JFlex support?
JFlex is written with Sun's JDK 1.1 and produces JDK 1.1 compatible code.
It should run on any platform that supports JRE/JDK 1.1 or above.
At the JFlex website, there is a list of platforms JFlex has been positively tested on.
-
My program crashes when I use a large scanner (about 250 states)
No problem. Just use the %pack directive for more compression
of the transition table. See section [code
generation] of the manual for more information on this topic.
-
I get a java.lang.VerifyError when I start my program with java -verify
This is in fact the same problem as described above. You can use the
%pack directive to avoid it. See section [code
generation] of the manual for more information on this topic.
-
My scanner is slower with a JIT compiler than without JIT
This usually only occurs with very large scanners.
As Angelo Schneider pointed out to me: This is because the generated scanners are so good.
What? Yes. A scanner uses only a certain percentage X of its time for recognizing
symbols, executing actions and the like. The rest of the time is spent with IO.
An optimal scanner
has a very small percentage X for the actual scanning and spends almost all of its
time with IO. A bad scanner has a high value of X. A JIT compiler can only improve
the X part of the scanner and can improve almost nothing of the IO time.
So if the JFlex scanner has a low X value and is large (so it takes long for the
JIT to compile it) and the input is small (so the small X that is improved can't
pay off) the scanner gets slower with JIT than without. Because it's so good.
-
Can I use CUP and JFlex together?
-
Can I use the generated code of my JFlex specification commercially?
You can use your generated code without restriction. See the file copyright for more
information.
-
I want my scanner to read from a network byte stream or from interactive stdin. Can I do this with JFlex?
This actually depends on the syntax of the input you want to read. The problem is, that for some
expressions the scanner needs one character of lookahead to decide which action to execute.
For interactive use and network streams this is very inconvenient, because the stream
doesn't send an EOF (or any other data) when the user stops typing while the scanner just waits
for the next character and doesn't return a symbol. Since version 1.1.1 of JFlex this problem
can be avoided because of a little more analysis at generation time. Take a look at these three
rules:
1. a
2. a*
3. ";"
When the scanner has read one a, an additional input character is needed to decide,
if this matches rule 1 (just one a) or rule two (when the next character is another
a). With input aaa the scanner also has to read one additional character,
because it is supposed to return the longest match (so if there comes another a, the match
is aaaa and not only aaa). But: When the scanner reads a ";",
it does not need an additional character and can immediatly execute the action for rule
number 3. This is the case for all rules that are not prefixes of any other rules in the
specification and that have a fixed length end (so (a* b) is ok but (a b*) is not).
For your application this means: if all commands (or whatever units of input you have) are
terminated by some delimiter (for instance ";" or LF or "end") reading from a network bytestream or an interactive stream works fine with JFlex.
-
My scanner doesn't return the value for EOF but just keeps matching some rule in
my specification at the end of file.
This usually occurs, when a rule in the specification matches the empty string. JFlex scanners
return the end of file value (specified for instance with the %eofval directive or the
%cup switch) only when they reach the end of file and no rule can be matched.
When a rule as for instance [0-9]* matches the empty string, it can always be matched
and the EOF value will never be returned. To avoid this, all such rules can be transformed in
ones, that match at least one character ([0-9]+ instead of [0-9]*).
Please note, that this is only necessary when a complete rule matches the empty string - so
in this specification:
%%
digits = [0-9]*
%%
"number :" {digits} { .. some action .. } (1)
{digits} { .. some other action .. } (2)
rule (1) is perfectly ok while rule (2) is not.
This behavior has been changed in version 1.2 of JFlex!
To follow "the rule of least surprise", the EOF rule has been given the higher priority of a match
length of one character - so the problems mentioned above should not occur with JFlex 1.2 or higher.
Even if EOF is for this special case considered as having length 1,
it is still not a "normal" character and can only be used in the special EOF rules.
How can I let my scanner read its input from a string?
String myString = "some input";
Scanner myScanner = new Scanner( new java.io.StringReader(myString) );
Why do standalone scanners have a different standard return type (int instead
of Yytoken)?
That's because int is a predefined type in Java and Yytoken is not. If a scanner
generated with %standalone option would have return type Yytoken, you would have
to provide this class for every standalone scanner you write. In most cases you don't want to do that,
because the scanner wouldn't be really standalone then.
The standard Yytoken for non standalone
cases stems from JLex and is only kept for compatibility (it's rarely used anyway).
If you still really want Yytoken as return type in a standalone scanner, you can always
explicitly specify it with %type Yytoken. If you just want to test your scanner scanner and see what
it does without a parser attached, use %debug instead of %standalone.
The expression .*$ used to work, but now JFlex complains about trailing context
This is because .*$ is now matched with the more general lookahead operator / and is
equivalent to .* / \r|\n|\r\n. The problem with this expression is that the character class
. contains the \r character and therefore matches the beginning of the trailing context.
Earlier versions of JFlex used a special implementation for $ and had no lookahead operator.
A simple workaround is not to use the . class, but to use the expression [^\r\n]
instead (so the full expression is now [^\r\n]*$ and should work without error).
I use %8bit and get an Exception, but I know my platform
only uses 8 bit. Is %8bit broken?
Short answer: not broken, use %unicode. Long answer:
Most probably this is an encoding problem. Java uses Unicode internally and converts
the bytes it reads from files (or somewhere else) to Unicode first. The 8 bit value
of your platform may not be 8 bit anymore when converted to Unicode. On many Windozes
for instance Cp1252 (Windows-Latin-1) is used as standard encoding, and there the
character "single right quotation mark" has code \x92 but after conversion to
Unicode it's \u2019 which is not 8 bit any more. See also the section
on Encodings, platforms, and Unicode of the JFlex
manual for more information.
My scanner needs to read a file that is not in my platforms standard encoding, but in encoding XYZ. How?
Since the scanner reads Java Unicode characters, it is independent
of the actual character encoding a file or a string uses. The
transformation byte-stream to Java characters for files
usually happens in the java.io.InputStreamReader
object connected with the input stream. Class
java.io.FileReader uses the platforms default
encoding automatically. If you would like to explicitly
specify another encoding, for instance UTF-8, you
could do something like
Reader r = new InputStreamReader(new FileInputStream(file), "UTF8");
Now you have a Reader r that can be passed to the
scanner's constructor in the usual way.
For more information on encodings see also Sun's JDK documentation,
especially in Guide to Features - Java Platform item
Internationalization and there the FAQ and Supported Encodings.
JFlex generates with 0 errors, 0 warnings, but the generated code does not compile
If you are sure that the problem is because of JFlex and not because of user code
in the specification, try to run JFlex without the JIT compiler. Some versions of
Sun's JIT compiler sunwjit in JDK 1.2.2 on Solaris 7/Intel seem to have a bug that is triggered by
JFlex (if your scanner compiles ok, there should be no problem).
Please do report this as a bug to lsf@jflex.de in any case.
If there is a problem with a specific platform/JDK/JIT combination it may be important
for other people to know. Also, I might be able to work around the JIT bug from the JFlex side.