Almost all of TeX's mathematics is supported with the exception of a
few obscure symbols that are absent from the fonts normally available
to browsers. Support includes, for example, in-line equations with
subscripts and superscripts, display equations with built-up
fractions, over accents, large delimiters, operators with limits;
matrix, pmatrix, cases, [but not bordermatrix]; over/underbrace [but
using a rule, not a brace].
Font styles: it, bf, sl, uppercase, accented characters written like
\"o or \'{e}. Guess the intent of font definitions
[optionally, remove contruct].
Macro definitions: def, edef, xdef, but all definitions are
global. Definitions with delimited arguments1.
Input of files [but not from the implicit texinputs path, see 5.4].
Newcount, number, advance and counter setting.
Centerline, beginsection, item, itemitem, obeylines; hang, hangindent,
narrower [for entire paragraphs, hangafter ignored].
Headline is made
into a title, footnote{}{}. Comments: removed.
Tables: halign [uses border style if the template contains
vrule]. Settabs, \+.
These cover most of the vital LaTeX constructs. Internal hypertext
cross-references are automatically generated (e.g. by ref and
tableofcontents) provided LaTeX has previously been run on the
document and the appropriate command-line switch is used.
When TtH encounters TeX constructs that it cannot handle either
because there is no HTML equivalent, or because it is not clever
enough, it tries to remove the mess they would otherwise cause in the
HTML code, generally giving a warning of the action if it is not sure
what it is doing. The following are
examples of constructs not translated.
The source for TtH is flex code which is processed to produce a C program
tth.c which comprises the distribution. This file is compiled by
The executable should then be copied to whatever directory you want
(preferably on your path of course). That's all!
Command line (switches can appear anywhere on the line):
TtH is extremely fast in default mode on any reasonable hardware.
Conversion of even large TeX files should be a matter of a second or
two.
This makes it possible to use TtH in a CGI script to output HTML
directly from TeX source if desired; (stderr may then need to be redirected.)
Equations are translated internally into HTML3.2 as much as it
allows. TtH uses HTML3.2 tables for layout of built-up fractions in
display equations. It also uses the extension HTML tag <font
face="symbol">, supported by Netscape and other major browsers (and
part of the HTML4.0 specification) to render Greek and large
delimiters etc. Untranslatable TeX math tokens are inserted verbatim.
The internal approach to equation translation is a major area where
TtH departs from the philosophy of Latex2html and its derivatives. TtH
does not use any images to try to represent hard-to-translate
constructs like equations. Instead it uses the native ability of HTML
to the fullest in providing a semantically correct rendering of the
equation. The aesthetic qualities obtained are in practice no worse on
average than Latex2html's inlined images, which are generally slightly
misaligned and of uncertain scaling relative to the text. Some
limitations in the HTML code are inevitable, of course, but one ends
up with a compact representation that can be rendered directly by the
browser without the visitor having to download any additional helper
code (e.g. Java equation renderer).
TtH offers an option [-i] which makes italic the default
font within equations, and thus the style more TeX-like. Because of
browser bugs mentioned below, this is implemented in a rather verbose
HTML style, making the output of typical documents slightly longer.
Browser italic font appearance is not as satisfactory as TeX's math
italic, so for many documents roman looks better.
Spacing in equations is handled slightly differently by TtH than
by TeX. The reason is that most browsers use fonts that will crowd the
characters horizontally too close for comfort in many cases (for
example: M||/2). Also, built-up HTML equations are more
spread out vertically than in TeX. Therefore TtH equations look better
if spaces are added between some characters. So TtH
does not remove spaces in the original TeX file between characters in
equations. The author is thus able to control this detail of layout in
the HTML without messing up their TeX file - since TeX will ignore
any spaces inserted. Legacy TeX code that contains a lot of spurious
whitespace (ignored by TeX) may, as a result, occasionally become too
spread-out when translated.
Another major difference between TtH and LaTeX2HTML is that TtH does
not call the latex or tex programs at all, and is not specifically
dependent upon these, or indeed any other (e.g. PERL), programs being
installed on the translating system. Its portability is therefore
virtually universal.
Forward references in LaTeX are handled by multiple passes that write
auxiliary files. TtH does only a single pass through the source. If
you want TtH to use LaTeX constructs (e.g. tableofcontents,
bibliographic commands, etc.) that depend on auxiliary files, then
you do need to run LaTeX on the code so that these files are
generated. You must also tell TtH, using the switch -Lfilename, the
base file name of these auxiliary files (which is the name of the
original file omitting the extension). If TtH cannot find the relevant
auxiliary file because you didn't run LaTeX and generate the files or
didn't include the switch, then it will omit the construct and warn
you.
Forward references via ref will not work if the .aux file is
unavailable, but backward references will.
If you routinely use LaTeX in a way that requires auxiliary files and grow
tired of always having to specify the -L switch to TtH, a simple ``l2h''
script may be useful. For example under Linux (Un*x) a script such as
When processing a LaTeX file that contains the \makeindex
command in its preamble, TtH will construct an appropriately
cross-hyperlinked index that will be input when the command
\printindex is encountered, which must be after all the index
references \index{ ... } in the document. TtH does this
independently of LaTeX, but not of the subsidiary program
makeindex that is normally used with LaTeX to produce the final
index ``file.ind''. TtH creates its index entries in a file with
extension .tid (Tth InDex). Because of makeindex program
limitations, TtH currently removes all the page-number formatting that
is present in the index entry (but not the reference formatting). It
also uses the section reference style ``1-2'', separated by a dash,
because this is the standard form that makeindex expects and it
is cumbersome to change it. The actual reference is less important to
the reader than the fact that TtH inserts a link to the actual place
in the document indexed. When the \printindex command is
encountered, TtH closes this file and runs makeindex on it, which
creates by default a file with extension .ind, and then TtH reads the
.ind file in as its index. Because the .ind file that TtH produces
may well be incompatible with running LaTeX itself on the file, TtH
deletes the .ind file that it creates (and hence one is left without
such a file at the end). If, instead of creating an index file during
TtH processing, one wants to use with TtH an index file already
created, all that is needed is to remove the \makeindex
command from the top of the latex source. Any existing .ind file will
still be input by
\printindex but no indexing files will be written or deleted.
If you run TtH on a file that contains both a \makeindex
command and a \printindex it will overwrite your file.ind. In
the unusual situation where your index, file.ind, was created
laboriously by hand and cannot be recreated, you should be very
careful to remove the
\makeindex command from the .tex file. For safety, of course,
also save a copy of your file.ind somewhere else.
The \makeindex command, if present, will also cause TtH to add
a linked entry called ``Index''
to the end of any table of contents. This entry is a highly desirable
feature for an HTML file, but if there is no \printindex
command at the end of the document, the index will not exist, so the
reference will be non-existent.
On some operating systems with file name length restrictions, the
makeindex program is called makeindx. Therefore a TtH switch is
provided: -xcommandline, which substitutes commandline
for the default call makeindex. Therefore, -xmakeindx
will switch to the correct program name on one of these limited
operating systems. This switch also allows additional parameters or
switches to be passed to makeindex via e.g.
-x"makeindex -s style.sty".
If you don't have the makeindex program, you can't create indexes with
TtH or LaTeX, except by hand.
All of the index file processing naturally requires that TtH have
write permission for the directory in which the original latex file
(specified by the -L switch) resides.
Optionally TtH can use more appropriate graphics format, possibly
using a user-supplied (script or) program called ps2gif to convert the
postscript file to a gif file, ``file.gif''. When the switch -e1 or
-e2 is specified, if ``file.gif'' or ``file.jpg'' already exists in
the same directory as implied by the reference to ``file.ps'' then no
translation is done and the file found is used instead. That gif (or
jpg) is then automatically either linked (-e1) or inlined (-e2) in the
document. If no gif or jpg is found, ps2gif is called. A linux (un*x)
ps2gif script using Ghostscript and the pbmplus utilities for this
purpose is included with the distribution. A comparable batch program
can be constructed to work under other operating systems or else the
translation can be done by hand. Naturally you need these utility
programs or their equivalent on your system to do the conversion. The
calling command-line for whatever ps2gif is supplied must be of the
form: ps2gif inputfile.ext outputfile.ext. The program must
have permission to write the outputfile (file.gif) in the directory in
which the file.ps resides.
The Latex command \includegraphics{...} does exactly the
same thing. Its optional arguments are
ignored. \[e]psfig{file=...} is also treated the same.
If the extension of the file initially specified is not .ps or .eps,
no conversion is done but the file is referenced or in-lined as an
image. In effect, then, TtH supports postscript, encapsulated
postscript, gif, and jpeg, plus any future formats that become
supported by common browsers. However, LaTeX does not support these
other formats, so it will give an error message if it can't find a
postscript file, unless you specify the bounding box, thus preventing
LaTeX interrogating the file.
Some TeX capabilities are extremely difficult or impossible to
translate into HTML because of browser limitations and are best
avoided if possible. Arrays or matrices in in-line equations cannot be
supported because tables cannot be placed in-line in HTML. TtH output
will be strangely disjointed. Likewise built-up fractions, most over-
and under-accents, and indeed anything that requires specific
placement on the page other than simple subscript or superscript and
underline, cannot be rendered
in-line in HTML, although TtH
will render them well in
displaystyle. These latter constructs
are nevertheless commonly used in in-line TeX. TtH adopts the policy
of indicating in an unambiguous and relatively intuitive way what
construct is being used (as opposed to simply omitting it). For
example $\hat{a}$ is rendered [^a]. The result is rarely
elegant. Therefore in authoring TeX that is known to be destined for
HTML translation, one should bear in mind these limitations, and use
an appropriate style.
Although TtH supports a remarkably complete subset of LATEX, it is
not as tolerant as TEX itself of ambiguous or confusing style. One
example is $Z_\alpha{+1} = Z_\mbox{+1}$. TEX is clever enough
to decide that \alpha is the subscript on the left, but
{\mbox{+1}} is the subscript on the right. TtH isn't that
clever but will warn you. Put the \mbox in braces to help it.
In part, the sensitivity of TtH arises from its incomplete support of
the complicated primitive details of TEX. For
example, practically any TEX that redefines character codes
will break because TtH knows nothing about the concept of character
codes. (If you don't know much about it either, join the vast majority
of TEX users!) Another example is that TtH expects only letters
or @ in user-defined command names, not punctuation characters etc.
Delimited parameter definitions are supported but the matching of
parameter templates is not 100% compatible with TeX in respect of
compression of space around commands, which is particularly obscure in
TeX. Generally, removal of redundant whitespace in definitions (and in
their usage, of course) will improve compatibility. The recognition of
these definitions can be disabled using the -d switch, in which case
the definitions are simply discarded.
Although global macro definitions are supported by TtH, if they
contain conditional or other unsupported constructs,
(e.g. expandafter) it is often better to leave the definitions in a
file on the texinputs path. The file will then not be found by TtH. That
provides a mechanism to include the definitions when ``TeX''ing the
file, but not when ``TtH''ing it. If the definition is required in
TtH, the full path should be specified relative to the directory from
which TtH is run, e.g. ``\input
/home/myhome/mytexdir/mymacro.tex''.
Unrecognized or undefined commands of the form
\dothis{one}{two}{three}, are treated by discarding
all the following adjacent brace groups. A space between the close and
open braces will terminate the discarded arguments and cause the
following brace group(s) to be scanned as if just the text. This
makes it possible to use formatting to make TeX code come out right in
both TeX and HTML. For example if TtH encounters a command written
``\boxthis{width} {boxed material}'' which might be
designed in TeX to provide a width to a defined command, written with
a space after the first argument, it will ignore the width and scan
the boxed material into the text.
Since TtH supports command definition using \def or
\newcommand, it will accommodate many personal
macros. Currently the mechanism for interpreting defined macros does
not permit a TeX construct that TtH considers to be built-in to be
split between a definition and following text. For example, one might
unnecessarily define a command for making text italic by putting
Then legal LaTeX usage thereafter would be to put
\unnecessary{italic} instead of \textit{italic}.
This will not work with TtH because TtH regards textit and its
argument as a compound construct and it will fail to recognize it
split partly into a definition. Instead, the following definition,
which is much better TeX style anyway, will work:
In general, if renaming a command, make its arguments explicit in the
definition.
Another important factor is that in TtH (unlike TeX) built-in
commands can not normally be redefined; any redefinition will
simply be ignored (except inside edef and a few other places). This
prevents TtH from safely allowing use of major packages that redefine
standard TeX commands. For example amsTeX redefines footnote to have
just one argument, which will cause problems. This particular example
is potentially a problem with LaTeX too, which also redefines
footnote. TtH handles this by keeping track of whether the file is
LaTeX or TeX; therefore you should not mix the two dialects in a
single file even though there is no need to tell TtH explicitly which
type the file is. (Besides, a mixed file will play havoc with TeX
itself.)
Sometimes TEX files use special macro packages designed
for a specific layout of journal or conference. If such a macro
package in its original form uses conditionals or other unsupported
constructs, it may be inadvisable to use it. (For example, amsppt.sty
which is full of unsupported constructs will certainly cause errors
with TtH.) A different, simpler, version of the macro package,
designed with the capabilities of TtH in mind, may readily be
substituted when using TtH for translation. One way to do this is to
leave the original macro or style file on the texinputs path so it is
not accessed by TtH, but to prefix the alternative package for
TtH. This can be done without alteration of the original TEX files by
using, for example, the following command line:
In summary, you might want to tell people viewing your documents to
set their browsers to View Encoding MacRoman, and Edit Preferences
Fonts Use-document-fonts (NS 4.0).
Symbol fonts are not normally enabled for Netscape running under X,
because of the way Netscape groups its fonts. A fix for this is to
install some aliases in the fonts directories or else to add a line to
your .Xdefaults file. See
http://venus.pfc.mit.edu/tth/Xfonts.notes. You might want to put these
notes on your site for people viewing your documents.
In Netscape 3.0 under X, for example, the printing fonts are hard
coded into the browser and the font-changing commands are ignored when
printing. For that reason, visitors viewing TtH documents will often
not be able to print readable versions of documents with lots of
mathematics. This problem could, and should, be fixed in the
browsers. However, if you want your readers to be able to print a
high-quality paper copy of the file, then you probably want to make
available to them either the TeX source or a common page-description
format such as Postscript or PDF. Since HTML documents download and
display so much faster and better than these other formats on the
screen, TtH's translation provides the natural medium for people to
browse, but not necessarily the best medium for paper
production.
Under Wind*ws, both Netscape (3.0) and Internet Explorer (3.02)
incorrectly size or space vertically the symbol glyphs so that small
gaps appear between the parts of large symbols and delimiters. This
occurs only at certain font sizes (different between the two
browsers!) but causes a slightly annoying degradation of the
appearance.
Both Netscape and IE fail (although somewhat differently) to carry font
changing commands from cell to cell of HTML3.2 tables. This means that
for example boldface in equations will be lost after the first fraction
or built-up construct. The -i switch mostly circumvents this bug
at the cost of verbose HTML, but the browsers ought to fix it.
IE can become confused about its vertical alignment in tables, with
the result that symbols float above or below the horizontal line in
built-up equations. This sometimes fixes itself if you simply refresh
the page!
Known limitations are significant but mostly covered above. I would
be interested to hear about bugs but only if reports are accompanied
by the brief section of TeX code that causes the problem. Ungraceful
failures to parse straightforward TeX code are of most
interest. Aesthetic critiques (with TeX code) will be considered for
future improvements. Failures on complicated macros are to be
expected. It is now very difficult to crash TtH by exceeding array
bounds. I would like to hear about it if you succeed in crashing TtH
on code that gives no error with TeX. I would be glad to receive
LaTeX2e files (emailed to hutch@pfc.mit.edu) that illustrate LaTeX
bugs. But please don't send LaTeX2.09 files or files that do not
conform to the latest (1994) LaTeX users' guide.
The code has been compiled and run on Linux 2.0, MSDOS, and Open VMS.
TtH is copyright © Ian Hutchinson, 1997-8 (hutch@pfc.mit.edu).
You may freely use this software for non-commercial purposes. It may
not be used for commercial purposes without an additional license. If
you distribute any copies, you must include this file and these
conditions must apply to the recipient. No warranty of fitness for
any purpose whatever is given, intended, or implied. You use this
software entirely at your own risk. If you choose to use TtH, by your
actions you acknowledge that any direct or consequential damage
whatever is your responsibility, not mine.
Many thanks for useful discussions and input to
Robert Curtis, Ken Yap, Paul Gomme, Bruce Lipschultz, Mike Fridberg,
Michael Sanders, Michael Patra, Bryan Anderson, Wolfram Gloger,
Ray Mines, John Murdie, David Johnson, Jonathan Barron, Michael
Hirsch, Jon Nimmo, Alan Flavell, Ron Kumon, Magne Rudshaug.
The following macro definitions, although not needed for TtH, will
enable a TeX file that uses the non-standard TtH commands to be
correctly parsed by Plain TeX.
1 Delimited
definitions are not 100% TeX
compatible
2 See appendix for TeX macros supporting these commands
1.1.1 Mathematics
1.1.2 Formatting and Macro Support
1.2 LATEX
LaTeX support includes essentially all mathematics plus the following
1.2.1 Environments:
em, verbatim, center, flushright, verse, quotation, quote, itemize,
enumerate, description, list [treated as if description], figure,
table, tabular[*,x], equation, displaymath, eqnarray, math, array [not
in-line], thebibliography, [raw]html, index [as description].
1.2.2 LATEX Commands:
[re]newcommand, newenvironment [optional arg not permitted], chapter,
section, subsection, subsubsection, caption, label, ref, pageref [no
number], emph, textit, texttt, textbf, centering, raggedleft,
includegraphics, [e]psfig, title, author, date [maketitle ignored:
title etc inserted when defined], lefteqn, frac, tableofcontents,
input, include [as input], textcolor, color [8 standard colors],
footnote [ignoring optional arg], cite, bibitem, bibliography, tiny
... normalsize ... Huge, newcounter [no ``within'' support],
setcounter, addtocounter, value [inside set or addto counter], arabic,
the, stepcounter, newline, verb[*], bfseries, itshape, ttfamily,
textsc, ensuremath, listoftables, listoffigures, newtheorem [no
optional arguments permitted], today, printindex, boldmath,
unboldmath, newfont, thanks, makeindex, index.
1.3 Special TeX usage for TtH
A few non-standard TeX commands are supported as follows
2. See
also 4.4.
\epsfbox{file.[e]ps} Puts in an anchor called "Figure" linked to
file.[e]ps (default), or alternatively calls user-supplied script
to convert the [e]ps file to a gif image and optionally inline it.
\special{html:"tags"} inserts ``tags'' into the HTML e.g. for images etc.
\href{reference}{anchor} highlights ``anchor'' with href=``reference''.
\begin{[raw]html} ... \end{[raw]html} environment passed direct to output.
\tthdump{...} The group is omitted by TtH. Define \tthdump as a nop for TeX.
%%tth:... The rest of the comment line is passed to TtH (not TeX) for parsing.
1.4 Unsupported Commands
\magnification \magstep etc : Removes the whole construct.
Any dimension and glue commands: removed.
All conditionals (if else fi etc) except ifmmode: removed/ignored.
Boxes: usually ignored (but this can sometimes cause problems).
2. Installation
gcc -o tth tth.c
or whatever C compiler you are using. Compilation takes a couple of
minutes on a fast 486. Alternatively, you may be able to obtain a
precompiled executable from wherever you accessed this file.
3. Usage
tth [switches -c -d ... ] <file.tex [>file.html] [2>err]
-c prefix header "Content-type: text/HTML" (for direct web serving).
-d disable delimited definitions.
-e? epsfbox handling: -e1 convert figure to gif using user-supplied ps2gif.
-e2 convert and include inline. -e0 (default) no conversion, just ref.
-f? sets the depth of grouping to which fractions are constructed built-up
f3 (default) allows three levels built-up, f0 none, f9 lots.
-g don't guess an HTML equivalent for font definitions, just remove.
-h print usage. -H or -? print more help.
-i use italic as default math font.
-Lfile tells tth the base file (no extension) for LaTeX auxiliary input,
enables LaTeX commands (e.g. \frac) without a \documentclass line.
-n number eqnarray environments just once (default number each line).
-pdirectory designate an additional directory to search for input files.
-r output raw HTML (no preamble or postlude) for inclusion in other HTML.
-xmakeindx specify a non-standard makeindex command line.
-v give verbose commentary.
The program is a filter, i.e. it reads from stdin and writes to stdout.
In addition, diagnostic messages concerning its detection of unknown
or untranslated constructs are sent to stderr.
4. More Specific Details
4.1 Equations
4.2 Independence of [La]TeX installation and the -L switch.
tth <$1.tex >$1.html -L$1
or the equivalent batch file under DOS/Wind*ws, may save some typing.
4.3 Indexing
Indexing an HTML document is different from indexing a printed
document, because a printed index refers to page numbers, which have
no meaning in HTML because there are no page breaks. TtH indexes LaTeX
documents by section number rather than by page; assuming, of
course, that they have been prepared with index entries in the
standard LaTeX fashion.
4.4 Graphics Inclusion: epsfbox/includegraphics
The standard way in plain TeX to include a graphic is using the epsf
macros. The work is done by \epsfbox{file.[e]ps} which
TtH can parse. By default TtH produces a simple link to such a
postscript file, or indeed any format file.
4.5 Various TtH limitations
4.6 Delimited-parameter macros
5. Command Handling Limitations
5.1 Macro- and Style-file inclusion
5.2 Layout to include arguments of unknown commands
5.3 Restrictions on renaming of internal commands
\def\unnecessary{\textit}
\def\unnecessary#1{\textit{#1}}
5.4 Alternate macro inclusion when translating
cat alternate.macros file.tex | tth >output.html
Since it is impossible to anticipate all style file incompatibilities,
it must be the responsibility of the user (or the journal) to decide
how to translate the concepts implemented in the original complicated
macro package into simpler, TtH-compatible, TEX macros.
6. Browser Problems
6.1 MacIntosh browser font problems
The characters with codes higher than 127 in the Mac fonts are in a
different order from the standard ISO-8859-1 (sometimes called ISO
Latin-1). If Netscape or IE on Macs have their document encoding set
to the standard, then in versions 3 onwards they are programmed to
access the glyph where they think the corresponding accented Latin
character will be in the Mac font. This is fine if one really wants an
accented Latin character. However, for mathematics, using the symbol
font (which is ordered the SAME on the Mac as on other platforms) the
result is that one gets the wrong symbol glyph. This is a particular
problem with large delimiters. The fix is that the Mac browser must be
set to use the Options/Document-encoding ``MacRoman". This tells the
browser not to do the permutation to access the accented Latin
characters in the Mac places; hence, for eight-bit characters, it
accesses the symbol font correctly. This would break the Latin
accented characters except for the fact that (most current versions
of) the browsers still access characters in the Mac order if they are
specified numerically using the HTML syntax ``&#???;". So TtH
documents will in most cases display both accented characters and
symbols correctly on Macs if the document-encoding is set to MacRoman.
In addition, NS4.0 has under Edit Preferences Fonts a choice between
``use document fonts'' and ``use my fonts overriding document''. You
need to set ``use document fonts''.
6.2 Netscape Composer
Netscape Composer (in Netscape Communicator 4.0 on) is
too clever for its own good. If you run an HTML document produced by TtH
through Netscape Composer, all sorts of internal tranlations are
performed that are detrimental to its eventual display. For example,
if you subsequently save the document with the usual encoding set
(Western), the eightbit codes that work with Macs are replaced with
HTML4.0 entities such as [&]ograve; or [&]pound;. This effectively
breaks the document for viewing on Macs because it undoes everything
just explained. Even if you use User-Defined encoding, which prevents
this particular substitution, Composer will rearrange the document in
various ways that it thinks are better, but that make the display of
the document worse. The moral is, don't run TtH documents through
Netscape Composer.
You therefore cannot use the ``publish'' facility
of Composer. Transfering the document to the server with plain old ftp
will keep it away from Composer's clutches.
6.3 X font problems
6.4 Printing
6.5 Other Browser Bugs
7. Code Critique
8. License
9. Acknowledgements
A. Appendix: Non-Standard TeX Macros
\def\title#1{\bgroup\leftskip 0 pt plus1fill \rightskip 0 pt plus1fill
\pretolerance=100000 \lefthyphenmin=20 \righthyphenmin=20
\noindent
#1
\par\egroup}% Centers a possibly multi-line title.
\let\author=\title % Actually smaller font than title in LaTeX.
\input epsf % PD package defines \epsfbox for figure inclusion
\def\href#1#2{\special{html:<a href="#1">}{#2}\special{html:</a>}}
% Macro for http reference inclusion, per hypertex.
\def\tthdump{} % Do nothing.
Index (showing section)
Footnotes:
File translated from TEX by TTH, version 1.1.