% engine=luatex language=uk
% $Id$

% TODO: fix layout of function legend descriptions
% check numbers
% check \luatex command

%\nopdfcompression
%\loggingall
\environment luatexref-env
\logo[DFONT]   {dfont}
\logo[CFF]     {cff}
\logo[CMAP]    {CMap}
\logo[PATGEN]  {patgen}
\logo[MP]      {MetaPost}
\logo[METAPOST]{MetaPost}
\logo[MPLIB]   {MPlib}
\logo[COCO]    {coco}
\logo[SUNOS]   {SunOS}
\logo[BSD]     {bsd}
\logo[SYSV]    {sysv}
\logo[DPI]     {dpi}

\setvariables
  [document]
  [beta=0.70.1]

\starttext

\dontcomplain \nonknuthmode

\setups[titlepage]

\title{Contents}

\placecontent[criterium=text,level=subsection]

\chapter{Introduction}

\startframedtext[framecolor=red,foregroundcolor=red,width=\hsize,style=\tfa]

This book will eventually become the reference manual of \LUATEX.
At the moment, it simply reports the behavior of the executable
matching the snapshot or beta release date in the title page.

\blank

Features may come and go. The current version of \LUATEX\ is not
meant for production and users cannot depend on stability, nor on
functionality staying the same.

\blank

Nothing is considered stable just yet. This manual therefore
simply reflects the current state of the executable. {\bs
Absolutely nothing\/} on the following pages is set in stone. When
the need arises, anything can (and will) be changed.

\blank

{\bf If you are not willing to deal with this situation, you should
wait for the stable version. Currently we expect the 1.0 release to
happen in spring 2012. Full stabilization will not happen soon, the
TODO list is still large.}

\stopframedtext

\blank[2*line]

\LUATEX\ consists of a number of interrelated but (still)
distinguishable parts:

\startitemize[packed]
\item \PDFTEX\ version 1.40.9, converted to C (with patches from later releases).
\item The direction model and some other bits from \ALEPH\ RC4 converted to C.
\item \LUA\ 5.1.4 ($+$ coco 1.1.5 $+$ portable bytecode)
\item dedicated \LUA\ libraries
\item various \TEX\ extensions
\item parts of \FONTFORGE\ 2008.11.17
\item the \METAPOST\ library
\item newly written compiled source code to glue it all together
\stopitemize

Neither \ALEPH's I/O translation processes, nor tcx files, nor
\ENCTEX\ can be used, these encoding|-|related functions are
superseded by a \LUA|-|based solution (reader callbacks). Also, some
experimental \PDFTEX\ features are removed. These can be implemented
in \LUA\ instead.

\chapter{Basic \TEX\ enhancements}

\section{Introduction}

From day one, \LUATEX\ has offered extra functionality when compared
to the superset of \PDFTEX\ and Aleph. That has not been limited to
the possibility to execute lua code via \type{\directlua}, but
\LUATEX\ also adds functionality via new \TEX-side primitives.

However, starting with beta \type{0.39.0}, most of that functionality
is hidden by default. When \LUATEX\ 0.40.0 starts up in
\quote{iniluatex} mode (\type{luatex -ini}), it defines only the
primitive commands known by \TEX82 and the one extra command
\type{\directlua}.

As is fitting, a lua function has to be called to add the extra
primitives to the user environment. The simplest method to get access
to all of the new primitive commands is by adding this line to the
format generation file:

\starttyping
\directlua { tex.enableprimitives('',tex.extraprimitives()) }
\stoptyping

But be aware that the curly braces may not have the proper \type{\catcode}
assigned to them at this early time (giving a 'Missing number' error),
so it may be needed to put these assignments

\starttyping
\catcode `\{=1
\catcode `\}=2
\stoptyping

before the above line.
More fine-grained primitives control is possible, you can look up the details in
\in{section}[luaprimitives]. For simplicity's sake, this manual assumes
that you have executed the \type{\directlua} command as given above.

The startup behavior documented above is considered stable in the sense
that there will not be backward-incompatible changes any more.

\section{Version information}

There are three new primitives to test the version of \LUATEX:

\starttabulate[|l|p|]
\NC \bf primitive         \NC \bf explanation \NC\NR
\NC \tex{luatexversion}   \NC a combination of major and minor number, as in \PDFTEX;
                              the current current value is {\bf\the\luatexversion} \NC\NR
\NC \tex{luatexrevision}  \NC the revision number, as in \PDFTEX;
                              the current value is {\bf\luatexrevision} \NC\NR
\NC \tex{luatexdatestamp} \NC a combination of the local date and hour when
                              the current executable was compiled,
                              the syntax is identical to \tex{luatexrevision};
                              the value for the executable that generated this
                              document is {\bf\luatexdatestamp}. \NC\NR
\stoptabulate

The official \LUATEX\ version is defined as follows:

\startitemize
\item The major version is the integer result of \tex{luatexversion} divided by 100.
 The primitive is an \quote{internal variable}, so you may need to prefix its
  use with \type{\the} depending on the context.
\item The minor version is the two-digit result of \tex{luatexversion} modulo 100.
\item The revision is the given by \tex{luatexrevision}. This primitive expands to a
  positive integer.
\item The full version number consists of the major version,
  minor version and revision, separated by dots.
\stopitemize


Note that the \tex{luatexdatestamp} depends on both the compilation
time and compilation place of the current executable; it is defined in
terms of the local time. The purpose of this primitive is solely to be
an aid in the development process, do not use it for anything besides
debugging.

\section{\UNICODE\ text support}

Text input and output is now considered to be \UNICODE\ text, so
input characters can use the full range of \UNICODE\ ($2^{20}+2^{16}-1
= \hbox{0x10FFFF}$).

Later chapters will talk of characters and glyphs. Although these
are not interchangeable, they are closely related. During
typesetting, a character is always converted to a suitable graphic
representation of that character in a specific font. However,
while processing a list of to|-|be|-|typeset nodes, its contents
may still be seen as a character. Inside \LUATEX\ there is not yet
a clear separation between the two concepts. Until this is
implemented, please do not be too harsh on us if we make errors in
the usage of the terms.

A few primitives are affected by this, all in a similar fashion: each
of them has to accommodate for a larger range of acceptable numbers.
For instance, \tex{char} now accepts values between~0 and
$1{,}114{,}111$. This should not be a problem for well|-|behaved input
files, but it could create incompatibilities for input that would have
generated an error when processed by older \TEX|-|based engines. The
affected commands with an altered initial (left of the equals sign) or
secondary (right of the equals sign) value are: \tex{char},
\tex{lccode},\tex{uccode}, \tex{catcode}, \tex{sfcode}, \tex{efcode},
\tex{lpcode}, \tex{rpcode}, \tex{chardef}.

As far as the core engine is concerned, all input and output to
text files is \UTF-8 encoded. Input files can be pre|-|processed
using the \luatex{reader} callback. This will be explained in a
later chapter.

Output in byte|-|sized chunks can be achieved by using characters
just outside of the valid \UNICODE\ range, starting at the value
$1{,}114{,}112$ (0x110000). When the time comes to print a character
$c>=1{,}114{,}112$, \LUATEX\ will actually print the single byte
corresponding to $c$ minus 1{,}114{,}112.

Output to the terminal uses \type{^^} notation for the lower
control range ($c<32$), with the exception of \type{^^I},
\type{^^J} and \type{^^M}. These are considered \quote{safe} and
therefore printed as-is.

Normalization of the \UNICODE\ input can be handled by a macro package
during callback processing (this will be explained in \in{section}[iocallback]).

\section{Extended tables}

All traditional \TEX\ and \ETEX\ registers can be 16-bit numbers as in
\ALEPH. The affected commands are:

\startcolumns[n=4]
\starttyping
\count
\dimen
\skip
\muskip
\marks
\toks
\countdef
\dimendef
\skipdef
\muskipdef
\toksdef
\box
\unhbox
\unvbox
\copy
\unhcopy
\unvcopy
\wd
\ht
\dp
\setbox
\vsplit
\stoptyping
\stopcolumns

The glyph properties (like \type {\efcode}) introduced in \PDFTEX\
that deal with font expansion (hz) and character protruding are
also 16-bit. Because font memory management has been rewritten,
these character properties are no longer shared among fonts
instances that originate from the same metric file.

The behavior documented in the above section is considered stable
in the sense that there will not be backward-incompatible changes any
more.

\section{Attribute registers}

Attributes are a completely new concept in \LUATEX. Syntactically,
they behave a lot like counters: attributes obey \TEX's nesting stack
and can be used after \tex{the} etc.\ just like the normal
\tex{count} registers.

\startsyntax
\attribute <16-bit number> <optional equals> <32-bit number>!crlf
\attributedef <csname> <optional equals> <16-bit number>
\stopsyntax

Conceptually, an attribute is either \quote{set} or
\quote{unset}. Unset attributes have a special negative value to
indicate that they are unset, that value is the lowest legal value:
\type{-"7FFFFFFF} in hexadecimal, a.k.a. $-2147483647$ in decimal.
It follows that the value \type{-"7FFFFFFF} cannot be used as
a legal attribute value, but you {\it can\/} assign \type{-"7FFFFFFF} to
\quote{unset} an attribute. All attributes start out in this
\quote{unset} state in \INITEX\ (prior to 0.37, there could not be
valid negative attribute values, and the \quote{unset} value was $-1$).

Attributes can be used as extra counter values, but their usefulness
comes mostly from the fact that the numbers and values of all \quote{set}
attributes are attached to all nodes created in their scope. These can
then be queried from any \LUA\ code that deals with node
processing. Further information about how to use attributes for node
list processing from \LUA\ is given in~\in{chapter}[nodes].

The behavior documented in the above subsection is considered stable
in the sense that there will not be backward-incompatible changes any
more.


\subsection{Box attributes}

Nodes typically receive the list of attributes that is in effect when
they are created. This moment can be quite asynchronous. For example: in
paragraph building, the individual line boxes are created after the
\tex{par} command has been processed, so they will receive the list of
attributes that is in effect then, not the attributes that were in
effect in, say, the first or third line of the paragraph.

Similar situations happen in \LUATEX\ regularly. A few of the more
obvious problematic cases are dealt with: the attributes for nodes
that are created during hyphenation, kerning and ligaturing borrow their
attributes from their surrounding glyphs, and it is possible to
influence box attributes directly.

When you assemble a box in a register, the attributes of the nodes
contained in the box are unchanged when such a box is placed,
unboxed, or copied. In this respect attributes act the same as
characters that have been converted to references to glyphs in
fonts. For instance, when you use attributes to implement color
support, each node carries information about its eventual color. In that
case, unless you implement mechanisms that deal with it, applying
a color to already boxed material will have no effect. Keep in
mind that this incompatibility is mostly due to the fact that separate
specials and literals are a more unnatural approach to colors than
attributes.

It is possible to fine-tune the list of attributes that are applied
to a \type{hbox}, \type{vbox} or \type{vtop} by the use of the
keyword \type{attr}. An example:

\starttyping
\attribute2=5
\setbox0=\hbox {Hello}
\setbox2=\hbox attr1=12 attr2=-"7FFFFFFF{Hello}
\stoptyping

This will set the attribute list of box~2 to $1=12$, and the
attributes of box~0 will be $2=5$. As you can see, assigning
the maximum negative value causes an attribute to be ignored.

The \type{attr} keyword(s) should come before a \type{to} or
\type{spread}, if that is also specified.

\section{\LUA\ related primitives}

In order to merge \LUA\ code with \TEX\ input, a few new primitives are
needed.


\subsection{\tex{directlua}}

The primitive \tex{directlua} is used to execute \LUA\ code immediately.
The syntax is

\startsyntax
\directlua <general text>!crlf
\directlua name <general text> <general text>!crlf
\directlua <16-bit number> <general text>
\stopsyntax

The last \syntax{<general text>} is expanded fully, and then fed
into the \LUA\ interpreter. After reading and expansion has been applied to the
\syntax{<general text>}, the resulting token list is converted to a
string as if it was displayed using \type{\the\toks}. On the \LUA\
side, each \type{\directlua} block is treated as a separate chunk. In
such a chunk you can use the \type {local} directive to keep your variables
from interfering with those used by the macro package.

The conversion to and from a token list means that you normally can
not use \LUA\ line comments (starting with \type{--}) within the
argument. As there typically will be only one \quote{line} the first
line comment will run on until the end of the input. You will either need to
use \TEX-style line comments (starting with \%), or change the \TEX\
category codes locally. Another possibility is to say:

\starttyping
\begingroup
\endlinechar=10
\directlua ...
\endgroup
\stoptyping

Then \LUA\ line comments can be used, since \TEX\ does not replace
line endings with spaces.

The \syntax{name <general text>} specifies the name of the \LUA\ chunk,
mainly shown in the stack backtrace of error messages created by \LUA\
code. The \syntax{<general text>} is expanded fully, thus macros can
be used to generate the chunk name, i.e.

\starttyping
\directlua name{\jobname:\the\inputlineno} ...
\stoptyping

to include the name of the input file as well as the input line into
the chunk name.

Likewise, the \syntax{<16-bit number>} designates a name of a \LUA\
chunk, but in this case the name will be taken from the
\type{lua.name} array (see the documentation of the \type{lua} table
further in this manual). This syntax is new in version 0.36.0.

The chunk name should not start with a \type{@}, or it will be displayed
as a file name (this is a quirk in the current \LUA\ implementation).

\startbuffer
$\pi = \directlua{tex.print(math.pi)}$
\stopbuffer

The \tex{directlua} command is expandable: the results of the \LUA\
code become effective immediately. As an example, the following
input:

\typebuffer

will result in

\getbuffer

Because the \syntax{<general text>} is a chunk, the normal \LUA\ error
handling is triggered if there is a problem in the included code. The
\LUA\ error messages should be clear enough, but the contextual
information is still pretty bad. Often, you will only see the line
number of the right brace at the end of the code.

While on the subject of errors: some of the things you can do inside
\LUA\ code can break up \LUATEX\ pretty bad. If you are not careful
while working with the node list interface, you may even end up with
assertion errors from within the \TEX\ portion of the executable.

The behavior documented in the above subsection is considered stable
in the sense that there will not be backward-incompatible changes any
more.

\subsection{\tex{latelua}}

\tex{latelua} stores \LUA\ code in a whatsit that will be processed
at the time of shipping out. Its intended use is a cross between
\tex{pdfliteral} and \tex{write}.
Within the \LUA\ code you can print \PDF\
statements directly to the \PDF\ file via \type{pdf.print},
or you can write to other output streams via \type{texio.write}
or simply using lua's I/O routines.

\startsyntax
\latelua <general text>!crlf
\latelua name <general text> <general text>!crlf
\latelua <16-bit number> <general text>
\stopsyntax

Expansion of macros etcetera in the final \type{<general text>} is delayed
until just before the whatsit is executed (like in \tex{write}). With
regard to PDF output stream \tex{latelua} behaves as \tex{pdfliteral page}.

The \syntax{name <general text>}  and \syntax{<16-bit number>} behave
in the same way as they do for \type{\directlua}

\subsection{\tex{luaescapestring}}

This primitive converts a \TEX\ token sequence so that it can be
safely used as the contents of a \LUA\ string: embedded backslashes,
double and single quotes, and newlines and carriage returns are
escaped. This is done by prepending an extra token consisting of a
backslash with category code~12, and for the line endings,
converting them to \type{n} and \type{r} respectively. The token
sequence is fully expanded.

\startsyntax
\luaescapestring <general text>
\stopsyntax

Most often, this command is not actually the best way to deal with the
differences between the \TEX\ and \LUA. In very short bits of \LUA\
code it is often not needed, and for longer stretches of \LUA\ code it
is easier to keep the code in a separate file and load it using \LUA's
\type{dofile}:

\starttyping
\directlua { dofile('mysetups.lua')}
\stoptyping

\section{New \ETEX\ primitives}

\subsection{\tex{clearmarks}}

This primitive clears a mark class completely, resetting all three
connected mark texts to empty.

\startsyntax
\clearmarks <16-bit number>
\stopsyntax

\subsection{\tex{noligs} and \tex{nokerns}}

These primitives prohibit ligature and kerning insertion at the time
when the initial node list is built by \LUATEX's main control loop.
They are part of a temporary trick and will be removed in the near
future. For now, you need to enable these primitives when you want to
do node list processing of \quote{characters}, where \TEX's normal
processing would get in the way.

\startsyntax
\noligs <integer>!crlf
\nokerns <integer>
\stopsyntax

These primitives can now be implemented by overloading the ligature
building and kerning functions, i.e.\ by assigning dummy functions
to their associated callbacks.

\subsection{\tex{formatname}}

\tex{formatname}'s syntax is identical to \tex{jobname}.

In \INITEX, the expansion is empty. Otherwise, the expansion is the
value that \tex{jobname} had during the \INITEX\ run that dumped the
currently loaded format.

\subsection{\tex{scantextokens}}

The syntax of \tex{scantextokens} is identical to \tex{scantokens}.
This primitive is a slightly adapted version of \ETEX's \tex{scantokens}. The
differences are:

\startitemize
\item The last (and usually only) line does not have a
      \tex{endlinechar} appended
\item \tex{scantextokens} never raises an EOF error,
      and it does not execute \tex{everyeof} tokens.
\item The \quote{\unknown\ while end of file \unknown} error tests are not executed, allowing
      the expansion to end on a different grouping level or while a
      conditional is still incomplete.
\stopitemize

\subsection {Verbose versions of single-character aligments commands (0.45)}

\LUATEX\ defines two new primitives that have the same function as
\type{#} and \type{&} in aligments:

\starttabulate[|l|l|l|l|]
\NC \bf primitive         \NC \bf explanation                           \NC\NR
\NC \tex{alignmark}       \NC Duplicates the functionality of \char`\#~%
                              inside alignment preambles\NC\NR
\NC \tex{aligntab}        \NC Duplicates the functionality of \char`\&~%
                              inside alignments (and preambles)\NC\NR
\stoptabulate


\subsection{Catcode tables}

Catcode tables are a new feature that allows you to switch to a
predefined catcode regime in a single statement. You can have a
practically unlimited number of different tables.

The subsystem is backward compatible: if you never use the following
commands, your document will not notice any difference in behavior
compared to traditional \TEX.

The contents of each catcode table is independent from any other
catcode tables, and their contents is stored and retrieved from the
format file.

\subsubsection{\tex{catcodetable}}

\startsyntax
\catcodetable <16-bit number>
\stopsyntax

The primitive \tex{catcodetable} switches to a different catcode table.
Such a table has to be previously created using one of the two
primitives below, or it has to be zero. Table zero is initialized by
\INITEX.

\subsubsection{\tex{initcatcodetable}}

\startsyntax
\initcatcodetable <16-bit number>
\stopsyntax

The primitive \tex{initcatcodetable} creates a new table with catcodes
identical to those defined by \INITEX:

\starttabulate[|l|l|l|l|l|]
\NC~0\NC \tt\letterbackslash       \NC         \NC \tt escape        \NC\NR
\NC~5\NC \tt\letterhat\letterhat M \NC return  \NC \tt car{\_}ret      \NC (this name may change)     \NC\NR
\NC~9\NC \tt\letterhat\letterhat @ \NC null    \NC \tt ignore        \NC\NR
\NC10\NC \tt <space>               \NC space   \NC \tt spacer        \NC\NR
\NC11\NC {\tt a} -- {\tt z}        \NC         \NC \tt letter        \NC\NR
\NC11\NC {\tt A} -- {\tt Z}        \NC         \NC \tt letter        \NC\NR
\NC12\NC everything else           \NC         \NC \tt other         \NC\NR
\NC14\NC \tt\letterpercent         \NC         \NC \tt comment       \NC\NR
\NC15\NC \tt\letterhat\letterhat ? \NC delete  \NC \tt invalid{\_}char \NC\NR
\stoptabulate

The new catcode table is allocated globally: it will not go away after
the current group has ended. If the supplied number is identical to
the currently active table, an error is raised.

\subsubsection{\tex{savecatcodetable}}

\startsyntax
\savecatcodetable <16-bit number>
\stopsyntax

\tex{savecatcodetable} copies the current set of catcodes to a
new table with the requested number. The definitions in this new table
are all treated as if they were made in the outermost level.

The new table is allocated globally: it will not go away after the
current group has ended. If the supplied number is the currently
active table, an error is raised.

\subsection{\tex{suppressfontnotfounderror} (0.11)}

\startsyntax
\suppressfontnotfounderror = 1
\stopsyntax

If this new integer parameter is non|-|zero, then \LUATEX\ will not
complain about font metrics that are not found. Instead it will
silently skip the font assignment, making the requested csname for the
font \tex{ifx} equal to \tex{nullfont}, so that it can be tested
against that without bothering the user.

\subsection{\tex{suppresslongerror} (0.36)}

\startsyntax
\suppresslongerror = 1
\stopsyntax

If this new integer parameter is non|-|zero, then \LUATEX\ will not
complain about \type{\par} commands encountered in contexts where
that is normally prohibited (most prominently in the arguments
of non-long macros).

\subsection{\tex{suppressifcsnameerror} (0.36)}

\startsyntax
\suppressifcsnameerror = 1
\stopsyntax

If this new integer parameter is non|-|zero, then \LUATEX\ will not
complain about non-expandable commands appearing in the middle of a
\type{\ifcsname} expansion. Instead, it will keep getting expanded
tokens from the input until it encounters an \type{\endcsname}
command. Use with care! This command is experimental: if the input
expansion is unbalanced wrt. \type{\csname} \ldots \type{\endcsname}
pairs, the \LUATEX\ process may hang indefinitely.


\subsection{\tex{suppressoutererror} (0.36)}

\startsyntax
\suppressoutererror = 1
\stopsyntax

If this new integer parameter is non|-|zero, then \LUATEX\ will not
complain about \type{\outer} commands encountered in contexts where
that is normally prohibited.

The addition of this command coincides with a change in the
\LUATEX\ engine: ever since the snapshot of 20060915, \type{\outer}
was simply ignored. That behavior has now reverted back to be
\TEX82-compatible by default.


\subsection{\tex{outputbox} (0.37)}

\startsyntax
\outputbox = 65535
\stopsyntax

This new integer parameter allows you to alter the number of the box
that will be used to store the page sent to the output routine. Its default
value is 255, and the acceptable range is from 0 to 65535.


\subsection{Font syntax}

\LUATEX\ will accept a braced argument as a font name:

\starttyping
\font\myfont = {cmr10}
\stoptyping

This allows for embedded spaces, without the need for double quotes.
Macro expansion takes place inside the argument.

\subsection{File syntax (0.45)}

\LUATEX\ will accept a braced argument as a file name:

\starttyping
\input {plain}
\openin 0 {plain}
\stoptyping

This allows for embedded spaces, without the need for double quotes.
Macro expansion takes place inside the argument.

\subsection{Images and Forms}

\LUATEX\ accepts optional dimension parameters for \type{\pdfrefximage}
and \type{\pdfrefxform} in the same format as for \type{\pdfximage}.
With images, these dimensions are then used
instead of the ones given to \type{\pdfximage};
but the original dimensions are not overwritten,
so that a \type{\pdfrefximage} without dimensions still provides
the image with dimensions defined by \type{\pdfximage}.
These optional parameters are not implemented for \type{\pdfxform}.

\starttyping
\pdfrefximage width 20mm height 10mm depth 5mm \pdflastximage
\pdfrefxform  width 20mm height 10mm depth 5mm \pdflastxform
\stoptyping

\section{Debugging}

If \tex{tracingonline} is larger than~2, the node list display will
also print the node number of the nodes.

\section{Global leaders}

There is a new experimental primitive: \type{\gleaders} (a \LUATEX\
extension, added in 0.43). This type of leaders is anchored to the
origin of the box to be shipped out. So they are like normal
\type{\leaders} in that they align nicely, except that the alignment
is based on the {\it largest\/} enclosing box instead of the
{\it smallest\/}.

\chapter {\LUA\ general}

\section[init]{Initialization}

\subsection{\LUATEX\ as a \LUA\ interpreter}

There are some situations that make \LUATEX\ behave like a standalone \LUA\
interpreter:

\startitemize[packed]
\item if a \type{--luaonly} option is given on the commandline, or
\item if the executable is named \type{texlua} (or \type{luatexlua}), or
\item if the only non|-|option argument (file) on the commandline has the extension
        \type{lua} or \type{luc}.
\stopitemize

In this mode, it will set \LUA's \type{arg[0]} to the found script
name, pushing preceding options in negative values and the rest of the
commandline in the positive values, just like the \LUA\
interpreter.

\LUATEX\ will exit immediately after executing the specified \LUA\
script and is, in effect, a somewhat bulky standalone \LUA\
interpreter with a bunch of extra preloaded libraries.

\subsection{\LUATEX\ as a \LUA\ byte compiler}

There are two situations that make \LUATEX\ behave like the \LUA\
byte compiler:

\startitemize[packed]
\item if a \type{--luaconly} option is given on the commandline, or
\item if the executable is named \type{texluac}
\stopitemize

In this mode, \LUATEX\ is exactly like \type{luac} from the standalone
\LUA\ distribution, except that it does not have the \type{-l} switch,
and that it accepts (but ignores) the \type{--luaconly} switch.

\subsection{Other commandline processing}

When the \LUATEX\ executable starts, it looks for the \type{--lua}
commandline option. If there is no \type{--lua} option, the
commandline is interpreted in a similar fashion as in traditional
\PDFTEX\ and \ALEPH.

The following command-line switches are understood.

\starttabulate[|lT|p|]
\NC --fmt=FORMAT \NC load the format file FORMAT             \NC\NR
\NC --lua=FILE \NC load and execute a \LUA\ initialization script\NC\NR
\NC --safer    \NC disable easily exploitable \LUA\ commands \NC\NR
\NC --nosocket \NC disable the \LUA\ socket library          \NC\NR
\NC --help     \NC display help and exit                     \NC\NR
\NC --ini      \NC be iniluatex, for dumping formats         \NC\NR
\NC --interaction=STRING      \NC  set interaction mode (STRING=batchmode/nonstopmode/\crlf
                                   scrollmode/errorstopmode) \NC \NR
\NC --halt-on-error           \NC  stop processing at the first error\NC \NR
\NC --kpathsea-debug=NUMBER   \NC set path searching debugging flags according to
                                  the bits of NUMBER  \NC \NR
\NC --progname=STRING         \NC set the program name to STRING \NC \NR
\NC --version                 \NC display version and exit                  \NC\NR
\NC --credits                 \NC display credits and exit                  \NC\NR
\NC --recorder                \NC enable filename recorder \NC \NR
\NC --etex                    \NC ignored\NC \NR
\NC --output-comment=STRING   \NC use STRING for DVI file comment instead of date
                                (no effect for PDF)\NC \NR
\NC --output-directory=DIR    \NC use DIR as the directory to write files to \NC \NR
\NC --draftmode               \NC  switch on draft mode (generates no output PDF)\NC \NR
\NC --output-format=FORMAT     \NC use FORMAT for job output; FORMAT is 'dvi' or 'pdf' \NC \NR
\NC --[no-]shell-escape       \NC disable/enable \type{\write18{SHELL COMMAND}} \NC \NR
\NC --enable-write18          \NC enable \type{\write18{SHELL COMMAND}} \NC \NR
\NC --disable-write18         \NC disable \type{\write18{SHELL COMMAND}} \NC \NR
\NC --shell-restricted        \NC restrict \type{\write18} to a list of commands
                                  given in texmf.cnf \NC \NR
\NC --debug-format            \NC enable format debugging \NC \NR
\NC --[no-]file-line-error       \NC disable/enable file:line:error style messages  \NC \NR
\NC --[no-]file-line-error-style \NC  aliases of --[no-]file-line-error \NC \NR
\NC --jobname=STRING          \NC set the job name to STRING \NC \NR
\NC --[no-]parse-first-line   \NC disable/enable parsing of the first line of the
                                  input file \NC \NR
\NC --translate-file=         \NC ignored \NC \NR
\NC --default-translate-file= \NC ignored \NC \NR
\NC --8bit                    \NC ignored \NC \NR
\NC --[no-]mktex=FMT         \NC  disable/enable mktexFMT generation (FMT=tex/tfm)\NC \NR
\NC --synctex=NUMBER          \NC enable synctex \NC \NR
\stoptabulate

A note on the creation of the various temporary files and the \type{\jobname}.
The value to use for \type{\jobname} is decided as follows:

\startitemize
\item If \type{--jobname} is given on the command line, its argument
  will be the value for \tex{jobname}, without any changes. The
  argument will not be used for actual input so it need not exist.
  The \type{--jobname} switch only controls the \tex{jobname} setting.
\item Otherwise, \tex{jobname} will be the name of the first file that
  is read from the file system, with any path components and the last
  extension (the part following the last \type{.}) stripped off.
\item An exception to the previous point: if the command
  line goes into interactive mode (by starting with a command) and
  there are no files input via \type{\everyjob} either, then the
  \tex{jobname} is set to \type{texput} as a last resort.
\stopitemize

The file names for output files that are generated automatically are
created by attaching the proper extension (\type{.log}, \type{.pdf},
etc.) to the found \tex{jobname}. These files are created in the
directory pointed to by \type{--output-directory}, or in the current
directory, if that switch is not present.

\blank

Without the \type{--lua} option, command line processing works like it does in
any other web2c-based typesetting engine, except that \LUATEX\ has a few extra
switches.


If the \type{--lua} option is present, \LUATEX\ will enter an alternative mode
of commandline processing in comparison to the standard web2c
programs.

In this mode, a small series of actions is taken in order. First,
it will parse the commandline as usual, but it will only interpret
a small subset of the options immediately: \type{--safer}, \type{--nosocket},
\type{--[no-]shell-escape}, \type{--enable-write18}, \type{--disable-write18},
\type{--shell-restricted}, \type{--help}, \type{--version}, and \type{--credits}.

Now it searches for the requested \LUA\ initialization script. If it
cannot be found using the actual name given on the commandline, a
second attempt is made by prepending the value of the environment
variable \type{LUATEXDIR}, if that variable is defined in the environment.

Then it checks the various safety switches. You can use those to disable
some \LUA\ commands that can easily be abused by a malicious document. At
the moment, \type{--safer} \type{nil}s the following functions:

\starttabulate[|l|l|]
\NC \bf library \NC \bf functions                         \NC \NR
\NC \tt os      \NC \tt execute exec setenv rename remove tmpdir \NC \NR
\NC \tt io      \NC \tt popen output tmpfile              \NC \NR
\NC \tt lfs     \NC \tt rmdir mkdir chdir lock touch      \NC \NR
\stoptabulate

Furthermore, it disables loading of compiled \LUA\ libraries (support
for these was added in 0.46.0), and it makes \lua{io.open()} fail on
files that are opened for anything besides reading.

\type{--nosocket} makes the socket library unavailable, so that
\LUA\ cannot use networking.

The switches \type{--[no-]shell-escape}, \type{--[enable|disable]-write18}, and
\type{--shell-restricted} have the same
effects as in \PDFTEX, and additionally make
\type{io.popen()}, \type{os.execute}, \type{os.exec} and \type{os.spawn}
adhere to the requested option.

Next the initialization script is loaded and executed. From within the
script, the entire commandline is available in the \LUA\ table
\lua{arg}, beginning with \lua {arg[0]}, containing the name of the executable.

Commandline processing happens very early on. So early, in fact, that
none of \TEX's initializations have taken place yet. For that reason,
the tables that deal with typesetting, like \luatex{tex}, \luatex{token},
\luatex{node} and \luatex{pdf}, are off|-|limits during the execution
of the startup file (they are nilled). Special care is taken that \luatex{texio.write} and
\luatex{texio.write_nl} function properly, so that you can at least
report your actions to the log file when (and if) it eventually
becomes opened (note that \TEX\ does not even know its \tex{jobname}
yet at this point). See \in{chapter}[libraries] for more information
about the \LUATEX-specific \LUA\ extension tables.


Everything you do in the \LUA\ initialization script will remain
visible during the rest of the run, with the exception of the
aforementioned \luatex{tex}, \luatex{token}, \luatex{node} and
\luatex{pdf} tables: those will be initialized
to their documented state after the execution of the script. You
should not store anything in variables or within tables with these
four global names, as they will be overwritten completely.

We recommend you use the startup file only for your own
\TEX|-|independent initializations (if you need any), to parse the
commandline, set values in the \luatex{texconfig} table, and register
the callbacks you need.

\LUATEX\ allows some of the commandline options to be overridden
by reading values from the \luatex{texconfig} table at the end of
script execution (see the description of the \luatex{texconfig} table
later on in this document for more details on which ones exactly).

Unless the \luatex{texconfig} table tells \LUATEX\ not to initialize
\KPATHSEA\ at all (set \luatex{texconfig.kpse_init} to \type{false} for that),
\LUATEX\ acts on some more commandline options after the
initialization script is finished:
in order to initialize the built|-|in \KPATHSEA\ library properly,
\LUATEX\ needs to know the correct program name to use, and for that it
needs to check \type{--progname}, or \type{--ini} and \type{--fmt}, if
\type{--progname} is missing.


\section{\LUA\ changes}

The C coroutine (\COCO) patches from LuaJIT are applied to the \LUA\
core, the used version is 1.1.5.  See \hyphenatedurl{http://coco.luajit.org/}
for details. This functionality currently (0.45) does not work on
non-intel OpenBSD platforms nor on powerpc Linux-es. Additional note:
\type{coroutines.wrap()} under Windows does not inherit the state
of the random generator, it always has an implicit
\type{math.randomseed(1)} that is added by the Windows kernel.

Starting from version 0.45, \LUATEX\ is able to use the kpathsea
library to find \type{require()}d modules. For this purpose,
\type{package.loaders[2]} is replaced by a different loader function,
that decides at runtime whether to use kpathsea or the built-in core
lua function.  It uses \KPATHSEA\ when that is already initialized at
that point in time, otherwise it reverts to using the normal
\type{package.path} loader.

Initialization of \KPATHSEA\ can happen either implicitly (when
\LUATEX\ starts up and the startup script has not set
\type{texconfig.kpse_init} to false), or explicitly by calling the
\LUA\ function \type{kpse.set_program_name()}.


Starting from version 0.46.0 (as an {\bf experimental} feature!) \LUATEX\ is
also able to use dynamically loadable \LUA\ libraries, unless
\type{--safer} was given as an option on the command line.

For this purpose, \type{package.loaders[3]} is replaced by a different
loader function, that decides at runtime whether to use kpathsea or
the build-in core lua function.  As in the previous paragraph, it uses
\KPATHSEA\ when that is already initialized at that point in time,
otherwise it reverts to using the normal \type{package.cpath} loader.

This functionality required an extension to kpathsea:

\startnarrower
There is a new kpathsea file format: \type{kpse_clua_format} that
searches for files with extension \type{.dll} and \type{.so}.  The
\type{texmf.cnf} setting for this variable is \type{CLUAINPUTS}, and
by default it has this value:

\starttyping
CLUAINPUTS=.:$SELFAUTOLOC/lib/{$progname,$engine,}/lua//
\stoptyping %$

This path is imperfect (it requires a TDS subtree below the binaries
directory), but the architecture has to be in the path somewhere, and
the currently simplest way to do that is to search below the binaries
directory only.

One level up (a \type{lib} directory parallel to \type{bin}) would
have been nicer, but that is not doable because \TEXLIVE\ uses a
\type{bin/<arch>} structure.
\stopnarrower

In keeping with the other \TEX-like programs in \TEXLIVE, the two
\LUA\ functions
\type{os.execute} and \type{io.popen} (as well as the two new functions \type{os.exec}
and \type{os.spawn} that are explained below) take the value of \type{shell_escape}
and/or \type{shell_escape_commands} in account. Whenever \LUATEX\ is run with the
assumed intention to typeset a document (and by that I mean that it is called as
\type{luatex}, as opposed to \type{texlua}, and that the commandline option
\type{--luaonly} was not given), it will only run the four functions above if the
matching texmf.cnf variable(s) or their \type{texconfig} (see~\in{section}[texconfig])
counterparts allow execution of the requested system command. In \quote{script
interpreter} runs of \LUATEX, these settings have no effect, and all four functions
function as normal. This change is new in 0.37.0.


The \lua{read("*line")} function from the io library has been
adjusted so that it is line|-|ending neutral: any of \type{LF}, \type
{CR} or \type{CR+LF} are acceptable line endings.

The \lua{tostring()} printer for numbers has been changed so that it
returns~\type{0.00000000000001} instead of~\hbox{\type{1e-14}} (which
confused \TEX\ enormously). Even values with an even smaller exponent
print simply as~\type{0}.

\lua{luafilesystem} has been extended: there are two extra boolean functions
(\luatex{lfs.isdir(filename)} and \luatex{lfs.isfile(filename)}) and
one extra string field in its attributes table
(\type{permissions}). There is an additional function (added in 0.51)
\type{lfs.shortname()} which takes a file name and returns its short
name on WIN32 platforms. On other platforms, it just returns the given
argument. The file name is not tested for existence. Finally, for
non-WIN32 platforms only, there is the new function
\type{lfs.readlink()} (added in 0.51) that takes an existing symbolic
link as argument and returns its content. It returns an error on
WIN32.

The \lua{string} library has an extra function:
\luatex{string.explode(s[,m])}. This function returns an array containing
the string argument \type{s} split into sub-strings based on the value
of the string argument \type{m}. The second argument is a string that
is either empty (this splits the string into characters), a single
character (this splits on each occurrence of that character, possibly
introducing empty strings), or a single character followed by the plus
sign \type{+} (this special version does not create empty
sub-strings). The default value for \type{m} is \quote{\type{ +}} (multiple
spaces).

Note: \type{m} is not hidden by surrounding braces (as it would be if
this function was written in \TEX\ macros).

The \lua{string} library also has six extra iterators that return strings
piecemeal:

\startitemize
\item \luatex{string.utfvalues(s)} (returns an integer value in the
\UNICODE\ range)
\item \luatex{string.utfcharacters(s)} (returns a string with a single
\UTF-8 token in it)
\item \luatex{string.characters(s)} (a string containing one byte)
\item \luatex{string.characterpairs(s)} (two strings each containing one byte) will
produce an empty second string if the string length was odd.
\item \luatex{string.bytes(s)} (a single byte value)
\item \luatex{string.bytepairs(s)} (two byte values) Will produce nil instead of a
number as its second return value if the string length was odd.
\stopitemize

The \luatex{string.characterpairs()} and \luatex{string.bytepairs()}
are useful especially in the conversion of UTF-16 encoded data into UTF-8.

Note: The \lua{string} library functions \luatex{len}, \luatex{lower},
\luatex{sub} etc. are not \UNICODE|-|aware.  For strings in the UTF-8
encoding, i.e., strings containing characters above code point 127, the
corresponding functions from the \lua{slnunicode} library can be used,
e.g., \luatex{unicode.utf8.len}, \luatex{unicode.utf8.lower} etc.  The
exceptions are \luatex{unicode.utf8.find}, that always returns byte
positions in a string, and \luatex{unicode.utf8.match} and
\luatex{unicode.utf8.gmatch}.  While the latter two functions in general
{\it are} \UNICODE|-|aware, they fall-back to non|-|\UNICODE|-|aware
behavior when using the empty capture \lua{()} (other captures work as
expected).  For the interpretation of character classes in
\luatex{unicode.utf8} functions refer to the library sources at
\hyphenatedurl{http://luaforge.net/projects/sln}.  The \lua{slnunicode}
library will be replaced by an internal \UNICODE\ library in a future
\LUATEX\ version.
\blank

The \lua{os} library has a few extra functions and variables:

\startitemize
\item \luatex{os.selfdir} is a variable that holds the directory path
of the actual executable.  For example: {\tt \directlua{tex.sprint(os.selfdir)}}
(present since 0.27.0).

\item \luatex{os.exec(commandline)} is a variation on \lua{os.execute}.

  The \type{commandline} can be either a single string or a single table.

  If the argument is a table: \LUATEX\ first checks if there is a value at
  integer index zero. If there is, this is the command to be executed. Otherwise,
  it will use the value at integer index one. (if neither are present, nothing
  at all happens).

  The set of consecutive values starting at integer 1 in the table are
  the arguments that are passed on to the command (the value at index 1
  becomes \type{arg[0]}).  The command is searched for in the execution path,
  so there is normally no need to pass on a fully qualified pathname.

  If the argument is a string, then it is automatically converted into
  a table by splitting on whitespace. In this case, it is impossible
  for the command and first argument to differ from each other.

  In the string argument format, whitespace can be protected by putting (part
  of) an argument inside single or double quotes.  One layer of quotes is
  interpreted by \LUATEX, and all occurrences of \tex{"}, \tex{'} or
  \type{\\} within the quoted text are un-escaped.  In the table format, there
  is no string handling taking place.

  This function normally does not return control back to the \LUA\ script: the
  command will replace the current process. However, it will return the two values
  \type{nil} and \type {'error'} if there was a problem while attempting to execute the command.

  On Windows, the current process is actually kept in memory until after the
  execution of the command has finished. This prevents crashes in situations
  where \TEXLUA\ scripts are run inside integrated \TEX\ environments.

  The original reason for this command is that it cleans out the current
  process before starting the new one, making it especially useful for
  use in \TEXLUA.

\item \luatex{os.spawn(commandline)} is a returning version of \lua{os.exec},
  with otherwise identical calling conventions.

  If the command ran ok, then the return value is the exit status of the
  command. Otherwise, it will return the two values \type{nil} and \type {'error'}.

\item \luatex{os.setenv('key','value')}
  This sets a variable in the environment. Passing \lua{nil} instead of a
  value string will remove the variable.

\item \luatex{os.env}
  This is a hash table containing a dump of the variables and values
  in the process environment at the start of the run. It is writeable,
  but the actual environment is {\em not\/} updated automatically.

\item \luatex{os.gettimeofday()}
Returns the current \quote {\UNIX\ time}, but as a float. This function is
not available on the \SUNOS\ platforms, so do not use this function
for portable documents.

\item \luatex{os.times()}
Returns the current process times according to \ the \UNIX\ C library function
\quote {times}. This function is not available on the \MSWINDOWS\
and \SUNOS\ platforms, so do not use this function for portable
documents.

\item \luatex{os.tmpdir()} This will create a directory in the \quote {current
directory} with the name \type{luatex.XXXXXX} where the \type {X}-es are
replaced by a unique string. The function also returns this string,
so you can \type{lfs.chdir()} into it, or \type{nil} if it failed to
create the directory.  The user is responsible for cleaning up at
the end of the run, it does not happen automatically.

\item \luatex{os.type}
This is a string  that gives a global indication of the class of operating
system. The possible values are currently \type{windows}, \type{unix}, and
\type{msdos} (you are unlikely to find this value \quote {in the wild}).

\item \luatex{os.name}
This is a string that gives a more precise indication of the operating
system. These possible values are not yet fixed, and for \type{os.type} values
\type{windows} and \type{msdos}, the \type{os.name} values are simply
\type{windows} and \type{msdos}

The list for the type \type{unix} is more precise: \type{linux},
\type{freebsd}, \type{kfreebsd} (since 0.51), \type{cygwin} (since
0.53), \type{openbsd}, \type{solaris}, \type{sunos} (pre-solaris),
\type{hpux}, \type{irix}, \type{macosx}, \type{gnu} (hurd), \type{bsd} (unknown, but \BSD|-|like),
\type{sysv} (unknown, but \SYSV|-|like), \type{generic} (unknown).

(\type{os.version} is planned as a future extension)

\item \luatex{os.uname()}
This function returns a table with specific operating system
information acquired at runtime. The keys in the returned table are
all string valued, and their names are: \type{sysname}, \type{machine},
\type{release}, \type{version}, and \type{nodename}.


\stopitemize

In stock \LUA, many things depend on the current locale. In \LUATEX, we can't do
that, because it makes documents unportable.  While \LUATEX\ is running if
forces the following locale settings:

\starttyping
LC_CTYPE=C
LC_COLLATE=C
LC_NUMERIC=C
\stoptyping

\section {\LUA\ modules}

Some modules that are normally external to \LUA\ are statically linked
in with \LUATEX, because they offer useful functionality:

\startitemize
\item \lua{slnunicode}, from the \type {Selene} libraries, \hyphenatedurl{http://luaforge.net/projects/sln}. (version 1.1)

This library has been slightly extended so that the \type{unicode.utf8.*}
functions also accept the first 256 values of plane~18. This is the range \LUATEX\
uses for raw binary output, as explained above.

\item \lua{luazip}, from the kepler project, \hyphenatedurl{http://www.keplerproject.org/luazip/}.
  (version 1.2.1, but patched for compilation with \LUA\ 5.1)
\item \lua{luafilesystem}, also from the kepler project, \hyphenatedurl{http://www.keplerproject.org/luafilesystem/}.
  (version 1.5.0)
\item \lua{lpeg}, by Roberto Ierusalimschy, \hyphenatedurl{http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html}. (version 0.9.0)

Note: \lua{lpeg} is not \UNICODE|-|aware, but interprets strings on a
byte|-|per|-|byte basis. This mainly means that \luatex{lpeg.S} cannot be
used with characters above code point 127, since those characters are
encoded using two bytes, and thus \luatex{lpeg.S} will look for one
of those two bytes when matching, not the combination of the two.

The same is true for \luatex{lpeg.R}, although the latter will display
an error message if used with characters above code point 127: I.\,e.\
\luatex{lpeg.R('aä')} results in the message \type{bad argument #1 to
'R' (range must have two characters)}, since to \lua{lpeg}, \type{ä}
is two 'characters' (bytes), so \type{aä} totals three.

\item \lua{lzlib}, by Tiago Dionizio, \hyphenatedurl{http://luaforge.net/projects/lzlib/}. (version 0.2)
\item \lua{md5}, by Roberto Ierusalimschy \hyphenatedurl{http://www.inf.puc-rio.br/~roberto/md5/md5-5/md5.html}.

\item \lua{luasocket}, by Diego Nehab
\hyphenatedurl{http://w3.impa.br/~diego/software/luasocket/}
(version 2.0.2).

Note: the \type{.lua} support modules from \type{luasocket} are also
preloaded inside the executable, there are no external file dependencies.
\stopitemize


\chapter[libraries]{\LUATEX\ \LUA\ Libraries}

The interfacing between \TEX\ and \LUA\ is facilitated by a set of
library modules. The \LUA\ libraries in this chapter are all defined and
initialized by the \LUATEX\ executable. Together, they allow \LUA\
scripts to query and change a number of \TEX's internal variables, run
various internal \TEX\ functions, and set up \LUATEX's hooks to execute
\LUA\ code.

The following sections are in alphabetical order.

\section{The \luatex{callback} library}

This library has functions that register, find and list callbacks.

A quick note on what callbacks are (thanks, Paul!):

Callbacks are entry points to \LUATEX's internal operations, which can be
interspersed with additional \LUA\ code, and even replaced altogether.
In the first case, \TEX\ is simply augmented with new operations
(for instance, a manipulation of the nodes resulting from the paragraph
builder); in the second case, its hard-coded behavior (for instance, the
paragraph builder itself) is ignored and processing relies on user code only.

More precisely, the code to be inserted at a given callback is a function
(an anonymous function or the name of a function variable); % Is this line useful?
it will receive the arguments associated with the callback, if any, and must
frequently return some other arguments for \TEX\ to resume its operations.

The first task is registering a callback:

\startfunctioncall
id, error = callback.register (<string> callback_name, <function> func)
id, error = callback.register (<string> callback_name, nil)
id, error = callback.register (<string> callback_name, false)
\stopfunctioncall

where the \syntax{callback_name} is a predefined callback name, see
below. The function returns the internal \type{id} of the callback
or \type{nil}, if the callback could not be registered. In the latter
case, \type{error} contains an error message, otherwise it is
\type{nil}.

\LUATEX\ internalizes the callback function in such a way that
it does not matter if you redefine a function accidentally.

Callback assignments are always global. You can use the special value
\type {nil} instead of a function for clearing the callback.

For some minor speed gain, you can assign the boolean \type{false} to
the non-file related callbacks, doing so will prevent \LUATEX\ from
executing whatever it would execute by default (when no callback
function is registered at all). Be warned: this may cause all sorts of
grief unless you know {\it exactly} what you are doing! This functionality
is present since version 0.38.

Currently, callbacks are not dumped into the format file.

\startfunctioncall
<table> info = callback.list()
\stopfunctioncall

The keys in the table are the known callback names, the value is a
boolean where \type{true} means that the callback is currently set
(active).

\startfunctioncall
<function> f = callback.find (callback_name)
\stopfunctioncall

If the callback is not set, \luatex{callback.find} returns \type{nil}.

\subsection{File discovery callbacks}

The behavior documented in this subsection is considered stable in the
sense that there will not be backward-incompatible changes any more.

\subsubsection{\luatex{find_read_file} and \luatex{find_write_file}}

Your callback function should have the following conventions:

\startfunctioncall
<string> actual_name = function (<number> id_number, <string> asked_name)
\stopfunctioncall

Arguments:

\startitemize

\sym{id_number}

This number is zero for the log or \tex {input} files. For \TEX's \tex{read} or
\tex{write} the number is incremented by one, so \tex{read0} becomes~1.

\sym{asked_name}

This is the user|-|supplied filename, as found by \tex{input}, \tex{openin}
or \tex{openout}.

\stopitemize

Return value:

\startitemize

\sym{actual_name}

This is the filename used. For the very first file that is read in by
\TEX, you have to make sure you return an \type{actual_name} that has
an extension and that is suitable for use as \type{jobname}. If you
don't, you will have to manually fix the name of the log file and
output file after \LUATEX\ is finished, and an eventual format
filename will become mangled. That is because these file names depend
on the jobname.

You have to return \type{nil} if the file cannot be found.

\stopitemize

\subsubsection{\luatex{find_font_file}}

Your callback function should have the following conventions:

\startfunctioncall
<string> actual_name = function (<string> asked_name)
\stopfunctioncall

The \type{asked_name} is an \OTF\ or \TFM\ font metrics file.

Return \type{nil} if the file cannot be found.

\subsubsection{\luatex{find_output_file}}

Your callback function should have the following conventions:

\startfunctioncall
<string> actual_name = function (<string> asked_name)
\stopfunctioncall

The \type{asked_name} is the \PDF\ or \DVI\ file for writing.

\subsubsection{\luatex{find_format_file}}

Your callback function should have the following conventions:

\startfunctioncall
<string> actual_name = function (<string> asked_name)
\stopfunctioncall

The \type{asked_name} is a format file for reading (the format file
for writing is always opened in the current directory).

\subsubsection{\luatex{find_vf_file}}

Like \luatex{find_font_file}, but for virtual fonts. This applies to
both \ALEPH's \OVF\ files and traditional Knuthian \VF\ files.

\subsubsection{\luatex{find_map_file}}

Like \luatex{find_font_file}, but for map files.

\subsubsection{\luatex{find_enc_file}}

Like \luatex{find_font_file}, but for enc files.

\subsubsection{\luatex{find_sfd_file}}

Like \luatex{find_font_file}, but for subfont definition files.

\subsubsection{\luatex{find_pk_file}}

Like \luatex{find_font_file}, but for pk bitmap files. The argument
\type{asked_name} is a bit special in this case. Its form is

\starttyping
<base res>dpi/<fontname>.<actual res>pk
\stoptyping

So you may be asked for \type{600dpi/manfnt.720pk}.  It is up to you
to find a \quote{reasonable} bitmap file to go with that specification.

\subsubsection{\luatex{find_data_file}}

Like \luatex{find_font_file}, but for embedded files (\tex{pdfobj file '...'}).

\subsubsection{\luatex{find_opentype_file}}

Like \luatex{find_font_file}, but for \OPENTYPE\ font files.

\subsubsection{\luatex{find_truetype_file} and \luatex{find_type1_file}}

Your callback function should have the following conventions:

\startfunctioncall
<string> actual_name = function (<string> asked_name)
\stopfunctioncall

The \type{asked_name} is a font file. This callback is called while
\LUATEX\ is building its internal list of needed font files, so the
actual timing may surprise you. Your return value is later fed back
into the matching \luatex{read_file} callback.

Strangely enough, \luatex{find_type1_file} is also used for \OPENTYPE\
(\OTF) fonts.

\subsubsection{\luatex{find_image_file}}

Your callback function should have the following conventions:

\startfunctioncall
<string> actual_name = function (<string> asked_name)
\stopfunctioncall

The \type{asked_name} is an image file. Your return value is used to
open a file from the harddisk, so make sure you return something that
is considered the name of a valid file by your operating system.

\subsection[iocallback]{File reading callbacks}

The behavior documented in this subsection is considered stable in the
sense that there will not be backward-incompatible changes any more.

\subsubsection{\luatex{open_read_file}}

Your callback function should have the following conventions:

\startfunctioncall
<table> env = function (<string> file_name)
\stopfunctioncall

Argument:

\startitemize

\sym{file_name}

The filename returned by a previous \luatex{find_read_file} or the return
value of \luatex{kpse.find_file()} if there was no such callback defined.

\stopitemize

Return value:

\startitemize

\sym{env}

This is a table containing at least one required and one optional
callback function for this file. The required field is
\luatex{reader} and the associated function will be called once
for each new line to be read, the optional one is \luatex{close}
that will be called once when \LUATEX\ is done with the file.

\LUATEX\ never looks at the rest of the table, so you can use it to
store your private per|-|file data. Both the callback functions will
receive the table as their only argument.

\stopitemize

\subsubsubsection{\luatex{reader}}

\LUATEX\ will run this function whenever it needs a new input line
from the file.

\startfunctioncall
function(<table> env)
    return <string> line
end
\stopfunctioncall

Your function should return either a string or \type{nil}. The value \type{nil}
signals that the end of file has occurred, and will make \TEX\ call
the optional \luatex{close} function next.

\subsubsubsection{\luatex{close}}

\LUATEX\ will run this optional function when it decides to close the file.

\startfunctioncall
function(<table> env)
end
\stopfunctioncall

Your function should not return any value.

\subsubsection{General file readers}

There is a set of callbacks for the loading of binary data
files. These all use the same interface:

\startfunctioncall
function(<string> name)
    return <boolean> success, <string> data, <number> data_size
end
\stopfunctioncall

The \type{name} will normally be a full path name as it is returned by
either one of the file discovery callbacks or the internal version of
\luatex{kpse.find_file()}.

\startitemize

\sym{success}

Return \type{false} when a fatal error occurred (e.\,g.\ when the file cannot be
found, after all).

\sym{data}

The bytes comprising the file.

\sym{data_size}

The length of the \type{data}, in bytes.

\stopitemize

Return an empty string and zero if the file was found but there was a
reading problem.

The list of functions is as follows:

\starttabulate[|l|p|]
\NC \luatex{read_font_file}     \NC ofm or tfm files \NC\NR
\NC \luatex{read_vf_file}       \NC virtual fonts \NC\NR
\NC \luatex{read_map_file}      \NC map files \NC\NR
\NC \luatex{read_enc_file}      \NC encoding files \NC\NR
\NC \luatex{read_sfd_file}      \NC subfont definition files \NC\NR
\NC \luatex{read_pk_file}       \NC pk bitmap files \NC\NR
\NC \luatex{read_data_file}     \NC embedded files (\tex{pdfobj file ...}) \NC\NR
\NC \luatex{read_truetype_file} \NC \TRUETYPE\ font files \NC\NR
\NC \luatex{read_type1_file}    \NC \TYPEONE\ font files \NC\NR
\NC \luatex{read_opentype_file} \NC \OPENTYPE\ font files \NC\NR
\stoptabulate

\subsection{Data processing callbacks}

\subsubsection{\luatex{process_input_buffer}}


This callback allows you to change the contents of the line input
buffer just before \LUATEX\ actually starts looking at it.

\startfunctioncall
function(<string> buffer)
    return <string> adjusted_buffer
end
\stopfunctioncall

If you return \type{nil}, \LUATEX\ will pretend like your callback
never happened. You can gain a small amount of processing time from
that.

This callback does not replace any internal code.

\subsubsection{\luatex{process_output_buffer} (0.43)}

This callback allows you to change the contents of the line output
buffer just before \LUATEX\ actually starts writing it to a file as the
result of a \tex{write} command. It is only called for output to an
actual file (that is, excluding the log, the terminal, and \tex{write18}
calls).

\startfunctioncall
function(<string> buffer)
    return <string> adjusted_buffer
end
\stopfunctioncall

If you return \type{nil}, \LUATEX\ will pretend like your callback
never happened. You can gain a small amount of processing time from
that.

This callback does not replace any internal code.

\subsubsection{\luatex{token_filter}}

This callback allows you to replace the way \LUATEX\ fetches
lexical tokens.

\startfunctioncall
function()
    return <table> token
end
\stopfunctioncall

The calling convention for this callback is a bit more complicated than
for most other callbacks.  The function should either return a \LUA\
table representing a valid to|-|be|-|processed token or tokenlist, or
something else like \type{nil} or an empty table.

If your \LUA\ function does not return a table representing a valid
token, it will be immediately called again, until it eventually does
return a useful token or tokenlist (or until you reset the callback
value to nil). See the description of \luatex{token} for some
handy functions to be used in conjunction with this callback.

If your function returns a single usable token, then that token will
be processed by \LUATEX\ immediately. If the function returns a token
list (a table consisting of a list of consecutive token tables), then
that list will be pushed to the input stack at a completely new token
list level, with its token type set to \quote{inserted}. In either case,
the returned token(s) will not be fed back into the callback function.

Setting this  callback to \type{false} has no effect (because otherwise
nothing would happen, forever).

\subsection{Node list processing callbacks}

The description of nodes and node lists is in~\in{chapter}[nodes].

\subsubsection{\luatex{buildpage_filter}}

This callback is called whenever \LUATEX\ is ready to move stuff to
the main vertical list. You can use this callback to do specialized
manipulation of the page building stage like imposition or column
balancing.

\startfunctioncall
function(<string> extrainfo)
end
\stopfunctioncall

The string \type{extrainfo} gives some additional information about
what \TEX's state is with respect to the \quote{current page}. The possible
values are:

\starttabulate[|lT|p|]
\NC \ssbf value     \NC \bf explanation                        \NC\NR
\NC alignment       \NC a (partial) alignment is being added   \NC\NR
\NC after_output    \NC an output routine has just finished    \NC\NR
\NC box             \NC a typeset box is being added           \NC\NR
%\NC pre_box         \NC interline material is being added      \NC\NR
%\NC adjust          \NC \tex{vadjust} material is being added  \NC\NR
\NC new_graf        \NC the beginning of a new paragraph       \NC\NR
\NC vmode_par       \NC \tex{par} was found in vertical mode   \NC\NR
\NC hmode_par       \NC \tex{par} was found in horizontal mode \NC\NR
\NC insert          \NC an insert is added                     \NC\NR
\NC penalty         \NC a penalty (in vertical mode)           \NC\NR
\NC before_display  \NC immediately before a display starts    \NC\NR
\NC after_display   \NC a display is finished                  \NC\NR
\NC end             \NC \LUATEX\ is terminating (it's all over)\NC\NR
\stoptabulate

This callback does not replace any internal code.


\subsubsection{\luatex{pre_linebreak_filter}}

This callback is called just before \LUATEX\ starts converting a list
of nodes into a stack of \tex{hbox}es, after the addition of
\type{\parfillskip}.

\startfunctioncall
function(<node> head, <string> groupcode)
    return true | false | <node> newhead
end
\stopfunctioncall

The string called \type {groupcode} identifies the nodelist's context
within \TEX's processing. The range of possibilities is given in the
table below, but not all of those can actually appear in
\luatex {pre_linebreak_filter}, some are for the
\luatex {hpack_filter} and \luatex {vpack_filter} callbacks that
will be explained in the next two paragraphs.

\starttabulate[|lT|p|]
\NC \ssbf value        \NC \bf explanation                     \NC\NR
\NC <empty>      \NC main vertical list                        \NC\NR
\NC hbox         \NC \tex{hbox} in horizontal mode             \NC\NR
\NC adjusted_hbox\NC \tex{hbox} in vertical mode               \NC\NR
\NC vbox         \NC \tex{vbox}                                \NC\NR
\NC vtop         \NC \tex{vtop}                                \NC\NR
\NC align        \NC \tex{halign} or \tex{valign}              \NC\NR
\NC disc         \NC discretionaries                           \NC\NR
\NC insert       \NC packaging an insert                       \NC\NR
\NC vcenter      \NC \tex{vcenter}                             \NC\NR
\NC local_box    \NC \tex{localleftbox} or \tex{localrightbox} \NC\NR
\NC split_off    \NC top of a \tex{vsplit}                     \NC\NR
\NC split_keep   \NC remainder of a \tex{vsplit}               \NC\NR
\NC align_set    \NC alignment cell                            \NC\NR
\NC fin_row      \NC alignment row                             \NC\NR
\stoptabulate

As for all the callbacks that deal with nodes, the return value can be one of three things:

\startitemize
\item boolean \type{true} signals succesful processing
\item \type{<node>} signals that the \quote{head} node should be replaced by the returned node
\item boolean \type{false} signals that the \quote{head} node list should be ignored and
flushed from memory
\stopitemize


This callback does not replace any internal code.


\subsubsection{\luatex{linebreak_filter}}

This callback replaces \LUATEX's line breaking algorithm.

\startfunctioncall
function(<node> head, <boolean> is_display)
    return <node> newhead
end
\stopfunctioncall

The returned node is the head of the list that will be added to the
main vertical list, the boolean argument is true if this paragraph is
interrupted by a following math display.

If you return something that is not a \type{<node>}, \LUATEX\ will
apply the internal linebreak algorithm on the list that starts at
\type{<head>}. Otherwise, the \type{<node>} you return is supposed
to be the head of a list of nodes that are all allowed in vertical
mode, and at least one of those has to represent a hbox. Failure to do
so will result in a fatal error.

Setting this  callback to \type{false} is possible, but dangerous,
because it is possible you will end up in an unfixable
\quote{deadcycles loop}.

\subsubsection{\luatex{post_linebreak_filter}}

This callback is called just after \LUATEX\ has converted a list
of nodes into a stack of \tex{hbox}es.

\startfunctioncall
function(<node> head, <string> groupcode)
    return true | false | <node> newhead
end
\stopfunctioncall

This callback does not replace any internal code.

\subsubsection{\luatex{hpack_filter}}

This callback is called when \TEX\ is ready to start boxing some
horizontal mode material. Math items and line boxes are ignored
at the moment.

\startfunctioncall
function(<node> head, <string> groupcode, <number> size,
         <string> packtype [, <string> direction])
    return true | false | <node> newhead
end
\stopfunctioncall

The \type{packtype} is either \type{additional} or \type{exactly}. If
\type{additional}, then the \type{size} is a \tex{hbox spread ...}
argument. If \type{exactly}, then the \type{size} is a \tex{hbox to ...}.
In both cases, the number is in scaled points.

The \type{direction} is either one of the three-letter direction specifier
strings, or \type{nil} (added in 0.45).


This callback does not replace any internal code.

\subsubsection{\luatex{vpack_filter}}

This callback is called when \TEX\ is ready to start boxing some
vertical mode material. Math displays are ignored at the moment.

This function is very similar to the \luatex{hpack_filter}. Besides
the fact that it is called at different moments, there is an extra
variable that matches \TEX's \tex{maxdepth} setting.

\startfunctioncall
function(<node> head, <string> groupcode, <number> size, <string>
         packtype,  <number> maxdepth [, <string> direction])
    return true | false | <node> newhead
end
\stopfunctioncall

This callback does not replace any internal code.

\subsubsection{\luatex{pre_output_filter}}

This callback is called when \TEX\ is ready to start boxing the
box 255 for \tex{output}.

\startfunctioncall
function(<node> head, <string> groupcode, <number> size, <string> packtype,
        <number> maxdepth [, <string> direction])
    return true | false | <node> newhead
end
\stopfunctioncall

This callback does not replace any internal code.

\subsubsection{\luatex{hyphenate}}

\startfunctioncall
function(<node> head, <node> tail)
end
\stopfunctioncall

No return values. This callback has to insert discretionary nodes in
the node list it receives.

Setting this  callback to \type{false} will prevent the internal
discretionary insertion pass.

\subsubsection{\luatex{ligaturing}}

\startfunctioncall
function(<node> head, <node> tail)
end
\stopfunctioncall

No return values. This callback has to apply ligaturing to the node
list it receives.

You don't have to worry about return values because the \type{head}
node that is passed on to the callback is guaranteed not to be a
glyph_node (if need be, a temporary node will be prepended), and
therefore it cannot be affected by the mutations that take place.
After the callback, the internal value of the \quote {tail of the list}
will be recalculated.

The \type{next} of \type{head} is guaranteed to be non-nil.

The \type{next} of \type{tail} is guaranteed to be nil, and therefore the
second callback argument can often be ignored. It is provided for
orthogonality, and because it can sometimes be handy when special
processing has to take place.

Setting this  callback to \type{false} will prevent the internal
ligature creation pass.

\subsubsection{\luatex{kerning}}

\startfunctioncall
function(<node> head, <node> tail)
end
\stopfunctioncall

No return values. This callback has to apply kerning between the nodes
in the node list it receives. See \type{ligaturing} for calling
conventions.

Setting this  callback to \type{false} will prevent the internal
kern insertion pass.

\subsubsection{\luatex{mlist_to_hlist}}

This callback replaces \LUATEX's math list to node list conversion algorithm.

\startfunctioncall
function(<node> head, <string> display_type, <boolean> need_penalties)
    return <node> newhead
end
\stopfunctioncall

The returned node is the head of the list that will be added to the vertical or
horizontal list, the string argument is either \quote{text} or \quote{display}
depending on the current math mode, the boolean argument is \type{true} if penalties
have to be inserted in this list, \type{false} otherwise.

Setting this  callback to \type{false} is bad, it will almost
certainly result in an endless loop.

\subsection{Information reporting callbacks}

\subsubsection{\luatex{pre_dump} (0.61)}

\startfunctioncall
function()
end
\stopfunctioncall

This function is called just before dumping to a format file starts.
It does not replace any code and there are neither arguments nor return values.

\subsubsection{\luatex{start_run}}

\startfunctioncall
function()
end
\stopfunctioncall

This callback replaces the code that prints \LUATEX's banner. Note that for
successful use, this callback has to be set in the lua initialization script,
otherwise it will be seen only after the run has already started.

\subsubsection{\luatex{stop_run}}

\startfunctioncall
function()
end
\stopfunctioncall

This callback replaces the code that prints \LUATEX's statistics and \quote{output written
to} messages.

\subsubsection{\luatex{start_page_number}}

\startfunctioncall
function()
end
\stopfunctioncall

Replaces the code that prints the \type{[} and the page number at the
begin of \tex{shipout}. This callback will also override the
printing of box information that normally takes place when
\tex{tracingoutput} is positive.

\subsubsection{\luatex{stop_page_number}}

\startfunctioncall
function()
end
\stopfunctioncall

Replaces the code that prints the \type{]} at the end of \tex{shipout}.

\subsubsection{\luatex{show_error_hook}}

\startfunctioncall
function()
end
\stopfunctioncall

This callback is run from inside the \TEX\ error function, and the idea
is to allow you to do some extra reporting on top of what \TEX\ already
does (none of the normal actions are removed). You may find some of
the values in the \luatex{status} table useful.

This callback does not replace any internal code.

\iffalse % this has been retracted for the moment
\startitemize

\sym{message}

is the formal error message \TEX\ has given to the user.
(the line after the '!').

\sym{indicator}

is either a filename (when it is a string) or a location indicator (a
number) that can mean lots of different things like a token list id
or a \tex{read} number.

\sym{lineno}

is the current line number.
\stopitemize

This is an investigative item for 'testing the water' only.
The final goal is the total replacement of \TEX's error handling
routines, but that needs lots of adjustments in the web source because
\TEX\ deals with errors in a somewhat haphazard fashion. This is why the
exact definition of \type{indicator} is not given here.
\fi

\subsection{PDF-related callbacks}

\subsubsection{\luatex{finish_pdffile}}

\startfunctioncall
function()
end
\stopfunctioncall

This callback is called when all document pages are already written to the \PDF\
file and \LUATEX\ is about to finalize the output document structure. Its intended
use is final update of \PDF\ dictionaries such as \type{/Catalog} or
\type{/Info}. The callback does not replace any code. There are neither
arguments nor return values.

\subsection{Font-related callbacks}

\subsubsection{\luatex{define_font}}

\startfunctioncall
function(<string> name, <number> size, <number> id)
    return <table> font
end
\stopfunctioncall

The string \type{name} is the filename part of the font
specification, as given by the user.

The number \type{size} is a bit special:

\startitemize[packed]
\item if it is positive, it specifies an \quote{at size} in scaled points.
\item if it is negative, its absolute value represents a \quote{scaled}
setting relative to the designsize of the font.
\stopitemize

The \type{id} is the internal number assigned to the font.

The internal structure of the \type{font} table that is to be
returned is explained in \in{chapter}[fonts]. That table is saved
internally, so you can put extra fields in the table for your
later \LUA\ code to use.

Setting this callback to \type{false} is pointless as it will prevent
font loading completely but will nevertheless generate errors.

\section{The \luatex{epdf} library}

The \type{epdf} library provides Lua bindings to many \PDF\ access functions
that are defined by the poppler pdf viewer library (written in C$+{}+$
by Kristian H\o gsberg, based on xpdf by Derek Noonburg).
Within \LUATEX\ (and \PDFTEX),
xpdf functionality is being used since long time to embed \PDF\ files.
The \type{epdf} library shall allow to scrutinize an external \PDF\ file.
It gives access to its document structure,
e.\,g., catalog, cross-reference table, individual pages, objects,
annotations, info, and metadata.

The \type{epdf} library is still in alpha state:
\PDF\ access is currently read|-|only
(it's not yet possible to alter a \PDF\ file or to assemble it from scratch),
and many function bindings are still missing.

For a start,
a \PDF\ file is opened by \type{epdf.open()} with file name, e.\,g.:

\starttyping
doc = epdf.open("foo.pdf")
\stoptyping

This normally returns a \type{PDFDoc} userdata variable;
but if the file could not be opened successfully,
instead of a fatal error just the value \type{nil} is returned.

All Lua functions in the \type{epdf} library are named after the
poppler functions listed in the poppler header files for the various classes,
e.\,g., files \type{PDFDoc.h}, \type{Dict.h}, and \type{Array.h}.
These files can be found in the poppler subdirectory within the \LUATEX\ sources.
Which functions are already implemented in the \type{epdf} library
can be found in the \LUATEX\ source file \type{lepdflib.cc}.
For using the \type{epdf} library,
knowledge of the \PDF\ file architecture is indispensable.

There are many different userdata types defined
by the \type{epdf} library, currently these are
\type{Annot},
\type{AnnotBorder},
\type{AnnotBorderStyle},
\type{Annots},
\type{Array},
\type{Catalog},
\type{EmbFile},
\type{Dict},
\type{GString},
\type{LinkDest},
\type{Object},
\type{ObjectStream},
\type{Page},
\type{PDFDoc},
\type{PDFRectangle},
\type{Ref},
\type{Stream},
\type{XRef}, and
\type{XRefEntry}.

All these userdata names and the Lua access functions closely resemble
the classes naming from the poppler header files,
including the choice of mixed upper and lower case letters.
The Lua function calls use object-oriented syntax, e.\,g.,
the following calls return the \type{Page} object for page~1:

\starttyping
pageref = doc:getCatalog():getPageRef(1)
pageobj = doc:getXRef():fetch(pageref.num, pageref.gen)
\stoptyping

But writing such chained calls is risky,
as an intermediate function may return \type{nil} on error;
therefore between function calls there should be Lua type checks
(e.\,g., against \type{nil}) done.
If a non-object item is requested
(e.\,g., a \type{Dict} item by calling \type{page:getPieceInfo()},
cf.~\type{Page.h}) but not available,
the Lua functions return \type{nil} (without error).
If a function should return an \type{Object}, but it's not existing,
a \type{Null} object is returned instead
(also without error; this is in|-|line with poppler behavior).

All library objects have a \type{__gc} metamethod for garbage collection.
The \type{__tostring} metamethod gives the type name for each object.

All object constructors:

\startfunctioncall
<PDFDoc>       = epdf.open(<string> PDF filename)
<Annot>        = epdf.Annot(<XRef>, <Dict>, <Catalog>, <Ref>)
<Annots>       = epdf.Annots(<XRef>, <Catalog>, <Object>)
<Array>        = epdf.Array(<XRef>)
<Dict>         = epdf.Dict(<XRef>)
<Object>       = epdf.Object()
<PDFRectangle> = epdf.PDFRectangle()
\stopfunctioncall

\type{Annot} methods:

\startfunctioncall
<boolean>     = <Annot>:isOK()
<Object>      = <Annot>:getAppearance()
<AnnotBorder> = <Annot>:getBorder()
<boolean>     = <Annot>:match(<Ref>)
\stopfunctioncall

\type{AnnotBorderStyle} methods:

\startfunctioncall
<number> = <AnnotBorderStyle>:getWidth()
\stopfunctioncall

\type{Annots} methods:

\startfunctioncall
<integer> = <Annots>:getNumAnnots()
<Annot>   = <Annots>:getAnnot(<integer>)
\stopfunctioncall

\type{Array} methods:

\startfunctioncall
            <Array>:incRef()
            <Array>:decRef()
<integer> = <Array>:getLength()
            <Array>:add(<Object>)
<Object>  = <Array>:get(<integer>)
<Object>  = <Array>:getNF(<integer>)
<string>  = <Array>:getString(<integer>)
\stopfunctioncall

\type{Catalog} methods:

\startfunctioncall
<boolean>  = <Catalog>:isOK()
<integer>  = <Catalog>:getNumPages()
<Page>     = <Catalog>:getPage(<integer>)
<Ref>      = <Catalog>:getPageRef(<integer>)
<string>   = <Catalog>:getBaseURI()
<string>   = <Catalog>:readMetadata()
<Object>   = <Catalog>:getStructTreeRoot()
<integer>  = <Catalog>:findPage(<integer> object number, <integer> object generation)
<LinkDest> = <Catalog>:findDest(<string> name)
<Object>   = <Catalog>:getDests()
<integer>  = <Catalog>:numEmbeddedFiles()
<EmbFile>  = <Catalog>:embeddedFile(<integer>)
<integer>  = <Catalog>:numJS()
<string>   = <Catalog>:getJS(<integer>)
<Object>   = <Catalog>:getOutline()
<Object>   = <Catalog>:getAcroForm()
\stopfunctioncall

\type{EmbFile} methods:

\startfunctioncall
<string>   = <EmbFile>:name()
<string>   = <EmbFile>:description()
<integer>  = <EmbFile>:size()
<string>   = <EmbFile>:modDate()
<string>   = <EmbFile>:createDate()
<string>   = <EmbFile>:checksum()
<string>   = <EmbFile>:mimeType()
<Object>   = <EmbFile>:streamObject()
<boolean>  = <EmbFile>:isOk()
\stopfunctioncall

\type{Dict} methods:

\startfunctioncall
            <Dict>:incRef()
            <Dict>:decRef()
<integer> = <Dict>:getLength()
            <Dict>:add(<string>, <Object>)
            <Dict>:set(<string>, <Object>)
            <Dict>:remove(<string>)
<boolean> = <Dict>:is(<string>)
<Object>  = <Dict>:lookup(<string>)
<Object>  = <Dict>:lookupNF(<string>)
<integer> = <Dict>:lookupInt(<string>, <string>)
<string>  = <Dict>:getKey(<integer>)
<Object>  = <Dict>:getVal(<integer>)
<Object>  = <Dict>:getValNF(<integer>)
\stopfunctioncall

\type{LinkDest} methods:

\startfunctioncall
<boolean>  = <LinkDest>:isOK()
<integer>  = <LinkDest>:getKind()
<string>   = <LinkDest>:getKindName()
<boolean>  = <LinkDest>:isPageRef()
<integer>  = <LinkDest>:getPageNum()
<Ref>      = <LinkDest>:getPageRef()
<number>   = <LinkDest>:getLeft()
<number>   = <LinkDest>:getBottom()
<number>   = <LinkDest>:getRight()
<number>   = <LinkDest>:getTop()
<number>   = <LinkDest>:getZoom()
<boolean>  = <LinkDest>:getChangeLeft()
<boolean>  = <LinkDest>:getChangeTop()
<boolean>  = <LinkDest>:getChangeZoom()
\stopfunctioncall

\type{Object} methods:

\startfunctioncall
            <Object>:initBool(<boolean>)
            <Object>:initInt(<integer>)
            <Object>:initReal(<number>)
            <Object>:initString(<string>)
            <Object>:initName(<string>)
            <Object>:initNull()
            <Object>:initArray(<XRef>)
            <Object>:initDict(<XRef>)
            <Object>:initStream(<Stream>)
            <Object>:initRef(<integer> object number, <integer> object generation)
            <Object>:initCmd(<string>)
            <Object>:initError()
            <Object>:initEOF()
<Object>  = <Object>:fetch(<XRef>)
<integer> = <Object>:getType()
<string>  = <Object>:getTypeName()
<boolean> = <Object>:isBool()
<boolean> = <Object>:isInt()
<boolean> = <Object>:isReal()
<boolean> = <Object>:isNum()
<boolean> = <Object>:isString()
<boolean> = <Object>:isName()
<boolean> = <Object>:isNull()
<boolean> = <Object>:isArray()
<boolean> = <Object>:isDict()
<boolean> = <Object>:isStream()
<boolean> = <Object>:isRef()
<boolean> = <Object>:isCmd()
<boolean> = <Object>:isError()
<boolean> = <Object>:isEOF()
<boolean> = <Object>:isNone()
<boolean> = <Object>:getBool()
<integer> = <Object>:getInt()
<number>  = <Object>:getReal()
<number>  = <Object>:getNum()
<string>  = <Object>:getString()
<string>  = <Object>:getName()
<Array>   = <Object>:getArray()
<Dict>    = <Object>:getDict()
<Stream>  = <Object>:getStream()
<Ref>     = <Object>:getRef()
<integer> = <Object>:getRefNum()
<integer> = <Object>:getRefGen()
<string>  = <Object>:getCmd()
<integer> = <Object>:arrayGetLength()
          = <Object>:arrayAdd(<Object>)
<Object>  = <Object>:arrayGet(<integer>)
<Object>  = <Object>:arrayGetNF(<integer>)
<integer> = <Object>:dictGetLength(<integer>)
          = <Object>:dictAdd(<string>, <Object>)
          = <Object>:dictSet(<string>, <Object>)
<Object>  = <Object>:dictLookup(<string>)
<Object>  = <Object>:dictLookupNF(<string>)
<string>  = <Object>:dictgetKey(<integer>)
<Object>  = <Object>:dictgetVal(<integer>)
<Object>  = <Object>:dictgetValNF(<integer>)
<boolean> = <Object>:streamIs()
          = <Object>:streamReset()
<integer> = <Object>:streamGetChar()
<integer> = <Object>:streamLookChar()
<integer> = <Object>:streamGetPos()
          = <Object>:streamSetPos(<integer>)
<Dict>    = <Object>:streamGetDict()
\stopfunctioncall

\type{Page} methods:

\startfunctioncall
<boolean>      = <Page>:isOk()
<integer>      = <Page>:getNum()
<PDFRectangle> = <Page>:getMediaBox()
<PDFRectangle> = <Page>:getCropBox()
<boolean>      = <Page>:isCropped()
<number>       = <Page>:getMediaWidth()
<number>       = <Page>:getMediaHeight()
<number>       = <Page>:getCropWidth()
<number>       = <Page>:getCropHeight()
<PDFRectangle> = <Page>:getBleedBox()
<PDFRectangle> = <Page>:getTrimBox()
<PDFRectangle> = <Page>:getArtBox()
<integer>      = <Page>:getRotate()
<string>       = <Page>:getLastModified()
<Dict>         = <Page>:getBoxColorInfo()
<Dict>         = <Page>:getGroup()
<Stream>       = <Page>:getMetadata()
<Dict>         = <Page>:getPieceInfo()
<Dict>         = <Page>:getSeparationInfo()
<Dict>         = <Page>:getResourceDict()
<Object>       = <Page>:getAnnots()
<Links>        = <Page>:getLinks(<Catalog>)
<Object>       = <Page>:getContents()
\stopfunctioncall

\type{PDFDoc} methods:

\startfunctioncall
<boolean>  = <PDFDoc>:isOk()
<integer>  = <PDFDoc>:getErrorCode()
<string>   = <PDFDoc>:getErrorCodeName()
<string>   = <PDFDoc>:getFileName()
<XRef>     = <PDFDoc>:getXRef()
<Catalog>  = <PDFDoc>:getCatalog()
<number>   = <PDFDoc>:getPageMediaWidth()
<number>   = <PDFDoc>:getPageMediaHeight()
<number>   = <PDFDoc>:getPageCropWidth()
<number>   = <PDFDoc>:getPageCropHeight()
<integer>  = <PDFDoc>:getNumPages()
<string>   = <PDFDoc>:readMetadata()
<Object>   = <PDFDoc>:getStructTreeRoot()
<integer>  = <PDFDoc>:findPage(<integer> object number, <integer> object generation)
<Links>    = <PDFDoc>:getLinks(<integer>)
<LinkDest> = <PDFDoc>:findDest(<string>)
<boolean>  = <PDFDoc>:isEncrypted()
<boolean>  = <PDFDoc>:okToPrint()
<boolean>  = <PDFDoc>:okToChange()
<boolean>  = <PDFDoc>:okToCopy()
<boolean>  = <PDFDoc>:okToAddNotes()
<boolean>  = <PDFDoc>:isLinearized()
<Object>   = <PDFDoc>:getDocInfo()
<Object>   = <PDFDoc>:getDocInfoNF()
<integer>  = <PDFDoc>:getPDFMajorVersion()
<integer>  = <PDFDoc>:getPDFMinorVersion()
\stopfunctioncall

\type{PDFRectangle} methods:

\startfunctioncall
<boolean>  = <PDFRectangle>:isValid()
\stopfunctioncall

%\type{Ref} methods:
%
%\startfunctioncall
%\stopfunctioncall

\type{Stream} methods:

\startfunctioncall
<integer>  = <Stream>:getKind()
<string>   = <Stream>:getKindName()
           = <Stream>:reset()
           = <Stream>:close()
<integer>  = <Stream>:getChar()
<integer>  = <Stream>:lookChar()
<integer>  = <Stream>:getRawChar()
<integer>  = <Stream>:getUnfilteredChar()
           = <Stream>:unfilteredReset()
<integer>  = <Stream>:getPos()
<boolean>  = <Stream>:isBinary()
<Stream>   = <Stream>:getUndecodedStream()
<Dict>     = <Stream>:getDict()
\stopfunctioncall

\type{XRef} methods:

\startfunctioncall
<boolean>  = <XRef>:isOk()
<integer>  = <XRef>:getErrorCode()
<boolean>  = <XRef>:isEncrypted()
<boolean>  = <XRef>:okToPrint()
<boolean>  = <XRef>:okToPrintHighRes()
<boolean>  = <XRef>:okToChange()
<boolean>  = <XRef>:okToCopy()
<boolean>  = <XRef>:okToAddNotes()
<boolean>  = <XRef>:okToFillForm()
<boolean>  = <XRef>:okToAccessibility()
<boolean>  = <XRef>:okToAssemble()
<Object>   = <XRef>:getCatalog()
<Object>   = <XRef>:fetch(<integer> object number, <integer> object generation)
<Object>   = <XRef>:getDocInfo()
<Object>   = <XRef>:getDocInfoNF()
<integer>  = <XRef>:getNumObjects()
<integer>  = <XRef>:getRootNum()
<integer>  = <XRef>:getRootGen()
<integer>  = <XRef>:getSize()
<Object>   = <XRef>:getTrailerDict()
\stopfunctioncall

%***********************************************************************

\section{The \luatex{font} library}

The font library provides the interface into the internals of the font
system, and also it contains helper functions to load traditional
\TEX\ font metrics formats. Other font loading functionality is
provided by the \luatex{fontloader} library that will be discussed in
the next section.

\subsection{Loading a \TFM\ file}

The behavior documented in this subsection is considered stable in the
sense that there will not be backward-incompatible changes any more.

\startfunctioncall
<table> fnt = font.read_tfm(<string> name, <number> s)
\stopfunctioncall

The number is a bit special:

\startitemize
\item if it is positive, it specifies an \quote{at size} in scaled points.
\item if it is negative, its absolute value represents a \quote{scaled}
setting relative to the designsize of the font.
\stopitemize

The internal structure of the metrics font table that is returned is
explained in \in{chapter}[fonts].

\subsection{Loading a \VF\ file}

The behavior documented in this subsection is considered stable in the
sense that there will not be backward-incompatible changes any more.

\startfunctioncall
<table> vf_fnt = font.read_vf(<string> name, <number> s)
\stopfunctioncall

The meaning of the number \type{s} and the format of the returned
table are similar to the ones in the \luatex{read_tfm()} function.

\subsection{The fonts array}

The whole table of \TEX\ fonts is accessible from \LUA\ using a virtual array.

\starttyping
font.fonts[n] = { ... }
<table> f = font.fonts[n]
\stoptyping

See \in{chapter}[fonts] for the structure of the tables. Because this
is a virtual array, you cannot call \type{pairs} on it, but see below
for the \type{font.each} iterator.

The two metatable functions implementing the virtual array are:

\startfunctioncall
<table> f = font.getfont(<number> n)
font.setfont(<number> n, <table> f)
\stopfunctioncall

Note that at the moment, each access to the \type{font.fonts} or call
to \type{font.getfont} creates a lua table for the whole font. This
process can be quite slow. In a later version of \LUATEX, this
interface will change (it will start using userdata objects instead of
actual tables).

Also note the following: assignments can only be made to fonts that
have already been defined in \TEX, but have not been accessed {\it at
all\/} since that definition. This limits the usability of the write
access to \type{font.fonts} quite a lot, a less stringent ruleset will
likely be implemented later.

\subsection{Checking a font's status}

You can test for the status of a font by calling this function:

\startfunctioncall
<boolean> f = font.frozen(<number> n)
\stopfunctioncall

The return value is one of \type{true} (unassignable), \type{false} (can be changed)
or \type{nil} (not a valid font at all).

\subsection{Defining a font directly}

You can define your own font into \luatex{font.fonts} by calling this function:

\startfunctioncall
<number> i = font.define(<table> f)
\stopfunctioncall

The return value is the internal id number of the defined font (the
index into \luatex{font.fonts}). If the font creation fails, an error is
raised. The table is a font structure, as explained in
\in{chapter}[fonts].

\subsection{Projected next font id}

\startfunctioncall
<number> i = font.nextid()
\stopfunctioncall

This returns the font id number that would be returned by a
\type{font.define} call if it was executed at this spot in the code
flow. This is useful for virtual fonts that need to reference
themselves.

\subsection{Font id (0.47)}

\startfunctioncall
<number> i = font.id(<string> csname)
\stopfunctioncall

This returns the font id associated with \type{csname} string, or $-1$
if \type{csname} is not defined; new in 0.47.

\subsection{Currently active font}

\startfunctioncall
<number> i = font.current()
font.current(<number> i)
\stopfunctioncall

This gets or sets the currently used font number.

\subsection{Maximum font id}

\startfunctioncall
<number> i = font.max()
\stopfunctioncall

This is the largest used index in \type{font.fonts}.

\subsection{Iterating over all fonts}

\startfunctioncall
for i,v in font.each() do
  ...
end
\stopfunctioncall

This is an iterator over each of the defined \TEX\ fonts. The first
returned value is the index in \type{font.fonts}, the second the font
itself, as a \LUA\ table. The indices are listed incrementally, but they
do not always form an array of consecutive numbers: in some cases
there can be holes in the sequence.

\section{The \luatex{fontloader} library (0.36)}

\subsection{Getting quick information on a font}

\startfunctioncall
<table> info = fontloader.info(<string> filename)
\stopfunctioncall

This function returns either \type{nil}, or a \type{table}, or an
array of small tables (in the case of a TrueType collection). The
returned table(s) will contain six fairly interesting information
items from the font(s) defined by the file:

\starttabulate[|lT|l|p|]
\NC \ssbf key                      \NC \bf type \NC \bf explanation \NC\NR
\NC fontname                     \NC string   \NC the \POSTSCRIPT\ name of the font\NC\NR
\NC fullname                     \NC string   \NC the formal name of the font\NC\NR
\NC familyname                   \NC string   \NC the family name this font belongs to\NC\NR
\NC weight                       \NC string   \NC a string indicating the color value of the font\NC\NR
\NC version                      \NC string   \NC the internal font version\NC\NR
\NC italicangle                  \NC float    \NC the slant angle\NC\NR
\stoptabulate

Getting information through this function is (sometimes much) more
efficient than loading the font properly, and is therefore handy when
you want to create a dictionary of available fonts based on a
directory contents.

\subsection{Loading an \OPENTYPE\ or \TRUETYPE\ file}

If you want to use an \OPENTYPE\ font, you have to get the metric
information from somewhere. Using the \type{fontloader} library, the
simplest way to get that information is thus:

\starttyping
function load_font (filename)
  local metrics = nil
  local font = fontloader.open(filename)
  if font then
     metrics = fontloader.to_table(font)
     fontloader.close(font)
  end
  return metrics
end

myfont = load_font('/opt/tex/texmf/fonts/data/arial.ttf')
\stoptyping

The main function call is

\startfunctioncall
<userdata> f, <table> w = fontloader.open(<string> filename)
<userdata> f, <table> w = fontloader.open(<string> filename, <string> fontname)
\stopfunctioncall

The first return value is a userdata representation of the font. The
second return value is a table containing any warnings and errors
reported by fontloader while opening the font. In normal typesetting,
you would probably ignore the second argument, but it can be useful
for debugging purposes.

For \TRUETYPE\ collections (when filename ends in 'ttc') and \DFONT\
collections, you have to use a second string argument to specify which
font you want from the collection.  Use the \type{fontname}
strings that are returned by \type{fontloader.info} for that.

To turn the font into a table, \type{fontloader.to_table} is used on
the font returned by \type{fontloader.open}.

\startfunctioncall
<table> f = fontloader.to_table(<userdata> font)
\stopfunctioncall

This table cannot be used directly by \LUATEX\ and should be turned
into another one as described in~\in{chapter}[fonts].
Do not forget to store the \type{fontname} value in the \type{psname}
field of the metrics table to be returned to \LUATEX, otherwise the
font inclusion backend will not be able to find the correct font in
the collection.

See \in{section}[fontloadertables] for details on the userdata object
returned by \type{fontloader.open()} and the layout of the
\type{metrics} table returned by \type{fontloader.to_table()}.

The font file is parsed and partially interpreted by the font
loading routines from \FONTFORGE. The file format can be \OPENTYPE,
\TRUETYPE, \TRUETYPE\ Collection, \CFF, or \TYPEONE.

There are a few advantages to this approach compared to reading the
actual font file ourselves:

\startitemize

\item The font is automatically re|-|encoded, so that the \type{metrics}
   table for \TRUETYPE\ and \OPENTYPE\ fonts is using \UNICODE\ for
   the character indices.

\item Many features are pre|-|processed into a format that is easier to handle
   than just the bare tables would be.

\item \POSTSCRIPT|-|based \OPENTYPE\ fonts do not store the character height and
  depth in the font file, so the character boundingbox has to be
  calculated in some way.

\item In the future, it may be interesting to allow \LUA\ scripts access to
  the font program itself, perhaps even creating or changing the font.

\stopitemize

A loaded font is discarded with:

\startfunctioncall
fontloader.close(<userdata> font)
\stopfunctioncall

\subsection{Applying a \quote{feature file}}

You can apply a \quote{feature file} to a loaded font:

\startfunctioncall
<table> errors = fontloader.apply_featurefile(<userdata> font, <string> filename)
\stopfunctioncall

A \quote{feature file} is a textual representation of the features in an
\OPENTYPE\ font.  See\crlf
\hyphenatedurl {http://www.adobe.com/devnet/opentype/afdko/topic_feature_file_syntax.html}\crlf
and\crlf
\hyphenatedurl {http://fontforge.sourceforge.net/featurefile.html}\crlf
for a more detailed description of feature files.

If the function fails, the return value is a table containing any
errors reported by fontloader while applying the feature file. On
success, \type{nil} is returned.  (the return value is new in 0.65)


\subsection{Applying an \quote{\AFM\ file}}

You can apply an \quote{\AFM\ file} to a loaded font:

\startfunctioncall
<table> errors = fontloader.apply_afmfile(<userdata> font, <string> filename)
\stopfunctioncall

An \AFM\ file is a textual representation of (some of) the meta information
in a \TYPEONE\ font. See \hyphenatedurl{ftp://ftp.math.utah.edu/u/ma/hohn/linux/postscript/5004.AFM_Spec.pdf}
for more information about afm files.

Note: If you \type{fontloader.open()} a \TYPEONE\ file named \type{font.pfb},
the library will automatically search for and apply \type{font.afm}
if it exists in the same directory as the file \type{font.pfb}. In that case,
there is no need for an explicit call to \type{apply_afmfile()}.

If the function fails, the return value is a table containing any
errors reported by fontloader while applying the AFM file. On
success, \type{nil} is returned. (the return value is new in 0.65)

\subsection[fontloadertables]{Fontloader font tables}

As mentioned earlier, the return value of \type{fontloader.open()} is
a userdata object. In \LUATEX\ versions before 0.63, the only way to
have access to the actual metrics was to call
\type{fontloader.to_table()} on this object, returning the table
structure that is explained in the following subsections.

However, it turns out that the result from
\type{fontloader.to_table()} sometimes needs very large amounts of memory
(depending on the font's complexity and size) so starting with \LUATEX\ 0.63,
it is possible to access the userdata object directly.

In the \LUATEX\ 0.63.0, the following is implemented:

\startitemize
\item all top-level keys that would be returned by \type{to_table()}
  can also be accessed directly.
\item the top-level key \quote{glyphs} returns a {\it virtual\/} array that
  allows indices from \type{0} to ($\type{f.glyphmax}-1$).
\item the items in that virtual array (the actual glyphs) are themselves also
  userdata objects, and each has accessors for all of the keys
  explained in the section \quote{Glyph items} below.
\item the top-level key \quote{subfonts} returns an {\it actual} array of
  userdata objects, one for each of the subfonts (or nil, if there are no subfonts).
\stopitemize


A short example may be helpful. This code generates a printout of all
the glyph names in the font \type{PunkNova.kern.otf}:

\starttyping
local f = fontloader.open('PunkNova.kern.otf')
print (f.fontname)
local i = 0
while (i < f.glyphmax) do
   local g = f.glyphs[i]
   if g then
      print(g.name)
   end
   i = i + 1
end
fontloader.close(f)
\stoptyping

In this case, the \LUATEX\ memory requirement stays below 100MB on the
test computer, while the internal stucture generated by
\type{to_table()} needs more than 2GB of memory (the font itself is
6.9MB in disk size).

In \LUATEX\ 0.63 only the top-level font, the subfont table entries,
and the glyphs are virtual objects, everything else still produces
normal lua values and tables. In future versions, more return values
may be replaced by userdata objects (as much as needed to keep the
memory requirements in check).

If you want to know the valid fields in a font or glyph
structure, call the \type{fields} function on an object of a
particular type (either glyph or font for now, more will be
implemented later):

\startfunctioncall
<table> fields = fontloader.fields(<userdata> font)
<table> fields = fontloader.fields(<userdata> font_glyph)
\stopfunctioncall

For instance:

\startfunctioncall
local fields = fontloader.fields(f)
local fields = fontloader.fields(f.glyphs[0])
\stopfunctioncall


\subsubsection{Table types}

\subsubsubsection{Top-level}

The top|-|level keys in the returned table are (the explanations in
this part of the documentation are not yet finished):

\starttabulate[|lT|l|p|]
\NC \ssbf key                      \NC \bf type \NC \bf explanation \NC\NR
\NC table_version                \NC number   \NC indicates the metrics version (currently~0.3)\NC\NR
\NC fontname                     \NC string   \NC \POSTSCRIPT\ font name\NC\NR
\NC fullname                     \NC string   \NC official (human-oriented) font name\NC\NR
\NC familyname                   \NC string   \NC family name\NC\NR
\NC weight                       \NC string   \NC weight indicator\NC\NR
\NC copyright                    \NC string   \NC copyright information\NC\NR
\NC filename                     \NC string   \NC the file name\NC\NR
\NC version                      \NC string   \NC font version\NC\NR
\NC italicangle                  \NC float    \NC slant angle\NC\NR
\NC units_per_em                 \NC number   \NC 1000 for \POSTSCRIPT-based fonts, usually 2048 for \TRUETYPE\NC\NR
\NC ascent                       \NC number   \NC height of ascender in \type{units_per_em}\NC\NR
\NC descent                      \NC number   \NC depth of descender in \type{units_per_em}\NC\NR
\NC upos                         \NC float    \NC \NC\NR
\NC uwidth                       \NC float    \NC \NC\NR
\NC uniqueid                     \NC number   \NC \NC\NR
\NC glyphcnt                     \NC number   \NC number of included glyphs\NC\NR
\NC glyphs                       \NC array    \NC \NC\NR
\NC glyphmax                     \NC number   \NC maximum used index the glyphs array\NC\NR
\NC hasvmetrics                  \NC number   \NC \NC\NR
\NC onlybitmaps                  \NC number   \NC \NC\NR
\NC serifcheck                   \NC number   \NC \NC\NR
\NC isserif                      \NC number   \NC \NC\NR
\NC issans                       \NC number   \NC \NC\NR
\NC encodingchanged              \NC number   \NC \NC\NR
\NC strokedfont                  \NC number   \NC \NC\NR
\NC use_typo_metrics             \NC number   \NC \NC\NR
\NC weight_width_slope_only      \NC number   \NC \NC\NR
\NC head_optimized_for_cleartype \NC number   \NC \NC\NR
\NC uni_interp                   \NC enum     \NC \type {unset}, \type {none}, \type {adobe},
                                                    \type {greek}, \type {japanese}, \type {trad_chinese},
                                                    \type {simp_chinese}, \type {korean}, \type {ams}\NC\NR
\NC origname                     \NC string   \NC the file name, as supplied by the user\NC\NR
\NC map                          \NC table    \NC \NC\NR
\NC private                      \NC table    \NC \NC\NR
\NC xuid                         \NC string   \NC \NC\NR
\NC pfminfo                      \NC table    \NC \NC\NR
\NC names                        \NC table    \NC \NC\NR
\NC cidinfo                      \NC table    \NC \NC\NR
\NC subfonts                     \NC array    \NC \NC\NR
\NC commments                    \NC string   \NC \NC\NR
\NC fontlog                      \NC string   \NC \NC\NR
\NC cvt_names                    \NC string   \NC \NC\NR
\NC anchor_classes               \NC table    \NC \NC\NR
\NC ttf_tables                   \NC table    \NC \NC\NR
\NC ttf_tab_saved                \NC table    \NC \NC\NR
\NC kerns                        \NC table    \NC \NC\NR
\NC vkerns                       \NC table    \NC \NC\NR
\NC texdata                      \NC table    \NC \NC\NR
\NC lookups                       \NC table    \NC \NC\NR
\NC gpos                          \NC table    \NC \NC\NR
\NC gsub                          \NC table    \NC \NC\NR
\NC sm                            \NC table    \NC \NC\NR
\NC features                      \NC table    \NC \NC\NR
\NC mm                            \NC table    \NC \NC\NR
\NC chosenname                   \NC string   \NC \NC\NR
\NC macstyle                     \NC number   \NC \NC\NR
\NC fondname                     \NC string   \NC \NC\NR
\NC design_size                  \NC number   \NC \NC\NR
\NC fontstyle_id                 \NC number   \NC \NC\NR
\NC fontstyle_name               \NC table    \NC \NC\NR
\NC design_range_bottom          \NC number   \NC \NC\NR
\NC design_range_top             \NC number   \NC \NC\NR
\NC strokewidth                  \NC float    \NC \NC\NR
\NC mark_classes                 \NC table    \NC \NC\NR
\NC creationtime                 \NC number   \NC \NC\NR
\NC modificationtime             \NC number   \NC \NC\NR
\NC os2_version                  \NC number   \NC \NC\NR
\NC sfd_version                  \NC number   \NC \NC\NR
\NC math                         \NC table    \NC \NC\NR
\NC validation_state             \NC table    \NC \NC\NR
\NC horiz_base                   \NC table    \NC \NC\NR
\NC vert_base                    \NC table    \NC \NC\NR
\NC extrema_bound                \NC number   \NC \NC\NR
\stoptabulate

\subsubsubsection{Glyph items}

The \type{glyphs} is an array containing the per|-|character
information (quite a few of these are only present if nonzero).

\starttabulate[|lT|l|p|]
\NC \ssbf key                      \NC \bf type \NC \bf explanation \NC\NR
\NC name                         \NC string   \NC the glyph name\NC\NR
\NC unicode                      \NC number   \NC unicode code point, or -1\NC\NR
\NC boundingbox                  \NC array    \NC array of four numbers, see note below\NC\NR
\NC width                        \NC number   \NC only for horizontal fonts\NC\NR
\NC vwidth                       \NC number   \NC only for vertical fonts\NC\NR
\NC lsidebearing                 \NC number   \NC only if nonzero and not equal to boundingbox[1]\NC\NR
\NC class                        \NC string   \NC one of "none", "base", "ligature", "mark", "component"
                                                  (if not present, the glyph class is \quote{automatic})\NC\NR
\NC kerns                        \NC array    \NC only for horizontal fonts, if set\NC\NR
\NC vkerns                       \NC array    \NC only for vertical fonts, if set\NC\NR
\NC dependents                   \NC array    \NC linear array of glyph name strings, only if nonempty\NC\NR
\NC lookups                      \NC table    \NC only if nonempty\NC\NR
\NC ligatures                    \NC table    \NC only if nonempty\NC\NR
\NC anchors                      \NC table    \NC only if set\NC\NR
\NC comment                      \NC string   \NC only if set\NC\NR
\NC tex_height                   \NC number   \NC only if set\NC\NR
\NC tex_depth                    \NC number   \NC only if set\NC\NR
\NC italic_correction            \NC number   \NC only if set\NC\NR
\NC top_accent                   \NC number   \NC only if set\NC\NR
\NC is_extended_shape            \NC number   \NC only if this character is part of a math extension list\NC\NR
\NC altuni                       \NC table    \NC alternate \UNICODE\ items \NC\NR
\NC vert_variants                \NC table    \NC \NC \NR
\NC horiz_variants               \NC table    \NC \NC \NR
\NC mathkern                     \NC table    \NC \NC \NR
\stoptabulate

On \type{boundingbox}: The boundingbox information for \TRUETYPE\ fonts and \TRUETYPE-based \OTF\ fonts is read
directly from the font file. \POSTSCRIPT-based fonts do not have this information, so the boundingbox of
traditional \POSTSCRIPT\ fonts is generated by interpreting the actual bezier curves to find the exact
boundingbox. This can be a slow process, so starting from \LUATEX\ 0.45, the boundingboxes of \POSTSCRIPT-based
\OTF\ fonts (and raw \CFF\ fonts) are calculated using an approximation of the glyph shape based on the actual
glyph points only, instead of taking the whole curve into account. This means that glyphs that have missing
points at extrema will have a too-tight boundingbox, but the processing is so much faster that in our opinion
the tradeoff is worth it.


The \type{kerns} and \type{vkerns} are linear arrays of small hashes:

\starttabulate[|lT|l|p|]
\NC \ssbf key                      \NC \bf type \NC \bf explanation \NC\NR
\NC char                         \NC string   \NC \NC\NR
\NC off                          \NC number   \NC \NC\NR
\NC lookup                       \NC string   \NC \NC\NR
\stoptabulate

The \type{lookups} is a hash, based on lookup subtable names, with
the value of each key inside that a linear array of small hashes:

% TODO: fix this description
\starttabulate[|lT|l|p|]
\NC \ssbf key                      \NC \bf type \NC \bf explanation \NC\NR
\NC type                         \NC enum     \NC \type {position}, \type {pair},  \type {substitution}, \type {alternate},
                                                  \type {multiple}, \type {ligature}, \type {lcaret},  \type {kerning}, \type {vkerning}, \type {anchors},
                                                  \type {contextpos}, \type {contextsub}, \type {chainpos}, \type {chainsub},
                                                  \type {reversesub}, \type {max}, \type {kernback}, \type {vkernback} \NC\NR
\NC specification                 \NC table   \NC extra data \NC\NR
\stoptabulate

For the first seven values of \type{type}, there can be additional
sub|-|information, stored in the sub-table \type{specification}:

\starttabulate[|lT|l|p|]
\NC \ssbf value    \NC \bf type \NC \bf explanation \NC\NR
\NC position     \NC table    \NC a table of the \type {offset_specs} type\NC\NR
\NC pair         \NC table    \NC one string: \type {paired}, and an array of one or
                                  two \type {offset_specs} tables:  \type{offsets}\NC\NR
\NC substitution \NC table    \NC one string: \type {variant}\NC\NR
\NC alternate    \NC table    \NC one string: \type {components}\NC\NR
\NC multiple     \NC table    \NC one string: \type {components}\NC\NR
\NC ligature     \NC table    \NC two strings: \type {components}, \type {char}\NC\NR
\NC lcaret       \NC array    \NC linear array of numbers\NC\NR
\stoptabulate

Tables for \type{offset_specs} contain up to four number|-|valued
fields: \type{x} (a horizontal offset), \type{y} (a vertical offset),
\type{h} (an advance width correction) and \type{v} (an advance height
correction).

The \type{ligatures} is a linear array of small hashes:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC lig                \NC table    \NC uses the same substructure as a single item in the \type{lookups} table explained above\NC\NR
\NC char               \NC string   \NC \NC\NR
\NC components         \NC array    \NC linear array of named components\NC\NR
\NC ccnt               \NC number   \NC \NC\NR
\stoptabulate

The \type{anchor} table is indexed by a string signifying the
anchor type, which is one of

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC mark              \NC table   \NC placement mark\NC\NR
\NC basechar          \NC table   \NC mark for attaching combining items to a base char\NC\NR
\NC baselig           \NC table   \NC mark for attaching combining items to a ligature\NC\NR
\NC basemark          \NC table   \NC generic mark for attaching combining items to connect to\NC\NR
\NC centry            \NC table   \NC cursive entry point\NC\NR
\NC cexit             \NC table   \NC cursive exit point\NC\NR
\stoptabulate

The content of these is a short array of defined anchors, with the
entry keys being the anchor names. For all except \type{baselig}, the
value is a single table with this definition:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC x                  \NC number   \NC x location\NC\NR
\NC y                  \NC number   \NC y location\NC\NR
\NC ttf_pt_index       \NC number   \NC truetype point index, only if given\NC\NR
\stoptabulate

For \type{baselig}, the value is a small array of such anchor sets
sets, one for each constituent item of the ligature.

For clarification, an anchor table could for example look like this :

\starttyping
['anchor'] = {
    ['basemark'] = {
        ['Anchor-7'] = { ['x']=170, ['y']=1080 }
    },
    ['mark'] ={
        ['Anchor-1'] = { ['x']=160, ['y']=810 },
        ['Anchor-4'] = { ['x']=160, ['y']=800 }
    },
    ['baselig'] = {
        [1] = { ['Anchor-2'] = { ['x']=160, ['y']=650 } },
        [2] = { ['Anchor-2'] = { ['x']=460, ['y']=640 } }
        }
    }
\stoptyping

\subsubsubsection{map table}

The top|-|level map is a list of encoding mappings. Each of those is a table itself.

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC enccount           \NC number   \NC \NC\NR
\NC encmax             \NC number   \NC \NC\NR
\NC backmax            \NC number   \NC \NC\NR
\NC remap              \NC table    \NC \NC\NR
\NC map                \NC array    \NC non|-|linear array of mappings\NC\NR
\NC backmap            \NC array    \NC non|-|linear array of backward mappings\NC\NR
\NC enc                \NC table    \NC \NC\NR
\stoptabulate

The \type{remap} table is very small:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC firstenc           \NC number   \NC \NC\NR
\NC lastenc            \NC number   \NC \NC\NR
\NC infont             \NC number   \NC \NC\NR
\stoptabulate

The \type{enc} table is a bit more verbose:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC enc_name           \NC string   \NC \NC\NR
\NC char_cnt           \NC number   \NC \NC\NR
\NC char_max           \NC number   \NC \NC\NR
\NC unicode            \NC array    \NC of \UNICODE\ position numbers\NC\NR
\NC psnames            \NC array    \NC of \POSTSCRIPT\ glyph names\NC\NR
\NC builtin            \NC number   \NC \NC\NR
\NC hidden             \NC number   \NC \NC\NR
\NC only_1byte         \NC number   \NC \NC\NR
\NC has_1byte          \NC number   \NC \NC\NR
\NC has_2byte          \NC number   \NC \NC\NR
\NC is_unicodebmp      \NC number   \NC only if nonzero\NC\NR
\NC is_unicodefull     \NC number   \NC only if nonzero\NC\NR
\NC is_custom          \NC number   \NC only if nonzero\NC\NR
\NC is_original        \NC number   \NC only if nonzero\NC\NR
\NC is_compact         \NC number   \NC only if nonzero\NC\NR
\NC is_japanese        \NC number   \NC only if nonzero\NC\NR
\NC is_korean          \NC number   \NC only if nonzero\NC\NR
\NC is_tradchinese     \NC number   \NC only if nonzero [name?]\NC\NR
\NC is_simplechinese   \NC number   \NC only if nonzero\NC\NR
\NC low_page           \NC number   \NC \NC\NR
\NC high_page          \NC number   \NC \NC\NR
\NC iconv_name         \NC string   \NC \NC\NR
\NC iso_2022_escape    \NC string   \NC \NC\NR
\stoptabulate

\subsubsubsection{private table}

This is the font's private \POSTSCRIPT\ dictionary, if any. Keys and
values are both strings.

\subsubsubsection{cidinfo table}

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC registry                  \NC string   \NC \NC\NR
\NC ordering                  \NC string   \NC \NC\NR
\NC supplement                \NC number   \NC \NC\NR
\NC version                   \NC number   \NC \NC\NR
\stoptabulate

\subsubsubsection{pfminfo table}

The \type{pfminfo} table contains most of the OS/2 information:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC pfmset             \NC number  \NC \NC\NR
\NC winascent_add      \NC number  \NC \NC\NR
\NC windescent_add     \NC number  \NC \NC\NR
\NC hheadascent_add    \NC number  \NC \NC\NR
\NC hheaddescent_add   \NC number  \NC \NC\NR
\NC typoascent_add     \NC number  \NC \NC\NR
\NC typodescent_add    \NC number  \NC \NC\NR
\NC subsuper_set       \NC number  \NC \NC\NR
\NC panose_set         \NC number  \NC \NC\NR
\NC hheadset           \NC number  \NC \NC\NR
\NC vheadset           \NC number  \NC \NC\NR
\NC pfmfamily          \NC number  \NC \NC\NR
\NC weight             \NC number  \NC \NC\NR
\NC width              \NC number  \NC \NC\NR
\NC avgwidth           \NC number  \NC \NC\NR
\NC firstchar          \NC number  \NC \NC\NR
\NC lastchar           \NC number  \NC \NC\NR
\NC fstype             \NC number  \NC \NC\NR
\NC linegap            \NC number  \NC \NC\NR
\NC vlinegap           \NC number  \NC \NC\NR
\NC hhead_ascent       \NC number  \NC \NC\NR
\NC hhead_descent      \NC number  \NC \NC\NR
\NC hhead_descent      \NC number  \NC \NC\NR
\NC os2_typoascent     \NC number  \NC \NC\NR
\NC os2_typodescent    \NC number  \NC \NC\NR
\NC os2_typolinegap    \NC number  \NC \NC\NR
\NC os2_winascent      \NC number  \NC \NC\NR
\NC os2_windescent     \NC number  \NC \NC\NR
\NC os2_subxsize       \NC number  \NC \NC\NR
\NC os2_subysize       \NC number  \NC \NC\NR
\NC os2_subxoff        \NC number  \NC \NC\NR
\NC os2_subyoff        \NC number  \NC \NC\NR
\NC os2_supxsize       \NC number  \NC \NC\NR
\NC os2_supysize       \NC number  \NC \NC\NR
\NC os2_supxoff        \NC number  \NC \NC\NR
\NC os2_supyoff        \NC number  \NC \NC\NR
\NC os2_strikeysize    \NC number  \NC \NC\NR
\NC os2_strikeypos     \NC number  \NC \NC\NR
\NC os2_family_class   \NC number  \NC \NC\NR
\NC os2_xheight        \NC number  \NC \NC\NR
\NC os2_capheight      \NC number  \NC \NC\NR
\NC os2_defaultchar    \NC number  \NC \NC\NR
\NC os2_breakchar      \NC number  \NC \NC\NR
\NC os2_vendor         \NC string  \NC \NC\NR
\NC codepages          \NC table  \NC A two-number array of encoded code pages\NC\NR
\NC unicoderages       \NC table  \NC A four-number array of encoded unicode ranges\NC\NR
\NC panose             \NC table  \NC \NC\NR
\stoptabulate

The \type{panose} subtable has exactly 10 string keys:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC familytype             \NC string   \NC Values as in the \OPENTYPE\ font specification:
                                        \type {Any}, \type {No Fit}, \type {Text and Display}, \type {Script},
                                        \type {Decorative}, \type {Pictorial} \NC\NR
\NC serifstyle             \NC string   \NC See the \OPENTYPE\ font specification for values\NC\NR
\NC weight                 \NC string   \NC id. \NC\NR
\NC proportion             \NC string   \NC id. \NC\NR
\NC contrast               \NC string   \NC id. \NC\NR
\NC strokevariation        \NC string   \NC id. \NC\NR
\NC armstyle               \NC string   \NC id. \NC\NR
\NC letterform             \NC string   \NC id. \NC\NR
\NC midline                \NC string   \NC id. \NC\NR
\NC xheight                \NC string   \NC id. \NC\NR
\stoptabulate

\subsubsubsection{names table}

Each item has two top|-|level keys:

\starttabulate[|lT|l|p|]
\NC \ssbf key         \NC \bf type \NC \bf explanation \NC\NR
\NC lang                   \NC string   \NC language for this entry \NC\NR
\NC names                  \NC table    \NC \NC\NR
\stoptabulate

The \type{names} keys are the actual \TRUETYPE\ name strings. The
possible keys are:

\starttabulate[|lT|p|]
\NC \ssbf key           \NC \bf explanation \NC\NR
\NC copyright   \NC \NC\NR
\NC family   \NC \NC\NR
\NC subfamily   \NC \NC\NR
\NC uniqueid   \NC \NC\NR
\NC fullname   \NC \NC\NR
\NC version   \NC \NC\NR
\NC postscriptname   \NC \NC\NR
\NC trademark   \NC \NC\NR
\NC manufacturer   \NC \NC\NR
\NC designer   \NC \NC\NR
\NC descriptor   \NC \NC\NR
\NC venderurl   \NC \NC\NR
\NC designerurl   \NC \NC\NR
\NC license   \NC \NC\NR
\NC licenseurl   \NC \NC\NR
\NC idontknow   \NC \NC\NR
\NC preffamilyname   \NC \NC\NR
\NC prefmodifiers   \NC \NC\NR
\NC compatfull   \NC \NC\NR
\NC sampletext   \NC \NC\NR
\NC cidfindfontname   \NC \NC\NR
\NC wwsfamily   \NC \NC\NR
\NC wwssubfamily   \NC \NC\NR
\stoptabulate

\subsubsubsection{anchor_classes table}

The anchor_classes classes:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC name                   \NC string   \NC a descriptive id of this anchor class\NC\NR
\NC lookup                 \NC string   \NC \NC\NR
\NC type                   \NC string   \NC one of \type {mark}, \type {mkmk}, \type {curs}, \type {mklg} \NC\NR
\stoptabulate

% type is actually a lookup subtype, not a feature name. Officially, these strings
% should be gpos_mark2mark etc.

\subsubsubsection{gpos table}

Th gpos table has one array entry for each lookup. (The \type {gpos_} prefix is somewhat redundant.)

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC type                  \NC string   \NC one of
  \type {gpos_single}, \type {gpos_pair}, \type {gpos_cursive},
  \type {gpos_mark2base},\crlf \type {gpos_mark2ligature}, \type {gpos_mark2mark},  \type {gpos_context},\crlf
  \type {gpos_contextchain}
\NC\NR
\NC flags                 \NC table  \NC \NC\NR
\NC name                  \NC string   \NC \NC\NR
\NC features              \NC array   \NC \NC\NR
\NC subtables             \NC array   \NC \NC\NR
\stoptabulate

The flags table has a true value for each of the lookup flags that is
actually set:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC r2l                    \NC boolean   \NC \NC\NR
\NC ignorebaseglyphs       \NC boolean    \NC \NC\NR
\NC ignoreligatures        \NC boolean    \NC \NC\NR
\NC ignorecombiningmarks   \NC boolean    \NC \NC\NR
\NC mark_class             \NC string    \NC (new in 0.44)\NC\NR
\stoptabulate


The features subtable items of gpos have:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC tag                    \NC string   \NC \NC\NR
\NC scripts                \NC table    \NC \NC\NR
\NC ismac                  \NC number   \NC (only if true)\NC\NR
\stoptabulate

The scripts table within features has:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC script                     \NC string          \NC \NC\NR
\NC langs                  \NC array of strings \NC \NC\NR
\stoptabulate


The subtables table has:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC name                    \NC string   \NC \NC\NR
\NC suffix                  \NC string   \NC (only if used)\NC\NR % used by gpos_single to get a default
\NC anchor_classes          \NC number   \NC (only if used)\NC\NR
\NC vertical_kerning        \NC number   \NC (only if used)\NC\NR
\NC kernclass               \NC table    \NC (only if used)\NC\NR
\stoptabulate


The kernclass with subtables table has:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC firsts                \NC array of strings  \NC \NC\NR
\NC seconds               \NC array of strings   \NC \NC\NR
\NC lookup                \NC string or array \NC associated lookup(s) \NC \NR
\NC offsets               \NC array of numbers  \NC \NC\NR
\stoptabulate

\subsubsubsection{gsub table}

This has identical layout to the \type{gpos} table, except for the
type:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC type                  \NC string   \NC one of  \type {gsub_single}, \type {gsub_multiple}, \type {gsub_alternate},
  \type {gsub_ligature},\crlf \type {gsub_context},  \type {gsub_contextchain}, \type {gsub_reversecontextchain}
\NC\NR
\stoptabulate


\subsubsubsection{ttf_tables and ttf_tab_saved tables}

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC tag                    \NC string   \NC \NC\NR
\NC len                    \NC number   \NC \NC\NR
\NC maxlen                 \NC number   \NC \NC\NR
\NC data                   \NC number   \NC \NC\NR
\stoptabulate

\subsubsubsection{sm table}

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC type                 \NC string   \NC one of "indic", "context", "lig", "simple", "insert", "kern"\NC\NR
\NC lookup               \NC string   \NC \NC\NR
\NC flags                \NC table    \NC a set of boolean values with
                                          the keys : "vert", "descending", "always"\NC\NR
\NC classes              \NC table    \NC an array of named classes \NC\NR
\NC state                \NC table    \NC \NC\NR
\stoptabulate

The \type{state} table has:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC next                 \NC number   \NC \NC \NR
\NC flags                \NC number   \NC \NC \NR
\NC context              \NC table    \NC A small table that has 'mark' and
'cur' as possible keys, with the values being lookup names. Only
applies if the \type{sm.type} = \type{context}.\NC\NR
\NC insert               \NC table    \NC A small table that has 'mark' and
'cur' as possible keys, with the values strings. Only
applies if the \type{sm.type} = \type{insert}.\NC\NR
\NC kern               \NC table    \NC A small array with kern data. Only
applies if the \type{sm.type} = \type{kern}.\NC\NR
\stoptabulate


\subsubsubsection{features table}

% handle_macfeat

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC feature                 \NC number   \NC \NC \NR
\NC ismutex                 \NC number   \NC \NC \NR
\NC default_setting         \NC number   \NC \NC \NR
\NC strid                   \NC number   \NC \NC \NR
\NC featname                \NC table    \NC A set of mac names.
macnames are like otfnames except that they also have an 'enc' field \NC \NR
\NC settings                \NC table    \NC \NC \NR
\stoptabulate

The \type{settings} are:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC setting                 \NC number   \NC \NC \NR
\NC strid                   \NC number   \NC \NC \NR
\NC initially_enabled       \NC number   \NC \NC \NR
\NC setname                 \NC table    \NC A set of mac names.
macnames are like otfnames except that they also have an 'enc' field \NC \NR
\stoptabulate

\subsubsubsection{mm table}

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC axes                 \NC table   \NC array of axis names \NC \NR
\NC instance_count       \NC number   \NC \NC \NR
\NC positions            \NC table   \NC array of instance positions
                                         (\#axes * instances )\NC \NR
\NC defweights           \NC table   \NC array of default weights for instances \NC \NR
\NC cdv                  \NC string  \NC \NC \NR
\NC ndv                  \NC string  \NC \NC \NR
\NC axismaps             \NC table   \NC  \NC \NR
\NC named_instance_count \NC number   \NC \NC \NR
\NC named_instances      \NC table   \NC \NC \NR
\NC apple                \NC number   \NC \NC \NR
\stoptabulate

The \type{axismaps}:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC blends               \NC table     \NC an array of blend points \NC \NR
\NC designs              \NC table     \NC an array of design values \NC \NR
\NC min                  \NC number   \NC \NC \NR
\NC def                  \NC number   \NC \NC \NR
\NC max                  \NC number   \NC \NC \NR
\NC axisnames            \NC table     \NC a set of mac names \NC \NR
\stoptabulate


The \type{named_instances} is an array of instances:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC names                \NC table  \NC a set of mac names  \NC \NR
\NC coords               \NC table  \NC an array of coordinates \NC \NR
\stoptabulate


\subsubsubsection{mark_classes table (0.44)}

The keys in this table are mark class names, and the values
are a space-separated string of glyph names in this class.

Note: This table is indeed new in 0.44. The manual said it existed
before then, but in practise it was missing due to a bug.

\subsubsubsection{math table}

\starttabulate[|lT|p|]
\NC ScriptPercentScaleDown \NC \NC \NR
\NC ScriptScriptPercentScaleDown \NC \NC \NR
\NC DelimitedSubFormulaMinHeight \NC \NC \NR
\NC DisplayOperatorMinHeight \NC \NC \NR
\NC MathLeading \NC \NC \NR
\NC AxisHeight \NC \NC \NR
\NC AccentBaseHeight \NC \NC \NR
\NC FlattenedAccentBaseHeight \NC \NC \NR
\NC SubscriptShiftDown \NC \NC \NR
\NC SubscriptTopMax \NC \NC \NR
\NC SubscriptBaselineDropMin \NC \NC \NR
\NC SuperscriptShiftUp \NC \NC \NR
\NC SuperscriptShiftUpCramped \NC \NC \NR
\NC SuperscriptBottomMin \NC \NC \NR
\NC SuperscriptBaselineDropMax \NC \NC \NR
\NC SubSuperscriptGapMin \NC \NC \NR
\NC SuperscriptBottomMaxWithSubscript \NC \NC \NR
\NC SpaceAfterScript \NC \NC \NR
\NC UpperLimitGapMin \NC \NC \NR
\NC UpperLimitBaselineRiseMin \NC \NC \NR
\NC LowerLimitGapMin \NC \NC \NR
\NC LowerLimitBaselineDropMin \NC \NC \NR
\NC StackTopShiftUp \NC \NC \NR
\NC StackTopDisplayStyleShiftUp \NC \NC \NR
\NC StackBottomShiftDown \NC \NC \NR
\NC StackBottomDisplayStyleShiftDown \NC \NC \NR
\NC StackGapMin \NC \NC \NR
\NC StackDisplayStyleGapMin \NC \NC \NR
\NC StretchStackTopShiftUp \NC \NC \NR
\NC StretchStackBottomShiftDown \NC \NC \NR
\NC StretchStackGapAboveMin \NC \NC \NR
\NC StretchStackGapBelowMin \NC \NC \NR
\NC FractionNumeratorShiftUp \NC \NC \NR
\NC FractionNumeratorDisplayStyleShiftUp \NC \NC \NR
\NC FractionDenominatorShiftDown \NC \NC \NR
\NC FractionDenominatorDisplayStyleShiftDown \NC \NC \NR
\NC FractionNumeratorGapMin \NC \NC \NR
\NC FractionNumeratorDisplayStyleGapMin \NC \NC \NR
\NC FractionRuleThickness \NC \NC \NR
\NC FractionDenominatorGapMin \NC \NC \NR
\NC FractionDenominatorDisplayStyleGapMin \NC \NC \NR
\NC SkewedFractionHorizontalGap \NC \NC \NR
\NC SkewedFractionVerticalGap \NC \NC \NR
\NC OverbarVerticalGap \NC \NC \NR
\NC OverbarRuleThickness \NC \NC \NR
\NC OverbarExtraAscender \NC \NC \NR
\NC UnderbarVerticalGap \NC \NC \NR
\NC UnderbarRuleThickness \NC \NC \NR
\NC UnderbarExtraDescender \NC \NC \NR
\NC RadicalVerticalGap \NC \NC \NR
\NC RadicalDisplayStyleVerticalGap \NC \NC \NR
\NC RadicalRuleThickness \NC \NC \NR
\NC RadicalExtraAscender \NC \NC \NR
\NC RadicalKernBeforeDegree \NC \NC \NR
\NC RadicalKernAfterDegree \NC \NC \NR
\NC RadicalDegreeBottomRaisePercent \NC \NC \NR
\NC MinConnectorOverlap \NC \NC \NR
\NC FractionDelimiterSize \NC (new in 0.47.0)\NC \NR
\NC FractionDelimiterDisplayStyleSize \NC (new in 0.47.0)\NC \NR
\stoptabulate

\subsubsubsection{validation_state table}

\starttabulate[|lT|p|]
\NC \ssbf key           \NC \bf explanation \NC\NR
\NC bad_ps_fontname \NC \NC \NR
\NC bad_glyph_table \NC \NC \NR
\NC bad_cff_table \NC \NC \NR
\NC bad_metrics_table \NC \NC \NR
\NC bad_cmap_table \NC \NC \NR
\NC bad_bitmaps_table  \NC \NC \NR
\NC bad_gx_table      \NC \NC \NR
\NC bad_ot_table     \NC \NC \NR
\NC bad_os2_version \NC \NC \NR
\NC bad_sfnt_header \NC \NC \NR
\stoptabulate

\subsubsubsection{horiz_base and vert_base table}

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC tags                 \NC table    \NC an array of script list tags\NC \NR
\NC scripts              \NC table    \NC \NC \NR
\stoptabulate


The \type{scripts} subtable:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC baseline             \NC table   \NC \NC \NR
\NC default_baseline     \NC number  \NC \NC \NR
\NC lang                 \NC table   \NC \NC \NR
\stoptabulate


The \type{lang} subtable:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC tag                  \NC string   \NC a script tag \NC \NR
\NC ascent               \NC number   \NC \NC \NR
\NC descent              \NC number   \NC \NC \NR
\NC features             \NC table   \NC \NC \NR
\stoptabulate

The \type{features} points to an array of tables with the same layout
except that in those nested tables, the tag represents a language.

\subsubsubsection{altuni table}

An array of alternate \UNICODE\ values. Inside that array
are hashes with:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC unicode             \NC number     \NC \NC \NR
\NC variant             \NC number     \NC \NC \NR
\stoptabulate

\subsubsubsection{vert_variants and horiz_variants table}

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC variants             \NC string     \NC \NC \NR
\NC italic_correction    \NC number     \NC \NC \NR
\NC parts                \NC table     \NC \NC \NR
\stoptabulate

The \type{parts} table is an array of smaller tables:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC component            \NC string     \NC \NC \NR
\NC extender             \NC number     \NC \NC \NR
\NC start                \NC number     \NC \NC \NR
\NC end                  \NC number     \NC \NC \NR
\NC advance              \NC number     \NC \NC \NR
\stoptabulate


\subsubsubsection{mathkern table}

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC top_right            \NC table     \NC \NC \NR
\NC bottom_right         \NC table     \NC \NC \NR
\NC top_left             \NC table     \NC \NC \NR
\NC bottom_left          \NC table     \NC \NC \NR
\stoptabulate

Each of the subtables is an array of small hashes with two keys:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC height           \NC number     \NC \NC \NR
\NC kern             \NC number     \NC \NC \NR
\stoptabulate

\subsubsubsection{kerns table}

Substructure is identical to the per|-|glyph subtable.

\subsubsubsection{vkerns table}

Substructure is identical to the per|-|glyph subtable.

\subsubsubsection{texdata table}


\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC type                   \NC string   \NC \type {unset}, \type {text}, \type {math}, \type {mathext}\NC\NR
\NC params                 \NC array    \NC 22 font numeric parameters\NC\NR
\stoptabulate

\subsubsubsection{lookups table}

Top|-|level \type{lookups} is quite different from the ones at
character level. The keys in this hash are strings, the values the
actual lookups, represented as dictionary tables.

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC type                   \NC number   \NC \NC\NR
\NC format                 \NC enum     \NC one of \type {glyphs}, \type {class}, \type {coverage}, \type {reversecoverage} \NC\NR
\NC tag                    \NC string   \NC \NC\NR
\NC current_class          \NC array   \NC \NC\NR
\NC before_class           \NC array   \NC \NC\NR
\NC after_class            \NC array   \NC \NC\NR
\NC rules                  \NC array   \NC an array of rule items\NC\NR
\stoptabulate

Rule items have one common item and one specialized item:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC lookups                \NC array    \NC a linear array of lookup names\NC\NR
\NC glyph                  \NC array     \NC only if the parent's format is \type{glyph}\NC\NR
\NC class                  \NC array     \NC only if the parent's format is \type{glyph}\NC\NR
\NC coverage               \NC array     \NC only if the parent's format is \type{glyph}\NC\NR
\NC reversecoverage        \NC array     \NC only if the parent's format is \type{glyph}\NC\NR
\stoptabulate

A glyph table is:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC names                  \NC string   \NC \NC\NR
\NC back                   \NC string   \NC \NC\NR
\NC fore                   \NC string   \NC \NC\NR
\stoptabulate

A class table is:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC current              \NC array    \NC of numbers \NC\NR
\NC before               \NC array    \NC of numbers  \NC\NR
\NC after                \NC array    \NC of numbers  \NC\NR
\stoptabulate

coverage:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC current                \NC array    \NC of strings \NC\NR
\NC before                 \NC array    \NC of strings\NC\NR
\NC after                  \NC array    \NC of strings \NC\NR
\stoptabulate

reversecoverage:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC current                \NC array    \NC of strings \NC\NR
\NC before                 \NC array    \NC of strings\NC\NR
\NC after                  \NC array    \NC of strings \NC\NR
\NC replacements           \NC string   \NC \NC\NR
\stoptabulate

%***********************************************************************

\section{The \luatex{img} library}

The \type{img} library can be used as an alternative to
\tex{pdfximage} and \tex{pdfrefximage}, and the associated \quote {satellite}
commands like \tex{pdfximagebbox}.
Image objects can also be used within virtual fonts
via the \type{image} command listed in~\in{section}[virtualfonts].

\subsection{\luatex{img.new}}

\startfunctioncall
<image> var = img.new()
<image> var = img.new(<table> image_spec)
\stopfunctioncall

This function creates a userdata object of type \quote {image}. The
\type{image_spec} argument is optional. If it is given, it must be
a table, and that table must contain a \type{filename} key.  A number of
other keys can also be useful, these are explained below.

You can either say

\starttyping
a = img.new()
\stoptyping

followed by

\starttyping
a.filename = "foo.png"
\stoptyping

or you can put the file name (and some or all of the other keys)
into a table directly, like so:

\starttyping
a = img.new({filename='foo.pdf', page=1})
\stoptyping

The generated \type{<image>} userdata object allows access to a set of
user|-|specified values as well as a set of values that are normally
filled in and updated automatically by \LUATEX\ itself. Some of those
are derived from the actual image file, others are updated to reflect
the \PDF\ output status of the object.

There is one required user-specified field: the file name
(\type{filename}).  It can optionally be augmented by the requested
image dimensions (\type{width}, \type{depth}, \type{height}),
user-specified image attributes (\type{attr}), the requested \PDF\ page
identifier (\type{page}), the requested boundingbox (\type{pagebox})
for \PDF\ inclusion, the requested color space object (\type{colorspace}).

The function \type{img.new} does not access the actual image file, it
just creates the \type{<image>} userdata object and initializes some
memory structures. The \type{<image>} object and its internal
structures are automatically garbage collected.

Once the image is scanned, all the values in the \type{<image>}
except \type{width}, \type{height} and \type{depth}, become frozen,
and you cannot change them any more.

\subsection{\luatex{img.keys}}

\startfunctioncall
<table> keys = img.keys()
\stopfunctioncall

This function returns a list of all the possible \type{image_spec}
keys, both user-supplied and automatic ones.

% hahe: i need to add r/w ro column...
\starttabulate[|l|l|p|]
\NC \bf field name\NC \bf type \NC description \NC \NR
\NC attr          \NC string   \NC the image attributes for \LUATEX \NC \NR
\NC bbox          \NC table    \NC table with 4 boundingbox dimensions
                                   \type{llx}, \type{lly}, \type{urx},
                                   and \type{ury} overruling the \type{pagebox}
                                   entry\NC \NR
\NC colordepth    \NC number   \NC the number of bits used by the color space\NC \NR
\NC colorspace    \NC number   \NC the color space object number \NC \NR
\NC depth         \NC number   \NC the image depth for \LUATEX\
                                   (in scaled points)\NC \NR
\NC filename      \NC string   \NC the image file name \NC \NR
\NC filepath      \NC string   \NC the full (expanded) file name of the image\NC \NR
\NC height        \NC number   \NC the image height for \LUATEX\
                                   (in scaled points)\NC \NR
\NC imagetype     \NC string   \NC one of \type{pdf}, \type{png}, \type{jpg}, \type{jp2},
                                   \type{jbig2}, or \type{nil} \NC \NR
\NC index         \NC number   \NC the \PDF\ image name suffix \NC \NR
\NC objnum        \NC number   \NC the \PDF\ image object number \NC \NR
\NC page          \NC ??       \NC the identifier for the requested image page
                                   (type is number or string,
                                   default is the number 1)\NC \NR
\NC pagebox       \NC string   \NC the requested bounding box, one of
                                   \type {none}, \type {media}, \type {crop},
                                   \type {bleed}, \type {trim}, \type {art} \NC \NR
\NC pages         \NC number   \NC the total number of available pages \NC \NR
\NC rotation      \NC number   \NC the image rotation from included \PDF\ file,
                                   in multiples of 90~deg. \NC \NR
\NC stream        \NC string   \NC the raw stream data for an \type{/Xobject}
                                   \type{/Form} object\NC \NR
\NC transform     \NC number   \NC the image transform, integer number 0..7\NC \NR
\NC width         \NC number   \NC the image width for \LUATEX\
                                   (in scaled points)\NC \NR
\NC xres          \NC number   \NC the horizontal natural image resolution
                                   (in \DPI) \NC \NR
\NC xsize         \NC number   \NC the natural image width \NC \NR
\NC yres          \NC number   \NC the vertical natural image resolution
                                   (in \DPI) \NC \NR
\NC ysize         \NC number   \NC the natural image height \NC \NR
\stoptabulate

A running (undefined) dimension in \type{width}, \type{height}, or \type{depth} is
represented as \type{nil} in \LUA, so if you want to load an image at
its \quote {natural} size, you do not have to specify any of those three fields.

The \type{stream} parameter allows to fabricate an \type{/XObject} \type{/Form}
object from a string giving the stream contents,
e.\,g., for a filled rectangle:

\startfunctioncall
a.stream = "0 0 20 10 re f"
\stopfunctioncall

When writing the image, an \type{/Xobject} \type{/Form} object is created,
like with embedded \PDF\ file writing. The object is written out only once.
The \type{stream} key requires that also the \type{bbox} table is given.
The \type{stream} key conflicts with the \type{filename} key.
The \type{transform} key works as usual also with \type{stream}.

The \type{bbox} key needs a table with four boundingbox values, e.\,g.:

\startfunctioncall
a.bbox = {"30bp", 0, "225bp", "200bp"}
\stopfunctioncall

This replaces and overrules any given \type{pagebox} value;
with given \type{bbox} the box dimensions coming with an embedded \PDF\ file
are ignored.
The \type{xsize} and \type{ysize} dimensions are set accordingly,
when the image is scaled.
The \type{bbox} parameter is ignored for non-\PDF\ images.

The \type{transform} allows to mirror and rotate the image in steps of 90~deg.
The default value~0 gives an unmirrored, unrotated image.
Values 1|--|3 give counterclockwise rotation by 90, 180, or 270~degrees,
whereas with values 4|--|7 the image is first mirrored
and then rotated counterclockwise by 90, 180, or 270~degrees.
The \type{transform} operation gives the same visual result
as if you would externally preprocess the image by a graphics tool
and then use it by \LUATEX.
If a \PDF\ file to be embedded already contains a \type{/Rotate} specification,
the rotation result is the combination of the \type{/Rotate} rotation
followed by the \type{transform} operation.

\subsection{\luatex{img.scan}}

\startfunctioncall
<image> var = img.scan(<image> var)
<image> var = img.scan(<table> image_spec)
\stopfunctioncall

When you say \type{img.scan(a)} for a new image, the file is scanned,
and variables such as \type{xsize}, \type{ysize}, image \type{type}, number of
\type{pages}, and the resolution are extracted. Each of the \type{width},
\type{height}, \type{depth} fields are set up according to the image dimensions,
if they were not given an explicit value already.
An image file will never be scanned more than once for a given image variable.
With all subsequent \type{img.scan(a)} calls only the dimensions are again
set up (if they have been changed by the user in the meantime).

For ease of use, you can do right-away a

\starttyping
<image> a = img.scan ({ filename = "foo.png" })
\stoptyping

without a prior \type{img.new}.

Nothing is written yet at this point, so you can do \type{a=img.scan},
retrieve the available info like image width and height, and then
throw away \type{a} again by saying \type{a=nil}.  In that case no
image object will be reserved in the PDF, and the used memory will be
cleaned up automatically.

\subsection{\luatex{img.copy}}

\startfunctioncall
<image> var = img.copy(<image> var)
<image> var = img.copy(<table> image_spec)
\stopfunctioncall

If you say \type{a = b}, then both variables point to the same
\type{<image>} object. if you want to write out an image with
different sizes, you can do a \type{b=img.copy(a)}.

Afterwards, \type{a} and \type{b} still reference the same actual
image dictionary, but the dimensions for \type{b} can now be changed
from their initial values that were just copies from \type{a}.

% Hartmut, I don't know if this makes sense. An example of what
% can, and what cannot be changed would be helpful.
% -- will think about it...

\subsection{\luatex{img.write}}

\startfunctioncall
<image> var = img.write(<image> var)
<image> var = img.write(<table> image_spec)
\stopfunctioncall

By \type{img.write(a)} a \PDF\ object number is allocated,
and a whatsit node of subtype \type{pdf_refximage} is generated
and put into the output list.
By this the image \type{a} is placed into the page stream,
and the image file is written out into an image stream object
after the shipping of the current page is finished.

Again you can do a terse call like

\starttyping
img.write ({ filename = "foo.png" })
\stoptyping

The \type{<image>} variable is returned in case you want it for later
processing.

\subsection{\luatex{img.immediatewrite}}

\startfunctioncall
<image> var = img.immediatewrite(<image> var)
<image> var = img.immediatewrite(<table> image_spec)
\stopfunctioncall

By \type{img.immediatewrite(a)} a \PDF\ object number is
allocated, and the image file for image \type{a} is written out
immediately into the \PDF\ file as an image stream object (like
with \tex{immediate}\tex{pdfximage}). The object number of the image
stream dictionary is then available by the \type{objnum} key. No
\type{pdf_refximage} whatsit node is generated. You will need an
\luatex{img.write(a)} or \luatex{img.node(a)} call to let the
image appear on the page, or reference it by another trick; else
you will have a dangling image object in the \PDF\ file.

Also here you can do a terse call like

\starttyping
a = img.immediatewrite ({ filename = "foo.png" })
\stoptyping

The \type{<image>} variable is returned and you will most likely need it.

\subsection{\luatex{img.node}}

\startfunctioncall
<node> n = img.node(<image> var)
<node> n = img.node(<table> image_spec)
\stopfunctioncall

This function allocates a \PDF\ object number and returns a
whatsit node of subtype \type{pdf_refximage}, filled with the
image parameters \type{width}, \type{height}, \type{depth}, and
\type{objnum}. Also here you can do a terse call like:

\starttyping
n = img.node ({ filename = "foo.png" })
\stoptyping

This example outputs an image:

\starttyping
node.write(img.node{filename="foo.png"})
\stoptyping

\subsection{\luatex{img.types}}

\startfunctioncall
<table> types = img.types()
\stopfunctioncall

This function returns a list with the supported image file type names,
currently these are \type{pdf}, \type{png}, \type{jpg}, \type{jp2} (JPEG~2000),
and \type{jbig2}.

\subsection{\luatex{img.boxes}}

\startfunctioncall
<table> boxes = img.boxes()
\stopfunctioncall

This function returns a list with the supported \PDF\ page box names,
currently these are \type {media}, \type {crop}, \type {bleed}, \type {trim}, and \type {art}
(all in lowercase letters).

%***********************************************************************

\section{The \luatex{kpse} library}

This library provides two separate, but nearly identical interfaces to
the \KPATHSEA\ file search functionality: there is a \quote{normal}
procedural interface that shares its kpathsea instance with \LUATEX\
itself, and an object oriented interface that is completely on its
own. The object oriented interface and \type{kpse.new} have been added
in \LUATEX\ 0.37.

\subsection{\luatex{kpse.set_program_name} and  \luatex{kpse.new}}

Before the search library can be used at all, its database has to be
initialized. There are three possibilities, two of which belong to the
procedural interface.

First, when \LUATEX\ is used to typeset documents, this initialization
happens automatically and the \KPATHSEA\ executable and program names
are set to \type{luatex} (that is, unless explicitly prohibited by the
user's startup script. See~\in{section}[init] for more details).

Second, in \TEXLUA\ mode, the initialization has to be done explicitly
via the \luatex{kpse.set_program_name} function, which sets the
\KPATHSEA\ executable (and optionally program) name.

\startfunctioncall
kpse.set_program_name(<string> name)
kpse.set_program_name(<string> name, <string> progname)
\stopfunctioncall

The second argument controls the use of the \quote{dotted} values in the
\type{texmf.cnf} configuration file, and defaults to the first argument.

Third, if you prefer the object oriented interface, you have to call a
different function. It has the same arguments, but it returns a
userdata variable.

\startfunctioncall
local kpathsea = kpse.new(<string> name)
local kpathsea = kpse.new(<string> name, <string> progname)
\stopfunctioncall

Apart from these two functions, the calling conventions of the
interfaces are identical. Depending on the chosen interface, you
either call \type{kpse.find_file()} or \type{kpathsea:find_file()},
with identical arguments and return vales.

\subsection{\luatex{find_file}}

The most often used function in the library is find_file:

\startfunctioncall
<string> f = kpse.find_file(<string> filename)
<string> f = kpse.find_file(<string> filename, <string> ftype)
<string> f = kpse.find_file(<string> filename, <boolean> mustexist)
<string> f = kpse.find_file(<string> filename, <string> ftype, <boolean> mustexist)
<string> f = kpse.find_file(<string> filename, <string> ftype, <number> dpi)
\stopfunctioncall

Arguments:
\startitemize[intro]

\sym{filename}

the name of the file you want to find, with or without extension.

\sym{ftype}

maps to the \type {-format} argument of \KPSEWHICH.  The supported
 \type{ftype} values are the same as the ones supported by the
standalone \type{kpsewhich} program:

\startsimplecolumns
\starttyping
'gf'
'pk'
'bitmap font'
'tfm'
'afm'
'base'
'bib'
'bst'
'cnf'
'ls-R'
'fmt'
'map'
'mem'
'mf'
'mfpool'
'mft'
'mp'
'mppool'
'MetaPost support'
'ocp'
'ofm'
'opl'
'otp'
'ovf'
'ovp'
'graphic/figure'
'tex'
'TeX system documentation'
'texpool'
'TeX system sources'
'PostScript header'
'Troff fonts'
'type1 fonts'
'vf'
'dvips config'
'ist'
'truetype fonts'
'type42 fonts'
'web2c files'
'other text files'
'other binary files'
'misc fonts'
'web'
'cweb'
'enc files'
'cmap files'
'subfont definition files'
'opentype fonts'
'pdftex config'
'lig files'
'texmfscripts'
'lua',
'font feature files',
'cid maps',
'mlbib',
'mlbst',
'clua',
\stoptyping
\stopsimplecolumns

The default type is \type{tex}. Note: this is different from
\KPSEWHICH, which tries to deduce the file type itself from
looking at the supplied extension.  The last four types:
'font feature files', 'cid maps', 'mlbib', 'mlbst' were new
additions in \LUATEX\ 0.40.2.


\sym{mustexist}

is similar to \KPSEWHICH's \type{-must-exist}, and the default is \type{false}.
If you specify \type{true} (or a non|-|zero integer), then the \KPSE\ library
will search the disk as well as the \type {ls-R} databases.

\sym{dpi}

This is used for the size argument of the formats \type{pk}, \type{gf}, and \type{bitmap font}.
\stopitemize

\subsection{\luatex{lookup}}

A more powerful (but slower) generic method for finding files is also
available (since 0.51). It returns a string for each found file.

\startfunctioncall
<string> f, ... = kpse.lookup(<string> filename, <table> options)
\stopfunctioncall

The options match commandline arguments from \type{kpsewhich}:

\starttabulate[|l|l|p|]
\NC \ssbf key \NC \ssbf type \NC \ssbf description \NC \NR
\NC debug     \NC number     \NC set debugging flags for this lookup\NC     \NR
\NC format    \NC string     \NC use specific file type (see list above)\NC \NR
\NC dpi       \NC number     \NC use this resolution for this lookup; default 600\NC \NR
\NC path      \NC string     \NC search in the given path\NC \NR
\NC all       \NC boolean    \NC output all matches, not just the first\NC \NR
\NC mustexist \NC boolean    \NC (0.65 and higher) search the disk as well as ls-R if necessary\NC \NR
\NC must-exist\NC boolean    \NC (0.64 and lower) search the disk as well as ls-R if necessary\NC \NR
\NC mktexpk   \NC boolean    \NC disable/enable mktexpk generation for this lookup\NC \NR
\NC mktextex  \NC boolean    \NC disable/enable mktextex generation for this lookup\NC \NR
\NC mktexmf   \NC boolean    \NC disable/enable mktexmf generation for this lookup\NC \NR
\NC mktextfm  \NC boolean    \NC disable/enable mktextfm generation for this lookup\NC \NR
\NC subdir    \NC string
                  or table   \NC only output matches whose directory part
                                 ends with the given string(s) \NC \NR
\stoptabulate

\subsection{\luatex{init_prog}}

Extra initialization for programs that need to generate bitmap fonts.

\startfunctioncall
kpse.init_prog(<string> prefix, <number> base_dpi, <string> mfmode)
kpse.init_prog(<string> prefix, <number> base_dpi, <string> mfmode, <string> fallback)
\stopfunctioncall


\subsection{\luatex{readable_file}}

Test if an (absolute) file name is a readable file.

\startfunctioncall
<string> f = kpse.readable_file(<string> name)
\stopfunctioncall

The return value is the actual absolute filename you should use,
because the disk name is not always the same as the requested name,
due to aliases and system|-|specific handling under e.\,g.\ \MSDOS.

Returns \lua {nil} if the file does not exist or is not readable.

\subsection{\luatex{expand_path}}

Like kpsewhich's \type {-expand-path}:

\startfunctioncall
<string> r = kpse.expand_path(<string> s)
\stopfunctioncall

\subsection{\luatex{expand_var}}

Like kpsewhich's  \type{-expand-var}:

\startfunctioncall
<string> r = kpse.expand_var(<string> s)
\stopfunctioncall

\subsection{\luatex{expand_braces}}

Like kpsewhich's \type{-expand-braces}:

\startfunctioncall
<string> r = kpse.expand_braces(<string> s)
\stopfunctioncall

\subsection{\luatex{show_path}}

Like kpsewhich's \type{-show-path}:

\startfunctioncall
<string> r = kpse.show_path(<string> ftype)
\stopfunctioncall


\subsection{\luatex{var_value}}

Like kpsewhich's \type{-var-value}:

\startfunctioncall
<string> r = kpse.var_value(<string> s)
\stopfunctioncall

\subsection{\luatex{version}}

Returns the kpathsea version string (new in 0.51)

\startfunctioncall
<string> r = kpse.version()
\stopfunctioncall


\section{The \luatex{lang} library}

This library provides the interface to \LUATEX's structure
representing a language, and the associated functions.

\startfunctioncall
<language> l = lang.new()
<language> l = lang.new(<number> id)
\stopfunctioncall

This function creates a new userdata object. An object of type
\type{<language>} is the first argument to most of the other functions
in the \luatex{lang} library. These functions can also be used as if
they were object methods, using the colon syntax.

Without an argument, the next available internal id number will be
assigned to this object. With argument, an object will be created that
links to the internal language with that id number.

\startfunctioncall
<number> n = lang.id(<language> l)
\stopfunctioncall

returns the internal \tex{language} id number this object refers to.

\startfunctioncall
<string> n = lang.hyphenation(<language> l)
lang.hyphenation(<language> l, <string> n)
\stopfunctioncall

Either returns the current hyphenation exceptions for this language,
or adds new ones. The syntax of the string is explained in~\in{section}[patternsexceptions].

\startfunctioncall
lang.clear_hyphenation(<language> l)
\stopfunctioncall

Clears the exception dictionary for this language.

\startfunctioncall
<string> n = lang.clean(<string> o)
\stopfunctioncall

Creates a hyphenation key from the supplied hyphenation value. The
syntax of the argument string is explained in~\in{section}[patternsexceptions].
This function is useful if
you want to do something else based on the words in a dictionary file,
like spell-checking.

\startfunctioncall
<string> n = lang.patterns(<language> l)
lang.patterns(<language> l, <string> n)
\stopfunctioncall

Adds additional patterns for this language object, or returns the
current set. The syntax of this string is explained in~\in{section}[patternsexceptions].

\startfunctioncall
lang.clear_patterns(<language> l)
\stopfunctioncall

Clears the pattern dictionary for this language.

\startfunctioncall
<number> n = lang.prehyphenchar(<language> l)
lang.prehyphenchar(<language> l, <number> n)
\stopfunctioncall

Gets or sets the \quote{pre|-|break} hyphen character for implicit
hyphenation in this language (initially the hyphen, decimal 45).

\startfunctioncall
<number> n = lang.posthyphenchar(<language> l)
lang.posthyphenchar(<language> l, <number> n)
\stopfunctioncall

Gets or sets the \quote{post|-|break} hyphen character for implicit
hyphenation in this language (initially null, decimal~0, indicating
emptiness).


\startfunctioncall
<number> n = lang.preexhyphenchar(<language> l)
lang.preexhyphenchar(<language> l, <number> n)
\stopfunctioncall

Gets or sets the \quote{pre|-|break} hyphen character for explicit
hyphenation in this language (initially null, decimal~0, indicating
emptiness).

\startfunctioncall
<number> n = lang.postexhyphenchar(<language> l)
lang.postexhyphenchar(<language> l, <number> n)
\stopfunctioncall

Gets or sets the \quote{post|-|break} hyphen character for explicit
hyphenation in this language (initially null, decimal~0, indicating
emptiness).

\startfunctioncall
<boolean> success = lang.hyphenate(<node> head)
<boolean> success = lang.hyphenate(<node> head, <node> tail)
\stopfunctioncall

Inserts hyphenation points (discretionary nodes) in a node list. If
\type{tail} is given as argument, processing stops on that node.
Currently, \type{success} is always true if \type{head} (and \type{tail}, if
specified) are proper nodes, regardless of possible other errors.

Hyphenation works only on \quote{characters}, a special subtype of all
the glyph nodes with the node subtype having the value \type{1}. Glyph
modes with different subtypes are not processed. See
\in{section~}[charsandglyphs] for more details.


\section{The \luatex{lua} library}

This library contains one read|-|only  item:

\starttyping
<string> s = lua.version
\stoptyping

This returns the \LUA\ version identifier string. The value is
currently \directlua {tex.print(lua.version)}.

\subsection{\LUA\ bytecode registers}

\LUA\ registers can be used to communicate \LUA\ functions across \LUA\
chunks. The accepted values for assignments are functions and
\type{nil}. Likewise, the retrieved value is either a function or \type{nil}.

\starttyping
lua.bytecode[<number> n] = <function> f
lua.bytecode[<number> n]()
\stoptyping

The contents of the \luatex{lua.bytecode} array is stored inside the format
file as actual \LUA\ bytecode, so it can also be used to preload \LUA\ code.

Note: The function must not contain any upvalues. Currently, functions
containing upvalues can be stored (and their upvalues are set to
\type{nil}), but this is an artifact of the current \LUA\
implementation and thus subject to change.

The associated function calls are

\startfunctioncall
<function> f = lua.getbytecode(<number> n)
lua.setbytecode(<number> n, <function> f)
\stopfunctioncall

Note: Since a \LUA\ file loaded using \luatex{loadfile(filename)} is
essentially an anonymous function, a complete file can be stored in a
bytecode register like this:

\startfunctioncall
lua.bytecode[n] = loadfile(filename)
\stopfunctioncall

Now all definitions (functions, variables) contained in the file can be
created by executing this bytecode register:

\startfunctioncall
lua.bytecode[n]()
\stopfunctioncall

Note that the path of the file is stored in the \LUA\ bytecode to be
used in stack backtraces and therefore dumped into the format file if
the above code is used in \INITEX. If it contains private information, i.e.
the user name, this information is then contained in the format file as
well. This should be kept in mind when preloading files into a bytecode
register in \INITEX.

\subsection{\LUA\ chunk name registers}

There is an array of 65536 (0--65535) potential chunk names for use with
the \type{\directlua} and \type{\latelua} primitives.

\startfunctioncall
lua.name[<number> n] = <string> s
<string> s = lua.name[<number> n]
\stopfunctioncall

If you want to unset a lua name, you can assign \type{nil} to it.


\section{The \luatex{mplib} library}

The \MP\ library interface registers itself in the table \type{mplib}. It
is based on  \MPLIB\ version \ctxlua{tex.sprint(mplib.version())}.

\subsection{\luatex{mplib.new}}

To create a new \METAPOST\ instance, call

\startfunctioncall
<mpinstance> mp = mplib.new({...})
\stopfunctioncall

This creates the \type{mp} instance object. The argument hash can have a number of
different fields,  as follows:

\starttabulate[|lT|l|p|p|]
\NC \ssbf name  \NC \bf type   \NC \bf description                \NC \bf default \NC\NR
\NC error_line \NC         number \NC error line width            \NC 79 \NC\NR
\NC print_line \NC         number \NC line length in ps output    \NC 100\NC\NR
\NC random_seed \NC        number \NC the initial random seed     \NC variable\NC\NR
\NC interaction \NC        string \NC the interaction mode, one of
\type {batch}, \type {nonstop}, \type {scroll}, \type {errorstop} \NC \type {errorstop}\NC\NR
\NC job_name \NC           string \NC \type {--jobname}           \NC \type {mpout} \NC\NR
\NC find_file \NC          function \NC a function to find files  \NC only local files\NC\NR
\stoptabulate

The \type{find_file} function should be of this form:

\starttyping
<string> found = finder (<string> name, <string> mode, <string> type)
\stoptyping

with:

\starttabulate[|lT|l|p|]
\NC \bf name \NC \bf the requested file \NC   \NR
\NC mode \NC the file mode: \type {r} or \type {w} \NC \NR
\NC type \NC the kind of file, one of: \type {mp}, \type {tfm}, \type {map}, \type {pfb}, \type {enc} \NC \NR
\stoptabulate

Return either the full pathname of the found file, or \type{nil} if
the file cannot be found.

Note that the new version of \MPLIB\ no longer uses binary mem files,
so the way to preload a set of macros is simply to start off with
an \type{input} command in the first \type{mp:execute()} call.


\subsection{\luatex{mp:statistics}}

You can request statistics with:

\startfunctioncall
<table> stats = mp:statistics()
\stopfunctioncall

This function returns the vital statistics for an \MPLIB\ instance. There are four
fields, giving the maximum number of used items in each of four
allocated object classes:

\starttabulate[|lT|l|p|]
\NC main_memory \NC number \NC memory size \NC\NR
\NC hash_size   \NC number \NC hash size\NC\NR
\NC param_size  \NC number \NC simultaneous macro parameters\NC\NR
\NC max_in_open \NC number \NC input file nesting levels\NC\NR
\stoptabulate

Note that in the new version of \MPLIB, this is informational only. The
objects are all allocated dynamically, so there is no chance of running
out of space unless the available system memory is exhausted.

\subsection{\luatex{mp:execute}}

You can ask the \METAPOST\ interpreter to run a chunk of code by calling

\startfunctioncall
<table> rettable = mp:execute('metapost language chunk')
\stopfunctioncall

for various bits of \METAPOST\ language input. Be sure to check the
\type{rettable.status} (see below) because when a fatal \METAPOST\
error occurs the \MPLIB\ instance will become unusable thereafter.

Generally speaking, it is best to keep your chunks small, but beware
that all chunks have to obey proper syntax, like each of them is a
small file. For instance, you cannot split a single statement over
multiple chunks.

In contrast with the normal standalone \type{mpost} command, there is
{\em no\/} implied \quote{input} at the start of the first chunk.

\subsection{\luatex{mp:finish}}

\startfunctioncall
<table> rettable = mp:finish()
\stopfunctioncall

If for some reason you want to stop using an \MPLIB\ instance while
processing is not yet actually done, you can call \type{mp:finish}.
Eventually, used memory will be freed and open files will be closed by
the \LUA\ garbage collector, but an explicit \type{mp:finish} is the
only way to capture the final part of the output streams.

\subsection{Result table}

The return value of \type{mp:execute} and \type{mp:finish} is a table
with a few possible keys (only \type {status} is always guaranteed to be present).

\starttabulate[|l|l|p|]
\NC log    \NC string \NC output to the \quote {log} stream \NC \NR
\NC term   \NC string \NC output to the \quote {term} stream \NC \NR
\NC error  \NC string \NC output to the \quote {error} stream (only used for \quote {out of memory})\NC \NR
\NC status \NC number \NC the return value: 0=good, 1=warning, 2=errors, 3=fatal error \NC \NR
\NC fig    \NC table \NC an array of generated figures (if any)\NC \NR
\stoptabulate

When \type{status} equals~3, you should stop using this \MPLIB\ instance
immediately, it is no longer capable of processing input.

If it is present, each of the entries in the \type{fig} array is a
userdata representing a figure object, and each of those has a number of
object methods you can call:

\starttabulate[|l|l|p|]
\NC boundingbox  \NC function \NC returns the bounding box, as an array of 4 values\NC \NR
\NC postscript   \NC function \NC returns a string that is the ps output of the \type{fig}.
                                  this function accepts two optional integer arguments for
                                  specifying the values of \type{prologues} (first argument)
                                  and \type{procset} (second argument)\NC \NR
\NC svg          \NC function \NC returns a string that is the svg output of the \type{fig}.
                                  This function accepts an optional integer argument for
                                  specifying the value of \type{prologues}\NC \NR
\NC objects      \NC function \NC returns the actual array of graphic objects in this \type{fig} \NC \NR
\NC copy_objects \NC function \NC returns a deep copy of the array of graphic objects in this \type{fig} \NC \NR
\NC filename     \NC function \NC the filename this \type{fig}'s \POSTSCRIPT\ output
                                  would have written to in standalone mode\NC \NR
\NC width        \NC function \NC the \type{charwd} value \NC \NR
\NC height       \NC function \NC the \type{charht} value \NC \NR
\NC depth        \NC function \NC the \type{chardp} value \NC \NR
\NC italcorr     \NC function \NC the \type{charit} value \NC \NR
\NC charcode     \NC function \NC the (rounded) \type{charcode} value \NC \NR
\stoptabulate

{\bf NOTE:} you can call \type{fig:objects()} only once for any one \type{fig} object!

When the boundingbox represents a \quote {negated rectangle}, i.e.\ when the first set
of coordinates is larger than the second set, the picture is empty.

Graphical objects come in various types that each has a different list of
accessible values. The types are: \type{fill}, \type{outline}, \type{text},
\type{start_clip}, \type{stop_clip}, \type{start_bounds}, \type{stop_bounds}, \type{special}.

There is helper function (\type{mplib.fields(obj)}) to get the list of
accessible values for a particular object, but you can just as easily
use the tables given below.

All graphical objects have a field \type{type} that gives the object
type as a string value; it is not explicit mentioned in the following tables.
In the following, \type{number}s are \POSTSCRIPT\ points represented as
a floating point number, unless stated otherwise. Field values that
are of type \type{table} are explained in the next section.

\subsubsection{fill}

\starttabulate[|l|l|p|]
\NC path       \NC table \NC the list of knots \NC \NR
\NC htap       \NC table \NC the list of knots for the reversed trajectory \NC \NR
\NC pen        \NC table \NC knots of the pen \NC \NR
\NC color      \NC table \NC the object's color \NC \NR
\NC linejoin   \NC number \NC line join style (bare number)\NC \NR
\NC miterlimit \NC number \NC miterlimit\NC \NR
\NC prescript  \NC string \NC the prescript text \NC \NR
\NC postscript \NC string \NC the postscript text \NC \NR
\stoptabulate

The entries \type{htap} and \type{pen} are optional.

There is helper function (\type{mplib.pen_info(obj)}) that returns
a table containing a bunch of vital characteristics of the used pen
(all values are floats):

\starttabulate[|l|l|p|]
\NC width       \NC number \NC width of the pen\NC \NR
\NC sx          \NC number \NC $x$ scale       \NC \NR
\NC rx          \NC number \NC $xy$ multiplier \NC \NR
\NC ry          \NC number \NC $yx$ multiplier \NC \NR
\NC sy          \NC number \NC $y$ scale       \NC \NR
\NC tx          \NC number \NC $x$ offset      \NC \NR
\NC ty          \NC number \NC $y$ offset      \NC \NR
\stoptabulate

\subsubsection{outline}

\starttabulate[|l|l|p|]
\NC path \NC table \NC the list of knots \NC \NR
\NC pen \NC table \NC knots of the pen \NC \NR
\NC color \NC table \NC the object's color \NC \NR
\NC linejoin \NC number \NC line join style (bare number)\NC \NR
\NC miterlimit \NC number \NC miterlimit \NC \NR
\NC linecap \NC number \NC line cap style (bare number)\NC \NR
\NC dash \NC table \NC representation of a dash list\NC \NR
\NC prescript \NC string \NC the prescript text \NC \NR
\NC postscript \NC string \NC the postscript text \NC \NR
\stoptabulate

The entry \type{dash} is optional.

\subsubsection{text}

\starttabulate[|l|l|p|]
\NC text \NC string \NC the text \NC \NR
\NC font \NC string \NC font tfm name \NC \NR
\NC dsize \NC number \NC font size\NC \NR
\NC color \NC table \NC the object's color \NC \NR
\NC width \NC number \NC  \NC \NR
\NC height \NC number \NC  \NC \NR
\NC depth \NC number \NC  \NC \NR
\NC transform \NC table \NC a text transformation \NC \NR
\NC prescript \NC string \NC the prescript text \NC \NR
\NC postscript \NC string \NC the postscript text \NC \NR
\stoptabulate

\subsubsection{special}

\starttabulate[|l|l|p|]
\NC prescript \NC string \NC special text \NC \NR
\stoptabulate

\subsubsection{start_bounds, start_clip}

\starttabulate[|l|l|p|]
\NC path \NC table \NC the list of knots \NC \NR
\stoptabulate

\subsubsection{stop_bounds, stop_clip}

Here are no fields available.

\subsection{Subsidiary table formats}

\subsubsection{Paths and pens}

Paths and pens (that are really just a special type of paths as far as
\MPLIB\ is concerned) are represented by an array where each entry
is a table that represents a knot.

\starttabulate[|lT|l|p|]
\NC left_type   \NC string \NC when present: 'endpoint', but usually absent \NC \NR
\NC right_type  \NC string \NC like \type{left_type}\NC \NR
\NC x_coord             \NC number \NC X coordinate of this knot\NC \NR
\NC y_coord             \NC number \NC Y coordinate of this knot\NC \NR
\NC left_x              \NC number \NC X coordinate of the precontrol point of this knot\NC \NR
\NC left_y              \NC number \NC Y coordinate of the precontrol point of this knot\NC \NR
\NC right_x             \NC number \NC X coordinate of the postcontrol point of this knot\NC \NR
\NC right_y             \NC number \NC Y coordinate of the postcontrol point of this knot\NC \NR
\stoptabulate

There is one special case: pens that are (possibly transformed)
ellipses have an extra string-valued key \type{type} with value
\type{elliptical} besides the array part containing the knot list.

\subsubsection{Colors}

A color is an integer array with 0, 1, 3 or 4 values:

\starttabulate[|l|l|p|]
\NC 0  \NC marking only \NC no values                                                     \NC\NR
\NC 1  \NC greyscale    \NC one value in the range $(0,1)$, \quote {black} is $0$         \NC\NR
\NC 3  \NC \RGB         \NC three values in the range $(0,1)$, \quote {black} is $0,0,0$  \NC\NR
\NC 4  \NC \CMYK        \NC four values in the range $(0,1)$, \quote {black} is $0,0,0,1$ \NC\NR
\stoptabulate

If the color model of the internal object was \type{uninitialized}, then
it was initialized to the values representing \quote {black} in the colorspace
\type{defaultcolormodel} that was in effect at the time of the \type{shipout}.

\subsubsection{Transforms}

Each transform is a six-item array.

\starttabulate[|l|l|p|]
\NC 1 \NC number \NC represents x \NC\NR
\NC 2 \NC number \NC represents y \NC\NR
\NC 3 \NC number \NC represents xx \NC\NR
\NC 4 \NC number \NC represents yx \NC\NR
\NC 5 \NC number \NC represents xy \NC\NR
\NC 6 \NC number \NC represents yy \NC\NR
\stoptabulate

Note that the translation (index 1 and 2) comes first. This differs
from the ordering in  \POSTSCRIPT, where the translation comes last.

\subsubsection{Dashes}

Each \type{dash} is two-item hash, using the same model as \POSTSCRIPT\
for the representation of the dashlist. \type{dashes} is an array of
\quote {on} and \quote {off}, values, and \type{offset} is the phase of the pattern.

\starttabulate[|l|l|p|]
\NC dashes \NC hash   \NC an array of on-off numbers \NC\NR
\NC offset \NC number \NC the starting offset value \NC\NR
\stoptabulate

\subsection{Character size information}

These functions find the size of a glyph in a defined font. The
\type{fontname} is the same name as the argument to \type{infont};
the \type{char} is a glyph id in the range 0 to 255; the returned
\type{w} is in AFM units.

\subsubsection{\luatex{mp:char_width}}

\startfunctioncall
<number> w = mp:char_width(<string> fontname, <number> char)
\stopfunctioncall

\subsubsection{\luatex{mp:char_height}}

\startfunctioncall
<number> w = mp:char_height(<string> fontname, <number> char)
\stopfunctioncall

\subsubsection{\luatex{mp:char_depth}}

\startfunctioncall
<number> w = mp:char_depth(<string> fontname, <number> char)
\stopfunctioncall

\section{The \luatex{node} library}

The \luatex{node} library contains functions that facilitate dealing
with (lists of) nodes and their values. They allow you to create, alter,
copy, delete, and insert \LUATEX\ node objects, the core
objects within the typesetter.

\LUATEX\ nodes are represented in \LUA\ as userdata with
the metadata type \luatex{luatex.node}. The various parts within
a node can be accessed using named fields.

Each node has at least the three fields \type{next}, \type{id}, and
\type{subtype}:

\startitemize[intro]

\item The \type{next} field returns the userdata
object for the next node in a linked list of nodes, or
\type{nil}, if there is no next node.

\item The \type{id} indicates \TEX's \quote{node type}. The field \type{id}
has a numeric value for efficiency reasons, but some of the library
functions also accept a string value instead of \type{id}.

\item The \type{subtype} is another number. It often gives further information
about a node of a particular \type{id}, but it is most important when dealing
with \quote{whatsits}, because they are differentiated solely based on their
\type{subtype}.
\stopitemize

The other available fields depend on the \type{id} (and for \quote{whatsits}, the
\type{subtype}) of the node. Further details on the various fields and their
meanings are given in~\in{chapter}[nodes].

Support for \type{unset} (alignment) nodes is partial:
they can be queried and modified from \LUA\ code, but not created.

Nodes can be compared to each other, but: you are actually comparing
indices into the node memory. This means that equality tests can only
be trusted under very limited conditions. It will not work correctly
in any situation where one of the two nodes has been freed and|/|or
reallocated: in that case, there will be false positives.

At the moment, memory management of nodes should still be done
explicitly by the user.  Nodes are not \quote{seen} by the \LUA\
garbage collector, so you have to call the node freeing functions
yourself when you are no longer in need of a node (list). Nodes form
linked lists without reference counting, so you have to be careful
that when control returns back to \LUATEX\ itself, you have not
deleted nodes that are still referenced from a \type{next} pointer
elsewhere, and that you did not create nodes that are referenced more
than once.

There are statistics available with regards to the allocated node memory,
which can be handy for tracing.

\subsection{Node handling functions}

\subsubsection{\luatex{node.is_node}}

\startfunctioncall
<boolean> t = node.is_node(<any> item)
\stopfunctioncall

This function returns true if the argument is a userdata object of
type \type{<node>}.

\subsubsection{\luatex{node.types}}

\startfunctioncall
<table> t = node.types()
\stopfunctioncall

This function returns an array that maps node id numbers to node type
strings, providing an overview of the possible top|-|level \type{id}
types.

\subsubsection{\luatex{node.whatsits}}

\startfunctioncall
<table> t = node.whatsits()
\stopfunctioncall

\TEX's \quote{whatsits} all have the same \type{id}. The various subtypes
are defined by their \type{subtype} fields. The function is much like
\luatex{node.types}, except that it provides an array of \type{subtype}
mappings.

\subsubsection{\luatex{node.id}}

\startfunctioncall
<number> id = node.id(<string> type)
\stopfunctioncall

This converts a single type name to its internal numeric
representation.

\subsubsection{\luatex{node.subtype}}

\startfunctioncall
<number> subtype = node.subtype(<string> type)
\stopfunctioncall

This converts a single whatsit name to its internal numeric
representation (\type{subtype}).

\subsubsection{\luatex{node.type}}

\startfunctioncall
<string> type = node.type(<any> n)
\stopfunctioncall

In the argument is a number, then this function converts an internal
numeric representation to an external string representation.
Otherwise, it will return the string \type{node} if the object
represents a node (this is new in 0.65), and \type{nil} otherwise.

\subsubsection{\luatex{node.fields}}

\startfunctioncall
<table> t = node.fields(<number> id)
<table> t = node.fields(<number> id, <number> subtype)
\stopfunctioncall

This function returns an array of valid field names for a particular
type of node. If you want to get the valid fields for a
\quote{whatsit}, you have to supply the second argument also. In other
cases, any given second argument will be silently ignored.

This function accepts string \type{id} and \type{subtype} values as
well.

\subsubsection{\luatex{node.has_field}}

\startfunctioncall
<boolean> t = node.has_field(<node> n, <string> field)
\stopfunctioncall

This function returns a boolean that is only true if \type{n} is
actually a node, and it has the field.

\subsubsection{\luatex{node.new}}

\startfunctioncall
<node> n = node.new(<number> id)
<node> n = node.new(<number> id, <number> subtype)
\stopfunctioncall

Creates a new node. All of the new node's fields are initialized to
either zero or \type{nil} except for \type{id} and \type{subtype} (if
supplied). If you want to create a new whatsit, then the second
argument is required, otherwise it need not be present. As with all
node functions, this function creates a node on the \TEX\ level.

This function accepts string \type{id} and \type{subtype} values as
well.

\subsubsection{\luatex{node.free}}

\startfunctioncall
node.free(<node> n)
\stopfunctioncall

Removes the node \type{n} from \TEX's memory. Be careful: no checks
are done on whether this node is still pointed to from a register or some
\type{next} field: it is up to you to make sure that the internal data
structures remain correct.

\subsubsection{\luatex{node.flush_list}}

\startfunctioncall
node.flush_list(<node> n)
\stopfunctioncall

Removes the node list \type{n} and the complete node list following
\type{n} from \TEX's memory. Be careful: no checks are done on whether
any of these nodes is still pointed to from a register or some
\type{next} field: it is up to you to make sure that the internal data
structures remain correct.

\subsubsection{\luatex{node.copy}}

\startfunctioncall
<node> m = node.copy(<node> n)
\stopfunctioncall

Creates a deep copy of node \type{n}, including all nested lists as in
the case of a hlist or vlist node. Only the \type{next} field is not
copied.

\subsubsection{\luatex{node.copy_list}}

\startfunctioncall
<node> m = node.copy_list(<node> n)
<node> m = node.copy_list(<node> n, <node> m)
\stopfunctioncall

Creates a deep copy of the node list that starts at \type{n}. If
\type{m} is also given, the copy stops just before node \type{m}.

Note that you cannot copy attribute lists this way, specialized functions for 
dealing with attribute lists will be provided later but are not there yet. 
However, there is normally no need to copy attribute lists as when you do
assignments to the \type{attr} field or make changes to specific attributes, the 
needed copying and freeing takes place automatically.

\subsubsection{\luatex{node.next} (0.65)}

\startfunctioncall
<node> m = node.next(<node> n)
\stopfunctioncall

Returns the node following this node, or \type{nil} if there is no
such node.

\subsubsection{\luatex{node.prev} (0.65)}

\startfunctioncall
<node> m = node.prev(<node> n)
\stopfunctioncall

Returns the node preceding this node, or \type{nil} if there is no
such node.


\subsubsection{\luatex{node.current_attr} (0.66)}

\startfunctioncall
<node> m = node.current_attr()
\stopfunctioncall

Returns the currently active list of attributes, if there is one.

Note: this function is somewhat experimental, and it returns the {\it
 actual} attribute list, not a copy thereof. 
Therefore, changing any of the attributes in the list will change
these values for all nodes that have the current attribute list
assigned to them.


\subsubsection{\luatex{node.hpack}}

\startfunctioncall
<node> h, <number> b = node.hpack(<node> n)
<node> h, <number> b = node.hpack(<node> n, <number> w, <string> info)
<node> h, <number> b = node.hpack(<node> n, <number> w, <string> info, <string> dir)
\stopfunctioncall

This function creates a new hlist by packaging the list that begins at  node
\type{n} into a horizontal box. With only a single argument, this box
is created using the natural width of its components. In the three
argument form, \type{info} must be either \type{additional} or
\type{exactly}, and \type{w} is the additional (\tex{hbox spread})
or exact (\tex{hbox to}) width to be used.

Direction support added in \LUATEX\ 0.45.

The second return value is the badness of the generated box,
this extension was added in 0.51.

Caveat: at this moment, there can be unexpected side|-|effects to this
function, like updating some of the \tex{marks} and \tex{inserts}.
Also note that the content of \type{h} is the original node list
\type{n}: if you call \type{node.free(h)} you will also free the
node list itself, unless you explicitly set the \type{list} field
to \type{nil} beforehand. And in a similar way, calling
\type{node.free(n)} will invalidate \type{h} as well!

\subsubsection{\luatex{node.vpack} (since 0.36)}

\startfunctioncall
<node> h, <number> b = node.vpack(<node> n)
<node> h, <number> b = node.vpack(<node> n, <number> w, <string> info)
<node> h, <number> b = node.vpack(<node> n, <number> w, <string> info, <string> dir)
\stopfunctioncall

This function creates a new vlist by packaging the list that begins at  node
\type{n} into a vertical box. With only a single argument, this box
is created using the natural height of its components. In the three
argument form, \type{info} must be either \type{additional} or
\type{exactly}, and \type{w} is the additional (\tex{vbox spread}) or exact (\tex{vbox to}) height to be used.

Direction support added in \LUATEX\ 0.45.

The second return value is the badness of the generated box,
this extension was added in 0.51.

See the description of \type{node.hpack()} for a few memory allocation
caveats.

\subsubsection{\luatex{node.dimensions} (0.43)}

\startfunctioncall
<number> w, <number> h, <number> d  = node.dimensions(<node> n)
<number> w, <number> h, <number> d  = node.dimensions(<node> n, <string> dir)
<number> w, <number> h, <number> d  = node.dimensions(<node> n, <node> t)
<number> w, <number> h, <number> d  = node.dimensions(<node> n, <node> t, <string> dir)
\stopfunctioncall

This function calculates the natural in-line dimensions of the node
list starting at node \type{n} and terminating just before node \type{t}
(or the end of the list, if there is no second argument). The return values are scaled
points. An alternative format that starts with glue parameters as the
first three arguments is also possible:

\startfunctioncall
<number> w, <number> h, <number> d  =
  node.dimensions(<number> glue_set, <number> glue_sign,
                 <number> glue_order, <node> n)
<number> w, <number> h, <number> d  =
  node.dimensions(<number> glue_set, <number> glue_sign,
                 <number> glue_order, <node> n, <string> dir)
<number> w, <number> h, <number> d  =
  node.dimensions(<number> glue_set, <number> glue_sign,
                 <number> glue_order, <node> n, <node> t)
<number> w, <number> h, <number> d  =
  node.dimensions(<number> glue_set, <number> glue_sign,
                 <number> glue_order, <node> n, <node> t, <string> dir)
\stopfunctioncall

This calling method takes glue settings into account and is especially
useful for finding the actual width of a sublist of nodes that are
already boxed, for example in code like this, which prints the
width of the space inbetween the \type{a} and \type{b} as it would
be if \type{\box0} was used as-is:

\starttyping
\setbox0 = \hbox to 20pt {a b}

\directlua{print (node.dimensions(tex.box[0].glue_set,
                                  tex.box[0].glue_sign,
                                  tex.box[0].glue_order,
                                  tex.box[0].head.next,
                                  node.tail(tex.box[0].head))) }
\stoptyping

Direction support added in \LUATEX\ 0.45.

\subsubsection{\luatex{node.mlist_to_hlist}}

\startfunctioncall
<node> h = node.mlist_to_hlist(<node> n,
             <string> display_type, <boolean> penalties)
\stopfunctioncall

This runs the internal mlist to hlist conversion, converting the math list in
\type{n} into the horizontal list \type{h}. The interface is exactly the same as
for the callback \type{mlist_to_hlist}.

\subsubsection{\luatex{node.slide}}

\startfunctioncall
<node> m = node.slide(<node> n)
\stopfunctioncall

Returns the last node of the node list that starts at \type{n}. As a
side|-|effect, it also creates a reverse chain of \type{prev} pointers
between nodes.

\subsubsection{\luatex{node.tail}}

\startfunctioncall
<node> m = node.tail(<node> n)
\stopfunctioncall

Returns the last node of the node list that starts at \type{n}.


\subsubsection{\luatex{node.length}}

\startfunctioncall
<number> i = node.length(<node> n)
<number> i = node.length(<node> n, <node> m)
\stopfunctioncall

Returns the number of nodes contained in the node list that starts at
\type{n}. If \type{m} is also supplied it stops at \type{m} instead of
at the end of the list. The node \type{m} is not counted.

\subsubsection{\luatex{node.count}}

\startfunctioncall
<number> i = node.count(<number> id, <node> n)
<number> i = node.count(<number> id, <node> n, <node> m)
\stopfunctioncall

Returns the number of nodes contained in the node list that starts at
\type{n} that have a matching \type{id} field.
If \type{m} is also supplied, counting stops at \type{m} instead of at
the end of the list. The node \type{m} is not counted.

This function also accept string \type{id}'s.

\subsubsection{\luatex{node.traverse}}

\startfunctioncall
<node> t = node.traverse(<node> n)
\stopfunctioncall

This is an iterator that loops over the node list that starts at \type{n}.

\subsubsection{\luatex{node.traverse_id}}

\startfunctioncall
<node> t = node.traverse_id(<number> id, <node> n)
\stopfunctioncall

This is an iterator that loops over all the nodes in the list that
starts at \type{n} that have a matching \type{id} field.

\subsubsection{\luatex{node.remove}}

\startfunctioncall
<node> head, current = node.remove(<node> head, <node> current)
\stopfunctioncall

This function removes the node \type{current} from the list following
\type{head}. It is your responsibility to make sure it is really part
of that list. The return values are the new \type{head} and
\type{current} nodes. The returned \type{current} is the node
following the \type{current} in the calling argument, and is only
passed back as a convenience (or \type{nil}, if there is no such node).  The
returned \type{head} is more important, because if the function is
called with \type{current} equal to \type{head}, it will be changed.

\subsubsection{\luatex{node.insert_before}}

\startfunctioncall
<node> head, new = node.insert_before(<node> head, <node> current, <node> new)
\stopfunctioncall

This function inserts the node \type{new} before \type{current} into
the list following \type{head}. It is your responsibility to make sure
that \type{current} is really part of that list. The return values are
the (potentially mutated) \type{head} and the node \type{new}, set up to
be part of the list (with correct \type{next} field). If \type{head}
is initially \type{nil}, it will become \type{new}.

\subsubsection{\luatex{node.insert_after}}

\startfunctioncall
<node> head, new = node.insert_after(<node> head, <node> current, <node> new)
\stopfunctioncall

This function inserts the node \type{new} after \type{current} into
the list following \type{head}. It is your responsibility to make sure
that \type{current} is really part of that list. The return values are
the \type{head} and the node \type{new}, set up to be part of the list
(with correct \type{next} field). If \type{head} is initially
\type{nil}, it will become \type{new}.

\subsubsection{\luatex{node.first_glyph} (0.65)}

\startfunctioncall
<node> n = node.first_glyph(<node> n)
<node> n = node.first_glyph(<node> n, <node> m)
\stopfunctioncall

Returns the first node in the list starting at \type{n} that is a
glyph node with a subtype indicating it is a glyph, or \type{nil}.
If \type{m} is given, processing stops at (but including) that node,
otherwise processing stops at the end of the list.

Note: this function used to be called \type{first_character}. It has
been renamed in \LUATEX\ 0.65, and the old name is deprecated now.

\subsubsection{\luatex{node.ligaturing}}

\startfunctioncall
<node> h, <node> t, <boolean> success = node.ligaturing(<node> n)
<node> h, <node> t, <boolean> success = node.ligaturing(<node> n, <node> m)
\stopfunctioncall

Apply \TEX-style ligaturing to the specified nodelist. The tail node
\type{m} is optional. The two returned nodes \type{h} and \type{t} are
the new head and tail (both \type{n} and \type{m} can change into
a new ligature).

\subsubsection{\luatex{node.kerning}}

\startfunctioncall
<node> h, <node> t, <boolean> success = node.kerning(<node> n)
<node> h, <node> t, <boolean> success = node.kerning(<node> n, <node> m)
\stopfunctioncall

Apply \TEX|-|style kerning to the specified nodelist. The tail node
\type{m} is optional. The two returned nodes \type{h} and \type{t} are
the head and tail (either one of these can be an inserted kern node,
because special kernings with word boundaries are possible).

\subsubsection{\luatex{node.unprotect_glyphs}}

\startfunctioncall
node.unprotect_glyphs(<node> n)
\stopfunctioncall

Subtracts 256 from all glyph node subtypes. This and the next
function are helpers to convert from \type{characters} to
\type{glyphs} during node processing.

\subsubsection{\luatex{node.protect_glyphs}}

\startfunctioncall
node.protect_glyphs(<node> n)
\stopfunctioncall

Adds 256 to all glyph node subtypes in the node list starting at
\type{n}, except that if the value is 1, it adds only 255. The special
handling of 1 means that \type{characters} will become \type{glyphs}
after subtraction of 256.

\subsubsection{\luatex{node.last_node}}

\startfunctioncall
<node> n = node.last_node()
\stopfunctioncall

This function pops the last node from \TEX's \quote{current list}.
It returns that node, or \type{nil} if the current list is empty.

\subsubsection{\luatex{node.write}}

\startfunctioncall
node.write(<node> n)
\stopfunctioncall

This is an experimental function that will append a node list to
\TEX's \quote {current list} (the node list is not deep-copied
any more since version 0.38).  There is no error checking yet!

\subsubsection{\luatex{node.protrusion_skippable} (0.60.1)}
\startfunctioncall
<boolean> skippable = node.protrusion_skippable(<node> n)
\stopfunctioncall

Returns \type{true} if, for the purpose of line boundary discovery
when character protrusion is active, this node can be skipped.

\subsection{Attribute handling}

Attributes appear as linked list of userdata objects in the
\type{attr} field of individual nodes. They can be handled
individually, but it is much safer and more efficient to use the
dedicated functions associated with them.

\subsubsection{\luatex{node.has_attribute}}

\startfunctioncall
<number> v = node.has_attribute(<node> n, <number> id)
<number> v = node.has_attribute(<node> n, <number> id, <number> val)
\stopfunctioncall

Tests if a node has the attribute with number \type{id} set. If
\type{val} is also supplied, also tests if the value matches \type{val}.
It returns the value, or, if no match is found, \type{nil}.

\subsubsection{\luatex{node.set_attribute}}

\startfunctioncall
node.set_attribute(<node> n, <number> id, <number> val)
\stopfunctioncall

Sets the attribute with number \type{id} to the value
\type{val}. Duplicate assignments are ignored. {\em [needs explanation]}

\subsubsection{\luatex{node.unset_attribute}}

\startfunctioncall
<number> v = node.unset_attribute(<node> n, <number> id)
<number> v = node.unset_attribute(<node> n, <number> id, <number> val)
\stopfunctioncall

Unsets the attribute with number \type{id}. If \type{val} is also supplied,
it will only perform this operation if the value matches \type{val}.
Missing attributes or attribute|-|value pairs are ignored.

If the attribute was actually deleted, returns its old
value. Otherwise, returns \type{nil}.

\section{The \luatex{pdf} library}

This contains variables and functions that are related to the \PDF\ backend.

%***********************************************************************

\subsection{\luatex{pdf.mapfile}, \luatex{pdf.mapline} (new in 0.53.0)}

\startfunctioncall
pdf.mapfile(<string> map file)
pdf.mapfile(<string> map line)
\stopfunctioncall

These two functions can be used to replace primitives \type{\pdfmapfile}
and \type{\pdfmapline} from \PDFTEX. They expect a string as only parameter
and have no return value.

The also functions replace the former variables
\luatex{pdf.pdfmapfile} and \luatex{pdf.pdfmapline}.

%***********************************************************************

\subsection{\luatex{pdf.catalog}, \luatex{pdf.info},
    \luatex{pdf.names}, \luatex{pdf.trailer} (new in 0.53.0)}

These variables offer a read-write interface to the corresponding
\PDFTEX\ token lists. The value types are strings.

The corresponding \quote{\type{pdf}} parameter names
\luatex{pdf.pdfcatalog}, \luatex{pdf.pdfinfo}, \luatex{pdf.pdfnames},
and \luatex{pdf.pdftrailer} (all new in 0.47.0)
still work, but are obsolescent (since 0.53.0).

Note: this interface will almost certainly change in the future.

%***********************************************************************

\subsection{\luatex{pdf.pageattributes}, \luatex{pdf.pageresources},
    \luatex{pdf.pagesattributes} (new in 0.53.0)}

These variables offer a read-write interface to related
token lists. The value types are strings. The variables have no
interaction with the corresponding \PDFTEX\ token registers
\tex{pdfpageattr}, \tex{pdfpageresources}, and \tex{pdfpagesattr},
but they are written out to the \PDF\ file directly after
the \PDFTEX\ token registers.

%***********************************************************************

\subsection{\luatex{pdf.h}, \luatex{pdf.v}}

These are the \type{h} and \type{v} values
that define the current location on the output page,
measured from its lower left corner.
The values can be queried % and set
using scaled points as units.

%\starttyping
%pdf.h
%pdf.v
%\stoptyping

Note: this interface will almost certainly change in the future.

% not implemented yet:
% \subsection{\luatex{pdf.seth()}, \luatex{pdf.setv()}}
%
% The function calls for position setting,
% associated with \type{pdf.h} and \type{pdf.v} are
%
% \startfunctioncall
% pdf.seth(<number> n)
% <number> n = pdf.h
% pdf.setv(<number> n)
% <number> n = pdf.v
% \stopfunctioncall

\subsection{\luatex{pdf.print}}

A print function to write stuff to the \PDF\ document
that can be used from within a \tex{latelua} argument.
This function is not to be used inside \tex{directlua}
unless you know {\it exactly} what you are doing.

\startfunctioncall
pdf.print(<string> s)
pdf.print(<string> type, <string> s)
\stopfunctioncall

The optional parameter can be used to mimic the behavior of
\tex{pdfliteral}: the \type{type} is \type{direct} or \type{page}.

\subsection{\luatex{pdf.immediateobj}}

This function creates a \PDF\ object
and immediately writes it to the \PDF\ file.
It is modelled after \PDFTEX's \tex{immediate}\tex{pdfobj} primitives.
All function variants return the object number
of the newly generated object.

\startfunctioncall
<number> n = pdf.immediateobj(<string> objtext)
<number> n = pdf.immediateobj("file", <string> filename)
<number> n = pdf.immediateobj("stream", <string> streamtext, <string> attrtext)
<number> n = pdf.immediateobj("streamfile", <string> filename, <string> attrtext)
\stopfunctioncall

The first version puts the \type{objtext} raw into an object.
Only the object wrapper is automatically generated,
but any internal structure (like \type{<< >>} dictionary markers)
needs to provided by the user.
The second version with keyword \type{"file"} as 1st argument
puts the contents of the file with name \type{filename} raw into the object.
The third version with keyword \type{"stream"} creates a stream object
and puts the \type{streamtext} raw into the stream.
The stream length is automatically calculated.
The optional \type{attrtext} goes into the dictionary of that object.
The fourth version with keyword \type{"streamfile"} does the same as the 3rd one,
it just reads the stream data raw from a file.

An optional first argument can be given to make the function use a
previously reserved \PDF\ object.

\startfunctioncall
<number> n = pdf.immediateobj(<integer> n, <string> objtext)
<number> n = pdf.immediateobj(<integer> n, "file", <string> filename)
<number> n = pdf.immediateobj(<integer> n, "stream", <string> streamtext, <string> attrtext)
<number> n = pdf.immediateobj(<integer> n, "streamfile", <string> filename, <string> attrtext)
\stopfunctioncall

%***********************************************************************

\subsection{\luatex{pdf.obj}}

This function creates a \PDF\ object,
which is written to the \PDF\ file only when referenced,
e.\,g., by \luatex{pdf.refobj()}.

All function variants return the object number of the newly generated
object, and there are two separate calling modes.

The first mode is modelled after \PDFTEX's \tex{pdfobj} primitive.

\startfunctioncall
<number> n = pdf.obj(<string> objtext)
<number> n = pdf.obj("file", <string> filename)
<number> n = pdf.obj("stream", <string> streamtext, <string> attrtext)
<number> n = pdf.obj("streamfile", <string> filename, <string> attrtext)
\stopfunctioncall

An optional first argument can be given to make the function use a
previously reserved \PDF\ object.

\startfunctioncall
<number> n = pdf.obj(<integer> n, <string> objtext)
<number> n = pdf.obj(<integer> n, "file", <string> filename)
<number> n = pdf.obj(<integer> n, "stream", <string> streamtext, <string> attrtext)
<number> n = pdf.obj(<integer> n, "streamfile", <string> filename, <string> attrtext)
\stopfunctioncall

The second mode accepts a single argument table with key--value pairs.

\startfunctioncall
<number> n = pdf.obj{ type = <string>,
                      immmediate = <boolean>,
                      objnum = <number>,
                      attr = <string>,
                      compresslevel = <number>,
                      objcompression = <boolean>,
                      file = <string>,
                      string = <string>}
\stopfunctioncall

The \type{type} field can have the values \type{raw} and
\type{stream}, this field is required, the others are optional
(within constraints).

Note: this mode makes \type{pdf.obj} look more flexible than it
actually is: the constraints from the separate parameter version
still apply, so for example you can't have both \type{string} and
\type{file} at the same time.

%***********************************************************************

\subsection{\luatex{pdf.refobj}}

This function,
the \LUA\ version of the \tex{pdfrefobj} primitive,
references an object by its object number,
so that the object will be written out.

\startfunctioncall
pdf.refobj(<integer> n)
\stopfunctioncall

This function works in both the \tex{directlua} and \tex{latelua} environment.
Inside \tex{directlua} a new whatsit node
\quote{pdf_refobj} is created, which will be marked for flushing during
page output and the object is then written directly after the page,
when also the resources objects are written out.
Inside \tex{latelua} the object will be marked for flushing.

This function has no return values.

%***********************************************************************

\subsection{\luatex{pdf.reserveobj}}

This function creates an empty \PDF\ object and returns its number.

\startfunctioncall
<number> n = pdf.reserveobj()
<number> n = pdf.reserveobj("annot")
\stopfunctioncall

\subsection{\luatex{pdf.registerannot} (new in 0.47.0)}

This function adds an object number to the \type{/Annots} array for the
current page without doing anything else. This function can only be
used from within \type{\latelua}.

\startfunctioncall
pdf.registerannot (<number> objnum)
\stopfunctioncall


\section{The \luatex{status} library}

This contains a number of run|-|time configuration items that
you may find useful in message reporting, as well as an iterator
function that gets all of the names and values as a table.

\startfunctioncall
<table> info = status.list()
\stopfunctioncall

The keys in the table are the known items, the value is the
current value. Almost all of the values in \type{status} are
fetched through a metatable at run|-|time whenever they are
accessed, so you cannot use \type{pairs} on \type{status}, but you
{\it can\/} use \type{pairs} on \type{info}, of course. If you do
not need the full list, you can also ask for a single item by
using its name as an index into \type{status}.

The current list is:

\starttabulate[|lT|p|]
\NC \ssbf key    \NC  \bf explanation \NC\NR
\NC pdf_gone\NC                  written \PDF\ bytes      \NC \NR
\NC pdf_ptr\NC                   not yet written \PDF\ bytes      \NC \NR
\NC dvi_gone\NC                  written \DVI\ bytes      \NC \NR
\NC dvi_ptr\NC                   not yet written \DVI\ bytes      \NC \NR
\NC total_pages\NC               number of written pages      \NC \NR
\NC output_file_name\NC          name of the \PDF\ or \DVI\ file      \NC \NR
\NC log_name\NC                  name of the log file      \NC \NR
\NC banner\NC                    terminal display banner      \NC \NR
\NC var_used\NC                  variable (one|-|word) memory in use \NC \NR
\NC dyn_used\NC                  token (multi|-|word) memory in use  \NC \NR
\NC str_ptr\NC                   number of strings      \NC \NR
\NC init_str_ptr\NC              number of \INITEX\ strings      \NC \NR
\NC max_strings\NC               maximum allowed strings      \NC \NR
\NC pool_ptr\NC                  string pool index      \NC \NR
\NC init_pool_ptr\NC             \INITEX\ string pool index      \NC \NR
\NC pool_size\NC                 current size allocated for string characters \NC \NR
\NC node_mem_usage\NC            a string giving insight into currently used nodes\NC\NR
\NC var_mem_max\NC               number of allocated words for nodes\NC \NR
\NC fix_mem_max\NC               number of allocated words for tokens\NC \NR
\NC fix_mem_end\NC               maximum number of used tokens\NC \NR
\NC cs_count\NC                  number of control sequences      \NC \NR
\NC hash_size\NC                 size of hash       \NC \NR
\NC hash_extra\NC                extra allowed hash  \NC \NR
\NC font_ptr\NC                  number of active fonts      \NC \NR
\NC max_in_stack\NC              max used input stack entries      \NC \NR
\NC max_nest_stack\NC            max used nesting stack entries     \NC \NR
\NC max_param_stack\NC           max used parameter stack entries     \NC \NR
\NC max_buf_stack\NC             max used buffer position      \NC \NR
\NC max_save_stack\NC            max used save stack entries      \NC \NR
\NC stack_size\NC                input stack size      \NC \NR
\NC nest_size\NC                 nesting stack size      \NC \NR
\NC param_size\NC                parameter stack size      \NC \NR
\NC buf_size\NC                  current allocated size of the line buffer \NC \NR
\NC save_size\NC                 save stack size      \NC \NR
\NC obj_ptr\NC                   max \PDF\ object pointer      \NC \NR
\NC obj_tab_size\NC              \PDF\ object table size      \NC \NR
\NC pdf_os_cntr\NC               max \PDF\ object stream pointer      \NC \NR
\NC pdf_os_objidx\NC             \PDF\ object stream index \NC \NR
\NC pdf_dest_names_ptr\NC        max \PDF\ destination pointer       \NC \NR
\NC dest_names_size\NC           \PDF\ destination table size      \NC \NR
\NC pdf_mem_ptr\NC               max \PDF\ memory used      \NC \NR
\NC pdf_mem_size\NC              \PDF\ memory size      \NC \NR
\NC largest_used_mark\NC         max referenced marks class        \NC \NR
\NC filename\NC                  name of the current input file    \NC \NR
\NC inputid\NC                   numeric id of the current input    \NC \NR
\NC linenumber\NC                location in the current input file\NC \NR
\NC lasterrorstring\NC           last error string\NC \NR
\NC luabytecodes\NC              number of active \LUA\ bytecode registers\NC \NR
\NC luabytecode_bytes\NC         number of bytes in \LUA\ bytecode registers\NC \NR
\NC luastate_bytes\NC            number of bytes in use by \LUA\ interpreters\NC \NR
\NC output_active\NC             \type{true} if the \tex{output} routine is active\NC \NR
\NC callbacks\NC                 total number of executed callbacks so far\NC \NR
\NC indirect_callbacks\NC        number of those that were themselves
                                 a result of other callbacks (e.g. file readers)\NC \NR
\NC luatex_svn\NC                the luatex repository id  (added in 0.51)\NC\NR
\NC luatex_version\NC            the luatex version number (added in 0.38)\NC\NR
\NC luatex_revision\NC           the luatex revision string (added in 0.38)\NC\NR
\NC ini_version\NC               \type{true} if this is an \INITEX\ run (added in 0.38)\NC\NR
\stoptabulate


\section{The \luatex{tex} library}

The \luatex{tex} table contains a large list of virtual internal \TEX\
parameters that are partially writable.

The designation \quote{virtual} means that these items are not properly
defined in \LUA, but are only front\-ends that are handled by a metatable
that operates on the actual \TEX\ values. As a result, most of the \LUA\
table operators (like \type{pairs} and \type{#}) do not work on such
items.

At the moment, it is possible to access almost every parameter
that has these characteristics:

\startitemize[packed]
\item You can use it after \tex{the}
\item It is a single token.
\item Some special others, see the list below
\stopitemize

This excludes parameters that need extra arguments, like
\tex{the}\tex{scriptfont}.

The subset comprising simple integer and dimension registers are
writable as well as readable (stuff like \tex{tracingcommands} and
\tex{parindent}).

\subsection{Internal parameter values}

For all the parameters in this section, it is possible to access them
directly using their names as index in the \type{tex} table, or by
using one of the functions \type{tex.get()} and \type{tex.set()}.

The exact parameters and return values differ depending on the actual
parameter, and so does whether \type{tex.set} has any effect. For the
parameters that {\it can\/} be set, it is possible to use
\type{'global'} as the first argument to \type{tex.set}; this makes
the  assignment global instead of local.

\startfunctioncall
tex.set (<string> n, ...)
tex.set ('global', <string> n, ...)
... = tex.get (<string> n)
\stopfunctioncall

\subsubsection{Integer parameters}

The integer parameters accept and return \LUA\ numbers.

Read-write:

\startcolumns[n=2]
\starttyping
tex.adjdemerits
tex.binoppenalty
tex.brokenpenalty
tex.catcodetable
tex.clubpenalty
tex.day
tex.defaulthyphenchar
tex.defaultskewchar
tex.delimiterfactor
tex.displaywidowpenalty
tex.doublehyphendemerits
tex.endlinechar
tex.errorcontextlines
tex.escapechar
tex.exhyphenpenalty
tex.fam
tex.finalhyphendemerits
tex.floatingpenalty
tex.globaldefs
tex.hangafter
tex.hbadness
tex.holdinginserts
tex.hyphenpenalty
tex.interlinepenalty
tex.language
tex.lastlinefit
tex.lefthyphenmin
tex.linepenalty
tex.localbrokenpenalty
tex.localinterlinepenalty
tex.looseness
tex.mag
tex.maxdeadcycles
tex.month
tex.newlinechar
tex.outputpenalty
tex.pausing
tex.pdfadjustspacing
tex.pdfcompresslevel
tex.pdfdecimaldigits
tex.pdfgamma
tex.pdfgentounicode
tex.pdfimageapplygamma
tex.pdfimagegamma
tex.pdfimagehicolor
tex.pdfimageresolution
tex.pdfinclusionerrorlevel
tex.pdfminorversion
tex.pdfobjcompresslevel
tex.pdfoutput
tex.pdfpagebox
tex.pdfpkresolution
tex.pdfprotrudechars
tex.pdftracingfonts
tex.pdfuniqueresname
tex.postdisplaypenalty
tex.predisplaydirection
tex.predisplaypenalty
tex.pretolerance
tex.relpenalty
tex.righthyphenmin
tex.savinghyphcodes
tex.savingvdiscards
tex.showboxbreadth
tex.showboxdepth
tex.time
tex.tolerance
tex.tracingassigns
tex.tracingcommands
tex.tracinggroups
tex.tracingifs
tex.tracinglostchars
tex.tracingmacros
tex.tracingnesting
tex.tracingonline
tex.tracingoutput
tex.tracingpages
tex.tracingparagraphs
tex.tracingrestores
tex.tracingscantokens
tex.tracingstats
tex.uchyph
tex.vbadness
tex.widowpenalty
tex.year
\stoptyping
\stopcolumns

Read|-|only:

\startcolumns[n=3]
\starttyping
tex.deadcycles
tex.insertpenalties
tex.parshape
tex.prevgraf
tex.spacefactor
\stoptyping
\stopcolumns

\subsubsection{Dimension parameters}

The dimension parameters accept \LUA\ numbers (signifying scaled points)
or strings (with included dimension). The result is always a number in
scaled points.

Read|-|write:

\startcolumns[n=3]
\starttyping
tex.boxmaxdepth
tex.delimitershortfall
tex.displayindent
tex.displaywidth
tex.emergencystretch
tex.hangindent
tex.hfuzz
tex.hoffset
tex.hsize
tex.lineskiplimit
tex.mathsurround
tex.maxdepth
tex.nulldelimiterspace
tex.overfullrule
tex.pagebottomoffset
tex.pageheight
tex.pageleftoffset
tex.pagerightoffset
tex.pagetopoffset
tex.pagewidth
tex.parindent
tex.pdfdestmargin
tex.pdfeachlinedepth
tex.pdfeachlineheight
tex.pdffirstlineheight
tex.pdfhorigin
tex.pdflastlinedepth
tex.pdflinkmargin
tex.pdfpageheight
tex.pdfpagewidth
tex.pdfpxdimen
tex.pdfthreadmargin
tex.pdfvorigin
tex.predisplaysize
tex.scriptspace
tex.splitmaxdepth
tex.vfuzz
tex.voffset
tex.vsize
\stoptyping
\stopcolumns

Read|-|only:

\startcolumns[n=3]
\starttyping
tex.pagedepth
tex.pagefilllstretch
tex.pagefillstretch
tex.pagefilstretch
tex.pagegoal
tex.pageshrink
tex.pagestretch
tex.pagetotal
tex.prevdepth
\stoptyping
\stopcolumns

\subsubsection{Direction parameters}

The direction parameters are read|-|only and return a \LUA\ string.

\startcolumns[n=3]
\starttyping
tex.bodydir
tex.mathdir
tex.pagedir
tex.pardir
tex.textdir
\stoptyping
\stopcolumns

\subsubsection{Glue parameters}

The glue parameters accept and return a userdata object that
represents a \type{glue_spec} node.

\startcolumns[n=3]
\starttyping
tex.abovedisplayshortskip
tex.abovedisplayskip
tex.baselineskip
tex.belowdisplayshortskip
tex.belowdisplayskip
tex.leftskip
tex.lineskip
tex.parfillskip
tex.parskip
tex.rightskip
tex.spaceskip
tex.splittopskip
tex.tabskip
tex.topskip
tex.xspaceskip
\stoptyping
\stopcolumns

\subsubsection{Muglue parameters}

All muglue parameters are to be used read|-|only and return a \LUA\ string.

\startcolumns[n=3]
\starttyping
tex.medmuskip
tex.thickmuskip
tex.thinmuskip
\stoptyping
\stopcolumns

\subsubsection{Tokenlist parameters}

The tokenlist parameters accept and return \LUA\ strings. \LUA\ strings are
converted to and from token lists using \tex{the}\tex{toks} style
expansion: all category codes are either space (10) or other (12).
It follows that assigning to some of these, like \quote{tex.output},
is actually useless, but it feels bad to make exceptions in view
of a coming extension that will accept full-blown token strings.

\startcolumns[n=3]
\starttyping
tex.errhelp
tex.everycr
tex.everydisplay
tex.everyeof
tex.everyhbox
tex.everyjob
tex.everymath
tex.everypar
tex.everyvbox
tex.output
tex.pdfpageattr
tex.pdfpageresources
tex.pdfpagesattr
tex.pdfpkmode
\stoptyping
\stopcolumns


\subsection{Convert commands}

All \quote{convert} commands are read|-|only and return a \LUA\ string.
The supported commands at this moment are:

\startcolumns[n=2]
\starttyping
tex.AlephVersion
tex.Alephrevision
tex.OmegaVersion
tex.Omegarevision
tex.eTeXVersion
tex.eTeXrevision
tex.formatname
tex.jobname
tex.luatexrevision
tex.luatexdatestamp
tex.pdfnormaldeviate
tex.pdftexbanner
tex.pdftexrevision
tex.fontname(number)
tex.pdffontname(number)
tex.pdffontobjnum(number)
tex.pdffontsize(number)
tex.uniformdeviate(number)
tex.number(number)
tex.romannumeral(number)
tex.pdfpageref(number)
tex.pdfxformname(number)
tex.fontidentifier(number)
\stoptyping
\stopcolumns

If you are wondering why this list looks haphazard; these are all the
cases of the \quote{convert} internal command that do not require an
argument, as well as the ones that require only a simple numeric
value.

The special (lua-only) case of \type{tex.fontidentifier} returns the
\type{csname} string that matches a font id number (if there is one).

\subsection{Last item commands}

All \quote{last item} commands are read|-|only and return a number.

The supported commands at this moment are:

\startcolumns[n=3]
\starttyping
tex.lastpenalty
tex.lastkern
tex.lastskip
tex.lastnodetype
tex.inputlineno
tex.pdftexversion
tex.pdflastobj
tex.pdflastxform
tex.pdflastximage
tex.pdflastximagepages
tex.pdflastannot
tex.pdflastxpos
tex.pdflastypos
tex.pdfrandomseed
tex.pdflastlink
tex.luatexversion
tex.Alephversion
tex.Omegaversion
tex.Alephminorversion
tex.Omegaminorversion
tex.eTeXminorversion
tex.eTeXversion
tex.currentgrouplevel
tex.currentgrouptype
tex.currentiflevel
tex.currentiftype
tex.currentifbranch
tex.pdflastximagecolordepth
\stoptyping
\stopcolumns

\subsection{Attribute, count, dimension, skip and token registers}

\TEX's attributes (\tex{attribute}), counters (\tex{count}),
dimensions (\tex{dimen}), skips (\tex{skip}) and token (\tex{toks})
registers can be accessed and written to using two times five virtual
sub|-|tables of the \luatex{tex} table:

\startcolumns[n=3]
\starttyping
tex.attribute
tex.count
tex.dimen
tex.skip
tex.toks
\stoptyping
\stopcolumns

It is possible to use the names of relevant \tex{attributedef}, \tex{countdef},
\tex{dimendef}, \tex{skipdef},  or \tex{toksdef} control sequences as indices
to these tables:

\starttyping
tex.count.scratchcounter = 0
enormous = tex.dimen['maxdimen']
\stoptyping

In this case, \LUATEX\ looks up the value for you on the fly. You have
to use a valid \tex{countdef} (or \tex{attributedef}, or
\tex{dimendef}, or \tex{skipdef}, or \tex{toksdef}), anything else
will generate an error (the intent is to eventually also allow
\type{<chardef tokens>} and even macros that expand into a number).

The attribute and count registers accept and return \LUA\ numbers.

The dimension registers accept \LUA\ numbers (in scaled points) or
strings (with an included absolute dimension; \type {em} and \type {ex} and \type {px}
are forbidden). The result is always a number in scaled points.

The token registers accept and return \LUA\ strings. \LUA\ strings are
converted to and from token lists using \tex{the}\tex{toks} style
expansion: all category codes are either space (10) or other (12).

The skip registers accept and return \type{glue_spec} userdata node
objects (see the description of the node interface elsewhere in this
manual).

As an alternative to array addressing, there are also accessor
functions defined for all cases, for example, here is the set
of possibilities for \type{\skip} registers:

\startfunctioncall
tex.setskip (<number> n, <node> s)
tex.setskip (<string> s, <node> s)
tex.setskip ('global',<number> n, <node> s)
tex.setskip ('global',<string> s, <node> s)
<node> s = tex.getskip (<number> n)
<node> s = tex.getskip (<string> s)
\stopfunctioncall

In the function-based interface, it is possible to define values
globally by using the string \type{'global'} as the first function argument.

\subsection{Character code registers (0.63)}

\TEX's character code tables (\tex{lccode}, \tex{uccode},
\tex{sfcode}, \tex{catcode}, \tex{mathcode}, \tex{delcode}) can be
accessed and written to using six virtual subtables of the \type{tex}
table

\startcolumns[n=3]
\starttyping
tex.lccode
tex.uccode
tex.sfcode
tex.catcode
tex.mathcode
tex.delcode
\stoptyping
\stopcolumns

The function call interfaces are roughly as above, but there are a few twists.
\type{sfcode}s are the simple ones:

\startfunctioncall
tex.setsfcode (<number> n, <number> s)
tex.setsfcode ('global', <number> n, <number> s)
<number> s = tex.getsfcode (<number> n)
\stopfunctioncall

The function call interface for \type{lccode} and \type{uccode} additionally allows you to set the associated sibling at the same time:

\startfunctioncall
tex.setlccode (['global'], <number> n, <number> lc)
tex.setlccode (['global'], <number> n, <number> lc, <number> uc)
<number> lc = tex.getlccode (<number> n)
tex.setuccode (['global'], <number> n, <number> uc)
tex.setuccode (['global'], <number> n, <number> uc, <number> lc)
<number> uc = tex.getuccode (<number> n)
\stopfunctioncall

The function call interface for \type{catcode} also allows you to
specify a category table to use on assignment or on query (default in
both cases is the current one):

\startfunctioncall
tex.setcatcode (['global'], <number> n, <number> c)
tex.setcatcode (['global'], <number> cattable, <number> n, <number> c)
<number> lc = tex.getcatcode (<number> n)
<number> lc = tex.getcatcode (<number> cattable, <number> n)
\stopfunctioncall


The interfaces for \type{delcode} and \type{mathcode} use small array tables to
set and retrieve values:

\startfunctioncall
tex.setmathcode (['global'], <number> n, <table> mval )
<table> mval = tex.getmathcode (<number> n)
tex.setdelcode (['global'], <number> n, <table> dval )
<table> dval = tex.getdelcode (<number> n)
\stopfunctioncall

Where the table for \type{mathcode} is an array of 3 numbers, like this:

\starttyping
{<number> mathclass, <number> family, <number> character}
\stoptyping

And the table for \type{delcode} is an array with 4 numbers, like this:

\starttyping
{<number> small_fam, <number> small_char, <number> large_fam, <number> large_char}
\stoptyping

Normally, the third and fourth values in a delimiter code assignment
will be zero according to \tex{Udelcode} usage, but the returned table can have
values there (if the delimiter code was set using \type{\delcode}, for
example). Unset \type{delcode}'s can be recognized because
\type{dval[1]} is $-1$.

\subsection{Box registers}

It is possible to set and query actual boxes, using the node
interface as defined in the \luatex{node} library:

\starttyping
tex.box
\stoptyping

for array access, or

\starttyping
tex.setbox(<number> n, <node> s)
tex.setbox('global', <number> n, <node> s)
<node> n = tex.getbox(<number> n)
\stoptyping

for function|-|based access.
In the function-based interface, it is possible to define values
globally by using the string \type{'global'} as the first function argument.

Be warned that an assignment like

\starttyping
tex.box[0] = tex.box[2]
\stoptyping

does not copy the node list, it just duplicates a node pointer.  If
\tex{box2} will be cleared by \TEX\ commands later on, the contents
of \tex{box0} becomes invalid as well. To prevent this from
happening, always use \luatex{node.copy_list()} unless you are
assigning to a temporary variable:

\starttyping
tex.box[0] = node.copy_list(tex.box[2])
\stoptyping

%{\bf note: In previous versions of \LUATEX\ there were also three
%virtual tables called \type{tex.wd}, \type{tex.ht}, and \type{tex.dp}
%along with an associated function call interface. These were
%removed in version 0.63. You should switch to using \type{tex.box[].width}
%etc. instead.}
%
%If for some reason you want the functionality of these tables back,
%you can add \LUA\ code to do that for you, like this:
%
%\starttyping
%local box = tex.box
%
%local wd = {
%    __index    = function(t,k)   local bk = box[k] return bk and bk.width or 0 end,
%    __newindex = function(t,k,v) local bk = box[k] if bk then bk.width = v end end,
%}
%local ht = {
%    __index    = function(t,k)   local bk = box[k] return bk and bk.height or 0 end,
%    __newindex = function(t,k,v) local bk = box[k] if bk then bk.height = v end end,
%}
%local dp = {
%    __index    = function(t,k)   local bk = box[k] return bk and bk.depth or 0 end,
%    __newindex = function(t,k,v) local bk = box[k] if bk then bk.depth = v end end,
%}
%
%tex.wd = { } setmetatable(tex.wd,wd)
%tex.ht = { } setmetatable(tex.ht,ht)
%tex.dp = { } setmetatable(tex.dp,dp)
%\stoptyping


\subsection{Math parameters}

It is possible to set and query the internal math parameters
using:

\startfunctioncall
tex.setmath(<string> n, <string> t, <number> n)
tex.setmath('global', <string> n, <string> t, <number> n)
<number> n = tex.getmath(<string> n, <string> t)
\stopfunctioncall

As before an optional first parameter \type{'global'} indicates a
global assignment.

The first string is the parameter name minus the leading \quote{Umath},
and the second string is the style name minus the trailing \quote{style}.

Just to be complete, the values for the math parameter name are:

\starttyping
quad                axis               operatorsize
overbarkern         overbarrule        overbarvgap
underbarkern        underbarrule       underbarvgap
radicalkern         radicalrule        radicalvgap
radicaldegreebefore radicaldegreeafter radicaldegreeraise
stackvgap           stacknumup         stackdenomdown
fractionrule        fractionnumvgap    fractionnumup
fractiondenomvgap   fractiondenomdown  fractiondelsize
limitabovevgap      limitabovebgap     limitabovekern
limitbelowvgap      limitbelowbgap     limitbelowkern
underdelimitervgap  underdelimiterbgap
overdelimitervgap   overdelimiterbgap
subshiftdrop        supshiftdrop       subshiftdown
subsupshiftdown     subtopmax          supshiftup
supbottommin        supsubbottommax    subsupvgap
spaceafterscript    connectoroverlapmin
ordordspacing       ordopspacing       ordbinspacing     ordrelspacing
ordopenspacing      ordclosespacing    ordpunctspacing   ordinnerspacing
opordspacing        opopspacing        opbinspacing      oprelspacing
opopenspacing       opclosespacing     oppunctspacing    opinnerspacing
binordspacing       binopspacing       binbinspacing     binrelspacing
binopenspacing      binclosespacing    binpunctspacing   bininnerspacing
relordspacing       relopspacing       relbinspacing     relrelspacing
relopenspacing      relclosespacing    relpunctspacing   relinnerspacing
openordspacing      openopspacing      openbinspacing    openrelspacing
openopenspacing     openclosespacing   openpunctspacing  openinnerspacing
closeordspacing     closeopspacing     closebinspacing   closerelspacing
closeopenspacing    closeclosespacing  closepunctspacing closeinnerspacing
punctordspacing     punctopspacing     punctbinspacing   punctrelspacing
punctopenspacing    punctclosespacing  punctpunctspacing punctinnerspacing
innerordspacing     inneropspacing     innerbinspacing   innerrelspacing
inneropenspacing    innerclosespacing  innerpunctspacing innerinnerspacing
\stoptyping

The values for the style parameter name are:

\starttyping
display       crampeddisplay
text          crampedtext
script        crampedscript
scriptscript  crampedscriptscript
\stoptyping


\subsection{Special list heads}

The virtual table  \luatex{tex.lists} contains the set of internal
registers that keep track of building page lists.


\starttabulate[|lT|p|]
\NC \bf field          \NC \bf description \NC \NR
\NC page_ins_head      \NC  circular list of pending insertions \NC \NR
\NC contrib_head       \NC  the recent contributions \NC \NR
\NC page_head          \NC  the current page content\NC \NR
%\NC temp_head          \NC  \NC \NR
\NC hold_head          \NC used for held-over items for next page\NC \NR
\NC adjust_head        \NC head of the current \tex{vadjust} list \NC \NR
\NC pre_adjust_head    \NC head of the current \tex{vadjust pre} list\NC \NR
% \NC align_head         \NC  \NC \NR
\stoptabulate

\subsection{Semantic nest levels (0.51)}

The virtual table \luatex{tex.nest} contains the currently active
semantic nesting state. It has two main parts: a zero-based array of
userdata for the semantic nest itself, and the numerical value
\type{tex.nest.ptr}, which gives the highest available index. Neither
the array items in \type{tex.nest[]} nor \type{tex.nest.ptr} can be
assigned to (as this would confuse the typesetting engine beyond
repair), but you can assign to the individual values inside the array
items, e.g. \type{tex.nest[tex.nest.ptr].prevdepth}.

\type{tex.nest[tex.nest.ptr]} is the current nest state, \type{tex.nest[0]}
the outermost (main vertical list) level.

The known fields are:

\starttabulate[|lT|l|l|p|]
\NC \ssbf key   \NC \bf type  \NC \bf modes  \NC \bf explanation \NC\NR
\NC mode        \NC number    \NC all        \NC The current mode. This is a number representing the
                                                 main mode at this level:\crlf
                                                 0 == no mode (this happens during \type{\write})\crlf
                                                 1 == vertical,\crlf
                                                 127 = horizontal,\crlf
                                                 253 = display math.\crlf
                                                 $-1$ == internal vertical,\crlf
                                                 $-127$ = restricted horizontal,\crlf
                                                 $-253$ = inline math.\NC\NR
\NC modeline    \NC number    \NC all        \NC source input line where this mode was entered in,
                                                 negative inside the output routine.\NC\NR
\NC head        \NC node      \NC all        \NC the head of the current list\NC\NR
\NC tail        \NC node      \NC all        \NC the tail of the current list\NC\NR
\NC prevgraf    \NC number    \NC vmode      \NC number of lines in the previous paragraph\NC\NR
\NC prevdepth   \NC number    \NC vmode      \NC depth of the previous paragraph (equal to \type{\pdfignoreddimen}
                                                 when it is to be ignored)\NC\NR
\NC spacefactor \NC number    \NC hmode      \NC the current space factor\NC\NR
\NC dirs        \NC node      \NC hmode      \NC used for temporary storage by the line break algorithm\NC\NR
\NC noad        \NC node      \NC mmode      \NC used for temporary storage of a pending fraction numerator,
                                                 for \type{\over} etc.\NC\NR
\NC delimptr    \NC node      \NC mmode      \NC used for temporary storage of the previous math delimiter,
                                                 for \type{\middle}.\NC\NR
\NC mathdir     \NC boolean   \NC mmode      \NC true when during math processing the \type{\mathdir} is not
                                                 the same as the surrounding \type{\textdir}\NC\NR
\NC mathstyle   \NC number    \NC mmode      \NC the current \type{\mathstyle} \NC\NR
\stoptabulate


\subsection{Print functions}

The \luatex{tex} table also contains the three print functions that
are the major interface from \LUA\ scripting to \TEX.

The arguments to these three functions are all stored in an in|-|memory
virtual file that is fed to the \TEX\ scanner as the result of the
expansion of \tex{directlua}.

The total amount of returnable text from a \tex{directlua} command
is only limited by available system \RAM. However, each separate
printed string has to fit completely in \TEX's input buffer.

The result of using these functions from inside callbacks is undefined
at the moment.

\subsubsection{\luatex{tex.print}}

\startfunctioncall
tex.print(<string> s, ...)
tex.print(<number> n, <string> s, ...)
tex.print(<table> t)
tex.print(<number> n, <table> t)
\stopfunctioncall

Each string argument is treated by \TEX\ as a separate input line.
If there is a table argument instead of a list of strings, this has to
be a consecutive array of strings to print (the first non-string value
will stop the printing process).  This syntax was added in 0.36.

The optional parameter can be used to print the strings using the
catcode regime defined by \tex{catcodetable}~\type{n}. If \type{n} is
$-1$, the currently active catcode regime is used. If \type{n} is
$-2$, the resulting catcodes are the result of \type{\the\toks}: all
category codes are 12 (other) except for the space character, that has
category code 10 (space). Otherwise, if \type{n} is not
a valid catcode table, then it is ignored, and the currently
active catcode regime is used instead.

The very last string of the very last \luatex{tex.print()} command in a
\tex{directlua} will not have the \tex{endlinechar} appended, all
others do.

\subsubsection{\luatex{tex.sprint}}

\startfunctioncall
tex.sprint(<string> s, ...)
tex.sprint(<number> n, <string> s, ...)
tex.sprint(<table> t)
tex.sprint(<number> n, <table> t)
\stopfunctioncall

Each string argument is treated by \TEX\ as a special kind of input line
that makes it suitable for use as a partial line input mechanism:

\startitemize[packed]
\item \TEX\ does not switch to the \quote{new line} state, so
   that leading spaces are not ignored.
\item No \tex{endlinechar} is inserted.
\item Trailing spaces are not removed.

Note that this does not prevent \TEX\ itself from eating spaces as
result of interpreting the line. For example, in

\starttyping
before\directlua{tex.sprint("\\relax")tex.sprint(" inbetween")}after
\stoptyping

the space before \type{inbetween} will be gobbled as a result of
the \quote{normal} scanning of \tex{relax}.
\stopitemize

If there is a table argument instead of a list of strings, this has to
be a consecutive array of strings to print (the first non-string value
will stop the printing process).   This syntax was added in 0.36.

The optional argument sets the catcode regime, as with \type{tex.print()}.

\subsubsection{\luatex{tex.tprint}}

\startfunctioncall
tex.tprint({<number> n, <string> s, ...}, {...})
\stopfunctioncall

This function is basically a shortcut for repeated calls to
\luatex{tex.sprint(<number> n, <string> s, ...)}, once for each of
the supplied argument tables.

\subsubsection{\luatex{tex.write}}

\startfunctioncall
tex.write(<string> s, ...)
tex.write(<table> t)
\stopfunctioncall

Each string argument is treated by \TEX\ as a special kind of input
line that makes it suitable for use as a quick way to dump
information:

\startitemize
\item All catcodes on that line are either \quote{space} (for '~') or
     \quote{character} (for all others).
\item There is no \tex{endlinechar} appended.
\stopitemize

If there is a table argument instead of a list of strings, this has to
be a consecutive array of strings to print (the first non-string value
will stop the printing process).  This syntax was added in 0.36.


\subsection{Helper functions}

\subsubsection{\luatex{tex.round}}

\startfunctioncall
<number> n = tex.round(<number> o)
\stopfunctioncall

Rounds \LUA\ number \type{o}, and returns a number that is in the range
of a valid \TEX\ register value. If the number starts out of range, it
generates a \quote{number to big} error as well.

\subsubsection{\luatex{tex.scale}}

\startfunctioncall
<number> n = tex.scale(<number> o, <number> delta)
<table> n = tex.scale(table o, <number> delta)
\stopfunctioncall

Multiplies the \LUA\ numbers \type{o} and \type{delta}, and returns a
rounded number that is in the range of a valid \TEX\ register value.
In the table version, it creates a copy of the table with all numeric
top||level values scaled in that manner. If the multiplied number(s) are
of range, it generates \quote{number to big} error(s) as well.

Note: the precision of the output of this function will depend on your
computer's architecture and operating system, so use with care! An
interface to \LUATEX's internal, 100\% portable scale function will be
added at a later date.

\subsubsection{\luatex{tex.sp} (0.51)}

\startfunctioncall
<number> n = tex.sp(<number> o)
<number> n = tex.sp(<string> s)
\stopfunctioncall

Converts the number \type{o} or a string \type{s} that represents
an explicit dimension into an integer number of scaled points.

For parsing the string, the same scanning and conversion rules are used
that \LUATEX\ would use if it was scanning a dimension specifier in
its \TEX-like input language (this includes generating errors for bad
values), expect for the following:

\startitemize[n]
\item only explicit values are allowed, control sequences are not handled
\item infinite dimension units (\type{fil...}) are forbidden
\item \type{mu} units do not generate an error (but may not be useful either)
\stopitemize

\subsubsection{\luatex{tex.definefont}}

\startfunctioncall
tex.definefont(<string> csname, <number> fontid)
tex.definefont(<boolean> global, <string> csname, <number> fontid)
\stopfunctioncall

Associates \type{csname} with the internal font number \type{fontid}.
The definition is global if (and only if) \type{global} is specified
and true (the setting of \type{globaldefs} is not taken into account).


\subsubsection{\luatex{tex.error} (0.61)}

\startfunctioncall
tex.error(<string> s)
tex.error(<string> s, <table> help)
\stopfunctioncall

This creates an error somewhat like the combination of \tex{errhelp}
and \tex{errmessage} would. During this error, deletions are disabled.

The array part of the \type{help} table has to contain strings,
one for each line of error help.

\subsection[luaprimitives]{Functions for dealing with primitives }

\subsubsection{\luatex{tex.enableprimitives}}

\startfunctioncall
tex.enableprimitives(<string> prefix, <table> primitive names)
\stopfunctioncall

This function accepts a prefix string and an array of primitive names.

For each combination of \quote{prefix} and \quote{name}, the
\type{tex.enableprimitives} first verifies that \quote{name} is
an actual primitive (it must be returned by one of the
\type{tex.extraprimitives()} calls explained below, or part of
\TEX82, or \type{\directlua}). If it is not,
\type{tex.enableprimitives} does nothing and skips to the next pair.

But if it is, then it will construct a csname variable by concatenating the
\quote{prefix} and \quote{name}, unless the \quote{prefix} is already the actual
prefix of \quote{name}. In the latter case, it will discard the \quote{prefix},
and just use \quote{name}.

Then it will check for the existence of the constructed csname.
If the csname is currently undefined (note: that is not the same as
\type{\relax}), it will globally define the csname to have the
meaning: run code belonging to the primitive \quote{name}. If for some
reason the csname is already defined, it does nothing and tries the
next pair.

An example:

\starttyping
  tex.enableprimitives('LuaTeX', {'formatname'})
\stoptyping

will define \type{\LuaTeXformatname} with the same intrinsic meaning
as the documented primitive \type{\formatname}, provided that the
control sequences \type{\LuaTeXformatname} is currently undefined.

Second example:

\starttyping
  tex.enableprimitives('Omega',tex.extraprimitives ('omega'))
\stoptyping

will define a whole series of csnames like \type{\Omegatextdir},
\type{\Omegapardir}, etc., but it will stick with \type{\OmegaVersion}
instead of creating the doubly-prefixed \type{\OmegaOmegaVersion}.

Starting with version 0.39.0 (and this is why the above two functions
are needed), \LUATEX\ in \type{--ini} mode contains only the \TEX82
primitives and \type{\directlua}, no extra primitives {\bf at all}.

So, if you want to have all the new functionality available using
their default names, as it is now, you will have to add

\starttyping
  \ifx\directlua\undefined \else
     \directlua {tex.enableprimitives('',tex.extraprimitives ())}
  \fi
\stoptyping

near the beginning of your format generation file. Or you can choose
different prefixes for different subsets, as you see fit.

Calling some form of \type{tex.enableprimitives()} is highly important
though, because if you do not, you will end up with a \TEX82-lookalike
that can run lua code but not do much else. The defined csnames are
(of course) saved in the format and will be available at runtime.


\subsubsection{\luatex{tex.extraprimitives}}

\startfunctioncall
<table> t = tex.extraprimitives(<string> s, ...)
\stopfunctioncall

This function returns a list of the primitives that originate
from the engine(s) given by the requested string value(s). The
possible values and their (current) return values are:

\startluacode
function out_prim (a)
  local v = tex.extraprimitives(a)
  table.sort(v)
  for _,n in pairs(v) do
    if n == ' ' then
      n = '\\normalcontrolspace'
    end
    tex.print(n .. '\\hskip 4pt plus 5em')
  end
end
\stopluacode

\starttabulate[|l|p|]
\NC \bf name\NC \bf values \NC \NR
\NC tex     \NC \ctxlua{out_prim('tex')    } \NC \NR
\NC core    \NC \ctxlua{out_prim('core')   } \NC \NR
\NC etex    \NC \ctxlua{out_prim('etex')   } \NC \NR
\NC pdftex  \NC \ctxlua{out_prim('pdftex') } \NC \NR
\NC omega   \NC \ctxlua{out_prim('omega')  } \NC \NR
\NC aleph   \NC \ctxlua{out_prim('aleph')  } \NC \NR
\NC luatex  \NC \ctxlua{out_prim('luatex') } \NC \NR
\stoptabulate

Note that \type{'luatex'} does not contain \type{directlua}, as that is
considered to be a core primitive, along with all the \TEX82
primitives, so it is part of the list that is returned from \type{'core'}.

Running \type{tex.extraprimitives()} will give you the complete list
of primitives that are not defined at \LUATEX\ 0.39.0 \type{-ini}
startup. It is exactly equivalent to \type{tex.extraprimitives('etex',
'pdftex', 'omega', 'aleph', 'luatex')}

\subsubsection{\luatex{tex.primitives}}

\startfunctioncall
<table> t = tex.primitives()
\stopfunctioncall

This function returns a hash table listing all primitives that \LUATEX\
knows about. The keys in the hash are primitives names, the values are
tables representing tokens (see~\in{section }[luatokens]). The third value
is always zero.

\subsection{Core functionality interfaces}

\subsubsection{\luatex{tex.badness} (0.53)}

\startfunctioncall
<number> b = tex.badness(<number> f, <number> s)
\stopfunctioncall

This helper function is useful
during linebreak calculations. \type{f} and \type{s} are scaled values; the function
returns the badness for when total \type{f} is supposed to be made from amounts
that sum to \type{s}.  The returned number is a reasonable approximation of $100(t/s)^3$;

\subsubsection{\luatex{tex.linebreak} (0.53)}

\startfunctioncall
local <node> nodelist, <table> info =
       tex.linebreak(<node> listhead, <table> parameters)
\stopfunctioncall

The understood parameters are as follows:

\starttabulate[|l|l|p|]
\NC \bf name                 \NC \bf type \NC \bf description \NC \NR
\NC pardir                   \NC string   \NC \NC \NR
\NC pretolerance             \NC number   \NC \NC \NR
\NC tracingparagraphs        \NC number   \NC \NC \NR
\NC tolerance                \NC number   \NC \NC \NR
\NC looseness                \NC number   \NC \NC \NR
\NC hyphenpenalty            \NC number   \NC \NC \NR
\NC exhyphenpenalty          \NC number   \NC \NC \NR
\NC pdfadjustspacing         \NC number   \NC \NC \NR
\NC adjdemerits              \NC number   \NC \NC \NR
\NC pdfprotrudechars         \NC number   \NC \NC \NR
\NC linepenalty              \NC number   \NC \NC \NR
\NC lastlinefit              \NC number   \NC \NC \NR
\NC doublehyphendemerits     \NC number   \NC \NC \NR
\NC finalhyphendemerits      \NC number   \NC \NC \NR
\NC hangafter                \NC number   \NC \NC \NR
\NC interlinepenalty         \NC number or table \NC if a table, then it is an array like \type{\interlinepenalties}\NC \NR
\NC clubpenalty              \NC number or table \NC if a table, then it is an array like \type{\clubpenalties}\NC \NR
\NC widowpenalty             \NC number or table \NC if a table, then it is an array like \type{\widowpenalties}\NC \NR
\NC brokenpenalty            \NC number   \NC \NC \NR
\NC emergencystretch         \NC number   \NC in scaled points \NC \NR
\NC hangindent               \NC number   \NC in scaled points \NC \NR
\NC hsize                    \NC number   \NC in scaled points \NC \NR
\NC leftskip                 \NC glue_spec node   \NC \NC \NR
\NC rightskip                \NC glue_spec node   \NC \NC \NR
\NC pdfeachlineheight        \NC number   \NC in scaled points \NC \NR
\NC pdfeachlinedepth         \NC number   \NC in scaled points \NC \NR
\NC pdffirstlineheight       \NC number   \NC in scaled points \NC \NR
\NC pdflastlinedepth         \NC number   \NC in scaled points \NC \NR
\NC pdfignoreddimen          \NC number   \NC in scaled points \NC \NR
\NC parshape                 \NC table   \NC \NC \NR
\stoptabulate

Note that there is no interface for \type{\displaywidowpenalties}, you
have to pass the right choice for \type{widowpenalties} yourself.

The meaning of the various keys should be fairly obvious from the
table (the names match the \TEX\ and \PDFTEX\ primitives) except for
the last 5 entries. The four \type{pdf...line...} keys are ignored if
their value equals \type{pdfignoreddimen}.

It is your own job to make sure that \type{listhead} is a proper
paragraph list: this function does not add any nodes to it. To be
exact, if you want to replace the core line breaking, you may have to
do the following (when you are not actually working in the
\type{pre_linebreak_filter} or \type{linebreak_filter} callbacks, or when the
original list starting at listhead was generated in horizontal mode):

\startitemize
\item add an \quote{indent box} and perhaps a \type{local_par} node at
  the start (only if you need them)
\item replace any found final glue by an infinite penalty (or add such
  a penalty, if the last node is not a glue)
\item add a glue node for the \type{\parfillskip} after that penalty node
\item make sure all the \type{prev} pointers are OK
\stopitemize

The result is a node list, it still needs to be vpacked if you
want to assign it to a \tex{vbox}.


The returned \type{info} table contains four values that are all numbers:

\starttabulate[|l|p|]
\NC prevdepth   \NC depth of the last line in the broken paragraph \NC \NR
\NC prevgraf    \NC number of lines in the broken paragraph \NC \NR
\NC looseness   \NC the actual looseness value in the broken paragraph \NC \NR
\NC demerits    \NC the total demerits of the chosen solution  \NC \NR
\stoptabulate

Note there are a few things you cannot interface using this function:
You cannot influence font expansion other than via
\type{pdfadjustspacing}, because the settings for that take place
elsewhere. The same is true for hbadness and hfuzz etc. All these are
in the \type{hpack()} routine, and that fetches its own variables via
globals.

\subsubsection{\luatex{tex.shipout} (0.51)}

\startfunctioncall
tex.shipout(<number> n)
\stopfunctioncall

Ships out box number \type{n} to the output file, and clears the box
register.


\section[texconfig]{The \luatex{texconfig} table}

This is a table that is created empty. A startup \LUA\ script could
fill this table with a number of settings that are read out by
the executable after loading and executing the startup file.

\starttabulate[|lT|l|l|p|]
\NC \ssbf key      \NC \bf type     \NC \bf default \NC \bf explanation \NC\NR
\NC kpse_init \NC boolean \NC true    \NC \type{false} totally disables \KPATHSEA\ initialisation,
                                           and enables interpretation of the following numeric key--value pairs.
                                          (only ever unset this if you implement {\it all\/} file
                                          find callbacks!)\NC \NR
\NC shell_escape    \NC    string\NC \type{'f'}\NC Use \type{'y'} or \type{'t'} or \type{'1'} to enable \type{\write18} unconditionally,
                                     \type{'p'} to enable the commands that are listed in \type{shell_escape_commands} (new in 0.37)\NC\NR
\NC shell_escape_commands \NC string\NC \NC Comma-separated list of command names that may be executed by \type{\write18} even
                                        if \type{shell_escape} is set to \type{'p'}. Do {\it not\/} use spaces around commas,
                                        separate any required command arguments by using a space, and use the ASCII double quote
                                        (\type{"}) for any needed argument or path quoting  (new in 0.37)\NC\NR
\NC string_vacancies \NC   number\NC  75000\NC cf.\ web2c docs \NC \NR
\NC pool_free \NC               number\NC  5000\NC cf.\ web2c docs \NC \NR
\NC max_strings \NC        number\NC  15000\NC cf.\ web2c docs \NC \NR
\NC strings_free \NC       number\NC  100\NC cf.\ web2c docs \NC \NR
\NC nest_size \NC               number\NC  50\NC cf.\ web2c docs \NC \NR
\NC max_in_open \NC        number\NC  15\NC cf.\ web2c docs \NC \NR
\NC param_size \NC         number\NC  60\NC cf.\ web2c docs \NC \NR
\NC save_size \NC               number\NC  4000\NC cf.\ web2c docs \NC \NR
\NC stack_size \NC         number\NC  300\NC cf.\ web2c docs \NC \NR
\NC dvi_buf_size \NC       number\NC  16384\NC cf.\ web2c docs \NC \NR
\NC error_line \NC         number\NC  79\NC cf.\ web2c docs \NC \NR
\NC half_error_line \NC    number\NC  50\NC cf.\ web2c docs \NC \NR
\NC max_print_line \NC     number\NC  79\NC cf.\ web2c docs \NC \NR
\NC hash_extra \NC         number\NC  0\NC cf.\ web2c docs \NC \NR
\NC pk_dpi \NC             number\NC  72\NC cf.\ web2c docs \NC \NR
\NC trace_file_names \NC boolean \NC true \NC \type{false} disables \TEX's normal file open|-|close
                                              feedback (the assumption is that callbacks will take care of
                                              that) \NC \NR
\NC file_line_error  \NC boolean \NC false \NC do \type{file:line} style error messages\NC \NR
\NC halt_on_error    \NC boolean \NC false \NC abort run on the first encountered error\NC \NR
\NC formatname       \NC string \NC \NC if no format name was given
                                             on the commandline, this key will be tested first
                                             instead of simply quitting\NC \NR
\NC jobname          \NC string \NC \NC if no input file name was given
                                           on the commandline, this key will be tested first
                                           instead of simply giving up\NC \NR
\stoptabulate

{\bf Note:} the numeric values that match web2c parameters are only used if
\type{kpse_init} is explicitly set to \type{false}. In all other cases, the normal values from
\type{texmf.cnf} are used.

\section{The \luatex{texio} library}

This library takes care of the low|-|level I/O interface.

\subsection{Printing functions}

\subsubsection{\luatex{texio.write}}

\startfunctioncall
texio.write(<string> target, <string> s, ...)
texio.write(<string> s, ...)
\stopfunctioncall

Without the \type{target} argument, writes all given strings to the same
location(s) \TEX\ writes messages to at this moment. If
\tex{batchmode} is in effect, it writes only to the log,
otherwise  it writes to the log and the terminal.
The optional \type{target} can be one of three possibilities:
\type{term}, \type{log} or \type {term and log}.

Note: If several strings are given, and if the first of these strings
is or might be one of the targets above, the \type{target} must be
specified explicitly to prevent \LUA\ from interpreting the first
string as the target.

\subsubsection{\luatex{texio.write_nl}}

\startfunctioncall
texio.write_nl(<string> target, <string> s, ...)
texio.write_nl(<string> s, ...)
\stopfunctioncall

This function behaves like \luatex{texio.write}, but make sure that the given strings will
appear at the beginning of a new line. You can pass a single empty string
if you only want to move to the next line.

%***********************************************************************

\section[luatokens]{The \luatex{token} library}

The \luatex{token} table contains interface functions to \TEX's
handling of tokens. These functions are most useful when combined with
the \luatex{token_filter} callback, but they could be used standalone
as well.

A token is represented in \LUA\ as a small table. For the moment, this
table consists of three numeric entries:

\starttabulate[|l|l|p|]
\NC \bf index\NC \bf meaning         \NC \bf description \NC \NR
\NC 1      \NC command code        \NC this is a value between~$0$ and~$130$ (approximately)\NC \NR
\NC 2      \NC command modifier    \NC this is a value between~$0$ and~$2^{21}$ \NC \NR
\NC 3      \NC control sequence id \NC for commands that are not the result of control
                                       sequences, like letters and characters, it is zero,
                                       otherwise, it is a number pointing into the \quote
                                       {equivalence table} \NC \NR
\stoptabulate

\subsection{\luatex{token.get_next}}

\startfunctioncall
token t = token.get_next()
\stopfunctioncall

This fetches the next input token from the current input source,
without expansion.

\subsection{\luatex{token.is_expandable}}

\startfunctioncall
<boolean> b = token.is_expandable(<token> t)
\stopfunctioncall

This tests if the token \type{t} could be expanded.

\subsection{\luatex{token.expand}}

\startfunctioncall
token.expand(<token> t)
\stopfunctioncall

If a token is expandable, this will expand one level of it, so that
the first token of the expansion will now be the next token to be read
by \luatex{token.get_next()}.

\subsection{\luatex{token.is_activechar}}

\startfunctioncall
<boolean> b = token.is_activechar(<token> t)
\stopfunctioncall

This is a special test that is sometimes handy. Discovering whether
some control sequence is the result of an active character turned out
to be very hard otherwise.

\subsection{\luatex{token.create}}

\startfunctioncall
token t = token.create(<string> csname)
token t = token.create(<number> charcode)
token t = token.create(<number> charcode, <number> catcode)
\stopfunctioncall

This is the token factory. If you feed it a string, then it is the
name of a control sequence (without leading backslash), and it will be
looked up in the equivalence table.

If you feed it number, then this is assumed to be an input character,
and an optional second number gives its category code.  This means it
is possible to overrule a character's category code, with a few
exceptions: the category codes~0 (escape), 9~(ignored), 13~(active),
14~(comment), and 15 (invalid) cannot occur inside a token. The values~0, 9, 14
and~15 are therefore illegal as input to \luatex{token.create()}, and
active characters will be resolved immediately.

Note: unknown string sequences and never defined active characters
will result in a token representing an \quote{undefined control sequence}
with a near|-|random name. It is {\em not} possible to define brand
new control sequences using \luatex{token.create}!

\subsection{\luatex{token.command_name}}

\startfunctioncall
<string> commandname = token.command_name(<token> t)
\stopfunctioncall

This returns the name associated with the \quote{command} value of the token
in \LUATEX. There is not always a direct connection between these names and
primitives. For instance, all \tex{ifxxx} tests are grouped under
\type {if_test}, and the \quote{command modifier} defines which test is to be run.

\subsection{\luatex{token.command_id}}

\startfunctioncall
<number> i = token.command_id(<string> commandname)
\stopfunctioncall

This returns a number that is the inverse operation of the previous
command, to be used as the first item in a token table.

\subsection{\luatex{token.csname_name}}

\startfunctioncall
<string> csname = token.csname_name(<token> t)
\stopfunctioncall

This returns the name associated with the \quote{equivalence table} value of
the token in \LUATEX. It returns the string value of the command used
to create the current token, or an empty string if there is no
associated control sequence.

Keep in mind that there are potentially two control sequences that
return the same csname string: single character control sequences
and active characters have the same \quote{name}.

\subsection{\luatex{token.csname_id}}

\startfunctioncall
<number> i = token.csname_id(<string> csname)
\stopfunctioncall

This returns a number that is the inverse operation of the previous
command, to be used as the third item in a token table.


\chapter[math]{Math}

The handling of mathematics in \LUATEX\ differs quite a bit from how
\TEX82 (and therefore \PDFTEX) handles math. First, \LUATEX\ adds primitives and
extends some others so that \UNICODE\ input can be used easily. Second, all
of \TEX82's internal special values (for example for operator spacing) have
been made accessible and changeable via control sequences. Third, there are
extensions that make it easier to use \OPENTYPE\ math fonts. And finally,
there are some extensions that have been proposed in the past that are now
added to the engine.

\section{The current math style}

Starting with \LUATEX\ 0.39.0, it is possible to discover the math
style that will be used for a formula in an expandable fashion
(while the math list is still being read).  To make this possible,
\LUATEX\ adds the new primitive: \type{\mathstyle}. This is a
\quote{convert command} like e.g. \type{\romannumeral}: its value can
only be read, not set.

\subsection{\tex{mathstyle}}

The returned value is between 0 and 7 (in math mode), or $-1$
(all other modes). For easy testing, the eight math style commands
have been altered so that the can be used as numeric values, so you
can write code like this:

\starttyping
\ifnum\mathstyle=\textstyle
   \message{normal text style}
\else \ifnum\mathstyle=\crampedtextstyle
   \message{cramped text style}
\fi \fi
\stoptyping

\subsection{\tex{Ustack}}

There are a few math commands in \TEX\ where the style that will be used
is not known straight from the start. These commands (\tex{over},
\tex{atop}, \tex{overwithdelims}, \tex{atopwithdelims}) would
therefore normally return wrong values for \type{\mathstyle}. To
fix this, \LUATEX\ introduces a special prefix command:
\type{\Ustack}:

\starttyping
$\Ustack {a \over b}$
\stoptyping

The \type{\Ustack} command will scan the next brace and start a new
math group with the correct (numerator) math style.

\section{Unicode math characters}

Character handling is now extended up to the full \UNICODE\ range. The
extension from 8-bit to 16-bit was already present in \ALEPH\ by means of a
set of extra primitives starting with the \type{\o} prefix, the extension
to full \UNICODE\ (the \type{\U} prefix) is compatible with \XETEX.

The math primitives from \TEX\ and \ALEPH\ are kept as they are, except for
the ones that convert from input to math commands:  \type{mathcode},
\type{omathcode}, \type{delcode}, and \type{odelcode}. These four now allow
for a 21-bit character argument on the left hand side of the equals sign.

Some of the \ALEPH\ math primitives and the new \LUATEX\ primitives read
more than one separate value.  This is shown in the tables below by a plus
sign in the second column.

The input for such primitives would look like this:

\starttyping
\def\overbrace {\Umathaccent 0 1 "23DE }
\stoptyping


Altered \TEX82 primitives:

\starttabulate[|l|l|l|]
\NC \bf primitive     \NC \bf value range (in hex)  \NC\NR
\NC \tex{mathcode}    \NC 0--10FFFF = 0--8000   \NC\NR
\NC \tex{delcode}     \NC 0--10FFFF = 0--FFFFFF \NC\NR
\stoptabulate

Unaltered:

\starttabulate[|l|l|l|]
\NC \bf primitive     \NC \bf value range (in hex)  \NC\NR
\NC \tex{mathchardef} \NC 0--8000               \NC\NR
\NC \tex{mathchar}    \NC 0--7FFF               \NC\NR
\NC \tex{mathaccent}  \NC 0--7FFF               \NC\NR
\NC \tex{delimiter}   \NC 0--7FFFFFF            \NC\NR
\NC \tex{radical}     \NC 0--7FFFFFF            \NC\NR
\stoptabulate

Altered \ALEPH\ primitives:

\starttabulate[|l|l|l|]
\NC \bf primitive      \NC \bf value range (in hex)           \NC\NR
\NC \tex{omathcode}    \NC 0--10FFFF = 0--8000000         \NC\NR
\NC \tex{odelcode}     \NC 0--10FFFF = 0+0--FFFFFF+FFFFFF \NC\NR
\stoptabulate

Unaltered:

\starttabulate[|l|l|l|]
\NC \bf primitive        \NC \bf value range (in hex)           \NC\NR
\NC \tex{omathchardef}   \NC 0--8000000                     \NC\NR
\NC \tex{omathchar}      \NC 0--7FFFFFF                     \NC\NR
\NC \tex{omathaccent}    \NC 0--7FFFFFF                     \NC\NR
\NC \tex{odelimiter}     \NC 0+0--7FFFFFF + FFFFFF          \NC\NR
\NC \tex{oradical}       \NC 0+0--7FFFFFF + FFFFFF          \NC\NR
\stoptabulate

New primitives that are compatible with \XETEX:

\starttabulate[|l|l|l|l|]
\NC \bf primitive         \NC \bf value range (in hex)               \NC\NR
\NC \tex{Umathchardef}    \NC 0+0+0--7+FF+10FFFF$^1$                 \NC\NR
\NC \tex{Umathcode}       \NC 0--10FFFF = 0+0+0--7+FF+10FFFF$^1$     \NC\NR
\NC \tex{Udelcode}        \NC 0--10FFFF = 0+0--FF+10FFFF$^2$         \NC\NR
\NC \tex{Umathchar}       \NC 0+0+0--7+FF+10FFFF                     \NC\NR
\NC \tex{Umathaccent}     \NC 0+0+0--7+FF+10FFFF$^{2,4}$             \NC\NR
\NC \tex{Udelimiter}      \NC 0+0+0--7+FF+10FFFF$^2$                 \NC\NR
\NC \tex{Uradical}        \NC 0+0--FF+10FFFF$^2$                     \NC\NR
\NC \tex{Umathcharnum}    \NC -80000000--7FFFFFFF$^3$              \NC\NR
\NC \tex{Umathcodenum}    \NC 0--10FFFF = -80000000--7FFFFFFF$^3$  \NC\NR
\NC \tex{Udelcodenum}     \NC 0--10FFFF = -80000000--7FFFFFFF$^3$  \NC\NR
\stoptabulate

Note 1: \type{\Umathchardef<csname>="8"0"0} and \type{\Umathchardef<number>="8"0"0}
are also accepted.

Note 2: The new primitives that deal with delimiter-style objects do not
set up a \quote{large family}. Selecting a suitable size for display
purposes is expected to be dealt with by the font via the
\tex{Umathoperatorsize} parameter (more information a following section).

Note 3: For these three primitives, all information is packed into a single
signed integer. For the first two (\tex{Umathcharnum} and
\tex{Umathcodenum}), the lowest 21 bits are the character code, the 3
bits above that represent the math class, and the family data is kept in
the topmost bits (This means that the values for math families 128--255 are
actually negative).  For \tex{Udelcodenum} there is no math class; the
math family information is stored in the bits directly on top of the
character code. Using these three commands is not as natural as using the
two- and three-value commands, so unless you know exactly what you are
doing and absolutely require the speedup resulting from the faster input
scanning, it is better to use the verbose commands instead.

Note 4: As of \LUATEX\ 0.65, \tex{Umathaccent} accepts optional
keywords to control various details regarding math accents. See
\in{section}[mathacc] below for details.


New primitives that exist in \LUATEX\ only (all of these will be explained
in following sections):


\starttabulate[|l|l|l|l|]
\NC \bf primitive         \NC \bf value range (in hex)               \NC\NR
%\NC \tex{Umathbotaccent}  \NC 0+0+0--7+FF+10FFFF                     \NC\NR
%\NC \tex{Umathaccents}    \NC 0+0+0+0+0+0--7+FF+10FFFF+7+FF+10FFFF   \NC\NR
\NC \tex{Uroot}           \NC 0+0--FF+10FFFF$^2$                     \NC\NR
\NC \tex{Uoverdelimiter}  \NC 0+0--FF+10FFFF$^2$                     \NC\NR
\NC \tex{Uunderdelimiter} \NC 0+0--FF+10FFFF$^2$                     \NC\NR
\NC \tex{Udelimiterover}  \NC 0+0--FF+10FFFF$^2$                     \NC\NR
\NC \tex{Udelimiterunder} \NC 0+0--FF+10FFFF$^2$                     \NC\NR
\stoptabulate

\section{Cramped math styles}

\LUATEX\ has four new primitives to set the cramped math styles
directly:

\starttyping
\crampeddisplaystyle
\crampedtextstyle
\crampedscriptstyle
\crampedscriptscriptstyle
\stoptyping

These additional commands are not all that valuable on their own, but
they come in handy as arguments to the math parameter settings that
will be added shortly.

\section{Math parameter settings}

In \LUATEX, the font dimension parameters that \TEX\ used in math
typesetting are now accessible via primitive commands. In fact,
refactoring of the math engine has resulted in many more parameters
than were accessible before.

\starttabulate
\NC \bf primitive name              \NC \bf description \NC \NR
\NC \type{\Umathquad}               \NC the width of 18mu's\NC \NR
\NC \type{\Umathaxis}               \NC height of the vertical center axis of
                                        the math formula above the baseline\NC \NR
\NC \type{\Umathoperatorsize}       \NC minimum size of large operators in display mode \NC \NR
\NC \type{\Umathoverbarkern}        \NC vertical clearance above the rule \NC \NR
\NC \type{\Umathoverbarrule}        \NC the width of the rule \NC \NR
\NC \type{\Umathoverbarvgap}        \NC vertical clearance below the rule \NC \NR
\NC \type{\Umathunderbarkern}       \NC vertical clearance below the rule \NC \NR
\NC \type{\Umathunderbarrule}       \NC the width of the rule \NC \NR
\NC \type{\Umathunderbarvgap}       \NC vertical clearance above the rule \NC \NR
\NC \type{\Umathradicalkern}        \NC vertical clearance above the rule \NC \NR
\NC \type{\Umathradicalrule}        \NC the width of the rule \NC \NR
\NC \type{\Umathradicalvgap}        \NC vertical clearance below the rule \NC \NR
\NC \type{\Umathradicaldegreebefore}\NC the forward kern that takes place before placement of
                                        the radical degree \NC \NR
\NC \type{\Umathradicaldegreeafter} \NC the backward kern that takes place after placement of
                                        the radical degree \NC \NR
\NC \type{\Umathradicaldegreeraise} \NC this is the percentage of the total height and depth of
                                        the radical sign that the degree is raised by. It is
                                        expressed in \type{percents}, so 60\% is expressed as the
                                        integer $60$.\NC \NR
\NC \type{\Umathstackvgap}          \NC vertical clearance between the two
                                        elements in a \type{\atop} stack \NC \NR
\NC \type{\Umathstacknumup}         \NC numerator shift upward in \type{\atop} stack \NC \NR
\NC \type{\Umathstackdenomdown}     \NC denominator shift downward in \type{\atop} stack\NC \NR
\NC \type{\Umathfractionrule}       \NC the width of the rule in a \type{\over}\NC \NR
\NC \type{\Umathfractionnumvgap}    \NC vertical clearance between the numerator and the rule\NC \NR
\NC \type{\Umathfractionnumup}      \NC numerator shift upward in \type{\over} \NC \NR
\NC \type{\Umathfractiondenomvgap}  \NC vertical clearance between the denominator and the rule\NC \NR
\NC \type{\Umathfractiondenomdown}  \NC denominator shift downward in \type{\over} \NC \NR
\NC \type{\Umathfractiondelsize}    \NC minimum delimiter size for \type{\...withdelims}\NC \NR
\NC \type{\Umathlimitabovevgap}     \NC vertical clearance for limits above operators\NC \NR
\NC \type{\Umathlimitabovebgap}     \NC vertical baseline clearance for limits above operators\NC \NR
\NC \type{\Umathlimitabovekern}     \NC space reserved at the top of the limit\NC \NR
\NC \type{\Umathlimitbelowvgap}     \NC vertical clearance for limits below operators\NC \NR
\NC \type{\Umathlimitbelowbgap}     \NC vertical baseline clearance for limits below operators\NC \NR
\NC \type{\Umathlimitbelowkern}     \NC space reserved at the bottom of the limit\NC \NR
\NC \type{\Umathoverdelimitervgap}  \NC vertical clearance for limits above delimiters\NC \NR
\NC \type{\Umathoverdelimiterbgap}  \NC vertical baseline clearance for limits above delimiters\NC \NR
\NC \type{\Umathunderdelimitervgap} \NC vertical clearance for limits below delimiters\NC \NR
\NC \type{\Umathunderdelimiterbgap} \NC vertical baseline clearance for limits below delimiters\NC \NR
\NC \type{\Umathsubshiftdrop}       \NC subscript drop for boxes and subformulas\NC \NR
\NC \type{\Umathsubshiftdown}       \NC subscript drop for characters\NC \NR
\NC \type{\Umathsupshiftdrop}       \NC superscript drop (raise, actually) for boxes and subformulas\NC \NR
\NC \type{\Umathsupshiftup}         \NC superscript raise for characters\NC \NR
\NC \type{\Umathsubsupshiftdown}    \NC subscript drop in the presence of a superscript\NC \NR
\NC \type{\Umathsubtopmax}          \NC the top of standalone subscripts cannot be higher than this above the baseline\NC \NR
\NC \type{\Umathsupbottommin}       \NC the bottom of standalone superscripts cannot be less than this above the baseline\NC \NR
\NC \type{\Umathsupsubbottommax}    \NC the bottom of the superscript of a combined super- and subscript
                                        be at least as high as this above the baseline\NC \NR
\NC \type{\Umathsubsupvgap}         \NC vertical clearance between super- and subscript\NC \NR
\NC \type{\Umathspaceafterscript}   \NC additional space added after a super- or subscript\NC \NR
\NC \type{\Umathconnectoroverlapmin}\NC minimum overlap between parts in an extensible recipe\NC \NR
\stoptabulate

Each of the parameters in this section can be set by a command like this:

\starttyping
\Umathquad\displaystyle=1em
\stoptyping

they obey grouping, and you can use \type{\the\Umathquad\displaystyle} if needed.

\section{Font-based Math Parameters}

While it is nice to have these math parameters available for tweaking, it
would be tedious to have to set each of them by hand. For this reason,
\LUATEX\ initializes a bunch of these parameters whenever you assign a font
identifier to a math family based on either the traditional math font
dimensions in the font (for assignments to math family~2 and~3 using
\TFM|-|based fonts like \type{cmsy} and \type{cmex}), or based on the named
values in a potential \type{MathConstants} table when the font is loaded
via Lua.  If there is a \type{MathConstants} table, this takes precedence
over font dimensions, and in that case no attention is paid to which
family is being assigned to: the \type{MathConstants} tables in the last
assigned family sets all parameters.

In the table below, the one-letter style abbreviations and symbolic tfm
font dimension names match those using in the \TeX book. Assignments to
\tex{textfont} set the values for the cramped and uncramped display and
text styles. Use \tex{scriptfont} for the script styles, and
\tex{scriptscriptfont} for the scriptscript styles (totalling eight
parameters for three font sizes). In the \TFM\ case, assignments only happen
in family~2 and family~3 (and of course only for the parameters for which
there are font dimensions).

Besides the parameters below, \LUATEX\ also looks at the \quote{space}
font dimension parameter. For math fonts, this should be set to zero.

\start

\switchtobodyfont[8pt]

\starttabulate[|l|l|l|p|]
\NC \bf variable                \NC \bf style             \NC \bf default value opentype               \NC \bf default value tfm \NC\NR
\NC \tex{Umathaxis}             \NC --                    \NC AxisHeight                               \NC axis_height \NC\NR
\NC \tex{Umathoperatorsize}     \NC D, D'                 \NC DisplayOperatorMinHeight                 \NC $^6$ \NC\NR
\NC \tex{Umathfractiondelsize}  \NC D, D'                 \NC FractionDelimiterDisplayStyleSize$^9$         \NC delim1 \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC FractionDelimiterSize$^9$                \NC delim2 \NC\NR
\NC \tex{Umathfractiondenomdown}\NC D, D'                 \NC FractionDenominatorDisplayStyleShiftDown \NC denom1 \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC FractionDenominatorShiftDown             \NC denom2 \NC\NR
\NC \tex{Umathfractiondenomvgap}\NC D, D'                 \NC FractionDenominatorDisplayStyleGapMin    \NC 3*default_rule_thickness \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC FractionDenominatorGapMin                \NC default_rule_thickness \NC\NR
\NC \tex{Umathfractionnumup}    \NC D, D'                 \NC FractionNumeratorDisplayStyleShiftUp     \NC num1 \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC FractionNumeratorShiftUp                 \NC num2 \NC\NR
\NC \tex{Umathfractionnumvgap}  \NC D, D'                 \NC FractionNumeratorDisplayStyleGapMin      \NC 3*default_rule_thickness \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC FractionNumeratorGapMin                  \NC default_rule_thickness \NC\NR
\NC \tex{Umathfractionrule}     \NC --                    \NC FractionRuleThickness                    \NC default_rule_thickness \NC\NR
\NC \tex{Umathlimitabovebgap}   \NC --                    \NC UpperLimitBaselineRiseMin                \NC big_op_spacing3 \NC\NR
\NC \tex{Umathlimitabovekern}   \NC --                    \NC 0$^1$                                    \NC big_op_spacing5 \NC\NR
\NC \tex{Umathlimitabovevgap}   \NC --                    \NC UpperLimitGapMin                         \NC big_op_spacing1 \NC\NR
\NC \tex{Umathlimitbelowbgap}   \NC --                    \NC LowerLimitBaselineDropMin                \NC big_op_spacing4 \NC\NR
\NC \tex{Umathlimitbelowkern}   \NC --                    \NC 0$^1$                                    \NC big_op_spacing5 \NC\NR
\NC \tex{Umathlimitbelowvgap}   \NC --                    \NC LowerLimitGapMin                         \NC big_op_spacing2 \NC\NR
\NC \tex{Umathoverdelimitervgap}\NC --                    \NC StretchStackGapBelowMin                  \NC big_op_spacing1 \NC\NR
\NC \tex{Umathoverdelimiterbgap}\NC --                    \NC StretchStackTopShiftUp                   \NC big_op_spacing3 \NC\NR
\NC \tex{Umathunderdelimitervgap}\NC--                    \NC StretchStackGapAboveMin                  \NC big_op_spacing2 \NC\NR
\NC \tex{Umathunderdelimiterbgap}\NC--                    \NC StretchStackBottomShiftDown              \NC big_op_spacing4 \NC\NR
\NC \tex{Umathoverbarkern}      \NC --                    \NC OverbarExtraAscender                     \NC default_rule_thickness \NC\NR
\NC \tex{Umathoverbarrule}      \NC --                    \NC OverbarRuleThickness                     \NC default_rule_thickness \NC\NR
\NC \tex{Umathoverbarvgap}      \NC --                    \NC OverbarVerticalGap                       \NC 3*default_rule_thickness \NC\NR
\NC \tex{Umathquad}             \NC --                    \NC <font_size(f)>$^1$                       \NC math_quad \NC\NR
\NC \tex{Umathradicalkern}      \NC --                    \NC RadicalExtraAscender                     \NC default_rule_thickness \NC\NR
\NC \tex{Umathradicalrule}      \NC --                    \NC RadicalRuleThickness                     \NC <not set>$^2$ \NC\NR
\NC \tex{Umathradicalvgap}      \NC D, D'                 \NC RadicalDisplayStyleVerticalGap           \NC (default_rule_thickness+\crlf
                                                                                                    (abs(math_x_height)/4))$^3$ \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC RadicalVerticalGap                       \NC (default_rule_thickness+\crlf
                                                                                                    (abs(default_rule_thickness)/4))$^3$ \NC\NR
\NC \tex{Umathradicaldegreebefore}\NC --                  \NC RadicalKernBeforeDegree                  \NC <not set>$^2$ \NC\NR
\NC \tex{Umathradicaldegreeafter}\NC --                   \NC RadicalKernAfterDegree                   \NC <not set>$^2$ \NC\NR
\NC \tex{Umathradicaldegreeraise}\NC --                   \NC RadicalDegreeBottomRaisePercent          \NC <not set>$^{2,7}$ \NC\NR
\NC \tex{Umathspaceafterscript} \NC --                    \NC SpaceAfterScript                         \NC script_space$^4$ \NC\NR
\NC \tex{Umathstackdenomdown}   \NC D, D'                 \NC StackBottomDisplayStyleShiftDown         \NC denom1 \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC StackBottomShiftDown                     \NC denom2 \NC\NR
\NC \tex{Umathstacknumup}       \NC D, D'                 \NC StackTopDisplayStyleShiftUp              \NC num1 \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC StackTopShiftUp                          \NC num3 \NC\NR
\NC \tex{Umathstackvgap}        \NC D, D'                 \NC StackDisplayStyleGapMin                  \NC 7*default_rule_thickness \NC\NR
\NC "                           \NC T, T', S, S', SS, SS' \NC StackGapMin                              \NC 3*default_rule_thickness \NC\NR
\NC \tex{Umathsubshiftdown}     \NC --                    \NC SubscriptShiftDown                       \NC sub1 \NC\NR
\NC \tex{Umathsubshiftdrop}     \NC --                    \NC SubscriptBaselineDropMin                 \NC sub_drop \NC\NR
\NC \tex{Umathsubsupshiftdown}  \NC --                    \NC SubscriptShiftDownWithSuperscript$^8$    \NC      \NC\NR
\NC                             \NC                       \NC \quad\ or SubscriptShiftDown             \NC sub2 \NC\NR
\NC \tex{Umathsubtopmax}        \NC --                    \NC SubscriptTopMax                          \NC (abs(math_x_height * 4) / 5) \NC\NR
\NC \tex{Umathsubsupvgap}       \NC --                    \NC SubSuperscriptGapMin                     \NC 4*default_rule_thickness \NC\NR
\NC \tex{Umathsupbottommin}     \NC --                    \NC SuperscriptBottomMin                     \NC (abs(math_x_height) / 4) \NC\NR
\NC \tex{Umathsupshiftdrop}     \NC --                    \NC SuperscriptBaselineDropMax               \NC sup_drop \NC\NR
\NC \tex{Umathsupshiftup}       \NC D                     \NC SuperscriptShiftUp                       \NC sup1 \NC\NR
\NC "                           \NC T, S, SS,             \NC SuperscriptShiftUp                       \NC sup2 \NC\NR
\NC "                           \NC D', T', S', SS'       \NC SuperscriptShiftUpCramped                \NC sup3 \NC\NR
\NC \tex{Umathsupsubbottommax}  \NC --                    \NC SuperscriptBottomMaxWithSubscript        \NC (abs(math_x_height * 4) / 5) \NC\NR
\NC \tex{Umathunderbarkern}     \NC --                    \NC UnderbarExtraDescender                   \NC default_rule_thickness \NC\NR
\NC \tex{Umathunderbarrule}     \NC --                    \NC UnderbarRuleThickness                    \NC default_rule_thickness \NC\NR
\NC \tex{Umathunderbarvgap}     \NC --                    \NC UnderbarVerticalGap                      \NC 3*default_rule_thickness \NC\NR
\NC \tex{Umathconnectoroverlapmin}\NC --                  \NC MinConnectorOverlap                      \NC 0$^5$ \NC\NR
\stoptabulate

\stop

Note 1: \OPENTYPE\ fonts set \tex{Umathlimitabovekern} and
\tex{Umathlimitbelowkern} to zero and set \tex{Umathquad} to the font size of the used font,
because these are not supported in the MATH table,

Note 2: \TFM\ fonts do not set \tex{Umathradicalrule} because \TeX82\ uses the height of the radical
instead. When this parameter is indeed not set when \LUATEX\ has to typeset a radical, a backward
compatibility mode will kick in that assumes that an oldstyle \TeX\ font is used.  Also, they  do
not set \tex{Umathradicaldegreebefore}, \tex{Umathradicaldegreeafter}, and
\tex{Umathradicaldegreeraise}.  These are then automatically initialized to $5/18$quad, $-10/18$quad, and 60.

Note 3: If tfm fonts are used, then the \tex{Umathradicalvgap} is not set until the first time
\LUATEX\ has to typeset a formula because this needs parameters from both family2 and family3.
This provides a partial backward compatibility with \TEX82, but that compatibility is only partial:
once the \tex{Umathradicalvgap} is set, it will not be recalculated any more.

Note 4: (also if tfm fonts are used) A similar situation arises wrt. \tex{Umathspaceafterscript}: it is not
set until the first time \LUATEX\ has to typeset a formula. This provides some backward compatibility with
\TEX82.  But once the \tex{Umathspaceafterscript} is set, \tex{scriptspace} will never be looked at again.

Note 5: Tfm fonts set \tex{Umathconnectoroverlapmin} to zero because
\TeX82\ always stacks extensibles without any overlap.

Note 6: The \tex{Umathoperatorsize} is only used in \type{\displaystyle}, and is only set
in \OPENTYPE\ fonts. In \TFM\ font mode, it is artificially set to one scaled point more than the
initial attempt's size, so that always the \quote{first next} will be tried, just like in \TEX82.

Note 7: The \tex{Umathradicaldegreeraise} is a special case because it is the only parameter that is
expressed in a percentage instead of as a number of scaled points.

Note 8: \type{SubscriptShiftDownWithSuperscript} does not actually exist in the \quote{standard}
Opentype Math font Cambria, but it is useful enough to be added. New in version 0.38.

Note 9: \type{FractionDelimiterDisplayStyleSize} and \type{FractionDelimiterSize} do not actually exist in the \quote{standard}
Opentype Math font Cambria, but were useful enough to be added. New in version 0.47.


\section{Math spacing setting}

Besides the parameters mentioned in the previous sections, there are
also 64 new primitives to control the math spacing table (as explained in
Chapter~18 of the \TeX book). The primitive names are a simple matter
of combining two math atom types, but for completeness' sake, here is
the whole list:

\startcolumns[n=2]
\starttyping
\Umathordordspacing
\Umathordopspacing
\Umathordbinspacing
\Umathordrelspacing
\Umathordopenspacing
\Umathordclosespacing
\Umathordpunctspacing
\Umathordinnerspacing
\Umathopordspacing
\Umathopopspacing
\Umathopbinspacing
\Umathoprelspacing
\Umathopopenspacing
\Umathopclosespacing
\Umathoppunctspacing
\Umathopinnerspacing
\Umathbinordspacing
\Umathbinopspacing
\Umathbinbinspacing
\Umathbinrelspacing
\Umathbinopenspacing
\Umathbinclosespacing
\Umathbinpunctspacing
\Umathbininnerspacing
\Umathrelordspacing
\Umathrelopspacing
\Umathrelbinspacing
\Umathrelrelspacing
\Umathrelopenspacing
\Umathrelclosespacing
\Umathrelpunctspacing
\Umathrelinnerspacing
\Umathopenordspacing
\Umathopenopspacing
\Umathopenbinspacing
\Umathopenrelspacing
\Umathopenopenspacing
\Umathopenclosespacing
\Umathopenpunctspacing
\Umathopeninnerspacing
\Umathcloseordspacing
\Umathcloseopspacing
\Umathclosebinspacing
\Umathcloserelspacing
\Umathcloseopenspacing
\Umathcloseclosespacing
\Umathclosepunctspacing
\Umathcloseinnerspacing
\Umathpunctordspacing
\Umathpunctopspacing
\Umathpunctbinspacing
\Umathpunctrelspacing
\Umathpunctopenspacing
\Umathpunctclosespacing
\Umathpunctpunctspacing
\Umathpunctinnerspacing
\Umathinnerordspacing
\Umathinneropspacing
\Umathinnerbinspacing
\Umathinnerrelspacing
\Umathinneropenspacing
\Umathinnerclosespacing
\Umathinnerpunctspacing
\Umathinnerinnerspacing
\stoptyping
\stopcolumns

These parameters are of type \type{\muskip}, so setting a parameter
can be done like this:

\starttyping
\Umathopordspacing\displaystyle=4mu plus 2mu
\stoptyping

They are all initialized by initex to the values mentioned in the
table in Chapter~18 of the \TeX book.

Note 1: for ease of use as well as for backward compatibility, \type{\thinmuskip},
\type{\medmuskip} and \type{\thickmuskip} are treated especially. In their case a pointer to
the corresponding internal parameter is saved, not the actual \type{\muskip} value. This
means that any later changes to one of these three parameters will be taken into account.

Note 2: Careful readers will realise that there are also primitives
for the items marked \type{*} in the \TeX book. These will not
actually be used as those combinations of atoms cannot actually
happen, but it seemed better not to break orthogonality. They are initialized to zero.


\section[mathacc]{Math accent handling}

\LUATEX\ supports both top accents and bottom accents in math mode,
and math accents stretch automatically (if this is supported by the
font the accent comes from, of course). Bottom and combined accents as
well as fixed-width math accents are controlled by optional keywords
following \tex{Umathaccent}.

The keyword \type{bottom} after \tex{Umathaccent} signals that a bottom
accent is needed, and the keyword \type{both} signals that both a top
and a bottom accent are needed (in this case two accents need to be
specified, of course).

Then the set of three integers defining the accent is read. This set
of integers can be prefixed by the \type{fixed} keyword to indicate
that a non-stretching variant is requested (in case of both accents,
this step is repeated).

A simple example:
\starttyping
\Umathaccent both fixed 0 0 "20D7 fixed 0 0 "20D7 {example}
\stoptyping

The primitives \tex{Umathbotaccent} and \tex{Umathaccents} are deprecated since
version 0.65, and will be removed eventually.

If a math top accent has to be placed and the accentee is a character and has a non-zero
\type{top_accent} value, then this value will be used to place the accent instead of
the \type{\skewchar} kern used by \TEX82.

The \type{top_accent} value represents a vertical line somewhere in the accentee. The
accent will be shifted horizontally such that its own \type{top_accent} line coincides
with the one from the accentee. If the \type{top_accent} value of the accent is zero,
then half the width of the accent followed by its italic correction is used instead.

The vertical placement of a top accent depends on the \type{x_height} of the font of the
accentee (as explained in the \TEX book), but if value that turns out to be zero and the
font had a MathConstants table, then \type{AccentBaseHeight} is used instead.

If a math bottom accent has to be placed, the \type{bot_accent} value is checked instead
of \type{top_accent}. Because bottom accents do not exist in \TEX82, the \type{\skewchar}
kern is ignored.

The vertical placement of a bottom accent is straight below the accentee, no correction
takes place.

\section{Math root extension}

The new primitive \type{\Uroot} allows the construction of a radical
noad including a degree field. Its syntax is an extension of \type{\Uradical}:

\starttyping
\Uradical <fam integer> <char integer> <radicand>
\Uroot    <fam integer> <char integer> <degree> <radicand>
\stoptyping

The placement of the degree is controlled by the math parameters
\type{\Umathradicaldegreebefore}, \type{\Umathradicaldegreeafter}, and
\type{\Umathradicaldegreeraise}. The degree will be typeset in \type{\scriptscriptstyle}.


\section{Math kerning in super- and subscripts}

The character fields in a lua-loaded OpenType math font can have a \quote{mathkern} table.
The format of this table is the same as the \quote{mathkern} table that is returned by
the \type{fontloader} library, except that all height and kern values have to
be specified in actual scaled points.

When a super- or subscript has to be placed next to a math item, \LUATEX\ checks
whether the super- or subscript and the nucleus are both simple character items. If
they are, and if the fonts of both character imtes are OpenType fonts (as opposed to
legacy \TEX\ fonts), then \LUATEX\ will use the OpenType MATH algorithm for deciding
on the horizontal placement of the super- or subscript.

This works as follows:

\startitemize
\item The vertical position of the script is calculated.
\item The default horizontal position is flat next to the base character.
\item For superscripts, the italic correction of the base character is added.
\item For a superscript, two vertical values are calculated: the bottom of the
   script (after shifting up), and the top of the base. For a subscript,
   the two values are the top of the (shifted down) script, and the bottom
   of the base.
\item For each of these two locations:
    \startitemize
    \item find the mathkern value at this height for the base
      (for a subscript placement, this is the bottom_right corner,
       for a superscript placement the top_right corner)
    \item find the mathkern value at this height for the script
      (for a subscript placement, this is the top_left corner,
       for a superscript placement the bottom_left corner)
    \item add the found values together to get a preliminary result.
    \stopitemize
\item The horizontal kern to be applied is the smallest of the two results from
    previous step.
\stopitemize

The mathkern value at a specific height is the kern value that is specified by the
next higher height and kern pair, or the highest one in the character (if there is no
value high enough in the character), or simply zero (if the character has no mathkern
pairs at all).

\section{Scripts on horizontally extensible items like arrows}

The new primitives \tex{Uunderdelimiter} and \tex{Uoverdelimiter}
(both from 0.35) allow the placement of a subscript or superscript on
an automatically extensible item and \tex{Udelimiterunder} and
\tex{Udelimiterover} (both from 0.37) allow the placement of
an automatically extensible item as a subscript or superscript on a
nucleus.

The vertical placements are controlled by
\tex{Umathunderdelimiterbgap}, \tex{Umathunderdelimitervgap},
\tex{Umathoverdelimiterbgap}, and \tex{Umathoverdelimitervgap} in a similar way as limit
placements on large operators. The superscript in \tex{Uoverdelimiter} is typeset in
a suitable scripted style, the subscript in \tex{Uunderdelimiter} is cramped as well.

\section {Extensible delimiters}

\LUATEX\ internally uses a structure that supports \OPENTYPE\ \quote{MathVariants} as well
as \TFM\ \quote{extensible recipes}.


\section{Other Math changes}

\subsection {Verbose versions of single-character math commands}

\LUATEX\ defines six new primitives that have the same function as
\type{^}, \type{_}, \type{$}, and \type{$$}. %$

\starttabulate[|l|l|l|l|]
\NC \bf primitive         \NC \bf explanation                           \NC\NR
\NC \tex{Usuperscript}    \NC Duplicates the functionality of \type{^}  \NC\NR
\NC \tex{Usubscript}      \NC Duplicates the functionality of \type{_}  \NC\NR
\NC \tex{Ustartmath}      \NC Duplicates the functionality of \type{$}, % $
                              when used in non-math mode.  \NC\NR
\NC \tex{Ustopmath}       \NC Duplicates the functionality of \type{$}, % $
                              when used in inline math mode.  \NC\NR
\NC \tex{Ustartdisplaymath}\NC Duplicates the functionality of \type{$$}, % $$
                              when used in non-math mode.  \NC\NR
\NC \tex{Ustopdisplaymath} \NC Duplicates the functionality of \type{$$}, % $$
                              when used in display math mode.  \NC\NR
\stoptabulate

All are new in version 0.38. The \tex{Ustopmath} and \tex{Ustopdisplaymath}
primitives check if the current math mode is the correct one (inline
vs. displayed), but you can freely intermix the four mathon|/|mathoff
commands with explicit dollar sign(s).


\subsection{Allowed math commands in non-math modes}

The commands \type{\mathchar}, \type{\omathchar}, and \type{\Umathchar} and control
sequences that are the result of \type{\mathchardef}, \type{\omathchardef}, or
\type{\Umathchardef}  are also acceptable in the horizontal and vertical modes.
In those cases, the \type{\textfont} from the requested math family is used.

\section{Math todo}

The following items are still todo.

\startitemize
\item Pre-scripts.
\item Multi-story stacks.
\item Flattened accents for high characters (?).
\item Better control over the spacing around displays and handling of equation numbers.
\item Support for multi-line displays using \MATHML\ style alignment points.
\stopitemize

\chapter[languages]{Languages and characters, fonts and glyphs}

\LUATEX's internal handling of the characters and glyphs that eventually
become typeset is quite different from the way \TEX82 handles those
same objects. The easiest way to explain the difference is to focus on
unrestricted horizontal mode (i.\,e.\ paragraphs) and hyphenation first.
Later on, it will be easy to deal with the differences that occur in
horizontal and math modes.

In \TEX82, the characters you type are converted into \type{char_node}
records when they are encountered by the main control loop. \TEX\
attaches and processes the font information while creating those
records, so that the resulting \quote{horizontal list} contains the final
forms of ligatures and implicit kerning. This packaging is needed because
we may want to get the effective width of for instance a horizontal box.

When it becomes necessary to hyphenate words in a paragraph, \TEX\
converts (one word at time) the \type{char_node} records into a
string array by replacing ligatures with their components and
ignoring the kerning. Then it runs the hyphenation algorithm on this
string, and converts the hyphenated result back into a
\quote{horizontal list} that is consecutively spliced back into
the paragraph stream. Keep in mind that the paragraph may contain unboxed horizontal material,
which then already contains ligatures and kerns and the words therein
are part of the hyphenation process.

The \type{char_node} records are somewhat misnamed, as they are glyph
positions in specific fonts, and therefore not really \quote{characters}
in the linguistic sense. There is no language information inside the
\type{char_node} records. Instead, language information is passed along
using \type{language whatsit} records inside the horizontal list.

In \LUATEX, the situation is quite different. The characters you
type are always converted into \type{glyph_node} records with a
special subtype to identify them as being intended as linguistic
characters. \LUATEX\ stores the needed language information in those
records, but does not do any font|-|related processing at the time of
node creation. It only stores the index of the font.

When it becomes necessary to typeset a paragraph, \LUATEX\ first
inserts all hyphenation points right into the whole node list.
Next, it processes all the font information in the whole list
(creating ligatures and adjusting kerning), and finally it adjusts
all the subtype identifiers so that the records are \quote{glyph
nodes} from now on.

That was the broad overview. The rest of this chapter will deal with the
minutiae of the new process.

\section[charsandglyphs]{Characters and glyphs}

\TEX82 (including \PDFTEX) differentiated between \type{char_node}s
and \type{lig_node}s.  The former are simple items that contained
nothing but a \quote{character} and a \quote{font} field, and they
lived in the same memory as tokens did. The latter also contained a
list of components, and a subtype indicating whether this ligature was
the result of a word boundary, and it was stored in the same place as
other nodes like boxes and kerns and glues.

In \LUATEX, these two types are merged into one, somewhat larger
structure called a \type{glyph_node}. Besides having the old
character, font, and component fields, and the new special fields like
\quote{attr} (see~\in{section}[glyphnodes]), these nodes also contain:

\startitemize

\item A subtype, split into four main types:

   \startitemize
   \item \type{character}, for characters to be hyphenated: the lowest
       bit (bit 0) is set to 1.
   \item \type{glyph}, for specific font glyphs: the lowest bit
      (bit 0) is not set.
   \item \type{ligature}, for ligatures (bit 1 is set)
   \item \type{ghost}, for \quote{ghost objects} (bit 2 is set)
   \stopitemize

   The latter two make further use of two extra fields (bits 3 and 4):

   \startitemize
   \item \type{left}, for ligatures created from a left word boundary and
                    for ghosts created from \tex{leftghost}
   \item \type{right}, for ligatures created from a right word boundary and
                    for ghosts created from \tex{rightghost}
   \stopitemize

   For ligatures, both bits can be set at the same time (in case of a single|-|glyph word).

\item \type{glyph_node}s of type \quote{character} also contain language data,
  split into four items that were current when the node was created:
  the \tex{setlanguage} (15 bits), \tex{lefthyphenmin} (8 bits),
  \tex{righthyphenmin} (8 bits), and \tex{uchyph} (1 bit).

\stopitemize

Incidentally, \LUATEX\ allows 32768 separate languages, and words can
be 256 characters long.

Because the \tex{uchyph} value is saved in the actual nodes, its
handling is subtly different from \TEX82: changes to \tex{uchyph}
become effective immediately, not at the end of the current partial
paragraph.

Typeset boxes now always have their language information embedded in
the nodes themselves, so there is no longer a possible dependency on
the surrounding language settings. In \TEX82, a mid-paragraph
statement like \tex{unhbox0} would process the box using the current
paragraph language unless there was a \tex{setlanguage} issued inside
the box. In \LUATEX, all language variables are already frozen.


\section{The main control loop}

In \LUATEX's main loop, almost all input characters that are to be
typeset are converted into \type{glyph_node} records with subtype
\quote{character}, but there are a few small exceptions.

First, the \tex{accent} primitives creates nodes with subtype \quote{glyph}
instead of \quote{character}: one for the actual accent and one for the
accentee. The primary reason for this is that \tex{accent} in \TEX82
is explicitly dependent on the current font encoding, so it would not
make much sense to attach a new meaning to the primitive's name, as
that would invalidate many old documents and macro packages. A
secondary reason is that in \TEX82, \tex{accent} prohibits hyphenation
of the current word. Since in \LUATEX\ hyphenation only takes place on
\quote{character} nodes, it is possible to achieve the same effect.

This change of meaning did happen with \tex{char}, that now generates
\quote{character} nodes, consistent with its changed meaning in \XETEX.
The changed status of \tex{char} is not yet finalized, but if it stays
as it is now, a new primitive \tex{glyph} should be added to directly
insert a font glyph id.

Second, all the results of processing in math mode eventually become
nodes with \quote{glyph} subtypes.

Third, the \ALEPH-derived commands \tex{leftghost} and
\tex{rightghost} create nodes of a third subtype: \quote{ghost}. These nodes
are ignored completely by all further processing until the stage where
inter-glyph kerning is added.

Fourth, automatic discretionaries are handled differently. \TEX82
inserts an empty discretionary after sensing an input character that
matches the \tex{hyphenchar} in the current font. This test is wrong,
in our opinion: whether or not hyphenation takes place should not
depend on the current font, it is a language property.

In \LUATEX, it works like this: if \LUATEX\ senses a string of input
characters that matches the value of the new integer parameter
\tex{exhyphenchar}, it will insert an explicit discretionary after that
series of nodes. Initex sets the \tex{exhyphenchar=`\-}.
Incidentally, this is a global parameter instead of a
language-specific one because it may be useful to change the value
depending on the document structure instead of the text language.

Note: as of \LUATEX\ 0.63.0, the insertion of discretionaries after
a sequence of explicit hyphens happens at the same time as the other
hyphenation processing, {\it not\/} inside the main control loop.

The only use \LUATEX\ has for \tex{hyphenchar} is at the check
whether a word should be considered for hyphenation at all. If the
\tex{hyphenchar} of the font attached to the first character node in a
word is negative, then hyphenation of that word is abandoned
immediately. {\bf This behavior is added for backward
compatibility only, and the use of \type{\hyphenchar=-1} as a means of
preventing hyphenation should not be used in new \LUATEX\ documents.}

Fifth, \tex{setlanguage} no longer creates whatsits. The meaning of
\tex{setlanguage} is changed so that it is now an integer parameter
like all others. That integer parameter is used in \tex{glyph_node}
creation to add language information to the glyph nodes. In
conjunction, the \tex{language} primitive is extended so that it
always also updates the value of \tex{setlanguage}.

Sixth, the \tex{noboundary} command (this command prohibits word
boundary processing where that would normally take place) now does
create whatsits. These whatsits are needed because the exact place of
the \tex{noboundary} command in the input stream has to be retained
until after the ligature and font processing stages.

Finally, there is no longer a \type{main_loop} label in the
code. Remember that \TEX82 did quite a lot of processing while adding
\type{char_nodes} to the horizontal list? For speed reasons, it handled
that processing code outside of the \quote{main control} loop, and only the
first character of any \quote{word} was handled by that \quote{main control} loop.
In \LUATEX, there is no longer a need for that (all hard work is done
later), and the (now very small) bits of character-handling code have
been moved back inline. When \tex{tracingcommands} is on, this is
visible because the full word is reported, instead of just the initial
character.


\section[patternsexceptions]{Loading patterns and exceptions}

The hyphenation algorithm in \LUATEX\ is quite different from the one
in \TEX82, although it uses essentially the same user input.

After expansion, the argument for \tex{patterns} has to be proper
UTF-8 with individual patterns separated by spaces, no \tex{char} or
\tex{chardef-ed} commands are allowed. (The current implementation is
even more strict, and will reject all non|-|\UNICODE\ characters, but
that will be changed in the future. For now, the generated errors are
a valuable tool in discovering font-encoding specific pattern files)

Likewise, the expanded argument for \tex{hyphenation} also has to be
proper UTF-8, but here a tiny little bit of extra syntax is provided:

\startitemize[n]
\item three sets of arguments in curly braces (\type{{}{}{}})
   indicates a desired complex discretionary, with arguments
   as in \tex{discretionary}'s command in normal document input.
\item \type{-} indicates a desired simple discretionary,  cf. \tex{-} and
   \type{\discretionary{-}{}{}} in normal document input.
\item Internal command names are ignored. This rule is provided
   especially for \tex{discretionary}, but it also helps to deal with
  \tex{relax} commands that may sneak in.
\item \type{=} indicates a (non-discretionary) hyphen in the document input.
\stopitemize

The expanded argument is first converted back to a space-separated
string while dropping the internal command names. This string is then
converted into a dictionary by a routine that creates key||value pairs
by converting the other listed items. It is important to note that the
keys in an exception dictionary can always be generated from the
values. Here are a few examples:

\starttabulate[|l|l|l|]
\NC \ssbf value          \NC \ssbf implied key (input) \NC \ssbf effect\NC\NR
\NC \type{ta-ble}        \NC table  \NC \type{ta\-ble}
                                        ($=$ \type{ta\discretionary{-}{}{}ble})\NC\NR
\NC \type{ba{k-}{}{c}ken}\NC backen \NC \type{ba\discretionary{k-}{}{c}ken}\NC\NR
\stoptabulate

The resultant patterns and exception dictionary will be stored under
the language code that is the present value of \tex{language}.

In the last line of the table, you see there is no \tex{discretionary}
command in the value: the command is optional in the \TEX-based input
syntax. The underlying reason for that is that it is conceivable that
a whole dictionary of words is stored as a plain text file and loaded
into \LUATEX\ using one of the functions in the \LUA\ \luatex{lang}
library. This loading method is quite a bit faster than going through
the \TEX\ language primitives, but some (most?) of that speed gain
would be lost if it had to interpret command sequences while doing so.

Starting with \LUATEX\ 0.63.0, it is possible to specify extra hyphenation
points in compound words by using \type{{-}{}{-}} for the explicit hyphen
character (replace \type{-} by the actual explicit hyphen character if needed).
For example, this matches the word \quote{multi-word-boundaries} and allows
an extra break inbetweem \quote{boun} and \quote{daries}:

\starttyping
\hyphenation{multi{-}{}{-}word{-}{}{-}boun-daries}
\stoptyping

The motivation behind the \ETEX\ extension \tex{savinghyphcodes} was
that hyphenation heavily depended on font encodings. This is no longer
true in \LUATEX, and the corresponding primitive is ignored pending
complete removal. The future semantics of \tex{uppercase} and
\tex{lowercase} are still under consideration, no changes have taken
place yet.


\section{Applying hyphenation}

The internal structures \LUATEX\ uses for the insertion of
discretionaries in words is very different from the ones in \TEX82,
and that means there are some noticeable differences in handling as
well.

First and foremost, there is no \quote{compressed trie} involved in
hyphenation. The algorithm still reads \PATGEN-generated pattern
files, but \LUATEX\ uses a finite state hash to match the patterns
against the word to be hyphenated. This algorithm is based on the
\quote{libhnj} library used by OpenOffice, which in turn is inspired
by \TEX.
The memory allocation for this new implementation is completely
dynamic, so the \WEBC\ setting for \type{trie_size} is ignored.

Differences between \LUATEX\ and \TEX82 that are a direct result of that:

\startitemize
\item \LUATEX\ happily hyphenates the full \UNICODE\ character range.
\item Pattern and exception dictionary size is limited by the
  available memory only, all allocations are done dynamically.
  The trie-related settings in \type{texmf.cnf} are ignored.
\item Because there is no \quote{trie preparation} stage, language patterns
  never become frozen. This means that the primitive \tex{patterns}
  (and its \LUA\ counterpart \luatex{lang.patterns}) can be used at any
  time, not only in initex.
\item Only the string representation of \tex{patterns} and
  \tex{hyphenation} is stored in the format file. At format load time,
  they are simply re-evaluated. It follows that there is no real
  reason to preload languages in the format file. In fact, it is
  usually not a good idea to do so. It is much smarter to load
  patterns no sooner than the first time they are actually needed.
\item \LUATEX\ uses the language-specific variables
\tex{prehyphenchar} and \tex{posthyphenchar} in the creation of
implicit discretionaries, instead of \TEX82's \tex{hyphenchar}, and
the values of the language-specific variables \tex{preexhyphenchar} and
\tex{postexhyphenchar} for explicit discretionaries (instead of
\TEX82's empty discretionary).
\stopitemize

Inserted characters and ligatures inherit their attributes from the
nearest glyph node item (usually the preceding one, but the following
one for the items inserted at the left-hand side of a word).

Word boundaries are no longer implied by font switches, but by
language switches. One word can have two separate fonts and still be
hyphenated correctly (but it can not have two different languages,
the \tex{setlanguage} command forces a word boundary).

All languages start out with \tex{prehyphenchar=`\-},
\tex{posthyphenchar=0}, \tex{preexhyphenchar=0} and
\tex{postexhyphenchar=0}.
When you assign the values of one of these four parameters, you are
actually changing the settings for the current \tex{language}, this
behavior is compatible with \tex{patterns} and \tex{hyphenation}.

\LUATEX\ also hyphenates the first word in a paragraph.

Words can be up to 256 characters long (up from 64 in \TEX82).  Longer
words generate an error right now, but eventually either the
limitation will be removed or perhaps it will become possible to
silently ignore the excess characters (this is what happens in \TEX82,
but there the behavior cannot be controlled).

If you are using the \LUA\ function \type{lang.hyphenate}, you should be
aware that this function expects to receive a list of \quote{character}
nodes. It will not operate properly in the presence of \quote{glyph},
\quote{ligature}, or \quote{ghost} nodes, nor does it know how to deal with
kerning.  In the near future, it will be able to skip over \quote{ghost}
nodes, and we may add a less fuzzy function you can call as well.

The hyphenation exception dictionary is maintained as key-value
hash, and that is also dynamic, so the \type{hyph_size} setting is not
used either.

A technical paper detailing the new algorithm will be released as a
separate document.

\section{Applying ligatures and kerning}

After all possible hyphenation points have been inserted in the list,
\LUATEX\ will process the list to convert the \quote{character} nodes into
\quote{glyph} and \quote{ligature} nodes. This is actually done in two stages:
first all ligatures are processed, then all kerning information is
applied to the result list. But those two stages are somewhat
dependent on each other: If the used font makes it possible to do so,
the ligaturing stage adds virtual \quote{character} nodes to the word
boundaries in the list. While doing so, it removes and interprets
\type{noboundary} nodes. The kerning stage deletes those word boundary
items after it is done with them, and it does the same for \quote{ghost}
nodes. Finally, at the end of the kerning stage, all remaining
\quote{character} nodes are converted to \quote{glyph} nodes.

This work separation is worth mentioning because, if you overrule from
\LUA\ only one of the two callbacks related to font handling, then you
have to make sure you perform the tasks normally done by \LUATEX\
itself in order to make sure that the other, non|-|overruled, routine
continues to function properly.

Work in this area is not yet complete, but most of the possible cases
are handled by our rewritten ligaturing engine. We are working hard to
make sure all of the possible inputs will become supported soon.

For example, take the word \type{office}, hyphenated \type{of-fice},
using a \quote{normal} font with all the \type{f}-\type{f} and
\type{f}-\type{i} type ligatures:

\starttabulate[|l|l|]
\NC Initial:               \NC \type{{o}{f}{f}{i}{c}{e}}\NC\NR
\NC After hyphenation:     \NC \type{{o}{f}{{-},{},{}}{f}{i}{c}{e}}\NC\NR
\NC First ligature stage:  \NC \type{{o}{{f-},{f},{<ff>}}{i}{c}{e}}\NC\NR
\NC Final result:          \NC \type{{o}{{f-},{<fi>},{<ffi>}}{c}{e}} \NC\NR
\stoptabulate

That's bad enough, but let us assume that there is also a hyphenation
point between the \type{f} and the \type{i}, to create
\type{of-f-ice}. Then the final result should be:

\starttyping
{o}{{f-},
    {{f-},
     {i},
     {<fi>}},
    {{<ff>-},
     {i},
     {<ffi>}}}{c}{e}
\stoptyping

with discretionaries in the post-break text as well as in the
replacement text of the top-level discretionary that resulted from the
first hyphenation point.

Here is that nested solution again, in a different representation:

\starttabulate[|l|l|l|l|]
\NC                  \NC pre             \NC post         \NC replace       \NC \NR
\NC topdisc          \NC \type{f-}$^1$   \NC sub1         \NC sub2          \NC \NR
\NC sub1             \NC \type{f-}$^2$   \NC \type{i}$^3$ \NC \type{<fi>}$^4$ \NC \NR
\NC sub2             \NC \type{<ff>-}$^5$\NC \type{i}$^6$ \NC \type{<ffi>}$^7$\NC \NR
\stoptabulate

When line breaking is choosing its breakpoints, the following fields will eventually
be selected:

\starttabulate[|l|l|l|]
\NC \type{of-f-ice} \NC \type{f-}$^1$    \NC \NR
\NC                 \NC \type{f-}$^2$    \NC \NR
\NC                 \NC \type{i}$^3$     \NC \NR
\NC \type{of-fice}  \NC \type{f-}$^1$    \NC \NR
\NC                 \NC \type{<fi>}$^4$  \NC \NR
\NC \type{off-ice}  \NC \type{<ff>-}$^5$ \NC \NR
\NC                 \NC \type{i}$^6$     \NC \NR
\NC \type{office}   \NC \type{<ffi>}$^7$ \NC \NR
\stoptabulate

The current solution in \LUATEX\ is not able to handle nested
discretionaries, but it is in fact smart enough to handle this
fictional \type{of-f-ice} example. It does so by combining two
sequential discretionary nodes as if they were a single object
(where the second discretionary node is treated as an extension
of the first node).

One can observe that the \type{of-f-ice} and \type{off-ice} cases both end
with the same actual post replacement list (\type{i}), and that this
would be the case even if that \type{i} was the first item of a
potential following ligature like \type{ic}. This allows \LUATEX\
to do away with one of the fields, and thus make the whole stuff fit
into just two discretionary nodes.

The mapping of the seven list fields to the six fields in this
discretionary node pair is as follows:

\starttabulate[|l|p|]
\NC \bf field \NC \bf description \NC \NR
\NC \type{disc1.pre}     \NC \type{f-}$^1$  \NC \NR
\NC \type{disc1.post}    \NC \type{<fi>}$^4$  \NC \NR
\NC \type{disc1.replace} \NC \type{<ffi>}$^7$ \NC \NR
\NC \type{disc2.pre}     \NC \type{f-}$^2$  \NC \NR
\NC \type{disc2.post}    \NC \type{i}$^{3{,}6}$\NC \NR
\NC \type{disc2.replace} \NC \type{<ff>-}$^5$\NC \NR
\stoptabulate

What is actually generated after ligaturing has been applied is
therefore:

\starttyping
{o}{{f-},
    {<fi>},
    {<ffi>}}
   {{f-},
    {i},
    {<ff>-}}{c}{e}
\stoptyping

The two discretionaries have different subtypes from a discretionary
appearing on its own: the first has subtype 4, and the second has
subtype 5.  The need for these special subtypes stems from the fact
that not all of the fields appear in their \quote{normal} location.
The second discretionary especially looks odd, with things like the
\type{<ff>-} appearing in \type{disc2.replace}. The fact that some of
the fields have different meanings (and different processing code
internally) is what makes it necessary to have different subtypes:
this enables \LUATEX\ to distinguish this sequence of two joined
discretionary nodes from the case of two standalone discretionaries
appearing in a row.


\section{Breaking paragraphs into lines}

This code is still almost unchanged, but because of the
above|-|mentioned changes with respect to discretionaries and ligatures,
line breaking will potentially be different from traditional \TEX.
The actual line breaking code is still based on the \TEX82 algorithms,
and it does not expect there to be discretionaries inside of
discretionaries.

But that situation is now fairly common in \LUATEX, due to the changes
to the ligaturing mechanism. And also, the \LUATEX\ discretionary
nodes are implemented slightly different from the \TEX82 nodes: the
\type{no_break} text is now embedded inside the disc node, where
previously these nodes kept their place in the horizontal list (the
discretionary node contained a counter indicating how many nodes to
skip).

The combined effect of these two differences is that \LUATEX\ does not
always use all of the potential breakpoints in a paragraph, especially
when fonts with many ligatures are used.

% TODO:
%   Check \sfcode handling
%   Implement \glyph
%
%   Remove \savinghyphcodes
%   Allow non-UCS characters in \patterns

\chapter[fonts]{Font structure}

All \TEX\ fonts are represented to \LUA\ code as tables, and
internally as C~structures. All keys in the table below are saved in
the internal font structure if they are present in the table returned
by the
\luatex{define_font} callback, or if they result from the normal \TFM|/|\VF\
reading routines if there is no \luatex{define_font} callback defined.

The column \quote{from \VF} means that this key will be created by the
\luatex{font.read_vf()} routine, \quote{from \TFM} means that the key will be created
by the \luatex{font.read_tfm()} routine, and \quote{used} means whether or not the
\LUATEX\ engine itself will do something with the key.

The top|-|level keys in the table are as follows:

\starttabulate[|Tl|l|l|l|l|p|]
\NC \ssbf key  \NC \bf from vf \NC \bf from tfm \NC \bf used\NC \bf value type \NC \bf description \NC\NR
\NC name               \NC yes      \NC yes      \NC yes \NC string \NC metric (file) name\NC\NR
\NC area               \NC no       \NC yes      \NC yes \NC string \NC (directory) location, typically empty\NC\NR
\NC used               \NC no       \NC yes      \NC yes \NC boolean\NC used already? (initial: false)\NC \NR
\NC characters         \NC yes      \NC yes      \NC yes \NC table  \NC the defined glyphs of this font \NC \NR
\NC checksum           \NC yes      \NC yes      \NC no  \NC number \NC default: 0 \NC \NR
\NC designsize         \NC no       \NC yes      \NC yes \NC number \NC expected size (default: 655360 == 10pt) \NC \NR
\NC direction          \NC no       \NC yes      \NC yes \NC number \NC default: 0 (TLT) \NC \NR
\NC encodingbytes      \NC no       \NC no       \NC yes \NC number \NC default: depends on \type {format}\NC\NR
\NC encodingname       \NC no       \NC no       \NC yes \NC string \NC encoding name\NC\NR
\NC fonts              \NC yes      \NC no       \NC yes \NC table  \NC locally used fonts\NC \NR
\NC psname             \NC no       \NC no       \NC yes \NC string
\NC actual (\POSTSCRIPT) name (this is the PS fontname in the
incoming font source, also used as fontname identifier in the \PDF\ output, new in 0.43)\NC\NR
\NC fullname           \NC no       \NC no       \NC yes \NC string \NC output font name, used as a fallback in the \PDF\ output if the psname is not set\NC\NR
\NC header             \NC yes      \NC no       \NC no  \NC string \NC header comments, if any\NC \NR
\NC hyphenchar         \NC no       \NC no       \NC yes \NC number \NC default: TeX's \tex{hyphenchar} \NC \NR
\NC parameters         \NC no       \NC yes      \NC yes \NC hash   \NC default: 7 parameters, all zero \NC \NR
\NC size               \NC no       \NC yes      \NC yes \NC number \NC loaded (at) size. (default: same as designsize) \NC \NR
\NC skewchar           \NC no       \NC no       \NC yes \NC number \NC default: TeX's \tex{skewchar}  \NC \NR
\NC type               \NC yes      \NC no       \NC yes \NC string \NC basic type of this font\NC \NR
\NC format             \NC no       \NC no       \NC yes \NC string \NC disk format type\NC \NR
\NC embedding          \NC no       \NC no       \NC yes \NC string \NC \PDF\ inclusion\NC \NR
\NC filename           \NC no       \NC no       \NC yes \NC string \NC disk file name\NC\NR
\NC tounicode          \NC no       \NC yes      \NC yes \NC number \NC if 1, \LUATEX\ assumes per-glyph tounicode entries are
                                                                        present in the font\NC\NR
\NC stretch            \NC no       \NC no       \NC yes \NC number \NC the \quote {stretch} value from \tex{pdffontexpand}\NC\NR
\NC shrink             \NC no       \NC no       \NC yes \NC number \NC the \quote {shrink} value from \tex{pdffontexpand}\NC\NR
\NC step               \NC no       \NC no       \NC yes \NC number \NC the \quote {step} value from \tex{pdffontexpand}\NC\NR
\NC auto_expand        \NC no       \NC no       \NC yes \NC boolean\NC the \quote {autoexpand} keyword from\crlf \tex{pdffontexpand}\NC\NR
\NC expansion_factor   \NC no       \NC no       \NC no  \NC number \NC the actual expansion factor of an expanded font\NC\NR
\NC attributes         \NC no       \NC no       \NC yes \NC string \NC the \tex{pdffontattr}\NC\NR
\NC cache              \NC no       \NC no       \NC yes \NC string \NC this key controls caching of the lua table on the \type{tex}
                                                                        end. \type{yes}: use a reference to the table that is
                                                                        passed to \LUATEX\  (this is the default). \type{no}: don't store the table
                                                                        reference, don't cache any lua data for this font.
                                                                        \type{renew}: don't store the table reference, but
                                                                        save a reference to the table that is created at the
                                                                        first access to one of its fields in font.fonts.
                                                                        (new in 0.40.0, before that caching was always \type{yes}).
Note: the saved reference is thread-local, so be careful when you are using coroutines: an error will be thrown if the table
has been cached in one thread, but you reference it from another thread ($\approx$ coroutine)\NC\NR
\NC nomath              \NC no       \NC no       \NC yes \NC boolean\NC this key allows a minor speedup for text fonts. if it is
                                                                         present and true, then \LUATEX\ will not check the
                                                                         character enties for math-specific keys. (0.42.0)\NC\NR
\NC slant               \NC no       \NC no       \NC yes \NC number \NC This has the same semantics as the \type{SlantFont} operator
                                                                         in font map files. (0.47.0)\NC\NR
\NC extent              \NC no       \NC no       \NC yes \NC number \NC This has the same semantics as the \type{ExtendFont} operator
                                                                         in font map files. (0.50.0)\NC\NR
\stoptabulate

The key \type{name} is always required. The keys \type{stretch},
\type{shrink}, \type{step} and optionally \type{auto_expand} only
have meaning when used together: they can be used to replace a
post-loading \tex{pdffontexpand} command. The
\type{expansion_factor} is value that can be present inside a font
in \type{font.fonts}. It is the actual expansion factor (a value
between \type{-shrink} and \type{stretch}, with step \type{step})
of a font that was automatically generated by the font expansion
algorithm. The key \type{attributes} can be used to replace
\tex{pdffontattr}. The key \type{used} is set by the engine when a
font is actively in use, this makes sure that the font's
definition is written to the output file (\DVI\ or \PDF). The
\TFM\ reader sets it to false. The \type{direction} is a number
signalling the \quote{normal} direction for this font. There are
sixteen possibilities:

\starttabulate[|Tc|c|c|c|]
\NC \ssbf number \NC \bf meaning \NC \bf number \NC \bf meaning \NC\NR
\NC 0          \NC LT          \NC 8          \NC TT          \NC\NR
\NC 1          \NC LL          \NC 9          \NC TL          \NC\NR
\NC 2          \NC LB          \NC 10         \NC TB          \NC\NR
\NC 3          \NC LR          \NC 11         \NC TR          \NC\NR
\NC 4          \NC RT          \NC 12         \NC BT          \NC\NR
\NC 5          \NC RL          \NC 13         \NC BL          \NC\NR
\NC 6          \NC RB          \NC 14         \NC BB          \NC\NR
\NC 7          \NC RR          \NC 15         \NC BR          \NC\NR
\stoptabulate

These are \OMEGA|-|style direction abbreviations: the first character
indicates the \quote{first} edge of the character glyphs (the edge that is
seen first in the writing direction), the second the \quote{top} side.

The \type{parameters} is a hash with mixed key types. There are seven
possible string keys, as well as a number of integer indices (these
start from 8 up). The seven strings are actually used instead of the
bottom seven indices, because that gives a nicer user interface.

The names and their internal remapping are:

\starttabulate[|lT|c|]
\NC \ssbf name      \NC \bf internal remapped number \NC\NR
\NC slant         \NC 1  \NC\NR
\NC space         \NC 2  \NC\NR
\NC space_stretch \NC 3  \NC\NR
\NC space_shrink  \NC 4  \NC\NR
\NC x_height      \NC 5  \NC\NR
\NC quad          \NC 6  \NC\NR
\NC extra_space   \NC 7  \NC\LR
\stoptabulate

The keys \type{type}, \type{format}, \type{embedding}, \type{fullname} and
\type{filename} are used to embed \OPENTYPE\ fonts in the result \PDF.

The \type{characters} table is a list of character hashes indexed by
an integer number. The number is the \quote{internal code} \TEX\ knows this
character by.

Two very special string indexes can be used also: \type{left_boundary} is a
virtual character whose ligatures and kerns are used to handle word
boundary processing. \type{right_boundary} is similar but not actually
used for anything (yet!).

Other index keys are ignored.

Each character hash itself is a hash. For example, here is the
character \quote{f} (decimal 102) in the font cmr10 at 10 points:

\starttyping
[102] = {
    ['width'] = 200250,
    ['height'] = 455111,
    ['depth'] = 0,
    ['italic'] = 50973,
    ['kerns'] = {
        [63] = 50973,
        [93] = 50973,
        [39] = 50973,
        [33] = 50973,
        [41] = 50973
    },
    ['ligatures'] = {
        [102] = {
            ['char'] = 11,
            ['type'] = 0
        },
        [108] = {
            ['char'] = 13,
            ['type'] = 0
        },
        [105] = {
            ['char'] = 12,
            ['type'] = 0
        }
    }
}
\stoptyping

The following top|-|level keys can be present inside a character hash:

\starttabulate[|lT|c|c|c|l|p|]
\NC \ssbf key    \NC \bf from vf \NC \bf from tfm \NC \bf used \NC \bf value type \NC \bf description \NC\NR
\NC width      \NC yes         \NC yes          \NC yes      \NC number         \NC character's width, in sp (default 0) \NC\NR
\NC height     \NC no          \NC yes          \NC yes      \NC number         \NC character's height, in sp (default 0) \NC\NR
\NC depth      \NC no          \NC yes          \NC yes      \NC number         \NC character's depth, in sp (default 0) \NC\NR
\NC italic     \NC no          \NC yes          \NC yes      \NC number         \NC character's italic correction, in sp (default zero) \NC\NR
\NC top_accent \NC no          \NC no           \NC maybe    \NC number         \NC character's top accent alignment place, in sp (default zero) \NC\NR
\NC bot_accent \NC no          \NC no           \NC maybe    \NC number         \NC character's bottom accent alignment place, in sp (default zero) \NC\NR
\NC left_protruding  \NC no    \NC no           \NC maybe    \NC number         \NC character's \tex{lpcode}\NC\NR
\NC right_protruding \NC no    \NC no           \NC maybe    \NC number         \NC character's \tex{rpcode}\NC\NR
\NC expansion_factor \NC no    \NC no           \NC maybe    \NC number         \NC character's \tex{efcode}\NC\NR
\NC tounicode  \NC no          \NC no           \NC maybe    \NC string         \NC character's Unicode equivalent(s), in UTF-16BE hexadecimal format\NC\NR
\NC next       \NC no          \NC yes          \NC yes      \NC number         \NC the \quote{next larger} character index \NC\NR
\NC extensible \NC no          \NC yes          \NC yes      \NC table          \NC the constituent parts of an extensible recipe \NC\NR
\NC vert_variants \NC no       \NC no           \NC yes      \NC table          \NC constituent parts of a vertical variant set\NC \NR
\NC horiz_variants\NC no       \NC no           \NC yes      \NC table          \NC constituent parts of a horizontal variant set\NC \NR
\NC kerns      \NC no          \NC yes          \NC yes      \NC table          \NC kerning information \NC\NR
\NC ligatures  \NC no          \NC yes          \NC yes      \NC table          \NC ligaturing information \NC\NR
\NC commands   \NC yes         \NC no           \NC yes      \NC array          \NC virtual font commands \NC\NR
\NC name       \NC no          \NC no           \NC no       \NC string         \NC the character (\POSTSCRIPT) name \NC\NR
\NC index      \NC no          \NC no           \NC yes      \NC number         \NC the (\OPENTYPE\ or \TRUETYPE) font glyph index \NC\NR
\NC used       \NC no          \NC yes          \NC yes      \NC boolean        \NC typeset already (default: false)? \NC\NR
\NC mathkern   \NC no          \NC no           \NC yes      \NC table          \NC math cut-in specifications \NC\NR
\stoptabulate

The values of \type{top_accent}, \type{bot_accent} and \type{mathkern} are used only for math
accent and superscript placement, see the \at{math chapter}[math] in this manual for details.

The values of \type{left_protruding} and \type{right_protruding} are used only when
\tex{pdfprotrudechars} is non-zero.

Whether or not \type{expansion_factor} is used depends on the font's global expansion
settings, as well as on the value of \tex{pdfadjustspacing}.

The usage of \type{tounicode} is this: if this font specifies a \type{tounicode=1} at
the top level, then \LUATEX\ will construct a \type{/ToUnicode} entry for the \PDF\
font (or font subset) based on the character-level \type{tounicode} strings, where
they are available. If a character does not have a sensible \UNICODE\ equivalent,
do not provide a string either (no empty strings).

If the font-level \type{tounicode} is not set, then \LUATEX\ will build up
\type{/ToUnicode} based on the \TEX\ code points you used, and any character-level
\type{tounicodes} will be ignored. {\it At the moment, the string format is exactly the
format that is expected by Adobe \CMAP\ files (\UTF-16BE in hexadecimal encoding), minus
the enclosing angle brackets. This may change in the future.} Small example: the
\type{tounicode} for a \type{fi} ligature would be \type{00660069}.

The presence of \type{extensible} will overrule \type{next}, if that is also present.
It in in turn can be overruled by \type{vert_variants}.

The \type{extensible} table is very simple:

\starttabulate[|lT|l|p|]
\NC \ssbf key \NC \bf type \NC \bf description                    \NC\NR
\NC top     \NC number   \NC \quote{top} character index        \NC\NR
\NC mid     \NC number   \NC \quote{middle} character index     \NC\NR
\NC bot     \NC number   \NC \quote{bottom} character index     \NC\NR
\NC rep     \NC number   \NC \quote{repeatable} character index \NC\NR
\stoptabulate

The \type{horiz_variants} and \type{vert_variants} are arrays of components. Each of those
components is itself a hash of up to five keys:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC component            \NC number     \NC The character index (note that this is an encoding number, not a name).\NC \NR
\NC extender             \NC number     \NC One (1) if this part is repeatable, zero (0) otherwise.\NC \NR
\NC start                \NC number     \NC Maximum overlap at the starting side (in scaled points).\NC \NR
\NC end                  \NC number     \NC Maximum overlap at the ending side (in scaled points).\NC \NR
\NC advance              \NC number     \NC Total advance width of this item (can be zero or missing,
                                            then the natural size of the glyph for character \type{component}
                                            is used).\NC \NR
\stoptabulate

The \type{kerns} table is a hash indexed by character index (and
\quote{character index} is defined as either a non|-|negative integer or the
string value \type {right_boundary}), with the values the kerning to be
applied, in scaled points.

The \type{ligatures} table is a hash indexed by character index (and
\quote{character index} is defined as either a non|-|negative integer or the
string value \type {right_boundary}), with the values being yet another small
hash, with two fields:

\starttabulate[|lT|l|p|]
\NC \ssbf key \NC \bf type \NC \bf description \NC\NR
\NC type    \NC number   \NC the type of this ligature command, default 0 \NC\NR
\NC char    \NC number   \NC the character index of the resultant ligature \NC\NR
\stoptabulate

The \type{char} field in a ligature is required.

The \type{type} field inside a ligature is the numerical or string value of one of the eight
possible ligature types supported by \TEX.  When \TEX\ inserts a new ligature, it puts the new
glyph in the middle of the left and right glyphs. The original left and right glyphs can
optionally be retained, and when at least one of them is kept, it is also possible to move the
new \quote{insertion point} forward one or two places. The glyph that ends up to the right of the
insertion point will become the next \quote{left}.

\starttabulate[|l|c|l|l|]
\NC \bf textual (Knuth) \NC \bf number \NC \bf string    \NC result \NC\NR
\NC l + r =: n          \NC 0          \NC \type{=:}     \NC \|n    \NC\NR
\NC l + r =:\| n        \NC 1          \NC \type{=:|}    \NC \|nr   \NC\NR
\NC l + r \|=: n        \NC 2          \NC \type{|=:}    \NC \|ln   \NC\NR
\NC l + r \|=:\| n      \NC 3          \NC \type{|=:|}   \NC \|lnr  \NC\NR
\NC l + r  =:\|\> n     \NC 5          \NC \type{=:|>}   \NC n\|r   \NC\NR
\NC l + r \|=:\> n      \NC 6          \NC \type{|=:>}   \NC l\|n   \NC\NR
\NC l + r \|=:\|\> n    \NC 7          \NC \type{|=:|>}  \NC l\|nr  \NC\NR
\NC l + r \|=:\|\>\> n  \NC 11         \NC \type{|=:|>>} \NC ln\|r  \NC\NR
\stoptabulate

The default value is~0, and can be left out. That signifies a \quote{normal}
ligature where the ligature replaces both original glyphs. In this table
the~\| indicates the final insertion point.

The \type{commands} array is explained below.

\section {Real fonts}

Whether or not a \TEX\ font is a \quote{real} font that should be written to
the \PDF\ document is decided by the \type{type} value in the top|-|level
font structure. If the value is \type{real}, then this is a proper
font, and the inclusion mechanism will attempt to add the needed
font object definitions to the \PDF.

Values for \type{type}:

\starttabulate[|Tl|p|]
\NC \ssbf value     \NC \bf description        \NC\NR
\NC real          \NC this is a base font    \NC\NR
\NC virtual       \NC this is a virtual font \NC\NR
\stoptabulate

The actions to be taken depend on a number of different variables:

\startitemize[packed]
\item Whether the used font fits in an 8-bit encoding scheme or not
\item The type of the disk font file
\item The level of embedding requested
\stopitemize

A font that uses anything other than an 8-bit encoding vector has to
be written to the \PDF\ in a different way.

The rule is: if the font table has \type {encodingbytes} set to~2,
then this is a wide font, in all other cases it isn't. The value~2 is
the default for \OPENTYPE\ and \TRUETYPE\ fonts loaded via \LUA. For
\TYPEONE\ fonts, you have to set \type {encodingbytes} to~2
explicitly. For \PK\ bitmap fonts, wide font encoding is not
supported at all.

If no special care is needed, \LUATEX\ currently falls back to the
mapfile|-|based solution used by \PDFTEX\ and \DVIPS. This behavior
will be removed in the future, when the existing code becomes
integrated in the new subsystem.

But if this is a \quote{wide} font, then the new subsystem kicks in, and
some extra fields have to be present in the font structure. In this
case, \LUATEX\ does not use a map file at all.

The extra fields are: \type{format}, \type{embedding}, \type{fullname},
\type{cidinfo} (as explained above), \type{filename}, and the
\type{index} key in the separate characters.

Values for \type{format} are:

\starttabulate[|Tl|p|]
\NC \ssbf value \NC \bf description                                           \NC\NR
\NC type1     \NC this is a \POSTSCRIPT\ \TYPEONE\ font                     \NC\NR
\NC type3     \NC this is a bitmapped (\PK) font                            \NC\NR
\NC truetype  \NC this is a \TRUETYPE\ or \TRUETYPE|-|based \OPENTYPE\ font \NC\NR
\NC opentype  \NC this is a \POSTSCRIPT|-|based \OPENTYPE\ font             \NC\NR
\stoptabulate

(\type{type3} fonts are provided for backward compatibility only, and do not
support the new wide encoding options.)

Values for \type{embedding} are:

\starttabulate[|Tl|p|]
\NC \ssbf value \NC \bf description \NC\NR
\NC no        \NC don't embed the font at all \NC\NR
\NC subset    \NC include and atttempt to subset the font \NC\NR
\NC full      \NC include this font in its entirety \NC\NR
\stoptabulate

It is not possible to artificially modify the transformation matrix
for the font at the moment.

The other fields are used as follows: The \type{fullname} will be the
\POSTSCRIPT|/|\PDF\ font name. The \type{cidinfo} will be used as the
character set (the CID \type{/Ordering} and \type{/Registry} keys). The
\type{filename} points to the actual font file. If you include the
full path in the \type{filename} or if the file is in the local
directory, \LUATEX\ will run a little bit more efficient because it
will not have to re|-|run the \type{find_xxx_file} callback in that
case.

Be careful: when mixing old and new fonts in one document, it is possible to
create \POSTSCRIPT\ name clashes that can result in printing
errors. When this happens, you have to change the \type{fullname}
of the font.

Typeset strings are written out in a wide format using 2~bytes per
glyph, using the \type{index} key in the character information as
value. The overall effect is like having an encoding based on numbers
instead of traditional (\POSTSCRIPT) name|-|based reencoding. The way
to get the correct \type{index} numbers for \TYPEONE\ fonts is by
loading the font via \type{fontloader.open}; use the table indices as
\type{index} fields.

This type of reencoding means that there is no longer a clear
connection between the text in your input file and the strings in the
output \PDF\ file. Dealing with this is high on the agenda.

\section[virtualfonts]{Virtual fonts}

You have to take the following steps if you want \LUATEX\ to treat the
returned table from \luatex{define_font} as a virtual font:

\startitemize[packed]
\item Set the top|-|level key \type {type} to \type {virtual}.
\item Make sure there is at least one valid entry in \luatex{fonts} (see below).
\item Give a \type {commands} array to every character (see below).
\stopitemize

The presence of the toplevel \type {type} key with the specific value
\type {virtual} will trigger handling of the rest of the special virtual
font fields in the table, but the mere existence of 'type' is enough
to prevent \LUATEX\ from looking for a virtual font on its own.

Therefore, this also works \quote{in reverse}: if you are absolutely certain
that a font is not a virtual font, assigning the value \type{base} or
\type{real} to \type{type} will inhibit \LUATEX\ from looking for a virtual font
file, thereby saving you a disk search.

The \luatex{fonts} is another \LUA\ array. The values are one- or two|-|key
hashes themselves, each entry indicating one of the base fonts in a
virtual font. In case your font is referring to itself, you can use the
\type {font.nextid()} function which returns the index of the next to be defined
font which is probably the currently defined one.

An example makes this easy to understand

\starttyping
fonts = {
    { name = 'ptmr8a', size = 655360 },
    { name = 'psyr', size = 600000 },
    { id = 38 }
}
\stoptyping

says that the first referenced font (index 1) in this virtual font is
\type{ptrmr8a} loaded at 10pt, and the second is \type{psyr}  loaded
at a little over 9pt. The third one is previously defined font that
is known to \LUATEX\ as fontid \quote{38}.

The array index numbers are used by the character command definitions
that are part of each character.

The \luatex{commands} array is a hash where each item is another small array, with the first
entry representing a command and the extra items being the parameters to that command. The
allowed commands and their arguments are:

\starttabulate[|Tl|l|l|p|]
\NC \ssbf command name  \NC \bf arguments \NC \bf arg type \NC \bf description \NC\NR
\NC font              \NC 1         \NC number    \NC select a new font from the local \luatex{fonts} table\NC\NR
\NC char              \NC 1         \NC number    \NC typeset this character number from the current font,
                                                      and move right by the character's width\NC\NR
\NC node              \NC 1         \NC node      \NC output this node (list), and move right
                                                      by the width of this list\NC\NR
\NC slot              \NC 2         \NC number    \NC a shortcut for the combination of a font and char command\NC\NR
\NC push              \NC 0         \NC           \NC save current position\NC\NR
\NC nop               \NC 0         \NC           \NC do nothing \NC\NR
\NC pop               \NC 0         \NC           \NC pop position \NC\NR
\NC rule              \NC 2         \NC 2 numbers \NC output a rule $ht*wd$, and move right.\NC\NR
\NC down              \NC 1         \NC number    \NC move down on the page\NC\NR
\NC right             \NC 1         \NC number    \NC move right on the page\NC\NR
\NC special           \NC 1         \NC string    \NC output a \tex{special} command\NC\NR
\NC image             \NC 1         \NC image     \NC output an image (the argument can be either an \type{<image>}
                                                      variable or an \type{image_spec} table)\NC\NR
\NC comment           \NC any       \NC any       \NC the arguments of this command are ignored\NC\NR
\stoptabulate

Here is a rather elaborate glyph commands example:
\starttyping
...
commands = {
  {'push'},                     -- remember where we are
  {'right', 5000},              -- move right about 0.08pt
  {'font', 3},                  -- select the fonts[3] entry
  {'char', 97},                 -- place character 97 (ASCII 'a')
  {'pop'},                      -- go all the way back
  {'down', -200000},            -- move upwards by about 3pt
  {'special', 'pdf: 1 0 0 rg'}  -- switch to red color
  {'rule', 500000, 20000}       -- draw a bar
  {'special','pdf: 0 g'}        -- back to black
}
...
\stoptyping

The default value for \type {font} is always~1 at the start of the \type{commands} array.
Therefore, if the virtual font is essentially only a re|-|encoding, then you do usually not
have create an explicit \quote{font} command in the array.

Rules inside of \type{commands} arrays are built up using only two dimensions:
they do not have depth. For correct vertical placement, an extra \type{down} command
may be needed.

Regardless of the amount of movement you create within the \type {commands},
the output pointer will always move by exactly the width that was given in
the \type {width} key of the character hash. Any movements that take place
inside the \type{commands} array are ignored on the upper level.

\subsection{Artificial fonts}

Even in a \quote{real} font, there can be virtual characters. When \LUATEX\ encounters a \type {commands}
field inside a character when it becomes time to typeset the character, it will interpret the commands, just
like for a true virtual character. In this case, if you have created no \quote{fonts} array, then the default
(and only) \quote{base} font is taken to be the current font itself. In practice, this means that you can
create virtual duplicates of existing characters which is useful if you want to create composite characters.

Note: this feature does {\it not\/} work the other way around. There can not be \quote{real} characters in a
virtual font! You cannot use this technique for font re-encoding either; you need a truly virtual
font for that (because characters that are already present cannot be altered).

\subsection{Example virtual font}

Finally, here is a plain \TEX\ input file with a virtual font demonstration:

\startbuffer
\directlua {
  callback.register('define_font',
    function (name,size)
      if name == 'cmr10-red' then
        f = font.read_tfm('cmr10',size)
        f.name = 'cmr10-red'
        f.type = 'virtual'
        f.fonts = {{ name = 'cmr10', size = size }}
        for i,v in pairs(f.characters) do
          if (string.char(i)):find('[tacohanshartmut]') then
             v.commands = {
               {'special','pdf: 1 0 0 rg'},
               {'char',i},
               {'special','pdf: 0 g'},
              }
          else
             v.commands = {{'char',i}}
          end
        end
      else
        f = font.read_tfm(name,size)
      end
      return f
    end
  )
}

\font\myfont = cmr10-red at 10pt \myfont  This is a line of text \par
\font\myfontx= cmr10 at 10pt \myfontx Here is another line of text \par
\stopbuffer

\typebuffer

%\getbuffer

\chapter[nodes]{Nodes}

\section{\LUA\ node representation}

\TEX's nodes are represented in \LUA\ as userdata object with a variable
set of fields. In the following syntax tables, such the type of such a
userdata object is represented as \syntax{<node>}.


The current return value of \luatex{node.types()} is:
\ctxlua {
  local d = node.types()
  tex.print('\\type{' .. d[0] .. '} (' .. 0 .. '), ')
  for _,v in pairs(d) do
    if _ > 0 then
      tex.print('\\type{' .. v .. '} (' .. _ .. '), ')
    end
  end
}.

NOTE: The \type {\lastnodetype} primitive is \ETEX\ compliant. The valid
range is still -1 .. 15 and glyph nodes have number 0 (used to be
char node) and ligature nodes are mapped to 7. That way macro
packages can use the same symbolic names as in traditional \ETEX.
Keep in mind that the internal node numbers are different and that
there are more node types than 15.

\subsection{Auxiliary items}

A few node|-|typed userdata objects do not occur in the \quote{normal}
list of nodes, but can be pointed to from within that list. They are
not quite the same as regular nodes, but it is easier for the library
routines to treat them as if they were.

\subsubsection{glue_spec items}

Skips are about the only type of data objects in traditional \TEX\
that are not a simple value. The structure that represents the glue
components of a skip is called a \type {glue_spec}, and it has the following
accessible fields:

\starttabulate[|lT|l|p|]
\NC \ssbf key            \NC \bf type \NC \bf explanation \NC\NR
\NC width          \NC number  \NC \NC\NR
\NC stretch        \NC number  \NC \NC\NR
\NC stretch_order  \NC number  \NC \NC\NR
\NC shrink         \NC number  \NC \NC\NR
\NC shrink_order   \NC number  \NC \NC\NR
\NC writable       \NC boolean \NC If this is true, you can't assign to this \type{glue_spec}
                                   because it is one of the preallocated special cases. New in 0.52\NC\NR
\stoptabulate

These objects are reference counted, so there is actually an extra
field named \type {ref_count} as well. This item type will likely
disappear in the future, and the glue fields themselves will
become part of the nodes referencing glue items.

\subsubsection{attribute{\_}list and attribute items}

The newly introduced attribute registers are non|-|trivial, because
the value that is attached to a node is essentially a sparse array of
key|-|value pairs.

It is generally easiest to deal with attribute lists and attributes
by using the dedicated functions in the \luatex{node} library, but
for completeness, here is the low|-|level interface.

An \type{attribute_list} item is used as a head pointer for a list
of attribute items. It has only one user-visible field:

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC next          \NC \syntax{<node>}  \NC pointer to the first attribute\NC\NR
\stoptabulate

A normal node's attribute field will point to an item of type
\type{attribute_list}, and the \type{next} field in that item will point
to the first defined \quote{attribute} item, whose \type {next} will
point to the second \quote{attribute} item, etc.

Valid fields in \type{attribute} items:

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC next           \NC \syntax{<node>}  \NC pointer to the next attribute\NC\NR
\NC number         \NC number  \NC the attribute type id\NC\NR
\NC value          \NC number  \NC the attribute value\NC\NR
\stoptabulate

\subsubsection{action item}

Valid fields: \showfields{action}\crlf
Id: \showid{action}

These are a special kind of item that only appears inside
pdf start link objects.

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC action_type   \NC  number   \NC  \NC\NR
\NC action_id     \NC  number or string   \NC  \NC\NR
\NC named_id      \NC  number   \NC  \NC\NR
\NC file          \NC  string   \NC  \NC\NR
\NC new_window    \NC  number   \NC  \NC\NR
\NC data          \NC  string   \NC  \NC\NR
\NC ref_count     \NC  number   \NC  \NC\NR
\stoptabulate

\subsection{Main text nodes}

These are the nodes that comprise actual typesetting commands.

A few fields are present in all nodes regardless of their type, these are:

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC next    \NC \syntax{<node>}  \NC  The next node in a list, or nil\NC\NR
\NC id      \NC number  \NC  The node's type (\type{id}) number \NC\NR
\NC subtype \NC number  \NC  The node \type{subtype} identifier\NC\NR
\stoptabulate

The \type{subtype} is sometimes just a stub entry. Not all nodes
actually use the \type{subtype}, but this way you can be sure that all
nodes accept it as a valid field name, and that is often handy in node
list traversal. In the following tables \type{next} and \type{id} are
not explicitly mentioned.

Besides these three fields, almost all nodes also have an \type {attr}
field, and there is a also a field called \type{prev}. That last field
is always present, but only initialized on explicit request: when the
function \type{node.slide()} is called, it will set up the \type{prev}
fields to be a backwards pointer in the argument node list.


\subsubsection{hlist nodes}

Valid fields: \showfields{hlist}\crlf
Id: \showid{hlist}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  0 = unknown origin, 1 = created by
linebreaking, 2 = explicit box command. (0.46.0),
3 = paragraph indentation box, 4 = alignment column or row, 5 = alignment cell (0.62.0)\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  The head of the associated attribute list \NC\NR
\NC width      \NC number  \NC  \NC\NR
\NC height     \NC number  \NC  \NC\NR
\NC depth      \NC number  \NC  \NC\NR
\NC shift      \NC number  \NC  a displacement perpendicular to the
                                character progression direction \NC\NR
\NC glue_order \NC number  \NC  a number in the range 0--4, indicating
                                the glue order\NC\NR
\NC glue_set   \NC number  \NC  the calculated glue ratio\NC\NR
\NC glue_sign  \NC number  \NC  \NC\NR
\NC head       \NC \syntax{<node>}    \NC  the first node of the body of this list\NC\NR
\NC dir        \NC string  \NC  the direction of this box. see~\in{}[dirnodes]\NC\NR
\stoptabulate

A warning: never assign a node list to the \type{head} field
unless you are sure its internal link structure is correct, otherwise
an error may result.

Note: the new field name \type{head} was introduced in 0.65 to replace
the old name \type{list}. Use of the name \type{list} is now
deprecated, but it will stay available until at least version 0.80.

\subsubsection{vlist nodes}

Valid fields: As for hlist, except that \quote{shift} is a displacement
perpendicular to the line progression direction, and \quote{subtype} only
has subtypes 0, 4, and 5.

\subsubsection{rule nodes}

Valid fields: \showfields{rule}\crlf
Id: \showid{rule}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  unused\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC width      \NC number  \NC  the width of the rule; the special value $-1073741824$
                                is used for \quote{running} glue dimensions\NC\NR
\NC height     \NC number  \NC  the height of the rule (can be negative)\NC\NR
\NC depth      \NC number  \NC  the depth of the rule (can be negative)\NC\NR
\NC dir        \NC string  \NC  the direction of this rule. see~\in{}[dirnodes]\NC\NR
\stoptabulate

\subsubsection{ins nodes}

Valid fields: \showfields{ins}\crlf
Id: \showid{ins}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  the insertion class\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC cost       \NC number  \NC  the penalty associated with this insert\NC\NR
\NC height     \NC number  \NC  \NC\NR
\NC depth      \NC number  \NC  \NC\NR
\NC head       \NC \syntax{<node>}    \NC the first node of the body of this insert\NC\NR
\NC spec       \NC \syntax{<node>}    \NC a pointer to the \tex{splittopskip} glue spec\NC\NR
\stoptabulate

A warning: never assign a node list to the \type{head} field
unless you are sure its internal link structure is correct, otherwise
an error may be result.

Note: the new field name \type{head} was introduced in 0.65 to replace
the old name \type{list}. Use of the name \type{list} is now
deprecated, but it will stay available until at least version 0.80.


\subsubsection{mark nodes}

Valid fields: \showfields{mark}\crlf
Id: \showid{mark}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  unused\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC class      \NC number  \NC  the mark class\NC\NR
\NC mark       \NC table   \NC  a table representing a token list\NC\NR
\stoptabulate

\subsubsection{adjust nodes}

Valid fields: \showfields{adjust}\crlf
Id: \showid{adjust}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  0 = normal, 1 = \quote{pre}\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC head       \NC \syntax{<node>}    \NC  adjusted material\NC\NR
\stoptabulate

A warning: never assign a node list to the \type{head} field
unless you are sure its internal link structure is correct, otherwise
an error may be result.

Note: the new field name \type{head} was introduced in 0.65 to replace
the old name \type{list}. Use of the name \type{list} is now
deprecated, but it will stay available until at least version 0.80.


\subsubsection{disc nodes}

Valid fields: \showfields{disc}\crlf
Id: \showid{disc}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  indicates the source of a discretionary.
                                0 = the \tex{discretionary} command,
                                1 = the \tex{-} command,
                                2 = added automatically following a \type{-},
                                3 = added by the hyphenation algorithm (simple),
                                4 = added by the hyphenation algorithm (hard, first item),
                                5 = added by the hyphenation algorithm (hard, second item)\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC pre        \NC \syntax{<node>}    \NC  pointer to the pre|-|break text\NC\NR
\NC post       \NC \syntax{<node>}    \NC  pointer to the post|-|break text\NC\NR
\NC replace    \NC \syntax{<node>}    \NC  pointer to the no|-|break text\NC\NR
\stoptabulate

The subtype numbers~4 and~5 belong to the \quote{of-f-ice} explanation given elsewhere.

A warning: never assign a node list to the pre, post or replace field
unless you are sure its internal link structure is correct, otherwise
an error may be result.

\subsubsection{math nodes}

Valid fields: \showfields{math}\crlf
Id: \showid{math}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  0 = \quote{on}, 1 = \quote{off}\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC surround   \NC number  \NC  width of the \tex{mathsurround} kern\NC\NR
\stoptabulate

\subsubsection{glue nodes}

Valid fields: \showfields{glue}\crlf
Id: \showid{glue}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  0 = \tex{skip},
                                1--18 = internal glue parameters,
                                100-103 = \quote{leader} subtypes \NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC spec       \NC \syntax{<node>}    \NC  pointer to a glue{\_}spec item \NC\NR
\NC leader     \NC \syntax{<node>}    \NC  pointer to a box or rule for leaders\NC\NR
\stoptabulate

The exact meanings of the subtypes are as follows:

\starttabulate[|rT|l|]
\NC  1  \NC \tex{lineskip}                 \NC \NR
\NC  2  \NC \tex{baselineskip}             \NC \NR
\NC  3  \NC \tex{parskip}                  \NC \NR
\NC  4  \NC \tex{abovedisplayskip}         \NC \NR
\NC  5  \NC \tex{belowdisplayskip}         \NC \NR
\NC  6  \NC \tex{abovedisplayshortskip}    \NC \NR
\NC  7  \NC \tex{belowdisplayshortskip}    \NC \NR
\NC  8  \NC \tex{leftskip}                 \NC \NR
\NC  9  \NC \tex{rightskip}                \NC \NR
\NC 10  \NC \tex{topskip}                  \NC \NR
\NC 11  \NC \tex{splittopskip}             \NC \NR
\NC 12  \NC \tex{tabskip}                  \NC \NR
\NC 13  \NC \tex{spaceskip}                \NC \NR
\NC 14  \NC \tex{xspaceskip}               \NC \NR
\NC 15  \NC \tex{parfillskip}              \NC \NR
\NC 16  \NC \tex{thinmuskip}               \NC \NR
\NC 17  \NC \tex{medmuskip}                \NC \NR
\NC 18  \NC \tex{thickmuskip}              \NC \NR
\NC 100 \NC \tex{leaders}                  \NC \NR
\NC 101 \NC \tex{cleaders}                 \NC \NR
\NC 102 \NC \tex{xleaders}                 \NC \NR
\NC 103 \NC \tex{gleaders}                 \NC \NR
\stoptabulate

\subsubsection{kern nodes}

Valid fields: \showfields{kern}\crlf
Id: \showid{kern}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  0 = from font,
                                1 = from \tex{kern} or \tex{/},
                                2 = from \tex{accent}\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC kern      \NC number  \NC  \NC\NR
\stoptabulate


\subsubsection{penalty nodes}

Valid fields: \showfields{penalty}\crlf
Id: \showid{penalty}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  not used\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC penalty    \NC number  \NC  \NC\NR
\stoptabulate

\subsubsection[glyphnodes]{glyph nodes}

Valid fields: \showfields{glyph}\crlf
Id: \showid{glyph}

\starttabulate[|lT|l|p|]
\NC \ssbf field\NC \ssbf type         \NC \ssbf explanation \NC\NR
\NC subtype    \NC number             \NC bitfield\NC\NR
\NC attr       \NC \syntax{<node>}    \NC \NC\NR
\NC char       \NC number             \NC \NC\NR
\NC font       \NC number             \NC \NC\NR
\NC lang       \NC number             \NC \NC\NR
\NC left       \NC number             \NC \NC\NR
\NC right      \NC number             \NC \NC\NR
\NC uchyph     \NC boolean            \NC \NC\NR
\NC components \NC \syntax{<node>}    \NC pointer to ligature components\NC\NR
\NC xoffset    \NC number             \NC \NC\NR
\NC yoffset    \NC number             \NC \NC\NR
\NC width      \NC number             \NC (new in 0.53)\NC\NR
\NC height     \NC number             \NC (new in 0.53)\NC\NR
\NC depth      \NC number             \NC (new in 0.53)\NC\NR
\stoptabulate

A warning: never assign a node list to the components field
unless you are sure its internal link structure is correct, otherwise
an error may be result.

Valid bits for the \type{subtype} field are:

\starttabulate[|c|l|]
\NC \ssbf bit  \NC \bf meaning \NC\NR
\NC 0 \NC character \NC\NR
\NC 1 \NC glyph     \NC\NR
\NC 2 \NC ligature  \NC\NR
\NC 3 \NC ghost     \NC\NR
\NC 4 \NC left      \NC\NR
\NC 5 \NC right     \NC\NR
\stoptabulate

See \in{section}[charsandglyphs] for a detailed description of the
\type{subtype} field.


\subsubsection{margin{\_}kern nodes}

Valid fields: \showfields{margin_kern}\crlf
Id: \showid{margin_kern}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC subtype    \NC number  \NC  0 = left side,
                                1 = right side\NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC width      \NC number  \NC  \NC\NR
\NC glyph      \NC \syntax{<node>}    \NC  \NC\NR
\stoptabulate

\subsection{Math nodes}

These are the so||called \quote{noad}s and the nodes that are specifically
associated with math processing. Most of these nodes contain sub-nodes so
that the list of possible fields is actually quite small. First, the subnodes:

\subsubsection{Math kernel subnodes}

Many object fields in math mode are either simple characters in a
specific family or math lists or node lists. There are four associated
subnodes that represent these cases (in the following node
descriptions these are indicated by the word \type{<kernel>}).

The \type{next} and \type{prev} fields for these subnodes are unused.

\subsubsubsection{math{\_}char and math{\_}text{\_}char subnodes}

Valid fields: \showfields{math_char}\crlf
Id: \showid{math_char}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type  \NC  \bf explanation \NC\NR
\NC attr       \NC \syntax{<node>}\NC          \NC\NR
\NC char       \NC number         \NC          \NC \NR
\NC fam        \NC number         \NC          \NC\NR
\stoptabulate

The \type{math_char} is the simplest subnode field, it contains
the character and family for a single glyph object. The
\type{math_text_char} is a special case that you will not
normally encounter, it arises temporarily during math list conversion
(its sole function is to suppress a following italic correction).

\subsubsubsection{sub{\_}box and sub{\_}mlist subnodes}

Valid fields: \showfields{sub_box}\crlf
Id: \showid{sub_box}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type  \NC  \bf explanation \NC\NR
\NC attr       \NC \syntax{<node>}\NC          \NC\NR
\NC head       \NC \syntax{<node>}\NC          \NC \NR
\stoptabulate

These two subnode types are used for subsidiary list items. For
\type{sub_box}, the \type{head} points to a \quote{normal} vbox or
hbox. For \type{sub_mlist}, the \type{head} points to a math list
that is yet to be converted.

A warning: never assign a node list to the \type{head} field
unless you are sure its internal link structure is correct, otherwise
an error may be result.

Note: the new field name \type{head} was introduced in 0.65 to replace
the old name \type{list}. Use of the name \type{list} is now
deprecated, but it will stay available until at least version 0.80.

\subsubsection{Math delimiter subnode}

There is a fifth subnode type that is used exclusively for delimiter
fields. As before, the \type{next} and \type{prev} fields are unused.

\subsubsubsection{delim subnodes}

Valid fields: \showfields{delim}\crlf
Id: \showid{delim}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type  \NC  \bf explanation \NC\NR
\NC attr       \NC \syntax{<node>}\NC          \NC\NR
\NC small_char \NC number         \NC          \NC \NR
\NC small_fam  \NC number         \NC          \NC\NR
\NC large_char \NC number         \NC          \NC \NR
\NC large_fam  \NC number         \NC          \NC\NR
\stoptabulate

The fields \type{large_char} and \type{large_fam} can be zero, in that
case the font that is sed for the \type{small_fam} is expected to
provide the large version as an extension to the \type{small_char}.

\subsubsection{Math core nodes}

First, there are the objects (the \TEX  book calls then \quote{atoms})
that are associated with the simple math objects: Ord, Op, Bin, Rel,
Open, Close, Punct, Inner, Over, Under, Vcent. These all have
the same fields, and they are combined into a single node type with
separate subtypes for differentiation.

\subsubsubsection{simple nodes}

Valid fields: \showfields{noad}\crlf
Id: \showid{noad}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type    \NC  \bf explanation \NC\NR
\NC subtype    \NC number           \NC see below    \NC\NR
\NC attr       \NC \syntax{<node>}  \NC          \NC\NR
\NC nucleus    \NC \syntax{<kernel>}\NC          \NC\NR
\NC sub        \NC \syntax{<kernel>}\NC          \NC\NR
\NC sup        \NC \syntax{<kernel>}\NC          \NC\NR
\stoptabulate

Operators are a bit special because they occupy three subtypes.
\type{subtype}.

\starttabulate[|lT|p|]
\NC \ssbf number     \NC  \bf node sub type         \NC\NR
\NC 0                \NC  Ord                       \NC\NR
\NC 1                \NC  Op, \type{\displaylimits} \NC\NR
\NC 2                \NC  Op, \type{\limits}        \NC\NR
\NC 3                \NC  Op, \type{\nolimits}      \NC\NR
\NC 4                \NC  Bin                       \NC\NR
\NC 5                \NC  Rel                       \NC\NR
\NC 6                \NC  Open                      \NC\NR
\NC 7                \NC  Close                     \NC\NR
\NC 8                \NC  Punct                     \NC\NR
\NC 9                \NC  Inner                     \NC\NR
\NC 10               \NC  Under                     \NC\NR
\NC 11               \NC  Over                      \NC\NR
\NC 12               \NC  Vcent                     \NC\NR
\stoptabulate

\subsubsubsection{accent nodes}

Valid fields: \showfields{accent}\crlf
Id: \showid{accent}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type  \NC  \bf explanation \NC\NR
\NC subtype    \NC number           \NC the first bit is used for a fixed top accent flag (if the \type{accent} field is present),
                                        the second bit for a fixed bottom accent flag (if the \type{bot_accent} field is present).
                                        Example: the actual value \type{3} means: do not stretch either accent\NC\NR
\NC attr       \NC \syntax{<node>}\NC          \NC\NR
\NC nucleus    \NC \syntax{<kernel>}\NC          \NC \NR
\NC sub        \NC \syntax{<kernel>}\NC          \NC\NR
\NC sup        \NC \syntax{<kernel>}\NC          \NC \NR
\NC accent     \NC \syntax{<kernel>}\NC          \NC\NR
\NC bot_accent \NC \syntax{<kernel>}\NC          \NC\NR
\stoptabulate

\subsubsubsection{style nodes}

Valid fields: \showfields{style}\crlf
Id: \showid{style}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type  \NC  \bf explanation \NC\NR
\NC style    \NC string         \NC  contains the style \NC\NR
\stoptabulate

There are eight possibilities for the string value: one of
\quote{display}, \quote{text}, \quote{script}, or \quote{scriptscript}.
Each of these can have a trailing \type{'} to signify
\quote{cramped} styles.

\subsubsubsection{choice nodes}

Valid fields: \showfields{choice}\crlf
Id: \showid{choice}

\starttabulate[|lT|l|p|]
\NC \ssbf field  \NC \bf type       \NC  \bf explanation \NC\NR
\NC attr         \NC \syntax{<node>}\NC          \NC\NR
\NC display      \NC \syntax{<node>}\NC          \NC\NR
\NC text         \NC \syntax{<node>}\NC          \NC\NR
\NC script       \NC \syntax{<node>}\NC          \NC\NR
\NC scriptscript \NC \syntax{<node>}\NC          \NC\NR
\stoptabulate

A warning: never assign a node list to the display, text, script, or
scriptscript field unless you are sure its internal link structure is
correct, otherwise an error may be result.

\subsubsubsection{radical nodes}

Valid fields: \showfields{radical}\crlf
Id: \showid{radical}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type  \NC  \bf explanation \NC\NR
\NC attr       \NC \syntax{<node>}\NC          \NC\NR
\NC nucleus    \NC \syntax{<kernel>}\NC          \NC \NR
\NC sub        \NC \syntax{<kernel>}\NC          \NC\NR
\NC sup        \NC \syntax{<kernel>}\NC          \NC \NR
\NC left       \NC \syntax{<delim>}\NC          \NC \NR
\NC degree     \NC \syntax{<kernel>}\NC Only set by \type{\Uroot} \NC \NR
\stoptabulate

A warning: never assign a node list to the nucleus, sub, sup, left, or
degree field
unless you are sure its internal link structure is correct, otherwise
an error may be result.

\subsubsubsection{fraction nodes}

Valid fields: \showfields{fraction}\crlf
Id: \showid{fraction}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type  \NC  \bf explanation \NC\NR
\NC attr       \NC \syntax{<node>}\NC          \NC\NR
\NC width      \NC number \NC          \NC \NR
\NC num        \NC \syntax{<kernel>}\NC          \NC\NR
\NC denom      \NC \syntax{<kernel>}\NC          \NC \NR
\NC left       \NC \syntax{<delim>}\NC          \NC \NR
\NC right      \NC \syntax{<delim>}\NC          \NC \NR
\stoptabulate

A warning: never assign a node list to the num, or denom field
unless you are sure its internal link structure is correct, otherwise
an error may be result.

\subsubsubsection{fence nodes}

Valid fields: \showfields{fence}\crlf
Id: \showid{fence}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type  \NC  \bf explanation \NC\NR
\NC subtype    \NC number         \NC 1 = \type{\left},
                                      2 = \type{\middle},
                                      3 = \type{\right} \NC\NR
\NC attr       \NC \syntax{<node>}\NC          \NC\NR
\NC delim       \NC \syntax{<delim>}\NC          \NC \NR
\stoptabulate

\subsection{whatsit nodes}

Whatsit nodes come in many subtypes that you can ask for by running
\luatex{node.whatsits()}:
\ctxlua {for n,name in table.sortedpairs(node.whatsits()) do
  if (n<100) then
     if (n>0) then tex.sprint (', ') end
     tex.sprint('\\type{' .. name .. '} (' .. n .. ')') end
end }

\subsubsection{open nodes}

Valid fields: \showfields{whatsit,open}\crlf
Id: \showid{whatsit,open}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC stream    \NC number  \NC \TEX's stream id number\NC\NR
\NC name      \NC string  \NC file name \NC\NR
\NC ext       \NC string  \NC file extension \NC\NR
\NC area      \NC string  \NC file area (this may become obsolete) \NC\NR
\stoptabulate

\subsubsection{write nodes}

Valid fields: \showfields{whatsit,write}\crlf
Id: \showid{whatsit,write}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC stream    \NC number  \NC \TEX's stream id number\NC\NR
\NC data      \NC table   \NC a table representing the token list to be written\NC\NR
\stoptabulate

\subsubsection{close nodes}

Valid fields: \showfields{whatsit,close}\crlf
Id: \showid{whatsit,close}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC stream    \NC number  \NC \TEX's stream id number\NC\NR
\stoptabulate

\subsubsection{special nodes}

Valid fields: \showfields{whatsit,special}\crlf
Id: \showid{whatsit,special}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC data      \NC string  \NC the \tex{special} information\NC\NR
\stoptabulate

\subsubsection{language nodes}


\LUATEX\ does not have language whatsits any more. All language
information is already present inside the glyph nodes themselves.
This whatsit subtype will be removed in the next release.


\subsubsection{local_par nodes}

Valid fields: \showfields{whatsit,local_par}\crlf
Id: \showid{whatsit,local_par}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC pen_inter \NC number  \NC local interline penalty (from \tex{localinterlinepenalty})\NC\NR
\NC pen_broken\NC number  \NC local broken penalty (from \tex{localbrokenpenalty})\NC\NR
\NC dir       \NC string  \NC the direction of this par. see~\in{}[dirnodes]\NC\NR
\NC box_left  \NC \syntax{<node>}      \NC the \tex{localleftbox}\NC\NR
\NC box_left_width\NC number\NC width of the \tex{localleftbox}\NC\NR
\NC box_right  \NC \syntax{<node>}      \NC the \tex{localrightbox}\NC\NR
\NC box_right_width\NC number\NC width of the \tex{localrightbox}\NC\NR
\stoptabulate

A warning: never assign a node list to the box_left or box_right field
unless you are sure its internal link structure is correct, otherwise
an error may be result.


\subsubsection[dirnodes]{dir nodes}

Valid fields: \showfields{whatsit,dir}\crlf
Id: \showid{whatsit,dir}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC dir       \NC string  \NC the direction (but see below)\NC\NR
\NC level     \NC number  \NC nesting level of this direction whatsit\NC\NR
\NC dvi_ptr   \NC number  \NC a saved dvi buffer byte offset\NC\NR
\NC dir_h     \NC number  \NC a saved dvi position\NC\NR
\stoptabulate

A note on \type{dir} strings. Direction specifiers are three-letter
combinations of  \type{T}, \type{B},  \type{R}, and \type{L}.

These are built up out of three separate items:
\startitemize
\item the first is the direction of the \quote{top} of paragraphs.
\item the second is the direction of the \quote{start} of lines.
\item the third is the direction of the \quote{top} of glyphs.
\stopitemize

However, only four combinations are accepted: \type{TLT}, \type{TRT},
\type{RTT}, and \type{LTL}.

Inside actual \type{dir} whatsit nodes, the representation of
\type{dir} is not a three-letter but a four-letter combination. The
first character in this case is always either \type{+} or \type{-},
indicating whether the value is pushed or popped from the direction
stack.

\subsubsection{pdf_literal nodes}

Valid fields: \showfields{whatsit,pdf_literal}\crlf
Id: \showid{whatsit,pdf_literal}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC mode      \NC number  \NC  the \quote{mode} setting of this literal\NC\NR
\NC data      \NC string  \NC the \tex{pdfliteral} information\NC\NR
\stoptabulate

Mode values:

\starttabulate[|lT|p|]
\NC  \ssbf value \NC \ssbf corresponding \tex{pdftex} keyword \NC \NR
\NC 0            \NC setorigin                                \NC \NR
\NC 1            \NC page                                     \NC \NR
\NC 2            \NC direct                                   \NC \NR
\stoptabulate

\subsubsection{pdf_refobj nodes}

Valid fields: \showfields{whatsit,pdf_refobj}\crlf
Id: \showid{whatsit,pdf_refobj}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC objnum    \NC number  \NC the referenced \PDF\ object number\NC\NR
\stoptabulate

\subsubsection{pdf_refxform nodes}

Valid fields: \showfields{whatsit,pdf_refxform}\crlf
Id: \showid{whatsit,pdf_refxform}.

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC width     \NC number  \NC \NC \NR
\NC height    \NC number  \NC \NC \NR
\NC depth     \NC number  \NC \NC \NR
\NC objnum    \NC number  \NC the referenced \PDF\ object number\NC\NR
\stoptabulate

Be aware that \type{pdf_refxform} nodes have dimensions that are used by \LUATEX.

\subsubsection{pdf_refximage nodes}

Valid fields: \showfields{whatsit,pdf_refximage}\crlf
Id: \showid{whatsit,pdf_refximage}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC width     \NC number  \NC \NC \NR
\NC height    \NC number  \NC \NC \NR
\NC depth     \NC number  \NC \NC \NR
\NC objnum    \NC number  \NC the referenced \PDF\ object number\NC\NR
\stoptabulate

Be aware that \type{pdf_refximage} nodes have dimensions that are used by \LUATEX.

\subsubsection{pdf_annot nodes}

Valid fields: \showfields{whatsit,pdf_annot}\crlf
Id: \showid{whatsit,pdf_annot}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC width     \NC number  \NC \NC \NR
\NC height    \NC number  \NC \NC \NR
\NC depth     \NC number  \NC \NC \NR
\NC objnum    \NC number  \NC the referenced \PDF\ object number\NC\NR
\NC data      \NC string  \NC the annotation data\NC\NR
\stoptabulate


\subsubsection{pdf_start_link nodes}

Valid fields: \showfields{whatsit,pdf_start_link}\crlf
Id: \showid{whatsit,pdf_start_link}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC width     \NC number  \NC \NC \NR
\NC height    \NC number  \NC \NC \NR
\NC depth     \NC number  \NC \NC \NR
\NC objnum    \NC number  \NC the referenced \PDF\ object number\NC\NR
\NC link_attr \NC table   \NC the link attribute token list\NC\NR
\NC action    \NC \syntax{<node>}    \NC the action to perform\NC\NR
\stoptabulate

\subsubsection{pdf_end_link nodes}

Valid fields: \showfields{whatsit,pdf_end_link}\crlf
Id: \showid{whatsit,pdf_end_link}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\stoptabulate

\subsubsection{pdf_dest nodes}

Valid fields: \showfields{whatsit,pdf_dest}\crlf
Id: \showid{whatsit,pdf_dest}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC width     \NC number  \NC \NC \NR
\NC height    \NC number  \NC \NC \NR
\NC depth     \NC number  \NC \NC \NR
\NC named_id  \NC number  \NC is the dest_id a string value?\NC\NR
\NC dest_id   \NC number or string \NC the destination id\NC\NR
\NC dest_type \NC number\NC type of destination\NC\NR
\NC xyz_zoom  \NC number\NC \NC\NR
\NC objnum    \NC number  \NC the \PDF\ object number\NC\NR
\stoptabulate

\subsubsection{pdf_thread nodes}

Valid fields: \showfields{whatsit,pdf_thread}\crlf
Id: \showid{whatsit,pdf_thread}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC width      \NC number  \NC \NC \NR
\NC height     \NC number  \NC \NC \NR
\NC depth      \NC number  \NC \NC \NR
\NC named_id   \NC number  \NC is the tread_id a string value?\NC\NR
\NC tread_id   \NC number or string \NC the thread id\NC\NR
\NC thread_attr\NC number           \NC extra thread information\NC\NR
\stoptabulate

\subsubsection{pdf_start_thread nodes}

Valid fields: \showfields{whatsit,pdf_start_thread}\crlf
Id: \showid{whatsit,pdf_start_thread}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr       \NC \syntax{<node>}    \NC  \NC\NR
\NC width      \NC number  \NC \NC \NR
\NC height     \NC number  \NC \NC \NR
\NC depth      \NC number  \NC \NC \NR
\NC named_id   \NC number  \NC is the tread_id a string value?\NC\NR
\NC tread_id   \NC number or string \NC the thread id\NC\NR
\NC thread_attr\NC number           \NC extra thread information\NC\NR
\stoptabulate

\subsubsection{pdf_end_thread nodes}

Valid fields: \showfields{whatsit,pdf_end_thread}\crlf
Id: \showid{whatsit,pdf_end_thread}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\stoptabulate

\subsubsection{pdf_save_pos nodes}

Valid fields: \showfields{whatsit,pdf_save_pos}\crlf
Id: \showid{whatsit,pdf_save_pos}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\stoptabulate

\subsubsection{late_lua nodes}

Valid fields: \showfields{whatsit,late_lua}\crlf
Id: \showid{whatsit,late_lua}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC data      \NC string  \NC data to execute\NC\NR
\NC string    \NC string  \NC data to execute (0.63)\NC\NR
\NC name      \NC string  \NC the name to use for lua error reporting\NC\NR
\stoptabulate

The difference between \type{data} and \type{string} is that on
assignment, the \type{data} field is converted to a token list, cf. use as
\tex{latelua}. The \type{string} version is treated as a literal string.

\subsubsection{pdf_colorstack  nodes}

Valid fields: \showfields{whatsit,pdf_colorstack}\crlf
Id: \showid{whatsit,pdf_colorstack}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC stack    \NC number  \NC colorstack id number\NC\NR
\NC cmd      \NC number  \NC command to execute\NC\NR
\NC data     \NC string  \NC data\NC\NR
\stoptabulate

\subsubsection{pdf_setmatrix nodes}

Valid fields: \showfields{whatsit,pdf_setmatrix}\crlf
Id: \showid{whatsit,pdf_setmatrix}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC data     \NC string  \NC data\NC\NR
\stoptabulate

\subsubsection{pdf_save nodes}

Valid fields: \showfields{whatsit,pdf_save}\crlf
Id: \showid{whatsit,pdf_save}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\stoptabulate

\subsubsection{pdf_restore nodes}

Valid fields: \showfields{whatsit,pdf_restore}\crlf
Id: \showid{whatsit,pdf_restore}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\stoptabulate

\subsubsection{user_defined nodes}

User|-|defined whatsit nodes can only be created and handled from \LUA\
code. In effect, they are an extension to the extension
mechanism. The \LUATEX\ engine will simply step over such whatsits
without ever looking at the contents.

Valid fields: \showfields{whatsit,user_defined}\crlf
Id: \showid{whatsit,user_defined}

\starttabulate[|lT|l|p|]
\NC \ssbf field     \NC \bf type \NC  \bf explanation \NC\NR
\NC attr      \NC \syntax{<node>}    \NC  \NC\NR
\NC user_id  \NC number  \NC id number\NC\NR
\NC type     \NC number  \NC type of the value\NC\NR
\NC value    \NC number  \NC \NC\NR
\NC          \NC string  \NC \NC\NR
\NC          \NC \syntax{<node>}   \NC \NC\NR
\NC          \NC table \NC \NC\NR
\stoptabulate

The \type{type} can have one of five distinct values:

\starttabulate[|lT|p|]
\NC \ssbf value   \NC  \bf explanation \NC\NR
\NC  97           \NC  the value is an attribute node list \NC\NR
\NC  100          \NC  the value is a number \NC\NR
\NC  110          \NC  the value is a node list \NC\NR
\NC  115          \NC  the value is a string\NC\NR
\NC  116          \NC  the value is a token list in \LUA\ table form\NC\NR
\stoptabulate


\chapter{Modifications}

Besides the expected changes caused by new functionality, there are a
number of not|-|so|-|expected changes. These are sometimes a side|-|effect
of a new (conflicting) feature, or, more often than not, a change
necessary to clean up the internal interfaces.

\section{Changes from \TEX\ 3.1415926}

\startitemize

\item The current code base is written in C, not Pascal web (as of \LUATEX~0.42.0).

\item See~\in{chapter}[languages] for many small changes related to paragraph
  building, language handling, and hyphenation. Most important change:
  adding a brace group in the middle of a word (like in \type{of{}fice})
  does not prevent ligature creation.

\item There is no pool file, all strings are embedded during compilation.

\item \type {plus 1 fillll} does not generate an error. The extra \quote{l} is
simply typeset.

\item The upper limit to \tex{endlinechar} and \tex{newlinechar} is 127.

\stopitemize

\section{Changes from \ETEX\ 2.2}

\startitemize

\item The \ETEX\ functionality is always present and enabled
   (but see below about \TEXXET), so the prepended asterisk or
   \type{-etex} switch for \INITEX\ is not needed.

\item \TEXXET\ is not present, so the primitives

\starttyping
\TeXXeTstate
\beginR
\beginL
\endR
\endL
\stoptyping

are missing.

\item Some of the tracing information that is output by \ETEX's \tex{tracingassigns} and
  \tex{tracingrestores} is not there.

\item Register management in \LUATEX\ uses the \ALEPH\ model, so the maximum value is 65535
  and the implementation uses a flat array instead of the mixed
 flat|\&|sparse model from \ETEX.

\item \type{savinghyphcodes} is a no-op.
See~\in{chapter}[languages] for details.

\item When kpathsea is used to find files, \LUATEX\ uses the
\type{ofm} file format to search for font metrics. In turn, this means
that \LUATEX\ looks at the \type{OFMFONTS} configuration variable
(like \OMEGA\ and \ALEPH) instead of \type{TFMFONTS} (like \TEX\ and
\PDFTEX). Likewise for virtual fonts (\LUATEX\ uses the variable
\type{OVFFONTS} instead of \type{VFFONTS}).


\stopitemize

\section{Changes from \PDFTEX\ 1.40}

\startitemize

\item The (experimental) support for snap nodes has been removed, because
it is much more natural to build this functionality on top of node
processing and attributes. The associated primitives that are now gone
are: \tex{pdfsnaprefpoint}, \tex{pdfsnapy}, and \tex{pdfsnapycomp}.

\item The (experimental) support for specialized spacing around nodes
has also been removed. The associated primitives that are now gone are:
\tex{pdfadjustinterwordglue}, \tex{pdfprependkern}, and \tex{pdfappendkern},
as well as the five supporting primitives \tex{knbscode}, \tex{stbscode},
\tex{shbscode}, \tex{knbccode}, and \tex{knaccode}.

\item A number of \quote{utility functions} is removed:

\startcolumns[n=3]
\starttyping
\pdfelapsedtime
\pdfescapehex
\pdfescapename
\pdfescapestring
\pdffiledump
\pdffilemoddate
\pdffilesize
\pdflastmatch
\pdfmatch
\pdfmdfivesum
\pdfresettimer
\pdfshellescape
\pdfstrcmp
\pdfunescapehex
\stoptyping
\stopcolumns

\item The four primitives that were already marked obsolete in \PDFTEX~1.40
have been removed since \LUATEX~0.42:

\startcolumns[n=2]
\starttyping
\pdfoptionalwaysusepdfpagebox
\pdfoptionpdfinclusionerrorlevel
\pdfforcepagebox
\pdfmovechars
\stoptyping
\stopcolumns


\item A few other experimental primitives are also provided without the
      extra  \luatex {pdf} prefix, so they can also be called like this:

\startcolumns[n=3]
\starttyping
\primitive
\ifprimitive
\ifabsnum
\ifabsdim
\stoptyping
\stopcolumns

\item The \tex{pdftexversion} is set to 200.

\item The PNG transparency fix from 1.40.6 is not applied
(high-level support is pending)

\item LFS (\PDF\ Files larger than 2GiB) support is not working yet.

\item \LUATEX~0.45.0 introduces two extra token lists, \tex{pdfxformresources}
and \tex{pdfxformattr}, as an alternative to \tex{pdfxform} keywords.

\item As of \LUATEX~0.50.0 is no longer possible for fonts from embedded pdf files
  to be replaced by / merged with the document fonts of the enveloping
  pdf document. This regression may be temporary, depending on how the
  rewritten font backend will look after beta 0.60.


\stopitemize

\section{Changes from \ALEPH\ RC4}

\startitemize

\item Starting with \LUATEX\ 0.63.0, OCP processing is no longer
  supported at all.  As a consequence, the following primitives have
  been removed:

\startcolumns[n=2]
\starttyping
\ocp
\externalocp
\ocplist
\pushocplist
\popocplist
\clearocplists
\addbeforeocplist
\addafterocplist
\removebeforeocplist
\removeafterocplist
\ocptracelevel
\stoptyping
\stopcolumns

\item \LUATEX\ only understands 4~of the 16~direction
specifiers of \ALEPH: \type{TLT} (latin), \type{TRT} (arabic),
\type{RTT} (cjk), \type{LTL} (mongolian). All other direction
specifiers generate an error (\LUATEX\ 0.45).

\item The input translations from \ALEPH\ are not implemented, the
   related primitives are not available:

\startcolumns[n=2]
\starttyping
\DefaultInputMode
\noDefaultInputMode
\noInputMode
\InputMode
\DefaultOutputMode
\noDefaultOutputMode
\noOutputMode
\OutputMode
\DefaultInputTranslation
\noDefaultInputTranslation
\noInputTranslation
\InputTranslation
\DefaultOutputTranslation
\noDefaultOutputTranslation
\noOutputTranslation
\OutputTranslation
\stoptyping
\stopcolumns

\item The \tex{hoffset} bug when \tex{pagedir TRT}  is fixed,
removing the need for an explicit fix to \tex{hoffset}

\item A bug causing \tex{fam} to fail for family numbers above
    15 is fixed.

\item A fair amount of other minor bugs are fixed as well, most of these
related to \tex{tracingcommands} output.

\item The internal function \type{scan_dir()} has been renamed to
\type{scan_direction()} to prevent a naming clash, and it now allows
an optional space after the direction is completely parsed.

\item The \type{^^} notation can come in five and six item repetitions also, to
insert characters that do not fit in the BMP.

\item Glues {\it immediately after} direction change commands are not
legal breakpoints.

\stopitemize

\section{Changes from standard \WEBC}

\startitemize

\item There is no mltex

\item There is no enctex

\item The following commandline switches are silently ignored, even
in non|-|\LUA\ mode:

\starttyping
-8bit
-translate-file=TCXNAME
-mltex
-enc
-etex
\stoptyping

\item \tex{openout} whatsits are not written to the log file.

\item Some of the so|-|called web2c extensions are hard to set up
  in non|-|\KPSE\ mode because texmf.cnf is not read: \type{shell-escape}
  is off (but that is not a problem because of \LUA's
  \lua{os.execute}), and the paranoia checks on \type{openin} and
  \type{openout} do not happen (however, it is easy for a \LUA\ script
  to do this itself by overloading \lua{io.open}).

\item The \quote{E} option does not do anything useful.

\stopitemize

\chapter{Implementation notes}

\section{Primitives overlap}

The primitives

\starttabulate[|l|l|]
\NC \tex{pdfpagewidth} \NC \tex{pagewidth}  \NC \NR
\NC \tex{pdfpageheight}\NC \tex{pageheight} \NC \NR
\NC \tex{fontcharwd}   \NC \tex{charwd}     \NC \NR
\NC \tex{fontcharht}   \NC \tex{charht}     \NC \NR
\NC \tex{fontchardp}   \NC \tex{chardp}     \NC \NR
\NC \tex{fontcharic}   \NC \tex{charit}     \NC \NR
\stoptabulate

are all aliases of each other.

\section{Memory allocation}

The single internal memory heap that traditional \TEX\ used for tokens
and nodes is split into two separate arrays. Each of these will grow
dynamically when needed.

The \type{texmf.cnf} settings related to main memory are no longer
used (these are: \type{main_memory}, \type{mem_bot},
\type{extra_mem_top} and \type{extra_mem_bot}). \quote{Out of main
memory} errors can still occur, but the limiting factor is now the
amount of RAM in your system, not a predefined limit.

Also, the memory (de)allocation routines for nodes are completely
rewritten. The relevant code now lives in the C file \type{texnode.c},
and basically uses a dozen or so \quote{avail} lists instead of a
doubly|-|linked model. An extra function layer is added so that the
code can ask for nodes by type instead of directly requisitioning
a certain amount of memory words.

Because of the split into two arrays and the resulting differences in the data
structures, some of the macros have been duplicated.  For instance, there are now
\type{vlink} and \type{vinfo} as well as \type{token_link} and \type{token_info}. All
access to the variable memory array is now hidden behind a macro called \type{vmem}.

The implementation of the growth of two arrays (via reallocation)
introduces a potential pitfall: the memory arrays should never be used
as the left hand side of a statement that can modify the array in
question.

The input line buffer and pool size are now also reallocated when
needed, and the \type{texmf.cnf} settings \type{buf_size} and
\type{pool_size} are silently ignored.

\section{Sparse arrays}

The \tex{mathcode}, \tex{delcode}, \tex{catcode},
\tex{sfcode}, \tex{lccode} and \tex{uccode} tables are now
sparse arrays that are implemented in~C. They are no longer part of
the \TEX\ \quote{equivalence table} and because each had 1.1 million
entries with a few memory words each, this makes a major difference
in memory usage.

The \tex{catcode}, \tex{sfcode}, \tex{lccode} and \tex{uccode} assignments
do not yet show up when using the etex tracing routines \tex{tracingassigns}
and \tex{tracingrestores} (code simply not written yet).

A side|-|effect of the current implementation is that \tex{global} is
now more expensive in terms of processing than non|-|global assignments.

See \type{mathcodes.c} and \type{textcodes.c} if you are interested in
the details.

Also, the glyph ids within a font are now managed by means
of a sparse array and glyph ids can go up to index $2^{21}-1$.

\section{Simple single-character csnames}

Single|-|character commands are no longer treated specially in the
internals, they are stored in the hash just like the multiletter
csnames.

The code that displays control sequences explicitly checks if
the length is one when it has to decide whether or not to add a
trailing space.

Active characters are internally implemented as a special type
of multi-letter control sequences that uses a prefix that is
otherwise impossible to obtain.

\section{Compressed format}

The format is passed through zlib, allowing it to shrink to roughly
half of the size it would have had in uncompressed form. This takes a
bit more CPU cycles but much less disk I/O, so it should still be
faster.

\section{Binary file reading}

All of the internal code is changed in such a way that if one of the
\type{read_xxx_file} callbacks is not set, then the file is read by
a C function using basically the same convention as the callback: a
single read into a buffer big enough to hold the entire file
contents. While this uses more memory than the previous code (that
mostly used \type{getc} calls), it can be quite a bit faster
(depending on your I/O subsystem).

\chapter{Known bugs and limitations, TODO}

There used to be a lists of bugs and planned features below here, but that did not
work out too well. There are lists of open bugs and feature requests in the tracker at
\hyphenatedurl{http://tracker.luatex.org}.

\stoptext