% engine=luatex language=uk % $Id$ % TODO: fix layout of function legend descriptions % check numbers % check \luatex command %\nopdfcompression %\loggingall \environment luatexref-env \logo[DFONT] {dfont} \logo[CFF] {cff} \logo[CMAP] {CMap} \logo[PATGEN] {patgen} \logo[MP] {MetaPost} \logo[METAPOST]{MetaPost} \logo[MPLIB] {MPlib} \logo[COCO] {coco} \logo[SUNOS] {SunOS} \logo[BSD] {bsd} \logo[SYSV] {sysv} \logo[DPI] {dpi} \setvariables [document] [beta=0.70.1] \starttext \dontcomplain \nonknuthmode \setups[titlepage] \title{Contents} \placecontent[criterium=text,level=subsection] \chapter{Introduction} \startframedtext[framecolor=red,foregroundcolor=red,width=\hsize,style=\tfa] This book will eventually become the reference manual of \LUATEX. At the moment, it simply reports the behavior of the executable matching the snapshot or beta release date in the title page. \blank Features may come and go. The current version of \LUATEX\ is not meant for production and users cannot depend on stability, nor on functionality staying the same. \blank Nothing is considered stable just yet. This manual therefore simply reflects the current state of the executable. {\bs Absolutely nothing\/} on the following pages is set in stone. When the need arises, anything can (and will) be changed. \blank {\bf If you are not willing to deal with this situation, you should wait for the stable version. Currently we expect the 1.0 release to happen in spring 2012. Full stabilization will not happen soon, the TODO list is still large.} \stopframedtext \blank[2*line] \LUATEX\ consists of a number of interrelated but (still) distinguishable parts: \startitemize[packed] \item \PDFTEX\ version 1.40.9, converted to C (with patches from later releases). \item The direction model and some other bits from \ALEPH\ RC4 converted to C. \item \LUA\ 5.1.4 ($+$ coco 1.1.5 $+$ portable bytecode) \item dedicated \LUA\ libraries \item various \TEX\ extensions \item parts of \FONTFORGE\ 2008.11.17 \item the \METAPOST\ library \item newly written compiled source code to glue it all together \stopitemize Neither \ALEPH's I/O translation processes, nor tcx files, nor \ENCTEX\ can be used, these encoding|-|related functions are superseded by a \LUA|-|based solution (reader callbacks). Also, some experimental \PDFTEX\ features are removed. These can be implemented in \LUA\ instead. \chapter{Basic \TEX\ enhancements} \section{Introduction} From day one, \LUATEX\ has offered extra functionality when compared to the superset of \PDFTEX\ and Aleph. That has not been limited to the possibility to execute lua code via \type{\directlua}, but \LUATEX\ also adds functionality via new \TEX-side primitives. However, starting with beta \type{0.39.0}, most of that functionality is hidden by default. When \LUATEX\ 0.40.0 starts up in \quote{iniluatex} mode (\type{luatex -ini}), it defines only the primitive commands known by \TEX82 and the one extra command \type{\directlua}. As is fitting, a lua function has to be called to add the extra primitives to the user environment. The simplest method to get access to all of the new primitive commands is by adding this line to the format generation file: \starttyping \directlua { tex.enableprimitives('',tex.extraprimitives()) } \stoptyping But be aware that the curly braces may not have the proper \type{\catcode} assigned to them at this early time (giving a 'Missing number' error), so it may be needed to put these assignments \starttyping \catcode `\{=1 \catcode `\}=2 \stoptyping before the above line. More fine-grained primitives control is possible, you can look up the details in \in{section}[luaprimitives]. For simplicity's sake, this manual assumes that you have executed the \type{\directlua} command as given above. The startup behavior documented above is considered stable in the sense that there will not be backward-incompatible changes any more. \section{Version information} There are three new primitives to test the version of \LUATEX: \starttabulate[|l|p|] \NC \bf primitive \NC \bf explanation \NC\NR \NC \tex{luatexversion} \NC a combination of major and minor number, as in \PDFTEX; the current current value is {\bf\the\luatexversion} \NC\NR \NC \tex{luatexrevision} \NC the revision number, as in \PDFTEX; the current value is {\bf\luatexrevision} \NC\NR \NC \tex{luatexdatestamp} \NC a combination of the local date and hour when the current executable was compiled, the syntax is identical to \tex{luatexrevision}; the value for the executable that generated this document is {\bf\luatexdatestamp}. \NC\NR \stoptabulate The official \LUATEX\ version is defined as follows: \startitemize \item The major version is the integer result of \tex{luatexversion} divided by 100. The primitive is an \quote{internal variable}, so you may need to prefix its use with \type{\the} depending on the context. \item The minor version is the two-digit result of \tex{luatexversion} modulo 100. \item The revision is the given by \tex{luatexrevision}. This primitive expands to a positive integer. \item The full version number consists of the major version, minor version and revision, separated by dots. \stopitemize Note that the \tex{luatexdatestamp} depends on both the compilation time and compilation place of the current executable; it is defined in terms of the local time. The purpose of this primitive is solely to be an aid in the development process, do not use it for anything besides debugging. \section{\UNICODE\ text support} Text input and output is now considered to be \UNICODE\ text, so input characters can use the full range of \UNICODE\ ($2^{20}+2^{16}-1 = \hbox{0x10FFFF}$). Later chapters will talk of characters and glyphs. Although these are not interchangeable, they are closely related. During typesetting, a character is always converted to a suitable graphic representation of that character in a specific font. However, while processing a list of to|-|be|-|typeset nodes, its contents may still be seen as a character. Inside \LUATEX\ there is not yet a clear separation between the two concepts. Until this is implemented, please do not be too harsh on us if we make errors in the usage of the terms. A few primitives are affected by this, all in a similar fashion: each of them has to accommodate for a larger range of acceptable numbers. For instance, \tex{char} now accepts values between~0 and $1{,}114{,}111$. This should not be a problem for well|-|behaved input files, but it could create incompatibilities for input that would have generated an error when processed by older \TEX|-|based engines. The affected commands with an altered initial (left of the equals sign) or secondary (right of the equals sign) value are: \tex{char}, \tex{lccode},\tex{uccode}, \tex{catcode}, \tex{sfcode}, \tex{efcode}, \tex{lpcode}, \tex{rpcode}, \tex{chardef}. As far as the core engine is concerned, all input and output to text files is \UTF-8 encoded. Input files can be pre|-|processed using the \luatex{reader} callback. This will be explained in a later chapter. Output in byte|-|sized chunks can be achieved by using characters just outside of the valid \UNICODE\ range, starting at the value $1{,}114{,}112$ (0x110000). When the time comes to print a character $c>=1{,}114{,}112$, \LUATEX\ will actually print the single byte corresponding to $c$ minus 1{,}114{,}112. Output to the terminal uses \type{^^} notation for the lower control range ($c<32$), with the exception of \type{^^I}, \type{^^J} and \type{^^M}. These are considered \quote{safe} and therefore printed as-is. Normalization of the \UNICODE\ input can be handled by a macro package during callback processing (this will be explained in \in{section}[iocallback]). \section{Extended tables} All traditional \TEX\ and \ETEX\ registers can be 16-bit numbers as in \ALEPH. The affected commands are: \startcolumns[n=4] \starttyping \count \dimen \skip \muskip \marks \toks \countdef \dimendef \skipdef \muskipdef \toksdef \box \unhbox \unvbox \copy \unhcopy \unvcopy \wd \ht \dp \setbox \vsplit \stoptyping \stopcolumns The glyph properties (like \type {\efcode}) introduced in \PDFTEX\ that deal with font expansion (hz) and character protruding are also 16-bit. Because font memory management has been rewritten, these character properties are no longer shared among fonts instances that originate from the same metric file. The behavior documented in the above section is considered stable in the sense that there will not be backward-incompatible changes any more. \section{Attribute registers} Attributes are a completely new concept in \LUATEX. Syntactically, they behave a lot like counters: attributes obey \TEX's nesting stack and can be used after \tex{the} etc.\ just like the normal \tex{count} registers. \startsyntax \attribute <16-bit number> <32-bit number>!crlf \attributedef <16-bit number> \stopsyntax Conceptually, an attribute is either \quote{set} or \quote{unset}. Unset attributes have a special negative value to indicate that they are unset, that value is the lowest legal value: \type{-"7FFFFFFF} in hexadecimal, a.k.a. $-2147483647$ in decimal. It follows that the value \type{-"7FFFFFFF} cannot be used as a legal attribute value, but you {\it can\/} assign \type{-"7FFFFFFF} to \quote{unset} an attribute. All attributes start out in this \quote{unset} state in \INITEX\ (prior to 0.37, there could not be valid negative attribute values, and the \quote{unset} value was $-1$). Attributes can be used as extra counter values, but their usefulness comes mostly from the fact that the numbers and values of all \quote{set} attributes are attached to all nodes created in their scope. These can then be queried from any \LUA\ code that deals with node processing. Further information about how to use attributes for node list processing from \LUA\ is given in~\in{chapter}[nodes]. The behavior documented in the above subsection is considered stable in the sense that there will not be backward-incompatible changes any more. \subsection{Box attributes} Nodes typically receive the list of attributes that is in effect when they are created. This moment can be quite asynchronous. For example: in paragraph building, the individual line boxes are created after the \tex{par} command has been processed, so they will receive the list of attributes that is in effect then, not the attributes that were in effect in, say, the first or third line of the paragraph. Similar situations happen in \LUATEX\ regularly. A few of the more obvious problematic cases are dealt with: the attributes for nodes that are created during hyphenation, kerning and ligaturing borrow their attributes from their surrounding glyphs, and it is possible to influence box attributes directly. When you assemble a box in a register, the attributes of the nodes contained in the box are unchanged when such a box is placed, unboxed, or copied. In this respect attributes act the same as characters that have been converted to references to glyphs in fonts. For instance, when you use attributes to implement color support, each node carries information about its eventual color. In that case, unless you implement mechanisms that deal with it, applying a color to already boxed material will have no effect. Keep in mind that this incompatibility is mostly due to the fact that separate specials and literals are a more unnatural approach to colors than attributes. It is possible to fine-tune the list of attributes that are applied to a \type{hbox}, \type{vbox} or \type{vtop} by the use of the keyword \type{attr}. An example: \starttyping \attribute2=5 \setbox0=\hbox {Hello} \setbox2=\hbox attr1=12 attr2=-"7FFFFFFF{Hello} \stoptyping This will set the attribute list of box~2 to $1=12$, and the attributes of box~0 will be $2=5$. As you can see, assigning the maximum negative value causes an attribute to be ignored. The \type{attr} keyword(s) should come before a \type{to} or \type{spread}, if that is also specified. \section{\LUA\ related primitives} In order to merge \LUA\ code with \TEX\ input, a few new primitives are needed. \subsection{\tex{directlua}} The primitive \tex{directlua} is used to execute \LUA\ code immediately. The syntax is \startsyntax \directlua !crlf \directlua name !crlf \directlua <16-bit number> \stopsyntax The last \syntax{} is expanded fully, and then fed into the \LUA\ interpreter. After reading and expansion has been applied to the \syntax{}, the resulting token list is converted to a string as if it was displayed using \type{\the\toks}. On the \LUA\ side, each \type{\directlua} block is treated as a separate chunk. In such a chunk you can use the \type {local} directive to keep your variables from interfering with those used by the macro package. The conversion to and from a token list means that you normally can not use \LUA\ line comments (starting with \type{--}) within the argument. As there typically will be only one \quote{line} the first line comment will run on until the end of the input. You will either need to use \TEX-style line comments (starting with \%), or change the \TEX\ category codes locally. Another possibility is to say: \starttyping \begingroup \endlinechar=10 \directlua ... \endgroup \stoptyping Then \LUA\ line comments can be used, since \TEX\ does not replace line endings with spaces. The \syntax{name } specifies the name of the \LUA\ chunk, mainly shown in the stack backtrace of error messages created by \LUA\ code. The \syntax{} is expanded fully, thus macros can be used to generate the chunk name, i.e. \starttyping \directlua name{\jobname:\the\inputlineno} ... \stoptyping to include the name of the input file as well as the input line into the chunk name. Likewise, the \syntax{<16-bit number>} designates a name of a \LUA\ chunk, but in this case the name will be taken from the \type{lua.name} array (see the documentation of the \type{lua} table further in this manual). This syntax is new in version 0.36.0. The chunk name should not start with a \type{@}, or it will be displayed as a file name (this is a quirk in the current \LUA\ implementation). \startbuffer $\pi = \directlua{tex.print(math.pi)}$ \stopbuffer The \tex{directlua} command is expandable: the results of the \LUA\ code become effective immediately. As an example, the following input: \typebuffer will result in \getbuffer Because the \syntax{} is a chunk, the normal \LUA\ error handling is triggered if there is a problem in the included code. The \LUA\ error messages should be clear enough, but the contextual information is still pretty bad. Often, you will only see the line number of the right brace at the end of the code. While on the subject of errors: some of the things you can do inside \LUA\ code can break up \LUATEX\ pretty bad. If you are not careful while working with the node list interface, you may even end up with assertion errors from within the \TEX\ portion of the executable. The behavior documented in the above subsection is considered stable in the sense that there will not be backward-incompatible changes any more. \subsection{\tex{latelua}} \tex{latelua} stores \LUA\ code in a whatsit that will be processed at the time of shipping out. Its intended use is a cross between \tex{pdfliteral} and \tex{write}. Within the \LUA\ code you can print \PDF\ statements directly to the \PDF\ file via \type{pdf.print}, or you can write to other output streams via \type{texio.write} or simply using lua's I/O routines. \startsyntax \latelua !crlf \latelua name !crlf \latelua <16-bit number> \stopsyntax Expansion of macros etcetera in the final \type{} is delayed until just before the whatsit is executed (like in \tex{write}). With regard to PDF output stream \tex{latelua} behaves as \tex{pdfliteral page}. The \syntax{name } and \syntax{<16-bit number>} behave in the same way as they do for \type{\directlua} \subsection{\tex{luaescapestring}} This primitive converts a \TEX\ token sequence so that it can be safely used as the contents of a \LUA\ string: embedded backslashes, double and single quotes, and newlines and carriage returns are escaped. This is done by prepending an extra token consisting of a backslash with category code~12, and for the line endings, converting them to \type{n} and \type{r} respectively. The token sequence is fully expanded. \startsyntax \luaescapestring \stopsyntax Most often, this command is not actually the best way to deal with the differences between the \TEX\ and \LUA. In very short bits of \LUA\ code it is often not needed, and for longer stretches of \LUA\ code it is easier to keep the code in a separate file and load it using \LUA's \type{dofile}: \starttyping \directlua { dofile('mysetups.lua')} \stoptyping \section{New \ETEX\ primitives} \subsection{\tex{clearmarks}} This primitive clears a mark class completely, resetting all three connected mark texts to empty. \startsyntax \clearmarks <16-bit number> \stopsyntax \subsection{\tex{noligs} and \tex{nokerns}} These primitives prohibit ligature and kerning insertion at the time when the initial node list is built by \LUATEX's main control loop. They are part of a temporary trick and will be removed in the near future. For now, you need to enable these primitives when you want to do node list processing of \quote{characters}, where \TEX's normal processing would get in the way. \startsyntax \noligs !crlf \nokerns \stopsyntax These primitives can now be implemented by overloading the ligature building and kerning functions, i.e.\ by assigning dummy functions to their associated callbacks. \subsection{\tex{formatname}} \tex{formatname}'s syntax is identical to \tex{jobname}. In \INITEX, the expansion is empty. Otherwise, the expansion is the value that \tex{jobname} had during the \INITEX\ run that dumped the currently loaded format. \subsection{\tex{scantextokens}} The syntax of \tex{scantextokens} is identical to \tex{scantokens}. This primitive is a slightly adapted version of \ETEX's \tex{scantokens}. The differences are: \startitemize \item The last (and usually only) line does not have a \tex{endlinechar} appended \item \tex{scantextokens} never raises an EOF error, and it does not execute \tex{everyeof} tokens. \item The \quote{\unknown\ while end of file \unknown} error tests are not executed, allowing the expansion to end on a different grouping level or while a conditional is still incomplete. \stopitemize \subsection {Verbose versions of single-character aligments commands (0.45)} \LUATEX\ defines two new primitives that have the same function as \type{#} and \type{&} in aligments: \starttabulate[|l|l|l|l|] \NC \bf primitive \NC \bf explanation \NC\NR \NC \tex{alignmark} \NC Duplicates the functionality of \char`\#~% inside alignment preambles\NC\NR \NC \tex{aligntab} \NC Duplicates the functionality of \char`\&~% inside alignments (and preambles)\NC\NR \stoptabulate \subsection{Catcode tables} Catcode tables are a new feature that allows you to switch to a predefined catcode regime in a single statement. You can have a practically unlimited number of different tables. The subsystem is backward compatible: if you never use the following commands, your document will not notice any difference in behavior compared to traditional \TEX. The contents of each catcode table is independent from any other catcode tables, and their contents is stored and retrieved from the format file. \subsubsection{\tex{catcodetable}} \startsyntax \catcodetable <16-bit number> \stopsyntax The primitive \tex{catcodetable} switches to a different catcode table. Such a table has to be previously created using one of the two primitives below, or it has to be zero. Table zero is initialized by \INITEX. \subsubsection{\tex{initcatcodetable}} \startsyntax \initcatcodetable <16-bit number> \stopsyntax The primitive \tex{initcatcodetable} creates a new table with catcodes identical to those defined by \INITEX: \starttabulate[|l|l|l|l|l|] \NC~0\NC \tt\letterbackslash \NC \NC \tt escape \NC\NR \NC~5\NC \tt\letterhat\letterhat M \NC return \NC \tt car{\_}ret \NC (this name may change) \NC\NR \NC~9\NC \tt\letterhat\letterhat @ \NC null \NC \tt ignore \NC\NR \NC10\NC \tt \NC space \NC \tt spacer \NC\NR \NC11\NC {\tt a} -- {\tt z} \NC \NC \tt letter \NC\NR \NC11\NC {\tt A} -- {\tt Z} \NC \NC \tt letter \NC\NR \NC12\NC everything else \NC \NC \tt other \NC\NR \NC14\NC \tt\letterpercent \NC \NC \tt comment \NC\NR \NC15\NC \tt\letterhat\letterhat ? \NC delete \NC \tt invalid{\_}char \NC\NR \stoptabulate The new catcode table is allocated globally: it will not go away after the current group has ended. If the supplied number is identical to the currently active table, an error is raised. \subsubsection{\tex{savecatcodetable}} \startsyntax \savecatcodetable <16-bit number> \stopsyntax \tex{savecatcodetable} copies the current set of catcodes to a new table with the requested number. The definitions in this new table are all treated as if they were made in the outermost level. The new table is allocated globally: it will not go away after the current group has ended. If the supplied number is the currently active table, an error is raised. \subsection{\tex{suppressfontnotfounderror} (0.11)} \startsyntax \suppressfontnotfounderror = 1 \stopsyntax If this new integer parameter is non|-|zero, then \LUATEX\ will not complain about font metrics that are not found. Instead it will silently skip the font assignment, making the requested csname for the font \tex{ifx} equal to \tex{nullfont}, so that it can be tested against that without bothering the user. \subsection{\tex{suppresslongerror} (0.36)} \startsyntax \suppresslongerror = 1 \stopsyntax If this new integer parameter is non|-|zero, then \LUATEX\ will not complain about \type{\par} commands encountered in contexts where that is normally prohibited (most prominently in the arguments of non-long macros). \subsection{\tex{suppressifcsnameerror} (0.36)} \startsyntax \suppressifcsnameerror = 1 \stopsyntax If this new integer parameter is non|-|zero, then \LUATEX\ will not complain about non-expandable commands appearing in the middle of a \type{\ifcsname} expansion. Instead, it will keep getting expanded tokens from the input until it encounters an \type{\endcsname} command. Use with care! This command is experimental: if the input expansion is unbalanced wrt. \type{\csname} \ldots \type{\endcsname} pairs, the \LUATEX\ process may hang indefinitely. \subsection{\tex{suppressoutererror} (0.36)} \startsyntax \suppressoutererror = 1 \stopsyntax If this new integer parameter is non|-|zero, then \LUATEX\ will not complain about \type{\outer} commands encountered in contexts where that is normally prohibited. The addition of this command coincides with a change in the \LUATEX\ engine: ever since the snapshot of 20060915, \type{\outer} was simply ignored. That behavior has now reverted back to be \TEX82-compatible by default. \subsection{\tex{outputbox} (0.37)} \startsyntax \outputbox = 65535 \stopsyntax This new integer parameter allows you to alter the number of the box that will be used to store the page sent to the output routine. Its default value is 255, and the acceptable range is from 0 to 65535. \subsection{Font syntax} \LUATEX\ will accept a braced argument as a font name: \starttyping \font\myfont = {cmr10} \stoptyping This allows for embedded spaces, without the need for double quotes. Macro expansion takes place inside the argument. \subsection{File syntax (0.45)} \LUATEX\ will accept a braced argument as a file name: \starttyping \input {plain} \openin 0 {plain} \stoptyping This allows for embedded spaces, without the need for double quotes. Macro expansion takes place inside the argument. \subsection{Images and Forms} \LUATEX\ accepts optional dimension parameters for \type{\pdfrefximage} and \type{\pdfrefxform} in the same format as for \type{\pdfximage}. With images, these dimensions are then used instead of the ones given to \type{\pdfximage}; but the original dimensions are not overwritten, so that a \type{\pdfrefximage} without dimensions still provides the image with dimensions defined by \type{\pdfximage}. These optional parameters are not implemented for \type{\pdfxform}. \starttyping \pdfrefximage width 20mm height 10mm depth 5mm \pdflastximage \pdfrefxform width 20mm height 10mm depth 5mm \pdflastxform \stoptyping \section{Debugging} If \tex{tracingonline} is larger than~2, the node list display will also print the node number of the nodes. \section{Global leaders} There is a new experimental primitive: \type{\gleaders} (a \LUATEX\ extension, added in 0.43). This type of leaders is anchored to the origin of the box to be shipped out. So they are like normal \type{\leaders} in that they align nicely, except that the alignment is based on the {\it largest\/} enclosing box instead of the {\it smallest\/}. \chapter {\LUA\ general} \section[init]{Initialization} \subsection{\LUATEX\ as a \LUA\ interpreter} There are some situations that make \LUATEX\ behave like a standalone \LUA\ interpreter: \startitemize[packed] \item if a \type{--luaonly} option is given on the commandline, or \item if the executable is named \type{texlua} (or \type{luatexlua}), or \item if the only non|-|option argument (file) on the commandline has the extension \type{lua} or \type{luc}. \stopitemize In this mode, it will set \LUA's \type{arg[0]} to the found script name, pushing preceding options in negative values and the rest of the commandline in the positive values, just like the \LUA\ interpreter. \LUATEX\ will exit immediately after executing the specified \LUA\ script and is, in effect, a somewhat bulky standalone \LUA\ interpreter with a bunch of extra preloaded libraries. \subsection{\LUATEX\ as a \LUA\ byte compiler} There are two situations that make \LUATEX\ behave like the \LUA\ byte compiler: \startitemize[packed] \item if a \type{--luaconly} option is given on the commandline, or \item if the executable is named \type{texluac} \stopitemize In this mode, \LUATEX\ is exactly like \type{luac} from the standalone \LUA\ distribution, except that it does not have the \type{-l} switch, and that it accepts (but ignores) the \type{--luaconly} switch. \subsection{Other commandline processing} When the \LUATEX\ executable starts, it looks for the \type{--lua} commandline option. If there is no \type{--lua} option, the commandline is interpreted in a similar fashion as in traditional \PDFTEX\ and \ALEPH. The following command-line switches are understood. \starttabulate[|lT|p|] \NC --fmt=FORMAT \NC load the format file FORMAT \NC\NR \NC --lua=FILE \NC load and execute a \LUA\ initialization script\NC\NR \NC --safer \NC disable easily exploitable \LUA\ commands \NC\NR \NC --nosocket \NC disable the \LUA\ socket library \NC\NR \NC --help \NC display help and exit \NC\NR \NC --ini \NC be iniluatex, for dumping formats \NC\NR \NC --interaction=STRING \NC set interaction mode (STRING=batchmode/nonstopmode/\crlf scrollmode/errorstopmode) \NC \NR \NC --halt-on-error \NC stop processing at the first error\NC \NR \NC --kpathsea-debug=NUMBER \NC set path searching debugging flags according to the bits of NUMBER \NC \NR \NC --progname=STRING \NC set the program name to STRING \NC \NR \NC --version \NC display version and exit \NC\NR \NC --credits \NC display credits and exit \NC\NR \NC --recorder \NC enable filename recorder \NC \NR \NC --etex \NC ignored\NC \NR \NC --output-comment=STRING \NC use STRING for DVI file comment instead of date (no effect for PDF)\NC \NR \NC --output-directory=DIR \NC use DIR as the directory to write files to \NC \NR \NC --draftmode \NC switch on draft mode (generates no output PDF)\NC \NR \NC --output-format=FORMAT \NC use FORMAT for job output; FORMAT is 'dvi' or 'pdf' \NC \NR \NC --[no-]shell-escape \NC disable/enable \type{\write18{SHELL COMMAND}} \NC \NR \NC --enable-write18 \NC enable \type{\write18{SHELL COMMAND}} \NC \NR \NC --disable-write18 \NC disable \type{\write18{SHELL COMMAND}} \NC \NR \NC --shell-restricted \NC restrict \type{\write18} to a list of commands given in texmf.cnf \NC \NR \NC --debug-format \NC enable format debugging \NC \NR \NC --[no-]file-line-error \NC disable/enable file:line:error style messages \NC \NR \NC --[no-]file-line-error-style \NC aliases of --[no-]file-line-error \NC \NR \NC --jobname=STRING \NC set the job name to STRING \NC \NR \NC --[no-]parse-first-line \NC disable/enable parsing of the first line of the input file \NC \NR \NC --translate-file= \NC ignored \NC \NR \NC --default-translate-file= \NC ignored \NC \NR \NC --8bit \NC ignored \NC \NR \NC --[no-]mktex=FMT \NC disable/enable mktexFMT generation (FMT=tex/tfm)\NC \NR \NC --synctex=NUMBER \NC enable synctex \NC \NR \stoptabulate A note on the creation of the various temporary files and the \type{\jobname}. The value to use for \type{\jobname} is decided as follows: \startitemize \item If \type{--jobname} is given on the command line, its argument will be the value for \tex{jobname}, without any changes. The argument will not be used for actual input so it need not exist. The \type{--jobname} switch only controls the \tex{jobname} setting. \item Otherwise, \tex{jobname} will be the name of the first file that is read from the file system, with any path components and the last extension (the part following the last \type{.}) stripped off. \item An exception to the previous point: if the command line goes into interactive mode (by starting with a command) and there are no files input via \type{\everyjob} either, then the \tex{jobname} is set to \type{texput} as a last resort. \stopitemize The file names for output files that are generated automatically are created by attaching the proper extension (\type{.log}, \type{.pdf}, etc.) to the found \tex{jobname}. These files are created in the directory pointed to by \type{--output-directory}, or in the current directory, if that switch is not present. \blank Without the \type{--lua} option, command line processing works like it does in any other web2c-based typesetting engine, except that \LUATEX\ has a few extra switches. If the \type{--lua} option is present, \LUATEX\ will enter an alternative mode of commandline processing in comparison to the standard web2c programs. In this mode, a small series of actions is taken in order. First, it will parse the commandline as usual, but it will only interpret a small subset of the options immediately: \type{--safer}, \type{--nosocket}, \type{--[no-]shell-escape}, \type{--enable-write18}, \type{--disable-write18}, \type{--shell-restricted}, \type{--help}, \type{--version}, and \type{--credits}. Now it searches for the requested \LUA\ initialization script. If it cannot be found using the actual name given on the commandline, a second attempt is made by prepending the value of the environment variable \type{LUATEXDIR}, if that variable is defined in the environment. Then it checks the various safety switches. You can use those to disable some \LUA\ commands that can easily be abused by a malicious document. At the moment, \type{--safer} \type{nil}s the following functions: \starttabulate[|l|l|] \NC \bf library \NC \bf functions \NC \NR \NC \tt os \NC \tt execute exec setenv rename remove tmpdir \NC \NR \NC \tt io \NC \tt popen output tmpfile \NC \NR \NC \tt lfs \NC \tt rmdir mkdir chdir lock touch \NC \NR \stoptabulate Furthermore, it disables loading of compiled \LUA\ libraries (support for these was added in 0.46.0), and it makes \lua{io.open()} fail on files that are opened for anything besides reading. \type{--nosocket} makes the socket library unavailable, so that \LUA\ cannot use networking. The switches \type{--[no-]shell-escape}, \type{--[enable|disable]-write18}, and \type{--shell-restricted} have the same effects as in \PDFTEX, and additionally make \type{io.popen()}, \type{os.execute}, \type{os.exec} and \type{os.spawn} adhere to the requested option. Next the initialization script is loaded and executed. From within the script, the entire commandline is available in the \LUA\ table \lua{arg}, beginning with \lua {arg[0]}, containing the name of the executable. Commandline processing happens very early on. So early, in fact, that none of \TEX's initializations have taken place yet. For that reason, the tables that deal with typesetting, like \luatex{tex}, \luatex{token}, \luatex{node} and \luatex{pdf}, are off|-|limits during the execution of the startup file (they are nilled). Special care is taken that \luatex{texio.write} and \luatex{texio.write_nl} function properly, so that you can at least report your actions to the log file when (and if) it eventually becomes opened (note that \TEX\ does not even know its \tex{jobname} yet at this point). See \in{chapter}[libraries] for more information about the \LUATEX-specific \LUA\ extension tables. Everything you do in the \LUA\ initialization script will remain visible during the rest of the run, with the exception of the aforementioned \luatex{tex}, \luatex{token}, \luatex{node} and \luatex{pdf} tables: those will be initialized to their documented state after the execution of the script. You should not store anything in variables or within tables with these four global names, as they will be overwritten completely. We recommend you use the startup file only for your own \TEX|-|independent initializations (if you need any), to parse the commandline, set values in the \luatex{texconfig} table, and register the callbacks you need. \LUATEX\ allows some of the commandline options to be overridden by reading values from the \luatex{texconfig} table at the end of script execution (see the description of the \luatex{texconfig} table later on in this document for more details on which ones exactly). Unless the \luatex{texconfig} table tells \LUATEX\ not to initialize \KPATHSEA\ at all (set \luatex{texconfig.kpse_init} to \type{false} for that), \LUATEX\ acts on some more commandline options after the initialization script is finished: in order to initialize the built|-|in \KPATHSEA\ library properly, \LUATEX\ needs to know the correct program name to use, and for that it needs to check \type{--progname}, or \type{--ini} and \type{--fmt}, if \type{--progname} is missing. \section{\LUA\ changes} The C coroutine (\COCO) patches from LuaJIT are applied to the \LUA\ core, the used version is 1.1.5. See \hyphenatedurl{http://coco.luajit.org/} for details. This functionality currently (0.45) does not work on non-intel OpenBSD platforms nor on powerpc Linux-es. Additional note: \type{coroutines.wrap()} under Windows does not inherit the state of the random generator, it always has an implicit \type{math.randomseed(1)} that is added by the Windows kernel. Starting from version 0.45, \LUATEX\ is able to use the kpathsea library to find \type{require()}d modules. For this purpose, \type{package.loaders[2]} is replaced by a different loader function, that decides at runtime whether to use kpathsea or the built-in core lua function. It uses \KPATHSEA\ when that is already initialized at that point in time, otherwise it reverts to using the normal \type{package.path} loader. Initialization of \KPATHSEA\ can happen either implicitly (when \LUATEX\ starts up and the startup script has not set \type{texconfig.kpse_init} to false), or explicitly by calling the \LUA\ function \type{kpse.set_program_name()}. Starting from version 0.46.0 (as an {\bf experimental} feature!) \LUATEX\ is also able to use dynamically loadable \LUA\ libraries, unless \type{--safer} was given as an option on the command line. For this purpose, \type{package.loaders[3]} is replaced by a different loader function, that decides at runtime whether to use kpathsea or the build-in core lua function. As in the previous paragraph, it uses \KPATHSEA\ when that is already initialized at that point in time, otherwise it reverts to using the normal \type{package.cpath} loader. This functionality required an extension to kpathsea: \startnarrower There is a new kpathsea file format: \type{kpse_clua_format} that searches for files with extension \type{.dll} and \type{.so}. The \type{texmf.cnf} setting for this variable is \type{CLUAINPUTS}, and by default it has this value: \starttyping CLUAINPUTS=.:$SELFAUTOLOC/lib/{$progname,$engine,}/lua// \stoptyping %$ This path is imperfect (it requires a TDS subtree below the binaries directory), but the architecture has to be in the path somewhere, and the currently simplest way to do that is to search below the binaries directory only. One level up (a \type{lib} directory parallel to \type{bin}) would have been nicer, but that is not doable because \TEXLIVE\ uses a \type{bin/} structure. \stopnarrower In keeping with the other \TEX-like programs in \TEXLIVE, the two \LUA\ functions \type{os.execute} and \type{io.popen} (as well as the two new functions \type{os.exec} and \type{os.spawn} that are explained below) take the value of \type{shell_escape} and/or \type{shell_escape_commands} in account. Whenever \LUATEX\ is run with the assumed intention to typeset a document (and by that I mean that it is called as \type{luatex}, as opposed to \type{texlua}, and that the commandline option \type{--luaonly} was not given), it will only run the four functions above if the matching texmf.cnf variable(s) or their \type{texconfig} (see~\in{section}[texconfig]) counterparts allow execution of the requested system command. In \quote{script interpreter} runs of \LUATEX, these settings have no effect, and all four functions function as normal. This change is new in 0.37.0. The \lua{read("*line")} function from the io library has been adjusted so that it is line|-|ending neutral: any of \type{LF}, \type {CR} or \type{CR+LF} are acceptable line endings. The \lua{tostring()} printer for numbers has been changed so that it returns~\type{0.00000000000001} instead of~\hbox{\type{1e-14}} (which confused \TEX\ enormously). Even values with an even smaller exponent print simply as~\type{0}. \lua{luafilesystem} has been extended: there are two extra boolean functions (\luatex{lfs.isdir(filename)} and \luatex{lfs.isfile(filename)}) and one extra string field in its attributes table (\type{permissions}). There is an additional function (added in 0.51) \type{lfs.shortname()} which takes a file name and returns its short name on WIN32 platforms. On other platforms, it just returns the given argument. The file name is not tested for existence. Finally, for non-WIN32 platforms only, there is the new function \type{lfs.readlink()} (added in 0.51) that takes an existing symbolic link as argument and returns its content. It returns an error on WIN32. The \lua{string} library has an extra function: \luatex{string.explode(s[,m])}. This function returns an array containing the string argument \type{s} split into sub-strings based on the value of the string argument \type{m}. The second argument is a string that is either empty (this splits the string into characters), a single character (this splits on each occurrence of that character, possibly introducing empty strings), or a single character followed by the plus sign \type{+} (this special version does not create empty sub-strings). The default value for \type{m} is \quote{\type{ +}} (multiple spaces). Note: \type{m} is not hidden by surrounding braces (as it would be if this function was written in \TEX\ macros). The \lua{string} library also has six extra iterators that return strings piecemeal: \startitemize \item \luatex{string.utfvalues(s)} (returns an integer value in the \UNICODE\ range) \item \luatex{string.utfcharacters(s)} (returns a string with a single \UTF-8 token in it) \item \luatex{string.characters(s)} (a string containing one byte) \item \luatex{string.characterpairs(s)} (two strings each containing one byte) will produce an empty second string if the string length was odd. \item \luatex{string.bytes(s)} (a single byte value) \item \luatex{string.bytepairs(s)} (two byte values) Will produce nil instead of a number as its second return value if the string length was odd. \stopitemize The \luatex{string.characterpairs()} and \luatex{string.bytepairs()} are useful especially in the conversion of UTF-16 encoded data into UTF-8. Note: The \lua{string} library functions \luatex{len}, \luatex{lower}, \luatex{sub} etc. are not \UNICODE|-|aware. For strings in the UTF-8 encoding, i.e., strings containing characters above code point 127, the corresponding functions from the \lua{slnunicode} library can be used, e.g., \luatex{unicode.utf8.len}, \luatex{unicode.utf8.lower} etc. The exceptions are \luatex{unicode.utf8.find}, that always returns byte positions in a string, and \luatex{unicode.utf8.match} and \luatex{unicode.utf8.gmatch}. While the latter two functions in general {\it are} \UNICODE|-|aware, they fall-back to non|-|\UNICODE|-|aware behavior when using the empty capture \lua{()} (other captures work as expected). For the interpretation of character classes in \luatex{unicode.utf8} functions refer to the library sources at \hyphenatedurl{http://luaforge.net/projects/sln}. The \lua{slnunicode} library will be replaced by an internal \UNICODE\ library in a future \LUATEX\ version. \blank The \lua{os} library has a few extra functions and variables: \startitemize \item \luatex{os.selfdir} is a variable that holds the directory path of the actual executable. For example: {\tt \directlua{tex.sprint(os.selfdir)}} (present since 0.27.0). \item \luatex{os.exec(commandline)} is a variation on \lua{os.execute}. The \type{commandline} can be either a single string or a single table. If the argument is a table: \LUATEX\ first checks if there is a value at integer index zero. If there is, this is the command to be executed. Otherwise, it will use the value at integer index one. (if neither are present, nothing at all happens). The set of consecutive values starting at integer 1 in the table are the arguments that are passed on to the command (the value at index 1 becomes \type{arg[0]}). The command is searched for in the execution path, so there is normally no need to pass on a fully qualified pathname. If the argument is a string, then it is automatically converted into a table by splitting on whitespace. In this case, it is impossible for the command and first argument to differ from each other. In the string argument format, whitespace can be protected by putting (part of) an argument inside single or double quotes. One layer of quotes is interpreted by \LUATEX, and all occurrences of \tex{"}, \tex{'} or \type{\\} within the quoted text are un-escaped. In the table format, there is no string handling taking place. This function normally does not return control back to the \LUA\ script: the command will replace the current process. However, it will return the two values \type{nil} and \type {'error'} if there was a problem while attempting to execute the command. On Windows, the current process is actually kept in memory until after the execution of the command has finished. This prevents crashes in situations where \TEXLUA\ scripts are run inside integrated \TEX\ environments. The original reason for this command is that it cleans out the current process before starting the new one, making it especially useful for use in \TEXLUA. \item \luatex{os.spawn(commandline)} is a returning version of \lua{os.exec}, with otherwise identical calling conventions. If the command ran ok, then the return value is the exit status of the command. Otherwise, it will return the two values \type{nil} and \type {'error'}. \item \luatex{os.setenv('key','value')} This sets a variable in the environment. Passing \lua{nil} instead of a value string will remove the variable. \item \luatex{os.env} This is a hash table containing a dump of the variables and values in the process environment at the start of the run. It is writeable, but the actual environment is {\em not\/} updated automatically. \item \luatex{os.gettimeofday()} Returns the current \quote {\UNIX\ time}, but as a float. This function is not available on the \SUNOS\ platforms, so do not use this function for portable documents. \item \luatex{os.times()} Returns the current process times according to \ the \UNIX\ C library function \quote {times}. This function is not available on the \MSWINDOWS\ and \SUNOS\ platforms, so do not use this function for portable documents. \item \luatex{os.tmpdir()} This will create a directory in the \quote {current directory} with the name \type{luatex.XXXXXX} where the \type {X}-es are replaced by a unique string. The function also returns this string, so you can \type{lfs.chdir()} into it, or \type{nil} if it failed to create the directory. The user is responsible for cleaning up at the end of the run, it does not happen automatically. \item \luatex{os.type} This is a string that gives a global indication of the class of operating system. The possible values are currently \type{windows}, \type{unix}, and \type{msdos} (you are unlikely to find this value \quote {in the wild}). \item \luatex{os.name} This is a string that gives a more precise indication of the operating system. These possible values are not yet fixed, and for \type{os.type} values \type{windows} and \type{msdos}, the \type{os.name} values are simply \type{windows} and \type{msdos} The list for the type \type{unix} is more precise: \type{linux}, \type{freebsd}, \type{kfreebsd} (since 0.51), \type{cygwin} (since 0.53), \type{openbsd}, \type{solaris}, \type{sunos} (pre-solaris), \type{hpux}, \type{irix}, \type{macosx}, \type{gnu} (hurd), \type{bsd} (unknown, but \BSD|-|like), \type{sysv} (unknown, but \SYSV|-|like), \type{generic} (unknown). (\type{os.version} is planned as a future extension) \item \luatex{os.uname()} This function returns a table with specific operating system information acquired at runtime. The keys in the returned table are all string valued, and their names are: \type{sysname}, \type{machine}, \type{release}, \type{version}, and \type{nodename}. \stopitemize In stock \LUA, many things depend on the current locale. In \LUATEX, we can't do that, because it makes documents unportable. While \LUATEX\ is running if forces the following locale settings: \starttyping LC_CTYPE=C LC_COLLATE=C LC_NUMERIC=C \stoptyping \section {\LUA\ modules} Some modules that are normally external to \LUA\ are statically linked in with \LUATEX, because they offer useful functionality: \startitemize \item \lua{slnunicode}, from the \type {Selene} libraries, \hyphenatedurl{http://luaforge.net/projects/sln}. (version 1.1) This library has been slightly extended so that the \type{unicode.utf8.*} functions also accept the first 256 values of plane~18. This is the range \LUATEX\ uses for raw binary output, as explained above. \item \lua{luazip}, from the kepler project, \hyphenatedurl{http://www.keplerproject.org/luazip/}. (version 1.2.1, but patched for compilation with \LUA\ 5.1) \item \lua{luafilesystem}, also from the kepler project, \hyphenatedurl{http://www.keplerproject.org/luafilesystem/}. (version 1.5.0) \item \lua{lpeg}, by Roberto Ierusalimschy, \hyphenatedurl{http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html}. (version 0.9.0) Note: \lua{lpeg} is not \UNICODE|-|aware, but interprets strings on a byte|-|per|-|byte basis. This mainly means that \luatex{lpeg.S} cannot be used with characters above code point 127, since those characters are encoded using two bytes, and thus \luatex{lpeg.S} will look for one of those two bytes when matching, not the combination of the two. The same is true for \luatex{lpeg.R}, although the latter will display an error message if used with characters above code point 127: I.\,e.\ \luatex{lpeg.R('aä')} results in the message \type{bad argument #1 to 'R' (range must have two characters)}, since to \lua{lpeg}, \type{ä} is two 'characters' (bytes), so \type{aä} totals three. \item \lua{lzlib}, by Tiago Dionizio, \hyphenatedurl{http://luaforge.net/projects/lzlib/}. (version 0.2) \item \lua{md5}, by Roberto Ierusalimschy \hyphenatedurl{http://www.inf.puc-rio.br/~roberto/md5/md5-5/md5.html}. \item \lua{luasocket}, by Diego Nehab \hyphenatedurl{http://w3.impa.br/~diego/software/luasocket/} (version 2.0.2). Note: the \type{.lua} support modules from \type{luasocket} are also preloaded inside the executable, there are no external file dependencies. \stopitemize \chapter[libraries]{\LUATEX\ \LUA\ Libraries} The interfacing between \TEX\ and \LUA\ is facilitated by a set of library modules. The \LUA\ libraries in this chapter are all defined and initialized by the \LUATEX\ executable. Together, they allow \LUA\ scripts to query and change a number of \TEX's internal variables, run various internal \TEX\ functions, and set up \LUATEX's hooks to execute \LUA\ code. The following sections are in alphabetical order. \section{The \luatex{callback} library} This library has functions that register, find and list callbacks. A quick note on what callbacks are (thanks, Paul!): Callbacks are entry points to \LUATEX's internal operations, which can be interspersed with additional \LUA\ code, and even replaced altogether. In the first case, \TEX\ is simply augmented with new operations (for instance, a manipulation of the nodes resulting from the paragraph builder); in the second case, its hard-coded behavior (for instance, the paragraph builder itself) is ignored and processing relies on user code only. More precisely, the code to be inserted at a given callback is a function (an anonymous function or the name of a function variable); % Is this line useful? it will receive the arguments associated with the callback, if any, and must frequently return some other arguments for \TEX\ to resume its operations. The first task is registering a callback: \startfunctioncall id, error = callback.register ( callback_name, func) id, error = callback.register ( callback_name, nil) id, error = callback.register ( callback_name, false) \stopfunctioncall where the \syntax{callback_name} is a predefined callback name, see below. The function returns the internal \type{id} of the callback or \type{nil}, if the callback could not be registered. In the latter case, \type{error} contains an error message, otherwise it is \type{nil}. \LUATEX\ internalizes the callback function in such a way that it does not matter if you redefine a function accidentally. Callback assignments are always global. You can use the special value \type {nil} instead of a function for clearing the callback. For some minor speed gain, you can assign the boolean \type{false} to the non-file related callbacks, doing so will prevent \LUATEX\ from executing whatever it would execute by default (when no callback function is registered at all). Be warned: this may cause all sorts of grief unless you know {\it exactly} what you are doing! This functionality is present since version 0.38. Currently, callbacks are not dumped into the format file. \startfunctioncall info = callback.list() \stopfunctioncall The keys in the table are the known callback names, the value is a boolean where \type{true} means that the callback is currently set (active). \startfunctioncall f = callback.find (callback_name) \stopfunctioncall If the callback is not set, \luatex{callback.find} returns \type{nil}. \subsection{File discovery callbacks} The behavior documented in this subsection is considered stable in the sense that there will not be backward-incompatible changes any more. \subsubsection{\luatex{find_read_file} and \luatex{find_write_file}} Your callback function should have the following conventions: \startfunctioncall actual_name = function ( id_number, asked_name) \stopfunctioncall Arguments: \startitemize \sym{id_number} This number is zero for the log or \tex {input} files. For \TEX's \tex{read} or \tex{write} the number is incremented by one, so \tex{read0} becomes~1. \sym{asked_name} This is the user|-|supplied filename, as found by \tex{input}, \tex{openin} or \tex{openout}. \stopitemize Return value: \startitemize \sym{actual_name} This is the filename used. For the very first file that is read in by \TEX, you have to make sure you return an \type{actual_name} that has an extension and that is suitable for use as \type{jobname}. If you don't, you will have to manually fix the name of the log file and output file after \LUATEX\ is finished, and an eventual format filename will become mangled. That is because these file names depend on the jobname. You have to return \type{nil} if the file cannot be found. \stopitemize \subsubsection{\luatex{find_font_file}} Your callback function should have the following conventions: \startfunctioncall actual_name = function ( asked_name) \stopfunctioncall The \type{asked_name} is an \OTF\ or \TFM\ font metrics file. Return \type{nil} if the file cannot be found. \subsubsection{\luatex{find_output_file}} Your callback function should have the following conventions: \startfunctioncall actual_name = function ( asked_name) \stopfunctioncall The \type{asked_name} is the \PDF\ or \DVI\ file for writing. \subsubsection{\luatex{find_format_file}} Your callback function should have the following conventions: \startfunctioncall actual_name = function ( asked_name) \stopfunctioncall The \type{asked_name} is a format file for reading (the format file for writing is always opened in the current directory). \subsubsection{\luatex{find_vf_file}} Like \luatex{find_font_file}, but for virtual fonts. This applies to both \ALEPH's \OVF\ files and traditional Knuthian \VF\ files. \subsubsection{\luatex{find_map_file}} Like \luatex{find_font_file}, but for map files. \subsubsection{\luatex{find_enc_file}} Like \luatex{find_font_file}, but for enc files. \subsubsection{\luatex{find_sfd_file}} Like \luatex{find_font_file}, but for subfont definition files. \subsubsection{\luatex{find_pk_file}} Like \luatex{find_font_file}, but for pk bitmap files. The argument \type{asked_name} is a bit special in this case. Its form is \starttyping dpi/.pk \stoptyping So you may be asked for \type{600dpi/manfnt.720pk}. It is up to you to find a \quote{reasonable} bitmap file to go with that specification. \subsubsection{\luatex{find_data_file}} Like \luatex{find_font_file}, but for embedded files (\tex{pdfobj file '...'}). \subsubsection{\luatex{find_opentype_file}} Like \luatex{find_font_file}, but for \OPENTYPE\ font files. \subsubsection{\luatex{find_truetype_file} and \luatex{find_type1_file}} Your callback function should have the following conventions: \startfunctioncall actual_name = function ( asked_name) \stopfunctioncall The \type{asked_name} is a font file. This callback is called while \LUATEX\ is building its internal list of needed font files, so the actual timing may surprise you. Your return value is later fed back into the matching \luatex{read_file} callback. Strangely enough, \luatex{find_type1_file} is also used for \OPENTYPE\ (\OTF) fonts. \subsubsection{\luatex{find_image_file}} Your callback function should have the following conventions: \startfunctioncall actual_name = function ( asked_name) \stopfunctioncall The \type{asked_name} is an image file. Your return value is used to open a file from the harddisk, so make sure you return something that is considered the name of a valid file by your operating system. \subsection[iocallback]{File reading callbacks} The behavior documented in this subsection is considered stable in the sense that there will not be backward-incompatible changes any more. \subsubsection{\luatex{open_read_file}} Your callback function should have the following conventions: \startfunctioncall
env = function ( file_name) \stopfunctioncall Argument: \startitemize \sym{file_name} The filename returned by a previous \luatex{find_read_file} or the return value of \luatex{kpse.find_file()} if there was no such callback defined. \stopitemize Return value: \startitemize \sym{env} This is a table containing at least one required and one optional callback function for this file. The required field is \luatex{reader} and the associated function will be called once for each new line to be read, the optional one is \luatex{close} that will be called once when \LUATEX\ is done with the file. \LUATEX\ never looks at the rest of the table, so you can use it to store your private per|-|file data. Both the callback functions will receive the table as their only argument. \stopitemize \subsubsubsection{\luatex{reader}} \LUATEX\ will run this function whenever it needs a new input line from the file. \startfunctioncall function(
env) return line end \stopfunctioncall Your function should return either a string or \type{nil}. The value \type{nil} signals that the end of file has occurred, and will make \TEX\ call the optional \luatex{close} function next. \subsubsubsection{\luatex{close}} \LUATEX\ will run this optional function when it decides to close the file. \startfunctioncall function(
env) end \stopfunctioncall Your function should not return any value. \subsubsection{General file readers} There is a set of callbacks for the loading of binary data files. These all use the same interface: \startfunctioncall function( name) return success, data, data_size end \stopfunctioncall The \type{name} will normally be a full path name as it is returned by either one of the file discovery callbacks or the internal version of \luatex{kpse.find_file()}. \startitemize \sym{success} Return \type{false} when a fatal error occurred (e.\,g.\ when the file cannot be found, after all). \sym{data} The bytes comprising the file. \sym{data_size} The length of the \type{data}, in bytes. \stopitemize Return an empty string and zero if the file was found but there was a reading problem. The list of functions is as follows: \starttabulate[|l|p|] \NC \luatex{read_font_file} \NC ofm or tfm files \NC\NR \NC \luatex{read_vf_file} \NC virtual fonts \NC\NR \NC \luatex{read_map_file} \NC map files \NC\NR \NC \luatex{read_enc_file} \NC encoding files \NC\NR \NC \luatex{read_sfd_file} \NC subfont definition files \NC\NR \NC \luatex{read_pk_file} \NC pk bitmap files \NC\NR \NC \luatex{read_data_file} \NC embedded files (\tex{pdfobj file ...}) \NC\NR \NC \luatex{read_truetype_file} \NC \TRUETYPE\ font files \NC\NR \NC \luatex{read_type1_file} \NC \TYPEONE\ font files \NC\NR \NC \luatex{read_opentype_file} \NC \OPENTYPE\ font files \NC\NR \stoptabulate \subsection{Data processing callbacks} \subsubsection{\luatex{process_input_buffer}} This callback allows you to change the contents of the line input buffer just before \LUATEX\ actually starts looking at it. \startfunctioncall function( buffer) return adjusted_buffer end \stopfunctioncall If you return \type{nil}, \LUATEX\ will pretend like your callback never happened. You can gain a small amount of processing time from that. This callback does not replace any internal code. \subsubsection{\luatex{process_output_buffer} (0.43)} This callback allows you to change the contents of the line output buffer just before \LUATEX\ actually starts writing it to a file as the result of a \tex{write} command. It is only called for output to an actual file (that is, excluding the log, the terminal, and \tex{write18} calls). \startfunctioncall function( buffer) return adjusted_buffer end \stopfunctioncall If you return \type{nil}, \LUATEX\ will pretend like your callback never happened. You can gain a small amount of processing time from that. This callback does not replace any internal code. \subsubsection{\luatex{token_filter}} This callback allows you to replace the way \LUATEX\ fetches lexical tokens. \startfunctioncall function() return
token end \stopfunctioncall The calling convention for this callback is a bit more complicated than for most other callbacks. The function should either return a \LUA\ table representing a valid to|-|be|-|processed token or tokenlist, or something else like \type{nil} or an empty table. If your \LUA\ function does not return a table representing a valid token, it will be immediately called again, until it eventually does return a useful token or tokenlist (or until you reset the callback value to nil). See the description of \luatex{token} for some handy functions to be used in conjunction with this callback. If your function returns a single usable token, then that token will be processed by \LUATEX\ immediately. If the function returns a token list (a table consisting of a list of consecutive token tables), then that list will be pushed to the input stack at a completely new token list level, with its token type set to \quote{inserted}. In either case, the returned token(s) will not be fed back into the callback function. Setting this callback to \type{false} has no effect (because otherwise nothing would happen, forever). \subsection{Node list processing callbacks} The description of nodes and node lists is in~\in{chapter}[nodes]. \subsubsection{\luatex{buildpage_filter}} This callback is called whenever \LUATEX\ is ready to move stuff to the main vertical list. You can use this callback to do specialized manipulation of the page building stage like imposition or column balancing. \startfunctioncall function( extrainfo) end \stopfunctioncall The string \type{extrainfo} gives some additional information about what \TEX's state is with respect to the \quote{current page}. The possible values are: \starttabulate[|lT|p|] \NC \ssbf value \NC \bf explanation \NC\NR \NC alignment \NC a (partial) alignment is being added \NC\NR \NC after_output \NC an output routine has just finished \NC\NR \NC box \NC a typeset box is being added \NC\NR %\NC pre_box \NC interline material is being added \NC\NR %\NC adjust \NC \tex{vadjust} material is being added \NC\NR \NC new_graf \NC the beginning of a new paragraph \NC\NR \NC vmode_par \NC \tex{par} was found in vertical mode \NC\NR \NC hmode_par \NC \tex{par} was found in horizontal mode \NC\NR \NC insert \NC an insert is added \NC\NR \NC penalty \NC a penalty (in vertical mode) \NC\NR \NC before_display \NC immediately before a display starts \NC\NR \NC after_display \NC a display is finished \NC\NR \NC end \NC \LUATEX\ is terminating (it's all over)\NC\NR \stoptabulate This callback does not replace any internal code. \subsubsection{\luatex{pre_linebreak_filter}} This callback is called just before \LUATEX\ starts converting a list of nodes into a stack of \tex{hbox}es, after the addition of \type{\parfillskip}. \startfunctioncall function( head, groupcode) return true | false | newhead end \stopfunctioncall The string called \type {groupcode} identifies the nodelist's context within \TEX's processing. The range of possibilities is given in the table below, but not all of those can actually appear in \luatex {pre_linebreak_filter}, some are for the \luatex {hpack_filter} and \luatex {vpack_filter} callbacks that will be explained in the next two paragraphs. \starttabulate[|lT|p|] \NC \ssbf value \NC \bf explanation \NC\NR \NC \NC main vertical list \NC\NR \NC hbox \NC \tex{hbox} in horizontal mode \NC\NR \NC adjusted_hbox\NC \tex{hbox} in vertical mode \NC\NR \NC vbox \NC \tex{vbox} \NC\NR \NC vtop \NC \tex{vtop} \NC\NR \NC align \NC \tex{halign} or \tex{valign} \NC\NR \NC disc \NC discretionaries \NC\NR \NC insert \NC packaging an insert \NC\NR \NC vcenter \NC \tex{vcenter} \NC\NR \NC local_box \NC \tex{localleftbox} or \tex{localrightbox} \NC\NR \NC split_off \NC top of a \tex{vsplit} \NC\NR \NC split_keep \NC remainder of a \tex{vsplit} \NC\NR \NC align_set \NC alignment cell \NC\NR \NC fin_row \NC alignment row \NC\NR \stoptabulate As for all the callbacks that deal with nodes, the return value can be one of three things: \startitemize \item boolean \type{true} signals succesful processing \item \type{} signals that the \quote{head} node should be replaced by the returned node \item boolean \type{false} signals that the \quote{head} node list should be ignored and flushed from memory \stopitemize This callback does not replace any internal code. \subsubsection{\luatex{linebreak_filter}} This callback replaces \LUATEX's line breaking algorithm. \startfunctioncall function( head, is_display) return newhead end \stopfunctioncall The returned node is the head of the list that will be added to the main vertical list, the boolean argument is true if this paragraph is interrupted by a following math display. If you return something that is not a \type{}, \LUATEX\ will apply the internal linebreak algorithm on the list that starts at \type{}. Otherwise, the \type{} you return is supposed to be the head of a list of nodes that are all allowed in vertical mode, and at least one of those has to represent a hbox. Failure to do so will result in a fatal error. Setting this callback to \type{false} is possible, but dangerous, because it is possible you will end up in an unfixable \quote{deadcycles loop}. \subsubsection{\luatex{post_linebreak_filter}} This callback is called just after \LUATEX\ has converted a list of nodes into a stack of \tex{hbox}es. \startfunctioncall function( head, groupcode) return true | false | newhead end \stopfunctioncall This callback does not replace any internal code. \subsubsection{\luatex{hpack_filter}} This callback is called when \TEX\ is ready to start boxing some horizontal mode material. Math items and line boxes are ignored at the moment. \startfunctioncall function( head, groupcode, size, packtype [, direction]) return true | false | newhead end \stopfunctioncall The \type{packtype} is either \type{additional} or \type{exactly}. If \type{additional}, then the \type{size} is a \tex{hbox spread ...} argument. If \type{exactly}, then the \type{size} is a \tex{hbox to ...}. In both cases, the number is in scaled points. The \type{direction} is either one of the three-letter direction specifier strings, or \type{nil} (added in 0.45). This callback does not replace any internal code. \subsubsection{\luatex{vpack_filter}} This callback is called when \TEX\ is ready to start boxing some vertical mode material. Math displays are ignored at the moment. This function is very similar to the \luatex{hpack_filter}. Besides the fact that it is called at different moments, there is an extra variable that matches \TEX's \tex{maxdepth} setting. \startfunctioncall function( head, groupcode, size, packtype, maxdepth [, direction]) return true | false | newhead end \stopfunctioncall This callback does not replace any internal code. \subsubsection{\luatex{pre_output_filter}} This callback is called when \TEX\ is ready to start boxing the box 255 for \tex{output}. \startfunctioncall function( head, groupcode, size, packtype, maxdepth [, direction]) return true | false | newhead end \stopfunctioncall This callback does not replace any internal code. \subsubsection{\luatex{hyphenate}} \startfunctioncall function( head, tail) end \stopfunctioncall No return values. This callback has to insert discretionary nodes in the node list it receives. Setting this callback to \type{false} will prevent the internal discretionary insertion pass. \subsubsection{\luatex{ligaturing}} \startfunctioncall function( head, tail) end \stopfunctioncall No return values. This callback has to apply ligaturing to the node list it receives. You don't have to worry about return values because the \type{head} node that is passed on to the callback is guaranteed not to be a glyph_node (if need be, a temporary node will be prepended), and therefore it cannot be affected by the mutations that take place. After the callback, the internal value of the \quote {tail of the list} will be recalculated. The \type{next} of \type{head} is guaranteed to be non-nil. The \type{next} of \type{tail} is guaranteed to be nil, and therefore the second callback argument can often be ignored. It is provided for orthogonality, and because it can sometimes be handy when special processing has to take place. Setting this callback to \type{false} will prevent the internal ligature creation pass. \subsubsection{\luatex{kerning}} \startfunctioncall function( head, tail) end \stopfunctioncall No return values. This callback has to apply kerning between the nodes in the node list it receives. See \type{ligaturing} for calling conventions. Setting this callback to \type{false} will prevent the internal kern insertion pass. \subsubsection{\luatex{mlist_to_hlist}} This callback replaces \LUATEX's math list to node list conversion algorithm. \startfunctioncall function( head, display_type, need_penalties) return newhead end \stopfunctioncall The returned node is the head of the list that will be added to the vertical or horizontal list, the string argument is either \quote{text} or \quote{display} depending on the current math mode, the boolean argument is \type{true} if penalties have to be inserted in this list, \type{false} otherwise. Setting this callback to \type{false} is bad, it will almost certainly result in an endless loop. \subsection{Information reporting callbacks} \subsubsection{\luatex{pre_dump} (0.61)} \startfunctioncall function() end \stopfunctioncall This function is called just before dumping to a format file starts. It does not replace any code and there are neither arguments nor return values. \subsubsection{\luatex{start_run}} \startfunctioncall function() end \stopfunctioncall This callback replaces the code that prints \LUATEX's banner. Note that for successful use, this callback has to be set in the lua initialization script, otherwise it will be seen only after the run has already started. \subsubsection{\luatex{stop_run}} \startfunctioncall function() end \stopfunctioncall This callback replaces the code that prints \LUATEX's statistics and \quote{output written to} messages. \subsubsection{\luatex{start_page_number}} \startfunctioncall function() end \stopfunctioncall Replaces the code that prints the \type{[} and the page number at the begin of \tex{shipout}. This callback will also override the printing of box information that normally takes place when \tex{tracingoutput} is positive. \subsubsection{\luatex{stop_page_number}} \startfunctioncall function() end \stopfunctioncall Replaces the code that prints the \type{]} at the end of \tex{shipout}. \subsubsection{\luatex{show_error_hook}} \startfunctioncall function() end \stopfunctioncall This callback is run from inside the \TEX\ error function, and the idea is to allow you to do some extra reporting on top of what \TEX\ already does (none of the normal actions are removed). You may find some of the values in the \luatex{status} table useful. This callback does not replace any internal code. \iffalse % this has been retracted for the moment \startitemize \sym{message} is the formal error message \TEX\ has given to the user. (the line after the '!'). \sym{indicator} is either a filename (when it is a string) or a location indicator (a number) that can mean lots of different things like a token list id or a \tex{read} number. \sym{lineno} is the current line number. \stopitemize This is an investigative item for 'testing the water' only. The final goal is the total replacement of \TEX's error handling routines, but that needs lots of adjustments in the web source because \TEX\ deals with errors in a somewhat haphazard fashion. This is why the exact definition of \type{indicator} is not given here. \fi \subsection{PDF-related callbacks} \subsubsection{\luatex{finish_pdffile}} \startfunctioncall function() end \stopfunctioncall This callback is called when all document pages are already written to the \PDF\ file and \LUATEX\ is about to finalize the output document structure. Its intended use is final update of \PDF\ dictionaries such as \type{/Catalog} or \type{/Info}. The callback does not replace any code. There are neither arguments nor return values. \subsection{Font-related callbacks} \subsubsection{\luatex{define_font}} \startfunctioncall function( name, size, id) return
font end \stopfunctioncall The string \type{name} is the filename part of the font specification, as given by the user. The number \type{size} is a bit special: \startitemize[packed] \item if it is positive, it specifies an \quote{at size} in scaled points. \item if it is negative, its absolute value represents a \quote{scaled} setting relative to the designsize of the font. \stopitemize The \type{id} is the internal number assigned to the font. The internal structure of the \type{font} table that is to be returned is explained in \in{chapter}[fonts]. That table is saved internally, so you can put extra fields in the table for your later \LUA\ code to use. Setting this callback to \type{false} is pointless as it will prevent font loading completely but will nevertheless generate errors. \section{The \luatex{epdf} library} The \type{epdf} library provides Lua bindings to many \PDF\ access functions that are defined by the poppler pdf viewer library (written in C$+{}+$ by Kristian H\o gsberg, based on xpdf by Derek Noonburg). Within \LUATEX\ (and \PDFTEX), xpdf functionality is being used since long time to embed \PDF\ files. The \type{epdf} library shall allow to scrutinize an external \PDF\ file. It gives access to its document structure, e.\,g., catalog, cross-reference table, individual pages, objects, annotations, info, and metadata. The \type{epdf} library is still in alpha state: \PDF\ access is currently read|-|only (it's not yet possible to alter a \PDF\ file or to assemble it from scratch), and many function bindings are still missing. For a start, a \PDF\ file is opened by \type{epdf.open()} with file name, e.\,g.: \starttyping doc = epdf.open("foo.pdf") \stoptyping This normally returns a \type{PDFDoc} userdata variable; but if the file could not be opened successfully, instead of a fatal error just the value \type{nil} is returned. All Lua functions in the \type{epdf} library are named after the poppler functions listed in the poppler header files for the various classes, e.\,g., files \type{PDFDoc.h}, \type{Dict.h}, and \type{Array.h}. These files can be found in the poppler subdirectory within the \LUATEX\ sources. Which functions are already implemented in the \type{epdf} library can be found in the \LUATEX\ source file \type{lepdflib.cc}. For using the \type{epdf} library, knowledge of the \PDF\ file architecture is indispensable. There are many different userdata types defined by the \type{epdf} library, currently these are \type{Annot}, \type{AnnotBorder}, \type{AnnotBorderStyle}, \type{Annots}, \type{Array}, \type{Catalog}, \type{EmbFile}, \type{Dict}, \type{GString}, \type{LinkDest}, \type{Object}, \type{ObjectStream}, \type{Page}, \type{PDFDoc}, \type{PDFRectangle}, \type{Ref}, \type{Stream}, \type{XRef}, and \type{XRefEntry}. All these userdata names and the Lua access functions closely resemble the classes naming from the poppler header files, including the choice of mixed upper and lower case letters. The Lua function calls use object-oriented syntax, e.\,g., the following calls return the \type{Page} object for page~1: \starttyping pageref = doc:getCatalog():getPageRef(1) pageobj = doc:getXRef():fetch(pageref.num, pageref.gen) \stoptyping But writing such chained calls is risky, as an intermediate function may return \type{nil} on error; therefore between function calls there should be Lua type checks (e.\,g., against \type{nil}) done. If a non-object item is requested (e.\,g., a \type{Dict} item by calling \type{page:getPieceInfo()}, cf.~\type{Page.h}) but not available, the Lua functions return \type{nil} (without error). If a function should return an \type{Object}, but it's not existing, a \type{Null} object is returned instead (also without error; this is in|-|line with poppler behavior). All library objects have a \type{__gc} metamethod for garbage collection. The \type{__tostring} metamethod gives the type name for each object. All object constructors: \startfunctioncall = epdf.open( PDF filename) = epdf.Annot(, , , ) = epdf.Annots(, , ) = epdf.Array() = epdf.Dict() = epdf.Object() = epdf.PDFRectangle() \stopfunctioncall \type{Annot} methods: \startfunctioncall = :isOK() = :getAppearance() = :getBorder() = :match() \stopfunctioncall \type{AnnotBorderStyle} methods: \startfunctioncall = :getWidth() \stopfunctioncall \type{Annots} methods: \startfunctioncall = :getNumAnnots() = :getAnnot() \stopfunctioncall \type{Array} methods: \startfunctioncall :incRef() :decRef() = :getLength() :add() = :get() = :getNF() = :getString() \stopfunctioncall \type{Catalog} methods: \startfunctioncall = :isOK() = :getNumPages() = :getPage() = :getPageRef() = :getBaseURI() = :readMetadata() = :getStructTreeRoot() = :findPage( object number, object generation) = :findDest( name) = :getDests() = :numEmbeddedFiles() = :embeddedFile() = :numJS() = :getJS() = :getOutline() = :getAcroForm() \stopfunctioncall \type{EmbFile} methods: \startfunctioncall = :name() = :description() = :size() = :modDate() = :createDate() = :checksum() = :mimeType() = :streamObject() = :isOk() \stopfunctioncall \type{Dict} methods: \startfunctioncall :incRef() :decRef() = :getLength() :add(, ) :set(, ) :remove() = :is() = :lookup() = :lookupNF() = :lookupInt(, ) = :getKey() = :getVal() = :getValNF() \stopfunctioncall \type{LinkDest} methods: \startfunctioncall = :isOK() = :getKind() = :getKindName() = :isPageRef() = :getPageNum() = :getPageRef() = :getLeft() = :getBottom() = :getRight() = :getTop() = :getZoom() = :getChangeLeft() = :getChangeTop() = :getChangeZoom() \stopfunctioncall \type{Object} methods: \startfunctioncall :initBool() :initInt() :initReal() :initString() :initName() :initNull() :initArray() :initDict() :initStream() :initRef( object number, object generation) :initCmd() :initError() :initEOF() = :fetch() = :getType() = :getTypeName() = :isBool() = :isInt() = :isReal() = :isNum() = :isString() = :isName() = :isNull() = :isArray() = :isDict() = :isStream() = :isRef() = :isCmd() = :isError() = :isEOF() = :isNone() = :getBool() = :getInt() = :getReal() = :getNum() = :getString() = :getName() = :getArray() = :getDict() = :getStream() = :getRef() = :getRefNum() = :getRefGen() = :getCmd() = :arrayGetLength() = :arrayAdd() = :arrayGet() = :arrayGetNF() = :dictGetLength() = :dictAdd(, ) = :dictSet(, ) = :dictLookup() = :dictLookupNF() = :dictgetKey() = :dictgetVal() = :dictgetValNF() = :streamIs() = :streamReset() = :streamGetChar() = :streamLookChar() = :streamGetPos() = :streamSetPos() = :streamGetDict() \stopfunctioncall \type{Page} methods: \startfunctioncall = :isOk() = :getNum() = :getMediaBox() = :getCropBox() = :isCropped() = :getMediaWidth() = :getMediaHeight() = :getCropWidth() = :getCropHeight() = :getBleedBox() = :getTrimBox() = :getArtBox() = :getRotate() = :getLastModified() = :getBoxColorInfo() = :getGroup() = :getMetadata() = :getPieceInfo() = :getSeparationInfo() = :getResourceDict() = :getAnnots() = :getLinks() = :getContents() \stopfunctioncall \type{PDFDoc} methods: \startfunctioncall = :isOk() = :getErrorCode() = :getErrorCodeName() = :getFileName() = :getXRef() = :getCatalog() = :getPageMediaWidth() = :getPageMediaHeight() = :getPageCropWidth() = :getPageCropHeight() = :getNumPages() = :readMetadata() = :getStructTreeRoot() = :findPage( object number, object generation) = :getLinks() = :findDest() = :isEncrypted() = :okToPrint() = :okToChange() = :okToCopy() = :okToAddNotes() = :isLinearized() = :getDocInfo() = :getDocInfoNF() = :getPDFMajorVersion() = :getPDFMinorVersion() \stopfunctioncall \type{PDFRectangle} methods: \startfunctioncall = :isValid() \stopfunctioncall %\type{Ref} methods: % %\startfunctioncall %\stopfunctioncall \type{Stream} methods: \startfunctioncall = :getKind() = :getKindName() = :reset() = :close() = :getChar() = :lookChar() = :getRawChar() = :getUnfilteredChar() = :unfilteredReset() = :getPos() = :isBinary() = :getUndecodedStream() = :getDict() \stopfunctioncall \type{XRef} methods: \startfunctioncall = :isOk() = :getErrorCode() = :isEncrypted() = :okToPrint() = :okToPrintHighRes() = :okToChange() = :okToCopy() = :okToAddNotes() = :okToFillForm() = :okToAccessibility() = :okToAssemble() = :getCatalog() = :fetch( object number, object generation) = :getDocInfo() = :getDocInfoNF() = :getNumObjects() = :getRootNum() = :getRootGen() = :getSize() = :getTrailerDict() \stopfunctioncall %*********************************************************************** \section{The \luatex{font} library} The font library provides the interface into the internals of the font system, and also it contains helper functions to load traditional \TEX\ font metrics formats. Other font loading functionality is provided by the \luatex{fontloader} library that will be discussed in the next section. \subsection{Loading a \TFM\ file} The behavior documented in this subsection is considered stable in the sense that there will not be backward-incompatible changes any more. \startfunctioncall
fnt = font.read_tfm( name, s) \stopfunctioncall The number is a bit special: \startitemize \item if it is positive, it specifies an \quote{at size} in scaled points. \item if it is negative, its absolute value represents a \quote{scaled} setting relative to the designsize of the font. \stopitemize The internal structure of the metrics font table that is returned is explained in \in{chapter}[fonts]. \subsection{Loading a \VF\ file} The behavior documented in this subsection is considered stable in the sense that there will not be backward-incompatible changes any more. \startfunctioncall
vf_fnt = font.read_vf( name, s) \stopfunctioncall The meaning of the number \type{s} and the format of the returned table are similar to the ones in the \luatex{read_tfm()} function. \subsection{The fonts array} The whole table of \TEX\ fonts is accessible from \LUA\ using a virtual array. \starttyping font.fonts[n] = { ... }
f = font.fonts[n] \stoptyping See \in{chapter}[fonts] for the structure of the tables. Because this is a virtual array, you cannot call \type{pairs} on it, but see below for the \type{font.each} iterator. The two metatable functions implementing the virtual array are: \startfunctioncall
f = font.getfont( n) font.setfont( n,
f) \stopfunctioncall Note that at the moment, each access to the \type{font.fonts} or call to \type{font.getfont} creates a lua table for the whole font. This process can be quite slow. In a later version of \LUATEX, this interface will change (it will start using userdata objects instead of actual tables). Also note the following: assignments can only be made to fonts that have already been defined in \TEX, but have not been accessed {\it at all\/} since that definition. This limits the usability of the write access to \type{font.fonts} quite a lot, a less stringent ruleset will likely be implemented later. \subsection{Checking a font's status} You can test for the status of a font by calling this function: \startfunctioncall f = font.frozen( n) \stopfunctioncall The return value is one of \type{true} (unassignable), \type{false} (can be changed) or \type{nil} (not a valid font at all). \subsection{Defining a font directly} You can define your own font into \luatex{font.fonts} by calling this function: \startfunctioncall i = font.define(
f) \stopfunctioncall The return value is the internal id number of the defined font (the index into \luatex{font.fonts}). If the font creation fails, an error is raised. The table is a font structure, as explained in \in{chapter}[fonts]. \subsection{Projected next font id} \startfunctioncall i = font.nextid() \stopfunctioncall This returns the font id number that would be returned by a \type{font.define} call if it was executed at this spot in the code flow. This is useful for virtual fonts that need to reference themselves. \subsection{Font id (0.47)} \startfunctioncall i = font.id( csname) \stopfunctioncall This returns the font id associated with \type{csname} string, or $-1$ if \type{csname} is not defined; new in 0.47. \subsection{Currently active font} \startfunctioncall i = font.current() font.current( i) \stopfunctioncall This gets or sets the currently used font number. \subsection{Maximum font id} \startfunctioncall i = font.max() \stopfunctioncall This is the largest used index in \type{font.fonts}. \subsection{Iterating over all fonts} \startfunctioncall for i,v in font.each() do ... end \stopfunctioncall This is an iterator over each of the defined \TEX\ fonts. The first returned value is the index in \type{font.fonts}, the second the font itself, as a \LUA\ table. The indices are listed incrementally, but they do not always form an array of consecutive numbers: in some cases there can be holes in the sequence. \section{The \luatex{fontloader} library (0.36)} \subsection{Getting quick information on a font} \startfunctioncall
info = fontloader.info( filename) \stopfunctioncall This function returns either \type{nil}, or a \type{table}, or an array of small tables (in the case of a TrueType collection). The returned table(s) will contain six fairly interesting information items from the font(s) defined by the file: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC fontname \NC string \NC the \POSTSCRIPT\ name of the font\NC\NR \NC fullname \NC string \NC the formal name of the font\NC\NR \NC familyname \NC string \NC the family name this font belongs to\NC\NR \NC weight \NC string \NC a string indicating the color value of the font\NC\NR \NC version \NC string \NC the internal font version\NC\NR \NC italicangle \NC float \NC the slant angle\NC\NR \stoptabulate Getting information through this function is (sometimes much) more efficient than loading the font properly, and is therefore handy when you want to create a dictionary of available fonts based on a directory contents. \subsection{Loading an \OPENTYPE\ or \TRUETYPE\ file} If you want to use an \OPENTYPE\ font, you have to get the metric information from somewhere. Using the \type{fontloader} library, the simplest way to get that information is thus: \starttyping function load_font (filename) local metrics = nil local font = fontloader.open(filename) if font then metrics = fontloader.to_table(font) fontloader.close(font) end return metrics end myfont = load_font('/opt/tex/texmf/fonts/data/arial.ttf') \stoptyping The main function call is \startfunctioncall f,
w = fontloader.open( filename) f,
w = fontloader.open( filename, fontname) \stopfunctioncall The first return value is a userdata representation of the font. The second return value is a table containing any warnings and errors reported by fontloader while opening the font. In normal typesetting, you would probably ignore the second argument, but it can be useful for debugging purposes. For \TRUETYPE\ collections (when filename ends in 'ttc') and \DFONT\ collections, you have to use a second string argument to specify which font you want from the collection. Use the \type{fontname} strings that are returned by \type{fontloader.info} for that. To turn the font into a table, \type{fontloader.to_table} is used on the font returned by \type{fontloader.open}. \startfunctioncall
f = fontloader.to_table( font) \stopfunctioncall This table cannot be used directly by \LUATEX\ and should be turned into another one as described in~\in{chapter}[fonts]. Do not forget to store the \type{fontname} value in the \type{psname} field of the metrics table to be returned to \LUATEX, otherwise the font inclusion backend will not be able to find the correct font in the collection. See \in{section}[fontloadertables] for details on the userdata object returned by \type{fontloader.open()} and the layout of the \type{metrics} table returned by \type{fontloader.to_table()}. The font file is parsed and partially interpreted by the font loading routines from \FONTFORGE. The file format can be \OPENTYPE, \TRUETYPE, \TRUETYPE\ Collection, \CFF, or \TYPEONE. There are a few advantages to this approach compared to reading the actual font file ourselves: \startitemize \item The font is automatically re|-|encoded, so that the \type{metrics} table for \TRUETYPE\ and \OPENTYPE\ fonts is using \UNICODE\ for the character indices. \item Many features are pre|-|processed into a format that is easier to handle than just the bare tables would be. \item \POSTSCRIPT|-|based \OPENTYPE\ fonts do not store the character height and depth in the font file, so the character boundingbox has to be calculated in some way. \item In the future, it may be interesting to allow \LUA\ scripts access to the font program itself, perhaps even creating or changing the font. \stopitemize A loaded font is discarded with: \startfunctioncall fontloader.close( font) \stopfunctioncall \subsection{Applying a \quote{feature file}} You can apply a \quote{feature file} to a loaded font: \startfunctioncall
errors = fontloader.apply_featurefile( font, filename) \stopfunctioncall A \quote{feature file} is a textual representation of the features in an \OPENTYPE\ font. See\crlf \hyphenatedurl {http://www.adobe.com/devnet/opentype/afdko/topic_feature_file_syntax.html}\crlf and\crlf \hyphenatedurl {http://fontforge.sourceforge.net/featurefile.html}\crlf for a more detailed description of feature files. If the function fails, the return value is a table containing any errors reported by fontloader while applying the feature file. On success, \type{nil} is returned. (the return value is new in 0.65) \subsection{Applying an \quote{\AFM\ file}} You can apply an \quote{\AFM\ file} to a loaded font: \startfunctioncall
errors = fontloader.apply_afmfile( font, filename) \stopfunctioncall An \AFM\ file is a textual representation of (some of) the meta information in a \TYPEONE\ font. See \hyphenatedurl{ftp://ftp.math.utah.edu/u/ma/hohn/linux/postscript/5004.AFM_Spec.pdf} for more information about afm files. Note: If you \type{fontloader.open()} a \TYPEONE\ file named \type{font.pfb}, the library will automatically search for and apply \type{font.afm} if it exists in the same directory as the file \type{font.pfb}. In that case, there is no need for an explicit call to \type{apply_afmfile()}. If the function fails, the return value is a table containing any errors reported by fontloader while applying the AFM file. On success, \type{nil} is returned. (the return value is new in 0.65) \subsection[fontloadertables]{Fontloader font tables} As mentioned earlier, the return value of \type{fontloader.open()} is a userdata object. In \LUATEX\ versions before 0.63, the only way to have access to the actual metrics was to call \type{fontloader.to_table()} on this object, returning the table structure that is explained in the following subsections. However, it turns out that the result from \type{fontloader.to_table()} sometimes needs very large amounts of memory (depending on the font's complexity and size) so starting with \LUATEX\ 0.63, it is possible to access the userdata object directly. In the \LUATEX\ 0.63.0, the following is implemented: \startitemize \item all top-level keys that would be returned by \type{to_table()} can also be accessed directly. \item the top-level key \quote{glyphs} returns a {\it virtual\/} array that allows indices from \type{0} to ($\type{f.glyphmax}-1$). \item the items in that virtual array (the actual glyphs) are themselves also userdata objects, and each has accessors for all of the keys explained in the section \quote{Glyph items} below. \item the top-level key \quote{subfonts} returns an {\it actual} array of userdata objects, one for each of the subfonts (or nil, if there are no subfonts). \stopitemize A short example may be helpful. This code generates a printout of all the glyph names in the font \type{PunkNova.kern.otf}: \starttyping local f = fontloader.open('PunkNova.kern.otf') print (f.fontname) local i = 0 while (i < f.glyphmax) do local g = f.glyphs[i] if g then print(g.name) end i = i + 1 end fontloader.close(f) \stoptyping In this case, the \LUATEX\ memory requirement stays below 100MB on the test computer, while the internal stucture generated by \type{to_table()} needs more than 2GB of memory (the font itself is 6.9MB in disk size). In \LUATEX\ 0.63 only the top-level font, the subfont table entries, and the glyphs are virtual objects, everything else still produces normal lua values and tables. In future versions, more return values may be replaced by userdata objects (as much as needed to keep the memory requirements in check). If you want to know the valid fields in a font or glyph structure, call the \type{fields} function on an object of a particular type (either glyph or font for now, more will be implemented later): \startfunctioncall
fields = fontloader.fields( font)
fields = fontloader.fields( font_glyph) \stopfunctioncall For instance: \startfunctioncall local fields = fontloader.fields(f) local fields = fontloader.fields(f.glyphs[0]) \stopfunctioncall \subsubsection{Table types} \subsubsubsection{Top-level} The top|-|level keys in the returned table are (the explanations in this part of the documentation are not yet finished): \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC table_version \NC number \NC indicates the metrics version (currently~0.3)\NC\NR \NC fontname \NC string \NC \POSTSCRIPT\ font name\NC\NR \NC fullname \NC string \NC official (human-oriented) font name\NC\NR \NC familyname \NC string \NC family name\NC\NR \NC weight \NC string \NC weight indicator\NC\NR \NC copyright \NC string \NC copyright information\NC\NR \NC filename \NC string \NC the file name\NC\NR \NC version \NC string \NC font version\NC\NR \NC italicangle \NC float \NC slant angle\NC\NR \NC units_per_em \NC number \NC 1000 for \POSTSCRIPT-based fonts, usually 2048 for \TRUETYPE\NC\NR \NC ascent \NC number \NC height of ascender in \type{units_per_em}\NC\NR \NC descent \NC number \NC depth of descender in \type{units_per_em}\NC\NR \NC upos \NC float \NC \NC\NR \NC uwidth \NC float \NC \NC\NR \NC uniqueid \NC number \NC \NC\NR \NC glyphcnt \NC number \NC number of included glyphs\NC\NR \NC glyphs \NC array \NC \NC\NR \NC glyphmax \NC number \NC maximum used index the glyphs array\NC\NR \NC hasvmetrics \NC number \NC \NC\NR \NC onlybitmaps \NC number \NC \NC\NR \NC serifcheck \NC number \NC \NC\NR \NC isserif \NC number \NC \NC\NR \NC issans \NC number \NC \NC\NR \NC encodingchanged \NC number \NC \NC\NR \NC strokedfont \NC number \NC \NC\NR \NC use_typo_metrics \NC number \NC \NC\NR \NC weight_width_slope_only \NC number \NC \NC\NR \NC head_optimized_for_cleartype \NC number \NC \NC\NR \NC uni_interp \NC enum \NC \type {unset}, \type {none}, \type {adobe}, \type {greek}, \type {japanese}, \type {trad_chinese}, \type {simp_chinese}, \type {korean}, \type {ams}\NC\NR \NC origname \NC string \NC the file name, as supplied by the user\NC\NR \NC map \NC table \NC \NC\NR \NC private \NC table \NC \NC\NR \NC xuid \NC string \NC \NC\NR \NC pfminfo \NC table \NC \NC\NR \NC names \NC table \NC \NC\NR \NC cidinfo \NC table \NC \NC\NR \NC subfonts \NC array \NC \NC\NR \NC commments \NC string \NC \NC\NR \NC fontlog \NC string \NC \NC\NR \NC cvt_names \NC string \NC \NC\NR \NC anchor_classes \NC table \NC \NC\NR \NC ttf_tables \NC table \NC \NC\NR \NC ttf_tab_saved \NC table \NC \NC\NR \NC kerns \NC table \NC \NC\NR \NC vkerns \NC table \NC \NC\NR \NC texdata \NC table \NC \NC\NR \NC lookups \NC table \NC \NC\NR \NC gpos \NC table \NC \NC\NR \NC gsub \NC table \NC \NC\NR \NC sm \NC table \NC \NC\NR \NC features \NC table \NC \NC\NR \NC mm \NC table \NC \NC\NR \NC chosenname \NC string \NC \NC\NR \NC macstyle \NC number \NC \NC\NR \NC fondname \NC string \NC \NC\NR \NC design_size \NC number \NC \NC\NR \NC fontstyle_id \NC number \NC \NC\NR \NC fontstyle_name \NC table \NC \NC\NR \NC design_range_bottom \NC number \NC \NC\NR \NC design_range_top \NC number \NC \NC\NR \NC strokewidth \NC float \NC \NC\NR \NC mark_classes \NC table \NC \NC\NR \NC creationtime \NC number \NC \NC\NR \NC modificationtime \NC number \NC \NC\NR \NC os2_version \NC number \NC \NC\NR \NC sfd_version \NC number \NC \NC\NR \NC math \NC table \NC \NC\NR \NC validation_state \NC table \NC \NC\NR \NC horiz_base \NC table \NC \NC\NR \NC vert_base \NC table \NC \NC\NR \NC extrema_bound \NC number \NC \NC\NR \stoptabulate \subsubsubsection{Glyph items} The \type{glyphs} is an array containing the per|-|character information (quite a few of these are only present if nonzero). \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC name \NC string \NC the glyph name\NC\NR \NC unicode \NC number \NC unicode code point, or -1\NC\NR \NC boundingbox \NC array \NC array of four numbers, see note below\NC\NR \NC width \NC number \NC only for horizontal fonts\NC\NR \NC vwidth \NC number \NC only for vertical fonts\NC\NR \NC lsidebearing \NC number \NC only if nonzero and not equal to boundingbox[1]\NC\NR \NC class \NC string \NC one of "none", "base", "ligature", "mark", "component" (if not present, the glyph class is \quote{automatic})\NC\NR \NC kerns \NC array \NC only for horizontal fonts, if set\NC\NR \NC vkerns \NC array \NC only for vertical fonts, if set\NC\NR \NC dependents \NC array \NC linear array of glyph name strings, only if nonempty\NC\NR \NC lookups \NC table \NC only if nonempty\NC\NR \NC ligatures \NC table \NC only if nonempty\NC\NR \NC anchors \NC table \NC only if set\NC\NR \NC comment \NC string \NC only if set\NC\NR \NC tex_height \NC number \NC only if set\NC\NR \NC tex_depth \NC number \NC only if set\NC\NR \NC italic_correction \NC number \NC only if set\NC\NR \NC top_accent \NC number \NC only if set\NC\NR \NC is_extended_shape \NC number \NC only if this character is part of a math extension list\NC\NR \NC altuni \NC table \NC alternate \UNICODE\ items \NC\NR \NC vert_variants \NC table \NC \NC \NR \NC horiz_variants \NC table \NC \NC \NR \NC mathkern \NC table \NC \NC \NR \stoptabulate On \type{boundingbox}: The boundingbox information for \TRUETYPE\ fonts and \TRUETYPE-based \OTF\ fonts is read directly from the font file. \POSTSCRIPT-based fonts do not have this information, so the boundingbox of traditional \POSTSCRIPT\ fonts is generated by interpreting the actual bezier curves to find the exact boundingbox. This can be a slow process, so starting from \LUATEX\ 0.45, the boundingboxes of \POSTSCRIPT-based \OTF\ fonts (and raw \CFF\ fonts) are calculated using an approximation of the glyph shape based on the actual glyph points only, instead of taking the whole curve into account. This means that glyphs that have missing points at extrema will have a too-tight boundingbox, but the processing is so much faster that in our opinion the tradeoff is worth it. The \type{kerns} and \type{vkerns} are linear arrays of small hashes: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC char \NC string \NC \NC\NR \NC off \NC number \NC \NC\NR \NC lookup \NC string \NC \NC\NR \stoptabulate The \type{lookups} is a hash, based on lookup subtable names, with the value of each key inside that a linear array of small hashes: % TODO: fix this description \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC type \NC enum \NC \type {position}, \type {pair}, \type {substitution}, \type {alternate}, \type {multiple}, \type {ligature}, \type {lcaret}, \type {kerning}, \type {vkerning}, \type {anchors}, \type {contextpos}, \type {contextsub}, \type {chainpos}, \type {chainsub}, \type {reversesub}, \type {max}, \type {kernback}, \type {vkernback} \NC\NR \NC specification \NC table \NC extra data \NC\NR \stoptabulate For the first seven values of \type{type}, there can be additional sub|-|information, stored in the sub-table \type{specification}: \starttabulate[|lT|l|p|] \NC \ssbf value \NC \bf type \NC \bf explanation \NC\NR \NC position \NC table \NC a table of the \type {offset_specs} type\NC\NR \NC pair \NC table \NC one string: \type {paired}, and an array of one or two \type {offset_specs} tables: \type{offsets}\NC\NR \NC substitution \NC table \NC one string: \type {variant}\NC\NR \NC alternate \NC table \NC one string: \type {components}\NC\NR \NC multiple \NC table \NC one string: \type {components}\NC\NR \NC ligature \NC table \NC two strings: \type {components}, \type {char}\NC\NR \NC lcaret \NC array \NC linear array of numbers\NC\NR \stoptabulate Tables for \type{offset_specs} contain up to four number|-|valued fields: \type{x} (a horizontal offset), \type{y} (a vertical offset), \type{h} (an advance width correction) and \type{v} (an advance height correction). The \type{ligatures} is a linear array of small hashes: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC lig \NC table \NC uses the same substructure as a single item in the \type{lookups} table explained above\NC\NR \NC char \NC string \NC \NC\NR \NC components \NC array \NC linear array of named components\NC\NR \NC ccnt \NC number \NC \NC\NR \stoptabulate The \type{anchor} table is indexed by a string signifying the anchor type, which is one of \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC mark \NC table \NC placement mark\NC\NR \NC basechar \NC table \NC mark for attaching combining items to a base char\NC\NR \NC baselig \NC table \NC mark for attaching combining items to a ligature\NC\NR \NC basemark \NC table \NC generic mark for attaching combining items to connect to\NC\NR \NC centry \NC table \NC cursive entry point\NC\NR \NC cexit \NC table \NC cursive exit point\NC\NR \stoptabulate The content of these is a short array of defined anchors, with the entry keys being the anchor names. For all except \type{baselig}, the value is a single table with this definition: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC x \NC number \NC x location\NC\NR \NC y \NC number \NC y location\NC\NR \NC ttf_pt_index \NC number \NC truetype point index, only if given\NC\NR \stoptabulate For \type{baselig}, the value is a small array of such anchor sets sets, one for each constituent item of the ligature. For clarification, an anchor table could for example look like this : \starttyping ['anchor'] = { ['basemark'] = { ['Anchor-7'] = { ['x']=170, ['y']=1080 } }, ['mark'] ={ ['Anchor-1'] = { ['x']=160, ['y']=810 }, ['Anchor-4'] = { ['x']=160, ['y']=800 } }, ['baselig'] = { [1] = { ['Anchor-2'] = { ['x']=160, ['y']=650 } }, [2] = { ['Anchor-2'] = { ['x']=460, ['y']=640 } } } } \stoptyping \subsubsubsection{map table} The top|-|level map is a list of encoding mappings. Each of those is a table itself. \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC enccount \NC number \NC \NC\NR \NC encmax \NC number \NC \NC\NR \NC backmax \NC number \NC \NC\NR \NC remap \NC table \NC \NC\NR \NC map \NC array \NC non|-|linear array of mappings\NC\NR \NC backmap \NC array \NC non|-|linear array of backward mappings\NC\NR \NC enc \NC table \NC \NC\NR \stoptabulate The \type{remap} table is very small: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC firstenc \NC number \NC \NC\NR \NC lastenc \NC number \NC \NC\NR \NC infont \NC number \NC \NC\NR \stoptabulate The \type{enc} table is a bit more verbose: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC enc_name \NC string \NC \NC\NR \NC char_cnt \NC number \NC \NC\NR \NC char_max \NC number \NC \NC\NR \NC unicode \NC array \NC of \UNICODE\ position numbers\NC\NR \NC psnames \NC array \NC of \POSTSCRIPT\ glyph names\NC\NR \NC builtin \NC number \NC \NC\NR \NC hidden \NC number \NC \NC\NR \NC only_1byte \NC number \NC \NC\NR \NC has_1byte \NC number \NC \NC\NR \NC has_2byte \NC number \NC \NC\NR \NC is_unicodebmp \NC number \NC only if nonzero\NC\NR \NC is_unicodefull \NC number \NC only if nonzero\NC\NR \NC is_custom \NC number \NC only if nonzero\NC\NR \NC is_original \NC number \NC only if nonzero\NC\NR \NC is_compact \NC number \NC only if nonzero\NC\NR \NC is_japanese \NC number \NC only if nonzero\NC\NR \NC is_korean \NC number \NC only if nonzero\NC\NR \NC is_tradchinese \NC number \NC only if nonzero [name?]\NC\NR \NC is_simplechinese \NC number \NC only if nonzero\NC\NR \NC low_page \NC number \NC \NC\NR \NC high_page \NC number \NC \NC\NR \NC iconv_name \NC string \NC \NC\NR \NC iso_2022_escape \NC string \NC \NC\NR \stoptabulate \subsubsubsection{private table} This is the font's private \POSTSCRIPT\ dictionary, if any. Keys and values are both strings. \subsubsubsection{cidinfo table} \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC registry \NC string \NC \NC\NR \NC ordering \NC string \NC \NC\NR \NC supplement \NC number \NC \NC\NR \NC version \NC number \NC \NC\NR \stoptabulate \subsubsubsection{pfminfo table} The \type{pfminfo} table contains most of the OS/2 information: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC pfmset \NC number \NC \NC\NR \NC winascent_add \NC number \NC \NC\NR \NC windescent_add \NC number \NC \NC\NR \NC hheadascent_add \NC number \NC \NC\NR \NC hheaddescent_add \NC number \NC \NC\NR \NC typoascent_add \NC number \NC \NC\NR \NC typodescent_add \NC number \NC \NC\NR \NC subsuper_set \NC number \NC \NC\NR \NC panose_set \NC number \NC \NC\NR \NC hheadset \NC number \NC \NC\NR \NC vheadset \NC number \NC \NC\NR \NC pfmfamily \NC number \NC \NC\NR \NC weight \NC number \NC \NC\NR \NC width \NC number \NC \NC\NR \NC avgwidth \NC number \NC \NC\NR \NC firstchar \NC number \NC \NC\NR \NC lastchar \NC number \NC \NC\NR \NC fstype \NC number \NC \NC\NR \NC linegap \NC number \NC \NC\NR \NC vlinegap \NC number \NC \NC\NR \NC hhead_ascent \NC number \NC \NC\NR \NC hhead_descent \NC number \NC \NC\NR \NC hhead_descent \NC number \NC \NC\NR \NC os2_typoascent \NC number \NC \NC\NR \NC os2_typodescent \NC number \NC \NC\NR \NC os2_typolinegap \NC number \NC \NC\NR \NC os2_winascent \NC number \NC \NC\NR \NC os2_windescent \NC number \NC \NC\NR \NC os2_subxsize \NC number \NC \NC\NR \NC os2_subysize \NC number \NC \NC\NR \NC os2_subxoff \NC number \NC \NC\NR \NC os2_subyoff \NC number \NC \NC\NR \NC os2_supxsize \NC number \NC \NC\NR \NC os2_supysize \NC number \NC \NC\NR \NC os2_supxoff \NC number \NC \NC\NR \NC os2_supyoff \NC number \NC \NC\NR \NC os2_strikeysize \NC number \NC \NC\NR \NC os2_strikeypos \NC number \NC \NC\NR \NC os2_family_class \NC number \NC \NC\NR \NC os2_xheight \NC number \NC \NC\NR \NC os2_capheight \NC number \NC \NC\NR \NC os2_defaultchar \NC number \NC \NC\NR \NC os2_breakchar \NC number \NC \NC\NR \NC os2_vendor \NC string \NC \NC\NR \NC codepages \NC table \NC A two-number array of encoded code pages\NC\NR \NC unicoderages \NC table \NC A four-number array of encoded unicode ranges\NC\NR \NC panose \NC table \NC \NC\NR \stoptabulate The \type{panose} subtable has exactly 10 string keys: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC familytype \NC string \NC Values as in the \OPENTYPE\ font specification: \type {Any}, \type {No Fit}, \type {Text and Display}, \type {Script}, \type {Decorative}, \type {Pictorial} \NC\NR \NC serifstyle \NC string \NC See the \OPENTYPE\ font specification for values\NC\NR \NC weight \NC string \NC id. \NC\NR \NC proportion \NC string \NC id. \NC\NR \NC contrast \NC string \NC id. \NC\NR \NC strokevariation \NC string \NC id. \NC\NR \NC armstyle \NC string \NC id. \NC\NR \NC letterform \NC string \NC id. \NC\NR \NC midline \NC string \NC id. \NC\NR \NC xheight \NC string \NC id. \NC\NR \stoptabulate \subsubsubsection{names table} Each item has two top|-|level keys: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC lang \NC string \NC language for this entry \NC\NR \NC names \NC table \NC \NC\NR \stoptabulate The \type{names} keys are the actual \TRUETYPE\ name strings. The possible keys are: \starttabulate[|lT|p|] \NC \ssbf key \NC \bf explanation \NC\NR \NC copyright \NC \NC\NR \NC family \NC \NC\NR \NC subfamily \NC \NC\NR \NC uniqueid \NC \NC\NR \NC fullname \NC \NC\NR \NC version \NC \NC\NR \NC postscriptname \NC \NC\NR \NC trademark \NC \NC\NR \NC manufacturer \NC \NC\NR \NC designer \NC \NC\NR \NC descriptor \NC \NC\NR \NC venderurl \NC \NC\NR \NC designerurl \NC \NC\NR \NC license \NC \NC\NR \NC licenseurl \NC \NC\NR \NC idontknow \NC \NC\NR \NC preffamilyname \NC \NC\NR \NC prefmodifiers \NC \NC\NR \NC compatfull \NC \NC\NR \NC sampletext \NC \NC\NR \NC cidfindfontname \NC \NC\NR \NC wwsfamily \NC \NC\NR \NC wwssubfamily \NC \NC\NR \stoptabulate \subsubsubsection{anchor_classes table} The anchor_classes classes: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC name \NC string \NC a descriptive id of this anchor class\NC\NR \NC lookup \NC string \NC \NC\NR \NC type \NC string \NC one of \type {mark}, \type {mkmk}, \type {curs}, \type {mklg} \NC\NR \stoptabulate % type is actually a lookup subtype, not a feature name. Officially, these strings % should be gpos_mark2mark etc. \subsubsubsection{gpos table} Th gpos table has one array entry for each lookup. (The \type {gpos_} prefix is somewhat redundant.) \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC type \NC string \NC one of \type {gpos_single}, \type {gpos_pair}, \type {gpos_cursive}, \type {gpos_mark2base},\crlf \type {gpos_mark2ligature}, \type {gpos_mark2mark}, \type {gpos_context},\crlf \type {gpos_contextchain} \NC\NR \NC flags \NC table \NC \NC\NR \NC name \NC string \NC \NC\NR \NC features \NC array \NC \NC\NR \NC subtables \NC array \NC \NC\NR \stoptabulate The flags table has a true value for each of the lookup flags that is actually set: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC r2l \NC boolean \NC \NC\NR \NC ignorebaseglyphs \NC boolean \NC \NC\NR \NC ignoreligatures \NC boolean \NC \NC\NR \NC ignorecombiningmarks \NC boolean \NC \NC\NR \NC mark_class \NC string \NC (new in 0.44)\NC\NR \stoptabulate The features subtable items of gpos have: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC tag \NC string \NC \NC\NR \NC scripts \NC table \NC \NC\NR \NC ismac \NC number \NC (only if true)\NC\NR \stoptabulate The scripts table within features has: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC script \NC string \NC \NC\NR \NC langs \NC array of strings \NC \NC\NR \stoptabulate The subtables table has: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC name \NC string \NC \NC\NR \NC suffix \NC string \NC (only if used)\NC\NR % used by gpos_single to get a default \NC anchor_classes \NC number \NC (only if used)\NC\NR \NC vertical_kerning \NC number \NC (only if used)\NC\NR \NC kernclass \NC table \NC (only if used)\NC\NR \stoptabulate The kernclass with subtables table has: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC firsts \NC array of strings \NC \NC\NR \NC seconds \NC array of strings \NC \NC\NR \NC lookup \NC string or array \NC associated lookup(s) \NC \NR \NC offsets \NC array of numbers \NC \NC\NR \stoptabulate \subsubsubsection{gsub table} This has identical layout to the \type{gpos} table, except for the type: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC type \NC string \NC one of \type {gsub_single}, \type {gsub_multiple}, \type {gsub_alternate}, \type {gsub_ligature},\crlf \type {gsub_context}, \type {gsub_contextchain}, \type {gsub_reversecontextchain} \NC\NR \stoptabulate \subsubsubsection{ttf_tables and ttf_tab_saved tables} \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC tag \NC string \NC \NC\NR \NC len \NC number \NC \NC\NR \NC maxlen \NC number \NC \NC\NR \NC data \NC number \NC \NC\NR \stoptabulate \subsubsubsection{sm table} \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC type \NC string \NC one of "indic", "context", "lig", "simple", "insert", "kern"\NC\NR \NC lookup \NC string \NC \NC\NR \NC flags \NC table \NC a set of boolean values with the keys : "vert", "descending", "always"\NC\NR \NC classes \NC table \NC an array of named classes \NC\NR \NC state \NC table \NC \NC\NR \stoptabulate The \type{state} table has: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC next \NC number \NC \NC \NR \NC flags \NC number \NC \NC \NR \NC context \NC table \NC A small table that has 'mark' and 'cur' as possible keys, with the values being lookup names. Only applies if the \type{sm.type} = \type{context}.\NC\NR \NC insert \NC table \NC A small table that has 'mark' and 'cur' as possible keys, with the values strings. Only applies if the \type{sm.type} = \type{insert}.\NC\NR \NC kern \NC table \NC A small array with kern data. Only applies if the \type{sm.type} = \type{kern}.\NC\NR \stoptabulate \subsubsubsection{features table} % handle_macfeat \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC feature \NC number \NC \NC \NR \NC ismutex \NC number \NC \NC \NR \NC default_setting \NC number \NC \NC \NR \NC strid \NC number \NC \NC \NR \NC featname \NC table \NC A set of mac names. macnames are like otfnames except that they also have an 'enc' field \NC \NR \NC settings \NC table \NC \NC \NR \stoptabulate The \type{settings} are: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC setting \NC number \NC \NC \NR \NC strid \NC number \NC \NC \NR \NC initially_enabled \NC number \NC \NC \NR \NC setname \NC table \NC A set of mac names. macnames are like otfnames except that they also have an 'enc' field \NC \NR \stoptabulate \subsubsubsection{mm table} \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC axes \NC table \NC array of axis names \NC \NR \NC instance_count \NC number \NC \NC \NR \NC positions \NC table \NC array of instance positions (\#axes * instances )\NC \NR \NC defweights \NC table \NC array of default weights for instances \NC \NR \NC cdv \NC string \NC \NC \NR \NC ndv \NC string \NC \NC \NR \NC axismaps \NC table \NC \NC \NR \NC named_instance_count \NC number \NC \NC \NR \NC named_instances \NC table \NC \NC \NR \NC apple \NC number \NC \NC \NR \stoptabulate The \type{axismaps}: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC blends \NC table \NC an array of blend points \NC \NR \NC designs \NC table \NC an array of design values \NC \NR \NC min \NC number \NC \NC \NR \NC def \NC number \NC \NC \NR \NC max \NC number \NC \NC \NR \NC axisnames \NC table \NC a set of mac names \NC \NR \stoptabulate The \type{named_instances} is an array of instances: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC names \NC table \NC a set of mac names \NC \NR \NC coords \NC table \NC an array of coordinates \NC \NR \stoptabulate \subsubsubsection{mark_classes table (0.44)} The keys in this table are mark class names, and the values are a space-separated string of glyph names in this class. Note: This table is indeed new in 0.44. The manual said it existed before then, but in practise it was missing due to a bug. \subsubsubsection{math table} \starttabulate[|lT|p|] \NC ScriptPercentScaleDown \NC \NC \NR \NC ScriptScriptPercentScaleDown \NC \NC \NR \NC DelimitedSubFormulaMinHeight \NC \NC \NR \NC DisplayOperatorMinHeight \NC \NC \NR \NC MathLeading \NC \NC \NR \NC AxisHeight \NC \NC \NR \NC AccentBaseHeight \NC \NC \NR \NC FlattenedAccentBaseHeight \NC \NC \NR \NC SubscriptShiftDown \NC \NC \NR \NC SubscriptTopMax \NC \NC \NR \NC SubscriptBaselineDropMin \NC \NC \NR \NC SuperscriptShiftUp \NC \NC \NR \NC SuperscriptShiftUpCramped \NC \NC \NR \NC SuperscriptBottomMin \NC \NC \NR \NC SuperscriptBaselineDropMax \NC \NC \NR \NC SubSuperscriptGapMin \NC \NC \NR \NC SuperscriptBottomMaxWithSubscript \NC \NC \NR \NC SpaceAfterScript \NC \NC \NR \NC UpperLimitGapMin \NC \NC \NR \NC UpperLimitBaselineRiseMin \NC \NC \NR \NC LowerLimitGapMin \NC \NC \NR \NC LowerLimitBaselineDropMin \NC \NC \NR \NC StackTopShiftUp \NC \NC \NR \NC StackTopDisplayStyleShiftUp \NC \NC \NR \NC StackBottomShiftDown \NC \NC \NR \NC StackBottomDisplayStyleShiftDown \NC \NC \NR \NC StackGapMin \NC \NC \NR \NC StackDisplayStyleGapMin \NC \NC \NR \NC StretchStackTopShiftUp \NC \NC \NR \NC StretchStackBottomShiftDown \NC \NC \NR \NC StretchStackGapAboveMin \NC \NC \NR \NC StretchStackGapBelowMin \NC \NC \NR \NC FractionNumeratorShiftUp \NC \NC \NR \NC FractionNumeratorDisplayStyleShiftUp \NC \NC \NR \NC FractionDenominatorShiftDown \NC \NC \NR \NC FractionDenominatorDisplayStyleShiftDown \NC \NC \NR \NC FractionNumeratorGapMin \NC \NC \NR \NC FractionNumeratorDisplayStyleGapMin \NC \NC \NR \NC FractionRuleThickness \NC \NC \NR \NC FractionDenominatorGapMin \NC \NC \NR \NC FractionDenominatorDisplayStyleGapMin \NC \NC \NR \NC SkewedFractionHorizontalGap \NC \NC \NR \NC SkewedFractionVerticalGap \NC \NC \NR \NC OverbarVerticalGap \NC \NC \NR \NC OverbarRuleThickness \NC \NC \NR \NC OverbarExtraAscender \NC \NC \NR \NC UnderbarVerticalGap \NC \NC \NR \NC UnderbarRuleThickness \NC \NC \NR \NC UnderbarExtraDescender \NC \NC \NR \NC RadicalVerticalGap \NC \NC \NR \NC RadicalDisplayStyleVerticalGap \NC \NC \NR \NC RadicalRuleThickness \NC \NC \NR \NC RadicalExtraAscender \NC \NC \NR \NC RadicalKernBeforeDegree \NC \NC \NR \NC RadicalKernAfterDegree \NC \NC \NR \NC RadicalDegreeBottomRaisePercent \NC \NC \NR \NC MinConnectorOverlap \NC \NC \NR \NC FractionDelimiterSize \NC (new in 0.47.0)\NC \NR \NC FractionDelimiterDisplayStyleSize \NC (new in 0.47.0)\NC \NR \stoptabulate \subsubsubsection{validation_state table} \starttabulate[|lT|p|] \NC \ssbf key \NC \bf explanation \NC\NR \NC bad_ps_fontname \NC \NC \NR \NC bad_glyph_table \NC \NC \NR \NC bad_cff_table \NC \NC \NR \NC bad_metrics_table \NC \NC \NR \NC bad_cmap_table \NC \NC \NR \NC bad_bitmaps_table \NC \NC \NR \NC bad_gx_table \NC \NC \NR \NC bad_ot_table \NC \NC \NR \NC bad_os2_version \NC \NC \NR \NC bad_sfnt_header \NC \NC \NR \stoptabulate \subsubsubsection{horiz_base and vert_base table} \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC tags \NC table \NC an array of script list tags\NC \NR \NC scripts \NC table \NC \NC \NR \stoptabulate The \type{scripts} subtable: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC baseline \NC table \NC \NC \NR \NC default_baseline \NC number \NC \NC \NR \NC lang \NC table \NC \NC \NR \stoptabulate The \type{lang} subtable: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC tag \NC string \NC a script tag \NC \NR \NC ascent \NC number \NC \NC \NR \NC descent \NC number \NC \NC \NR \NC features \NC table \NC \NC \NR \stoptabulate The \type{features} points to an array of tables with the same layout except that in those nested tables, the tag represents a language. \subsubsubsection{altuni table} An array of alternate \UNICODE\ values. Inside that array are hashes with: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC unicode \NC number \NC \NC \NR \NC variant \NC number \NC \NC \NR \stoptabulate \subsubsubsection{vert_variants and horiz_variants table} \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC variants \NC string \NC \NC \NR \NC italic_correction \NC number \NC \NC \NR \NC parts \NC table \NC \NC \NR \stoptabulate The \type{parts} table is an array of smaller tables: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC component \NC string \NC \NC \NR \NC extender \NC number \NC \NC \NR \NC start \NC number \NC \NC \NR \NC end \NC number \NC \NC \NR \NC advance \NC number \NC \NC \NR \stoptabulate \subsubsubsection{mathkern table} \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC top_right \NC table \NC \NC \NR \NC bottom_right \NC table \NC \NC \NR \NC top_left \NC table \NC \NC \NR \NC bottom_left \NC table \NC \NC \NR \stoptabulate Each of the subtables is an array of small hashes with two keys: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC height \NC number \NC \NC \NR \NC kern \NC number \NC \NC \NR \stoptabulate \subsubsubsection{kerns table} Substructure is identical to the per|-|glyph subtable. \subsubsubsection{vkerns table} Substructure is identical to the per|-|glyph subtable. \subsubsubsection{texdata table} \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC type \NC string \NC \type {unset}, \type {text}, \type {math}, \type {mathext}\NC\NR \NC params \NC array \NC 22 font numeric parameters\NC\NR \stoptabulate \subsubsubsection{lookups table} Top|-|level \type{lookups} is quite different from the ones at character level. The keys in this hash are strings, the values the actual lookups, represented as dictionary tables. \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC type \NC number \NC \NC\NR \NC format \NC enum \NC one of \type {glyphs}, \type {class}, \type {coverage}, \type {reversecoverage} \NC\NR \NC tag \NC string \NC \NC\NR \NC current_class \NC array \NC \NC\NR \NC before_class \NC array \NC \NC\NR \NC after_class \NC array \NC \NC\NR \NC rules \NC array \NC an array of rule items\NC\NR \stoptabulate Rule items have one common item and one specialized item: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC lookups \NC array \NC a linear array of lookup names\NC\NR \NC glyph \NC array \NC only if the parent's format is \type{glyph}\NC\NR \NC class \NC array \NC only if the parent's format is \type{glyph}\NC\NR \NC coverage \NC array \NC only if the parent's format is \type{glyph}\NC\NR \NC reversecoverage \NC array \NC only if the parent's format is \type{glyph}\NC\NR \stoptabulate A glyph table is: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC names \NC string \NC \NC\NR \NC back \NC string \NC \NC\NR \NC fore \NC string \NC \NC\NR \stoptabulate A class table is: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC current \NC array \NC of numbers \NC\NR \NC before \NC array \NC of numbers \NC\NR \NC after \NC array \NC of numbers \NC\NR \stoptabulate coverage: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC current \NC array \NC of strings \NC\NR \NC before \NC array \NC of strings\NC\NR \NC after \NC array \NC of strings \NC\NR \stoptabulate reversecoverage: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC current \NC array \NC of strings \NC\NR \NC before \NC array \NC of strings\NC\NR \NC after \NC array \NC of strings \NC\NR \NC replacements \NC string \NC \NC\NR \stoptabulate %*********************************************************************** \section{The \luatex{img} library} The \type{img} library can be used as an alternative to \tex{pdfximage} and \tex{pdfrefximage}, and the associated \quote {satellite} commands like \tex{pdfximagebbox}. Image objects can also be used within virtual fonts via the \type{image} command listed in~\in{section}[virtualfonts]. \subsection{\luatex{img.new}} \startfunctioncall var = img.new() var = img.new(
image_spec) \stopfunctioncall This function creates a userdata object of type \quote {image}. The \type{image_spec} argument is optional. If it is given, it must be a table, and that table must contain a \type{filename} key. A number of other keys can also be useful, these are explained below. You can either say \starttyping a = img.new() \stoptyping followed by \starttyping a.filename = "foo.png" \stoptyping or you can put the file name (and some or all of the other keys) into a table directly, like so: \starttyping a = img.new({filename='foo.pdf', page=1}) \stoptyping The generated \type{} userdata object allows access to a set of user|-|specified values as well as a set of values that are normally filled in and updated automatically by \LUATEX\ itself. Some of those are derived from the actual image file, others are updated to reflect the \PDF\ output status of the object. There is one required user-specified field: the file name (\type{filename}). It can optionally be augmented by the requested image dimensions (\type{width}, \type{depth}, \type{height}), user-specified image attributes (\type{attr}), the requested \PDF\ page identifier (\type{page}), the requested boundingbox (\type{pagebox}) for \PDF\ inclusion, the requested color space object (\type{colorspace}). The function \type{img.new} does not access the actual image file, it just creates the \type{} userdata object and initializes some memory structures. The \type{} object and its internal structures are automatically garbage collected. Once the image is scanned, all the values in the \type{} except \type{width}, \type{height} and \type{depth}, become frozen, and you cannot change them any more. \subsection{\luatex{img.keys}} \startfunctioncall
keys = img.keys() \stopfunctioncall This function returns a list of all the possible \type{image_spec} keys, both user-supplied and automatic ones. % hahe: i need to add r/w ro column... \starttabulate[|l|l|p|] \NC \bf field name\NC \bf type \NC description \NC \NR \NC attr \NC string \NC the image attributes for \LUATEX \NC \NR \NC bbox \NC table \NC table with 4 boundingbox dimensions \type{llx}, \type{lly}, \type{urx}, and \type{ury} overruling the \type{pagebox} entry\NC \NR \NC colordepth \NC number \NC the number of bits used by the color space\NC \NR \NC colorspace \NC number \NC the color space object number \NC \NR \NC depth \NC number \NC the image depth for \LUATEX\ (in scaled points)\NC \NR \NC filename \NC string \NC the image file name \NC \NR \NC filepath \NC string \NC the full (expanded) file name of the image\NC \NR \NC height \NC number \NC the image height for \LUATEX\ (in scaled points)\NC \NR \NC imagetype \NC string \NC one of \type{pdf}, \type{png}, \type{jpg}, \type{jp2}, \type{jbig2}, or \type{nil} \NC \NR \NC index \NC number \NC the \PDF\ image name suffix \NC \NR \NC objnum \NC number \NC the \PDF\ image object number \NC \NR \NC page \NC ?? \NC the identifier for the requested image page (type is number or string, default is the number 1)\NC \NR \NC pagebox \NC string \NC the requested bounding box, one of \type {none}, \type {media}, \type {crop}, \type {bleed}, \type {trim}, \type {art} \NC \NR \NC pages \NC number \NC the total number of available pages \NC \NR \NC rotation \NC number \NC the image rotation from included \PDF\ file, in multiples of 90~deg. \NC \NR \NC stream \NC string \NC the raw stream data for an \type{/Xobject} \type{/Form} object\NC \NR \NC transform \NC number \NC the image transform, integer number 0..7\NC \NR \NC width \NC number \NC the image width for \LUATEX\ (in scaled points)\NC \NR \NC xres \NC number \NC the horizontal natural image resolution (in \DPI) \NC \NR \NC xsize \NC number \NC the natural image width \NC \NR \NC yres \NC number \NC the vertical natural image resolution (in \DPI) \NC \NR \NC ysize \NC number \NC the natural image height \NC \NR \stoptabulate A running (undefined) dimension in \type{width}, \type{height}, or \type{depth} is represented as \type{nil} in \LUA, so if you want to load an image at its \quote {natural} size, you do not have to specify any of those three fields. The \type{stream} parameter allows to fabricate an \type{/XObject} \type{/Form} object from a string giving the stream contents, e.\,g., for a filled rectangle: \startfunctioncall a.stream = "0 0 20 10 re f" \stopfunctioncall When writing the image, an \type{/Xobject} \type{/Form} object is created, like with embedded \PDF\ file writing. The object is written out only once. The \type{stream} key requires that also the \type{bbox} table is given. The \type{stream} key conflicts with the \type{filename} key. The \type{transform} key works as usual also with \type{stream}. The \type{bbox} key needs a table with four boundingbox values, e.\,g.: \startfunctioncall a.bbox = {"30bp", 0, "225bp", "200bp"} \stopfunctioncall This replaces and overrules any given \type{pagebox} value; with given \type{bbox} the box dimensions coming with an embedded \PDF\ file are ignored. The \type{xsize} and \type{ysize} dimensions are set accordingly, when the image is scaled. The \type{bbox} parameter is ignored for non-\PDF\ images. The \type{transform} allows to mirror and rotate the image in steps of 90~deg. The default value~0 gives an unmirrored, unrotated image. Values 1|--|3 give counterclockwise rotation by 90, 180, or 270~degrees, whereas with values 4|--|7 the image is first mirrored and then rotated counterclockwise by 90, 180, or 270~degrees. The \type{transform} operation gives the same visual result as if you would externally preprocess the image by a graphics tool and then use it by \LUATEX. If a \PDF\ file to be embedded already contains a \type{/Rotate} specification, the rotation result is the combination of the \type{/Rotate} rotation followed by the \type{transform} operation. \subsection{\luatex{img.scan}} \startfunctioncall var = img.scan( var) var = img.scan(
image_spec) \stopfunctioncall When you say \type{img.scan(a)} for a new image, the file is scanned, and variables such as \type{xsize}, \type{ysize}, image \type{type}, number of \type{pages}, and the resolution are extracted. Each of the \type{width}, \type{height}, \type{depth} fields are set up according to the image dimensions, if they were not given an explicit value already. An image file will never be scanned more than once for a given image variable. With all subsequent \type{img.scan(a)} calls only the dimensions are again set up (if they have been changed by the user in the meantime). For ease of use, you can do right-away a \starttyping a = img.scan ({ filename = "foo.png" }) \stoptyping without a prior \type{img.new}. Nothing is written yet at this point, so you can do \type{a=img.scan}, retrieve the available info like image width and height, and then throw away \type{a} again by saying \type{a=nil}. In that case no image object will be reserved in the PDF, and the used memory will be cleaned up automatically. \subsection{\luatex{img.copy}} \startfunctioncall var = img.copy( var) var = img.copy(
image_spec) \stopfunctioncall If you say \type{a = b}, then both variables point to the same \type{} object. if you want to write out an image with different sizes, you can do a \type{b=img.copy(a)}. Afterwards, \type{a} and \type{b} still reference the same actual image dictionary, but the dimensions for \type{b} can now be changed from their initial values that were just copies from \type{a}. % Hartmut, I don't know if this makes sense. An example of what % can, and what cannot be changed would be helpful. % -- will think about it... \subsection{\luatex{img.write}} \startfunctioncall var = img.write( var) var = img.write(
image_spec) \stopfunctioncall By \type{img.write(a)} a \PDF\ object number is allocated, and a whatsit node of subtype \type{pdf_refximage} is generated and put into the output list. By this the image \type{a} is placed into the page stream, and the image file is written out into an image stream object after the shipping of the current page is finished. Again you can do a terse call like \starttyping img.write ({ filename = "foo.png" }) \stoptyping The \type{} variable is returned in case you want it for later processing. \subsection{\luatex{img.immediatewrite}} \startfunctioncall var = img.immediatewrite( var) var = img.immediatewrite(
image_spec) \stopfunctioncall By \type{img.immediatewrite(a)} a \PDF\ object number is allocated, and the image file for image \type{a} is written out immediately into the \PDF\ file as an image stream object (like with \tex{immediate}\tex{pdfximage}). The object number of the image stream dictionary is then available by the \type{objnum} key. No \type{pdf_refximage} whatsit node is generated. You will need an \luatex{img.write(a)} or \luatex{img.node(a)} call to let the image appear on the page, or reference it by another trick; else you will have a dangling image object in the \PDF\ file. Also here you can do a terse call like \starttyping a = img.immediatewrite ({ filename = "foo.png" }) \stoptyping The \type{} variable is returned and you will most likely need it. \subsection{\luatex{img.node}} \startfunctioncall n = img.node( var) n = img.node(
image_spec) \stopfunctioncall This function allocates a \PDF\ object number and returns a whatsit node of subtype \type{pdf_refximage}, filled with the image parameters \type{width}, \type{height}, \type{depth}, and \type{objnum}. Also here you can do a terse call like: \starttyping n = img.node ({ filename = "foo.png" }) \stoptyping This example outputs an image: \starttyping node.write(img.node{filename="foo.png"}) \stoptyping \subsection{\luatex{img.types}} \startfunctioncall
types = img.types() \stopfunctioncall This function returns a list with the supported image file type names, currently these are \type{pdf}, \type{png}, \type{jpg}, \type{jp2} (JPEG~2000), and \type{jbig2}. \subsection{\luatex{img.boxes}} \startfunctioncall
boxes = img.boxes() \stopfunctioncall This function returns a list with the supported \PDF\ page box names, currently these are \type {media}, \type {crop}, \type {bleed}, \type {trim}, and \type {art} (all in lowercase letters). %*********************************************************************** \section{The \luatex{kpse} library} This library provides two separate, but nearly identical interfaces to the \KPATHSEA\ file search functionality: there is a \quote{normal} procedural interface that shares its kpathsea instance with \LUATEX\ itself, and an object oriented interface that is completely on its own. The object oriented interface and \type{kpse.new} have been added in \LUATEX\ 0.37. \subsection{\luatex{kpse.set_program_name} and \luatex{kpse.new}} Before the search library can be used at all, its database has to be initialized. There are three possibilities, two of which belong to the procedural interface. First, when \LUATEX\ is used to typeset documents, this initialization happens automatically and the \KPATHSEA\ executable and program names are set to \type{luatex} (that is, unless explicitly prohibited by the user's startup script. See~\in{section}[init] for more details). Second, in \TEXLUA\ mode, the initialization has to be done explicitly via the \luatex{kpse.set_program_name} function, which sets the \KPATHSEA\ executable (and optionally program) name. \startfunctioncall kpse.set_program_name( name) kpse.set_program_name( name, progname) \stopfunctioncall The second argument controls the use of the \quote{dotted} values in the \type{texmf.cnf} configuration file, and defaults to the first argument. Third, if you prefer the object oriented interface, you have to call a different function. It has the same arguments, but it returns a userdata variable. \startfunctioncall local kpathsea = kpse.new( name) local kpathsea = kpse.new( name, progname) \stopfunctioncall Apart from these two functions, the calling conventions of the interfaces are identical. Depending on the chosen interface, you either call \type{kpse.find_file()} or \type{kpathsea:find_file()}, with identical arguments and return vales. \subsection{\luatex{find_file}} The most often used function in the library is find_file: \startfunctioncall f = kpse.find_file( filename) f = kpse.find_file( filename, ftype) f = kpse.find_file( filename, mustexist) f = kpse.find_file( filename, ftype, mustexist) f = kpse.find_file( filename, ftype, dpi) \stopfunctioncall Arguments: \startitemize[intro] \sym{filename} the name of the file you want to find, with or without extension. \sym{ftype} maps to the \type {-format} argument of \KPSEWHICH. The supported \type{ftype} values are the same as the ones supported by the standalone \type{kpsewhich} program: \startsimplecolumns \starttyping 'gf' 'pk' 'bitmap font' 'tfm' 'afm' 'base' 'bib' 'bst' 'cnf' 'ls-R' 'fmt' 'map' 'mem' 'mf' 'mfpool' 'mft' 'mp' 'mppool' 'MetaPost support' 'ocp' 'ofm' 'opl' 'otp' 'ovf' 'ovp' 'graphic/figure' 'tex' 'TeX system documentation' 'texpool' 'TeX system sources' 'PostScript header' 'Troff fonts' 'type1 fonts' 'vf' 'dvips config' 'ist' 'truetype fonts' 'type42 fonts' 'web2c files' 'other text files' 'other binary files' 'misc fonts' 'web' 'cweb' 'enc files' 'cmap files' 'subfont definition files' 'opentype fonts' 'pdftex config' 'lig files' 'texmfscripts' 'lua', 'font feature files', 'cid maps', 'mlbib', 'mlbst', 'clua', \stoptyping \stopsimplecolumns The default type is \type{tex}. Note: this is different from \KPSEWHICH, which tries to deduce the file type itself from looking at the supplied extension. The last four types: 'font feature files', 'cid maps', 'mlbib', 'mlbst' were new additions in \LUATEX\ 0.40.2. \sym{mustexist} is similar to \KPSEWHICH's \type{-must-exist}, and the default is \type{false}. If you specify \type{true} (or a non|-|zero integer), then the \KPSE\ library will search the disk as well as the \type {ls-R} databases. \sym{dpi} This is used for the size argument of the formats \type{pk}, \type{gf}, and \type{bitmap font}. \stopitemize \subsection{\luatex{lookup}} A more powerful (but slower) generic method for finding files is also available (since 0.51). It returns a string for each found file. \startfunctioncall f, ... = kpse.lookup( filename,
options) \stopfunctioncall The options match commandline arguments from \type{kpsewhich}: \starttabulate[|l|l|p|] \NC \ssbf key \NC \ssbf type \NC \ssbf description \NC \NR \NC debug \NC number \NC set debugging flags for this lookup\NC \NR \NC format \NC string \NC use specific file type (see list above)\NC \NR \NC dpi \NC number \NC use this resolution for this lookup; default 600\NC \NR \NC path \NC string \NC search in the given path\NC \NR \NC all \NC boolean \NC output all matches, not just the first\NC \NR \NC mustexist \NC boolean \NC (0.65 and higher) search the disk as well as ls-R if necessary\NC \NR \NC must-exist\NC boolean \NC (0.64 and lower) search the disk as well as ls-R if necessary\NC \NR \NC mktexpk \NC boolean \NC disable/enable mktexpk generation for this lookup\NC \NR \NC mktextex \NC boolean \NC disable/enable mktextex generation for this lookup\NC \NR \NC mktexmf \NC boolean \NC disable/enable mktexmf generation for this lookup\NC \NR \NC mktextfm \NC boolean \NC disable/enable mktextfm generation for this lookup\NC \NR \NC subdir \NC string or table \NC only output matches whose directory part ends with the given string(s) \NC \NR \stoptabulate \subsection{\luatex{init_prog}} Extra initialization for programs that need to generate bitmap fonts. \startfunctioncall kpse.init_prog( prefix, base_dpi, mfmode) kpse.init_prog( prefix, base_dpi, mfmode, fallback) \stopfunctioncall \subsection{\luatex{readable_file}} Test if an (absolute) file name is a readable file. \startfunctioncall f = kpse.readable_file( name) \stopfunctioncall The return value is the actual absolute filename you should use, because the disk name is not always the same as the requested name, due to aliases and system|-|specific handling under e.\,g.\ \MSDOS. Returns \lua {nil} if the file does not exist or is not readable. \subsection{\luatex{expand_path}} Like kpsewhich's \type {-expand-path}: \startfunctioncall r = kpse.expand_path( s) \stopfunctioncall \subsection{\luatex{expand_var}} Like kpsewhich's \type{-expand-var}: \startfunctioncall r = kpse.expand_var( s) \stopfunctioncall \subsection{\luatex{expand_braces}} Like kpsewhich's \type{-expand-braces}: \startfunctioncall r = kpse.expand_braces( s) \stopfunctioncall \subsection{\luatex{show_path}} Like kpsewhich's \type{-show-path}: \startfunctioncall r = kpse.show_path( ftype) \stopfunctioncall \subsection{\luatex{var_value}} Like kpsewhich's \type{-var-value}: \startfunctioncall r = kpse.var_value( s) \stopfunctioncall \subsection{\luatex{version}} Returns the kpathsea version string (new in 0.51) \startfunctioncall r = kpse.version() \stopfunctioncall \section{The \luatex{lang} library} This library provides the interface to \LUATEX's structure representing a language, and the associated functions. \startfunctioncall l = lang.new() l = lang.new( id) \stopfunctioncall This function creates a new userdata object. An object of type \type{} is the first argument to most of the other functions in the \luatex{lang} library. These functions can also be used as if they were object methods, using the colon syntax. Without an argument, the next available internal id number will be assigned to this object. With argument, an object will be created that links to the internal language with that id number. \startfunctioncall n = lang.id( l) \stopfunctioncall returns the internal \tex{language} id number this object refers to. \startfunctioncall n = lang.hyphenation( l) lang.hyphenation( l, n) \stopfunctioncall Either returns the current hyphenation exceptions for this language, or adds new ones. The syntax of the string is explained in~\in{section}[patternsexceptions]. \startfunctioncall lang.clear_hyphenation( l) \stopfunctioncall Clears the exception dictionary for this language. \startfunctioncall n = lang.clean( o) \stopfunctioncall Creates a hyphenation key from the supplied hyphenation value. The syntax of the argument string is explained in~\in{section}[patternsexceptions]. This function is useful if you want to do something else based on the words in a dictionary file, like spell-checking. \startfunctioncall n = lang.patterns( l) lang.patterns( l, n) \stopfunctioncall Adds additional patterns for this language object, or returns the current set. The syntax of this string is explained in~\in{section}[patternsexceptions]. \startfunctioncall lang.clear_patterns( l) \stopfunctioncall Clears the pattern dictionary for this language. \startfunctioncall n = lang.prehyphenchar( l) lang.prehyphenchar( l, n) \stopfunctioncall Gets or sets the \quote{pre|-|break} hyphen character for implicit hyphenation in this language (initially the hyphen, decimal 45). \startfunctioncall n = lang.posthyphenchar( l) lang.posthyphenchar( l, n) \stopfunctioncall Gets or sets the \quote{post|-|break} hyphen character for implicit hyphenation in this language (initially null, decimal~0, indicating emptiness). \startfunctioncall n = lang.preexhyphenchar( l) lang.preexhyphenchar( l, n) \stopfunctioncall Gets or sets the \quote{pre|-|break} hyphen character for explicit hyphenation in this language (initially null, decimal~0, indicating emptiness). \startfunctioncall n = lang.postexhyphenchar( l) lang.postexhyphenchar( l, n) \stopfunctioncall Gets or sets the \quote{post|-|break} hyphen character for explicit hyphenation in this language (initially null, decimal~0, indicating emptiness). \startfunctioncall success = lang.hyphenate( head) success = lang.hyphenate( head, tail) \stopfunctioncall Inserts hyphenation points (discretionary nodes) in a node list. If \type{tail} is given as argument, processing stops on that node. Currently, \type{success} is always true if \type{head} (and \type{tail}, if specified) are proper nodes, regardless of possible other errors. Hyphenation works only on \quote{characters}, a special subtype of all the glyph nodes with the node subtype having the value \type{1}. Glyph modes with different subtypes are not processed. See \in{section~}[charsandglyphs] for more details. \section{The \luatex{lua} library} This library contains one read|-|only item: \starttyping s = lua.version \stoptyping This returns the \LUA\ version identifier string. The value is currently \directlua {tex.print(lua.version)}. \subsection{\LUA\ bytecode registers} \LUA\ registers can be used to communicate \LUA\ functions across \LUA\ chunks. The accepted values for assignments are functions and \type{nil}. Likewise, the retrieved value is either a function or \type{nil}. \starttyping lua.bytecode[ n] = f lua.bytecode[ n]() \stoptyping The contents of the \luatex{lua.bytecode} array is stored inside the format file as actual \LUA\ bytecode, so it can also be used to preload \LUA\ code. Note: The function must not contain any upvalues. Currently, functions containing upvalues can be stored (and their upvalues are set to \type{nil}), but this is an artifact of the current \LUA\ implementation and thus subject to change. The associated function calls are \startfunctioncall f = lua.getbytecode( n) lua.setbytecode( n, f) \stopfunctioncall Note: Since a \LUA\ file loaded using \luatex{loadfile(filename)} is essentially an anonymous function, a complete file can be stored in a bytecode register like this: \startfunctioncall lua.bytecode[n] = loadfile(filename) \stopfunctioncall Now all definitions (functions, variables) contained in the file can be created by executing this bytecode register: \startfunctioncall lua.bytecode[n]() \stopfunctioncall Note that the path of the file is stored in the \LUA\ bytecode to be used in stack backtraces and therefore dumped into the format file if the above code is used in \INITEX. If it contains private information, i.e. the user name, this information is then contained in the format file as well. This should be kept in mind when preloading files into a bytecode register in \INITEX. \subsection{\LUA\ chunk name registers} There is an array of 65536 (0--65535) potential chunk names for use with the \type{\directlua} and \type{\latelua} primitives. \startfunctioncall lua.name[ n] = s s = lua.name[ n] \stopfunctioncall If you want to unset a lua name, you can assign \type{nil} to it. \section{The \luatex{mplib} library} The \MP\ library interface registers itself in the table \type{mplib}. It is based on \MPLIB\ version \ctxlua{tex.sprint(mplib.version())}. \subsection{\luatex{mplib.new}} To create a new \METAPOST\ instance, call \startfunctioncall mp = mplib.new({...}) \stopfunctioncall This creates the \type{mp} instance object. The argument hash can have a number of different fields, as follows: \starttabulate[|lT|l|p|p|] \NC \ssbf name \NC \bf type \NC \bf description \NC \bf default \NC\NR \NC error_line \NC number \NC error line width \NC 79 \NC\NR \NC print_line \NC number \NC line length in ps output \NC 100\NC\NR \NC random_seed \NC number \NC the initial random seed \NC variable\NC\NR \NC interaction \NC string \NC the interaction mode, one of \type {batch}, \type {nonstop}, \type {scroll}, \type {errorstop} \NC \type {errorstop}\NC\NR \NC job_name \NC string \NC \type {--jobname} \NC \type {mpout} \NC\NR \NC find_file \NC function \NC a function to find files \NC only local files\NC\NR \stoptabulate The \type{find_file} function should be of this form: \starttyping found = finder ( name, mode, type) \stoptyping with: \starttabulate[|lT|l|p|] \NC \bf name \NC \bf the requested file \NC \NR \NC mode \NC the file mode: \type {r} or \type {w} \NC \NR \NC type \NC the kind of file, one of: \type {mp}, \type {tfm}, \type {map}, \type {pfb}, \type {enc} \NC \NR \stoptabulate Return either the full pathname of the found file, or \type{nil} if the file cannot be found. Note that the new version of \MPLIB\ no longer uses binary mem files, so the way to preload a set of macros is simply to start off with an \type{input} command in the first \type{mp:execute()} call. \subsection{\luatex{mp:statistics}} You can request statistics with: \startfunctioncall
stats = mp:statistics() \stopfunctioncall This function returns the vital statistics for an \MPLIB\ instance. There are four fields, giving the maximum number of used items in each of four allocated object classes: \starttabulate[|lT|l|p|] \NC main_memory \NC number \NC memory size \NC\NR \NC hash_size \NC number \NC hash size\NC\NR \NC param_size \NC number \NC simultaneous macro parameters\NC\NR \NC max_in_open \NC number \NC input file nesting levels\NC\NR \stoptabulate Note that in the new version of \MPLIB, this is informational only. The objects are all allocated dynamically, so there is no chance of running out of space unless the available system memory is exhausted. \subsection{\luatex{mp:execute}} You can ask the \METAPOST\ interpreter to run a chunk of code by calling \startfunctioncall
rettable = mp:execute('metapost language chunk') \stopfunctioncall for various bits of \METAPOST\ language input. Be sure to check the \type{rettable.status} (see below) because when a fatal \METAPOST\ error occurs the \MPLIB\ instance will become unusable thereafter. Generally speaking, it is best to keep your chunks small, but beware that all chunks have to obey proper syntax, like each of them is a small file. For instance, you cannot split a single statement over multiple chunks. In contrast with the normal standalone \type{mpost} command, there is {\em no\/} implied \quote{input} at the start of the first chunk. \subsection{\luatex{mp:finish}} \startfunctioncall
rettable = mp:finish() \stopfunctioncall If for some reason you want to stop using an \MPLIB\ instance while processing is not yet actually done, you can call \type{mp:finish}. Eventually, used memory will be freed and open files will be closed by the \LUA\ garbage collector, but an explicit \type{mp:finish} is the only way to capture the final part of the output streams. \subsection{Result table} The return value of \type{mp:execute} and \type{mp:finish} is a table with a few possible keys (only \type {status} is always guaranteed to be present). \starttabulate[|l|l|p|] \NC log \NC string \NC output to the \quote {log} stream \NC \NR \NC term \NC string \NC output to the \quote {term} stream \NC \NR \NC error \NC string \NC output to the \quote {error} stream (only used for \quote {out of memory})\NC \NR \NC status \NC number \NC the return value: 0=good, 1=warning, 2=errors, 3=fatal error \NC \NR \NC fig \NC table \NC an array of generated figures (if any)\NC \NR \stoptabulate When \type{status} equals~3, you should stop using this \MPLIB\ instance immediately, it is no longer capable of processing input. If it is present, each of the entries in the \type{fig} array is a userdata representing a figure object, and each of those has a number of object methods you can call: \starttabulate[|l|l|p|] \NC boundingbox \NC function \NC returns the bounding box, as an array of 4 values\NC \NR \NC postscript \NC function \NC returns a string that is the ps output of the \type{fig}. this function accepts two optional integer arguments for specifying the values of \type{prologues} (first argument) and \type{procset} (second argument)\NC \NR \NC svg \NC function \NC returns a string that is the svg output of the \type{fig}. This function accepts an optional integer argument for specifying the value of \type{prologues}\NC \NR \NC objects \NC function \NC returns the actual array of graphic objects in this \type{fig} \NC \NR \NC copy_objects \NC function \NC returns a deep copy of the array of graphic objects in this \type{fig} \NC \NR \NC filename \NC function \NC the filename this \type{fig}'s \POSTSCRIPT\ output would have written to in standalone mode\NC \NR \NC width \NC function \NC the \type{charwd} value \NC \NR \NC height \NC function \NC the \type{charht} value \NC \NR \NC depth \NC function \NC the \type{chardp} value \NC \NR \NC italcorr \NC function \NC the \type{charit} value \NC \NR \NC charcode \NC function \NC the (rounded) \type{charcode} value \NC \NR \stoptabulate {\bf NOTE:} you can call \type{fig:objects()} only once for any one \type{fig} object! When the boundingbox represents a \quote {negated rectangle}, i.e.\ when the first set of coordinates is larger than the second set, the picture is empty. Graphical objects come in various types that each has a different list of accessible values. The types are: \type{fill}, \type{outline}, \type{text}, \type{start_clip}, \type{stop_clip}, \type{start_bounds}, \type{stop_bounds}, \type{special}. There is helper function (\type{mplib.fields(obj)}) to get the list of accessible values for a particular object, but you can just as easily use the tables given below. All graphical objects have a field \type{type} that gives the object type as a string value; it is not explicit mentioned in the following tables. In the following, \type{number}s are \POSTSCRIPT\ points represented as a floating point number, unless stated otherwise. Field values that are of type \type{table} are explained in the next section. \subsubsection{fill} \starttabulate[|l|l|p|] \NC path \NC table \NC the list of knots \NC \NR \NC htap \NC table \NC the list of knots for the reversed trajectory \NC \NR \NC pen \NC table \NC knots of the pen \NC \NR \NC color \NC table \NC the object's color \NC \NR \NC linejoin \NC number \NC line join style (bare number)\NC \NR \NC miterlimit \NC number \NC miterlimit\NC \NR \NC prescript \NC string \NC the prescript text \NC \NR \NC postscript \NC string \NC the postscript text \NC \NR \stoptabulate The entries \type{htap} and \type{pen} are optional. There is helper function (\type{mplib.pen_info(obj)}) that returns a table containing a bunch of vital characteristics of the used pen (all values are floats): \starttabulate[|l|l|p|] \NC width \NC number \NC width of the pen\NC \NR \NC sx \NC number \NC $x$ scale \NC \NR \NC rx \NC number \NC $xy$ multiplier \NC \NR \NC ry \NC number \NC $yx$ multiplier \NC \NR \NC sy \NC number \NC $y$ scale \NC \NR \NC tx \NC number \NC $x$ offset \NC \NR \NC ty \NC number \NC $y$ offset \NC \NR \stoptabulate \subsubsection{outline} \starttabulate[|l|l|p|] \NC path \NC table \NC the list of knots \NC \NR \NC pen \NC table \NC knots of the pen \NC \NR \NC color \NC table \NC the object's color \NC \NR \NC linejoin \NC number \NC line join style (bare number)\NC \NR \NC miterlimit \NC number \NC miterlimit \NC \NR \NC linecap \NC number \NC line cap style (bare number)\NC \NR \NC dash \NC table \NC representation of a dash list\NC \NR \NC prescript \NC string \NC the prescript text \NC \NR \NC postscript \NC string \NC the postscript text \NC \NR \stoptabulate The entry \type{dash} is optional. \subsubsection{text} \starttabulate[|l|l|p|] \NC text \NC string \NC the text \NC \NR \NC font \NC string \NC font tfm name \NC \NR \NC dsize \NC number \NC font size\NC \NR \NC color \NC table \NC the object's color \NC \NR \NC width \NC number \NC \NC \NR \NC height \NC number \NC \NC \NR \NC depth \NC number \NC \NC \NR \NC transform \NC table \NC a text transformation \NC \NR \NC prescript \NC string \NC the prescript text \NC \NR \NC postscript \NC string \NC the postscript text \NC \NR \stoptabulate \subsubsection{special} \starttabulate[|l|l|p|] \NC prescript \NC string \NC special text \NC \NR \stoptabulate \subsubsection{start_bounds, start_clip} \starttabulate[|l|l|p|] \NC path \NC table \NC the list of knots \NC \NR \stoptabulate \subsubsection{stop_bounds, stop_clip} Here are no fields available. \subsection{Subsidiary table formats} \subsubsection{Paths and pens} Paths and pens (that are really just a special type of paths as far as \MPLIB\ is concerned) are represented by an array where each entry is a table that represents a knot. \starttabulate[|lT|l|p|] \NC left_type \NC string \NC when present: 'endpoint', but usually absent \NC \NR \NC right_type \NC string \NC like \type{left_type}\NC \NR \NC x_coord \NC number \NC X coordinate of this knot\NC \NR \NC y_coord \NC number \NC Y coordinate of this knot\NC \NR \NC left_x \NC number \NC X coordinate of the precontrol point of this knot\NC \NR \NC left_y \NC number \NC Y coordinate of the precontrol point of this knot\NC \NR \NC right_x \NC number \NC X coordinate of the postcontrol point of this knot\NC \NR \NC right_y \NC number \NC Y coordinate of the postcontrol point of this knot\NC \NR \stoptabulate There is one special case: pens that are (possibly transformed) ellipses have an extra string-valued key \type{type} with value \type{elliptical} besides the array part containing the knot list. \subsubsection{Colors} A color is an integer array with 0, 1, 3 or 4 values: \starttabulate[|l|l|p|] \NC 0 \NC marking only \NC no values \NC\NR \NC 1 \NC greyscale \NC one value in the range $(0,1)$, \quote {black} is $0$ \NC\NR \NC 3 \NC \RGB \NC three values in the range $(0,1)$, \quote {black} is $0,0,0$ \NC\NR \NC 4 \NC \CMYK \NC four values in the range $(0,1)$, \quote {black} is $0,0,0,1$ \NC\NR \stoptabulate If the color model of the internal object was \type{uninitialized}, then it was initialized to the values representing \quote {black} in the colorspace \type{defaultcolormodel} that was in effect at the time of the \type{shipout}. \subsubsection{Transforms} Each transform is a six-item array. \starttabulate[|l|l|p|] \NC 1 \NC number \NC represents x \NC\NR \NC 2 \NC number \NC represents y \NC\NR \NC 3 \NC number \NC represents xx \NC\NR \NC 4 \NC number \NC represents yx \NC\NR \NC 5 \NC number \NC represents xy \NC\NR \NC 6 \NC number \NC represents yy \NC\NR \stoptabulate Note that the translation (index 1 and 2) comes first. This differs from the ordering in \POSTSCRIPT, where the translation comes last. \subsubsection{Dashes} Each \type{dash} is two-item hash, using the same model as \POSTSCRIPT\ for the representation of the dashlist. \type{dashes} is an array of \quote {on} and \quote {off}, values, and \type{offset} is the phase of the pattern. \starttabulate[|l|l|p|] \NC dashes \NC hash \NC an array of on-off numbers \NC\NR \NC offset \NC number \NC the starting offset value \NC\NR \stoptabulate \subsection{Character size information} These functions find the size of a glyph in a defined font. The \type{fontname} is the same name as the argument to \type{infont}; the \type{char} is a glyph id in the range 0 to 255; the returned \type{w} is in AFM units. \subsubsection{\luatex{mp:char_width}} \startfunctioncall w = mp:char_width( fontname, char) \stopfunctioncall \subsubsection{\luatex{mp:char_height}} \startfunctioncall w = mp:char_height( fontname, char) \stopfunctioncall \subsubsection{\luatex{mp:char_depth}} \startfunctioncall w = mp:char_depth( fontname, char) \stopfunctioncall \section{The \luatex{node} library} The \luatex{node} library contains functions that facilitate dealing with (lists of) nodes and their values. They allow you to create, alter, copy, delete, and insert \LUATEX\ node objects, the core objects within the typesetter. \LUATEX\ nodes are represented in \LUA\ as userdata with the metadata type \luatex{luatex.node}. The various parts within a node can be accessed using named fields. Each node has at least the three fields \type{next}, \type{id}, and \type{subtype}: \startitemize[intro] \item The \type{next} field returns the userdata object for the next node in a linked list of nodes, or \type{nil}, if there is no next node. \item The \type{id} indicates \TEX's \quote{node type}. The field \type{id} has a numeric value for efficiency reasons, but some of the library functions also accept a string value instead of \type{id}. \item The \type{subtype} is another number. It often gives further information about a node of a particular \type{id}, but it is most important when dealing with \quote{whatsits}, because they are differentiated solely based on their \type{subtype}. \stopitemize The other available fields depend on the \type{id} (and for \quote{whatsits}, the \type{subtype}) of the node. Further details on the various fields and their meanings are given in~\in{chapter}[nodes]. Support for \type{unset} (alignment) nodes is partial: they can be queried and modified from \LUA\ code, but not created. Nodes can be compared to each other, but: you are actually comparing indices into the node memory. This means that equality tests can only be trusted under very limited conditions. It will not work correctly in any situation where one of the two nodes has been freed and|/|or reallocated: in that case, there will be false positives. At the moment, memory management of nodes should still be done explicitly by the user. Nodes are not \quote{seen} by the \LUA\ garbage collector, so you have to call the node freeing functions yourself when you are no longer in need of a node (list). Nodes form linked lists without reference counting, so you have to be careful that when control returns back to \LUATEX\ itself, you have not deleted nodes that are still referenced from a \type{next} pointer elsewhere, and that you did not create nodes that are referenced more than once. There are statistics available with regards to the allocated node memory, which can be handy for tracing. \subsection{Node handling functions} \subsubsection{\luatex{node.is_node}} \startfunctioncall t = node.is_node( item) \stopfunctioncall This function returns true if the argument is a userdata object of type \type{}. \subsubsection{\luatex{node.types}} \startfunctioncall
t = node.types() \stopfunctioncall This function returns an array that maps node id numbers to node type strings, providing an overview of the possible top|-|level \type{id} types. \subsubsection{\luatex{node.whatsits}} \startfunctioncall
t = node.whatsits() \stopfunctioncall \TEX's \quote{whatsits} all have the same \type{id}. The various subtypes are defined by their \type{subtype} fields. The function is much like \luatex{node.types}, except that it provides an array of \type{subtype} mappings. \subsubsection{\luatex{node.id}} \startfunctioncall id = node.id( type) \stopfunctioncall This converts a single type name to its internal numeric representation. \subsubsection{\luatex{node.subtype}} \startfunctioncall subtype = node.subtype( type) \stopfunctioncall This converts a single whatsit name to its internal numeric representation (\type{subtype}). \subsubsection{\luatex{node.type}} \startfunctioncall type = node.type( n) \stopfunctioncall In the argument is a number, then this function converts an internal numeric representation to an external string representation. Otherwise, it will return the string \type{node} if the object represents a node (this is new in 0.65), and \type{nil} otherwise. \subsubsection{\luatex{node.fields}} \startfunctioncall
t = node.fields( id)
t = node.fields( id, subtype) \stopfunctioncall This function returns an array of valid field names for a particular type of node. If you want to get the valid fields for a \quote{whatsit}, you have to supply the second argument also. In other cases, any given second argument will be silently ignored. This function accepts string \type{id} and \type{subtype} values as well. \subsubsection{\luatex{node.has_field}} \startfunctioncall t = node.has_field( n, field) \stopfunctioncall This function returns a boolean that is only true if \type{n} is actually a node, and it has the field. \subsubsection{\luatex{node.new}} \startfunctioncall n = node.new( id) n = node.new( id, subtype) \stopfunctioncall Creates a new node. All of the new node's fields are initialized to either zero or \type{nil} except for \type{id} and \type{subtype} (if supplied). If you want to create a new whatsit, then the second argument is required, otherwise it need not be present. As with all node functions, this function creates a node on the \TEX\ level. This function accepts string \type{id} and \type{subtype} values as well. \subsubsection{\luatex{node.free}} \startfunctioncall node.free( n) \stopfunctioncall Removes the node \type{n} from \TEX's memory. Be careful: no checks are done on whether this node is still pointed to from a register or some \type{next} field: it is up to you to make sure that the internal data structures remain correct. \subsubsection{\luatex{node.flush_list}} \startfunctioncall node.flush_list( n) \stopfunctioncall Removes the node list \type{n} and the complete node list following \type{n} from \TEX's memory. Be careful: no checks are done on whether any of these nodes is still pointed to from a register or some \type{next} field: it is up to you to make sure that the internal data structures remain correct. \subsubsection{\luatex{node.copy}} \startfunctioncall m = node.copy( n) \stopfunctioncall Creates a deep copy of node \type{n}, including all nested lists as in the case of a hlist or vlist node. Only the \type{next} field is not copied. \subsubsection{\luatex{node.copy_list}} \startfunctioncall m = node.copy_list( n) m = node.copy_list( n, m) \stopfunctioncall Creates a deep copy of the node list that starts at \type{n}. If \type{m} is also given, the copy stops just before node \type{m}. Note that you cannot copy attribute lists this way, specialized functions for dealing with attribute lists will be provided later but are not there yet. However, there is normally no need to copy attribute lists as when you do assignments to the \type{attr} field or make changes to specific attributes, the needed copying and freeing takes place automatically. \subsubsection{\luatex{node.next} (0.65)} \startfunctioncall m = node.next( n) \stopfunctioncall Returns the node following this node, or \type{nil} if there is no such node. \subsubsection{\luatex{node.prev} (0.65)} \startfunctioncall m = node.prev( n) \stopfunctioncall Returns the node preceding this node, or \type{nil} if there is no such node. \subsubsection{\luatex{node.current_attr} (0.66)} \startfunctioncall m = node.current_attr() \stopfunctioncall Returns the currently active list of attributes, if there is one. Note: this function is somewhat experimental, and it returns the {\it actual} attribute list, not a copy thereof. Therefore, changing any of the attributes in the list will change these values for all nodes that have the current attribute list assigned to them. \subsubsection{\luatex{node.hpack}} \startfunctioncall h, b = node.hpack( n) h, b = node.hpack( n, w, info) h, b = node.hpack( n, w, info, dir) \stopfunctioncall This function creates a new hlist by packaging the list that begins at node \type{n} into a horizontal box. With only a single argument, this box is created using the natural width of its components. In the three argument form, \type{info} must be either \type{additional} or \type{exactly}, and \type{w} is the additional (\tex{hbox spread}) or exact (\tex{hbox to}) width to be used. Direction support added in \LUATEX\ 0.45. The second return value is the badness of the generated box, this extension was added in 0.51. Caveat: at this moment, there can be unexpected side|-|effects to this function, like updating some of the \tex{marks} and \tex{inserts}. Also note that the content of \type{h} is the original node list \type{n}: if you call \type{node.free(h)} you will also free the node list itself, unless you explicitly set the \type{list} field to \type{nil} beforehand. And in a similar way, calling \type{node.free(n)} will invalidate \type{h} as well! \subsubsection{\luatex{node.vpack} (since 0.36)} \startfunctioncall h, b = node.vpack( n) h, b = node.vpack( n, w, info) h, b = node.vpack( n, w, info, dir) \stopfunctioncall This function creates a new vlist by packaging the list that begins at node \type{n} into a vertical box. With only a single argument, this box is created using the natural height of its components. In the three argument form, \type{info} must be either \type{additional} or \type{exactly}, and \type{w} is the additional (\tex{vbox spread}) or exact (\tex{vbox to}) height to be used. Direction support added in \LUATEX\ 0.45. The second return value is the badness of the generated box, this extension was added in 0.51. See the description of \type{node.hpack()} for a few memory allocation caveats. \subsubsection{\luatex{node.dimensions} (0.43)} \startfunctioncall w, h, d = node.dimensions( n) w, h, d = node.dimensions( n, dir) w, h, d = node.dimensions( n, t) w, h, d = node.dimensions( n, t, dir) \stopfunctioncall This function calculates the natural in-line dimensions of the node list starting at node \type{n} and terminating just before node \type{t} (or the end of the list, if there is no second argument). The return values are scaled points. An alternative format that starts with glue parameters as the first three arguments is also possible: \startfunctioncall w, h, d = node.dimensions( glue_set, glue_sign, glue_order, n) w, h, d = node.dimensions( glue_set, glue_sign, glue_order, n, dir) w, h, d = node.dimensions( glue_set, glue_sign, glue_order, n, t) w, h, d = node.dimensions( glue_set, glue_sign, glue_order, n, t, dir) \stopfunctioncall This calling method takes glue settings into account and is especially useful for finding the actual width of a sublist of nodes that are already boxed, for example in code like this, which prints the width of the space inbetween the \type{a} and \type{b} as it would be if \type{\box0} was used as-is: \starttyping \setbox0 = \hbox to 20pt {a b} \directlua{print (node.dimensions(tex.box[0].glue_set, tex.box[0].glue_sign, tex.box[0].glue_order, tex.box[0].head.next, node.tail(tex.box[0].head))) } \stoptyping Direction support added in \LUATEX\ 0.45. \subsubsection{\luatex{node.mlist_to_hlist}} \startfunctioncall h = node.mlist_to_hlist( n, display_type, penalties) \stopfunctioncall This runs the internal mlist to hlist conversion, converting the math list in \type{n} into the horizontal list \type{h}. The interface is exactly the same as for the callback \type{mlist_to_hlist}. \subsubsection{\luatex{node.slide}} \startfunctioncall m = node.slide( n) \stopfunctioncall Returns the last node of the node list that starts at \type{n}. As a side|-|effect, it also creates a reverse chain of \type{prev} pointers between nodes. \subsubsection{\luatex{node.tail}} \startfunctioncall m = node.tail( n) \stopfunctioncall Returns the last node of the node list that starts at \type{n}. \subsubsection{\luatex{node.length}} \startfunctioncall i = node.length( n) i = node.length( n, m) \stopfunctioncall Returns the number of nodes contained in the node list that starts at \type{n}. If \type{m} is also supplied it stops at \type{m} instead of at the end of the list. The node \type{m} is not counted. \subsubsection{\luatex{node.count}} \startfunctioncall i = node.count( id, n) i = node.count( id, n, m) \stopfunctioncall Returns the number of nodes contained in the node list that starts at \type{n} that have a matching \type{id} field. If \type{m} is also supplied, counting stops at \type{m} instead of at the end of the list. The node \type{m} is not counted. This function also accept string \type{id}'s. \subsubsection{\luatex{node.traverse}} \startfunctioncall t = node.traverse( n) \stopfunctioncall This is an iterator that loops over the node list that starts at \type{n}. \subsubsection{\luatex{node.traverse_id}} \startfunctioncall t = node.traverse_id( id, n) \stopfunctioncall This is an iterator that loops over all the nodes in the list that starts at \type{n} that have a matching \type{id} field. \subsubsection{\luatex{node.remove}} \startfunctioncall head, current = node.remove( head, current) \stopfunctioncall This function removes the node \type{current} from the list following \type{head}. It is your responsibility to make sure it is really part of that list. The return values are the new \type{head} and \type{current} nodes. The returned \type{current} is the node following the \type{current} in the calling argument, and is only passed back as a convenience (or \type{nil}, if there is no such node). The returned \type{head} is more important, because if the function is called with \type{current} equal to \type{head}, it will be changed. \subsubsection{\luatex{node.insert_before}} \startfunctioncall head, new = node.insert_before( head, current, new) \stopfunctioncall This function inserts the node \type{new} before \type{current} into the list following \type{head}. It is your responsibility to make sure that \type{current} is really part of that list. The return values are the (potentially mutated) \type{head} and the node \type{new}, set up to be part of the list (with correct \type{next} field). If \type{head} is initially \type{nil}, it will become \type{new}. \subsubsection{\luatex{node.insert_after}} \startfunctioncall head, new = node.insert_after( head, current, new) \stopfunctioncall This function inserts the node \type{new} after \type{current} into the list following \type{head}. It is your responsibility to make sure that \type{current} is really part of that list. The return values are the \type{head} and the node \type{new}, set up to be part of the list (with correct \type{next} field). If \type{head} is initially \type{nil}, it will become \type{new}. \subsubsection{\luatex{node.first_glyph} (0.65)} \startfunctioncall n = node.first_glyph( n) n = node.first_glyph( n, m) \stopfunctioncall Returns the first node in the list starting at \type{n} that is a glyph node with a subtype indicating it is a glyph, or \type{nil}. If \type{m} is given, processing stops at (but including) that node, otherwise processing stops at the end of the list. Note: this function used to be called \type{first_character}. It has been renamed in \LUATEX\ 0.65, and the old name is deprecated now. \subsubsection{\luatex{node.ligaturing}} \startfunctioncall h, t, success = node.ligaturing( n) h, t, success = node.ligaturing( n, m) \stopfunctioncall Apply \TEX-style ligaturing to the specified nodelist. The tail node \type{m} is optional. The two returned nodes \type{h} and \type{t} are the new head and tail (both \type{n} and \type{m} can change into a new ligature). \subsubsection{\luatex{node.kerning}} \startfunctioncall h, t, success = node.kerning( n) h, t, success = node.kerning( n, m) \stopfunctioncall Apply \TEX|-|style kerning to the specified nodelist. The tail node \type{m} is optional. The two returned nodes \type{h} and \type{t} are the head and tail (either one of these can be an inserted kern node, because special kernings with word boundaries are possible). \subsubsection{\luatex{node.unprotect_glyphs}} \startfunctioncall node.unprotect_glyphs( n) \stopfunctioncall Subtracts 256 from all glyph node subtypes. This and the next function are helpers to convert from \type{characters} to \type{glyphs} during node processing. \subsubsection{\luatex{node.protect_glyphs}} \startfunctioncall node.protect_glyphs( n) \stopfunctioncall Adds 256 to all glyph node subtypes in the node list starting at \type{n}, except that if the value is 1, it adds only 255. The special handling of 1 means that \type{characters} will become \type{glyphs} after subtraction of 256. \subsubsection{\luatex{node.last_node}} \startfunctioncall n = node.last_node() \stopfunctioncall This function pops the last node from \TEX's \quote{current list}. It returns that node, or \type{nil} if the current list is empty. \subsubsection{\luatex{node.write}} \startfunctioncall node.write( n) \stopfunctioncall This is an experimental function that will append a node list to \TEX's \quote {current list} (the node list is not deep-copied any more since version 0.38). There is no error checking yet! \subsubsection{\luatex{node.protrusion_skippable} (0.60.1)} \startfunctioncall skippable = node.protrusion_skippable( n) \stopfunctioncall Returns \type{true} if, for the purpose of line boundary discovery when character protrusion is active, this node can be skipped. \subsection{Attribute handling} Attributes appear as linked list of userdata objects in the \type{attr} field of individual nodes. They can be handled individually, but it is much safer and more efficient to use the dedicated functions associated with them. \subsubsection{\luatex{node.has_attribute}} \startfunctioncall v = node.has_attribute( n, id) v = node.has_attribute( n, id, val) \stopfunctioncall Tests if a node has the attribute with number \type{id} set. If \type{val} is also supplied, also tests if the value matches \type{val}. It returns the value, or, if no match is found, \type{nil}. \subsubsection{\luatex{node.set_attribute}} \startfunctioncall node.set_attribute( n, id, val) \stopfunctioncall Sets the attribute with number \type{id} to the value \type{val}. Duplicate assignments are ignored. {\em [needs explanation]} \subsubsection{\luatex{node.unset_attribute}} \startfunctioncall v = node.unset_attribute( n, id) v = node.unset_attribute( n, id, val) \stopfunctioncall Unsets the attribute with number \type{id}. If \type{val} is also supplied, it will only perform this operation if the value matches \type{val}. Missing attributes or attribute|-|value pairs are ignored. If the attribute was actually deleted, returns its old value. Otherwise, returns \type{nil}. \section{The \luatex{pdf} library} This contains variables and functions that are related to the \PDF\ backend. %*********************************************************************** \subsection{\luatex{pdf.mapfile}, \luatex{pdf.mapline} (new in 0.53.0)} \startfunctioncall pdf.mapfile( map file) pdf.mapfile( map line) \stopfunctioncall These two functions can be used to replace primitives \type{\pdfmapfile} and \type{\pdfmapline} from \PDFTEX. They expect a string as only parameter and have no return value. The also functions replace the former variables \luatex{pdf.pdfmapfile} and \luatex{pdf.pdfmapline}. %*********************************************************************** \subsection{\luatex{pdf.catalog}, \luatex{pdf.info}, \luatex{pdf.names}, \luatex{pdf.trailer} (new in 0.53.0)} These variables offer a read-write interface to the corresponding \PDFTEX\ token lists. The value types are strings. The corresponding \quote{\type{pdf}} parameter names \luatex{pdf.pdfcatalog}, \luatex{pdf.pdfinfo}, \luatex{pdf.pdfnames}, and \luatex{pdf.pdftrailer} (all new in 0.47.0) still work, but are obsolescent (since 0.53.0). Note: this interface will almost certainly change in the future. %*********************************************************************** \subsection{\luatex{pdf.pageattributes}, \luatex{pdf.pageresources}, \luatex{pdf.pagesattributes} (new in 0.53.0)} These variables offer a read-write interface to related token lists. The value types are strings. The variables have no interaction with the corresponding \PDFTEX\ token registers \tex{pdfpageattr}, \tex{pdfpageresources}, and \tex{pdfpagesattr}, but they are written out to the \PDF\ file directly after the \PDFTEX\ token registers. %*********************************************************************** \subsection{\luatex{pdf.h}, \luatex{pdf.v}} These are the \type{h} and \type{v} values that define the current location on the output page, measured from its lower left corner. The values can be queried % and set using scaled points as units. %\starttyping %pdf.h %pdf.v %\stoptyping Note: this interface will almost certainly change in the future. % not implemented yet: % \subsection{\luatex{pdf.seth()}, \luatex{pdf.setv()}} % % The function calls for position setting, % associated with \type{pdf.h} and \type{pdf.v} are % % \startfunctioncall % pdf.seth( n) % n = pdf.h % pdf.setv( n) % n = pdf.v % \stopfunctioncall \subsection{\luatex{pdf.print}} A print function to write stuff to the \PDF\ document that can be used from within a \tex{latelua} argument. This function is not to be used inside \tex{directlua} unless you know {\it exactly} what you are doing. \startfunctioncall pdf.print( s) pdf.print( type, s) \stopfunctioncall The optional parameter can be used to mimic the behavior of \tex{pdfliteral}: the \type{type} is \type{direct} or \type{page}. \subsection{\luatex{pdf.immediateobj}} This function creates a \PDF\ object and immediately writes it to the \PDF\ file. It is modelled after \PDFTEX's \tex{immediate}\tex{pdfobj} primitives. All function variants return the object number of the newly generated object. \startfunctioncall n = pdf.immediateobj( objtext) n = pdf.immediateobj("file", filename) n = pdf.immediateobj("stream", streamtext, attrtext) n = pdf.immediateobj("streamfile", filename, attrtext) \stopfunctioncall The first version puts the \type{objtext} raw into an object. Only the object wrapper is automatically generated, but any internal structure (like \type{<< >>} dictionary markers) needs to provided by the user. The second version with keyword \type{"file"} as 1st argument puts the contents of the file with name \type{filename} raw into the object. The third version with keyword \type{"stream"} creates a stream object and puts the \type{streamtext} raw into the stream. The stream length is automatically calculated. The optional \type{attrtext} goes into the dictionary of that object. The fourth version with keyword \type{"streamfile"} does the same as the 3rd one, it just reads the stream data raw from a file. An optional first argument can be given to make the function use a previously reserved \PDF\ object. \startfunctioncall n = pdf.immediateobj( n, objtext) n = pdf.immediateobj( n, "file", filename) n = pdf.immediateobj( n, "stream", streamtext, attrtext) n = pdf.immediateobj( n, "streamfile", filename, attrtext) \stopfunctioncall %*********************************************************************** \subsection{\luatex{pdf.obj}} This function creates a \PDF\ object, which is written to the \PDF\ file only when referenced, e.\,g., by \luatex{pdf.refobj()}. All function variants return the object number of the newly generated object, and there are two separate calling modes. The first mode is modelled after \PDFTEX's \tex{pdfobj} primitive. \startfunctioncall n = pdf.obj( objtext) n = pdf.obj("file", filename) n = pdf.obj("stream", streamtext, attrtext) n = pdf.obj("streamfile", filename, attrtext) \stopfunctioncall An optional first argument can be given to make the function use a previously reserved \PDF\ object. \startfunctioncall n = pdf.obj( n, objtext) n = pdf.obj( n, "file", filename) n = pdf.obj( n, "stream", streamtext, attrtext) n = pdf.obj( n, "streamfile", filename, attrtext) \stopfunctioncall The second mode accepts a single argument table with key--value pairs. \startfunctioncall n = pdf.obj{ type = , immmediate = , objnum = , attr = , compresslevel = , objcompression = , file = , string = } \stopfunctioncall The \type{type} field can have the values \type{raw} and \type{stream}, this field is required, the others are optional (within constraints). Note: this mode makes \type{pdf.obj} look more flexible than it actually is: the constraints from the separate parameter version still apply, so for example you can't have both \type{string} and \type{file} at the same time. %*********************************************************************** \subsection{\luatex{pdf.refobj}} This function, the \LUA\ version of the \tex{pdfrefobj} primitive, references an object by its object number, so that the object will be written out. \startfunctioncall pdf.refobj( n) \stopfunctioncall This function works in both the \tex{directlua} and \tex{latelua} environment. Inside \tex{directlua} a new whatsit node \quote{pdf_refobj} is created, which will be marked for flushing during page output and the object is then written directly after the page, when also the resources objects are written out. Inside \tex{latelua} the object will be marked for flushing. This function has no return values. %*********************************************************************** \subsection{\luatex{pdf.reserveobj}} This function creates an empty \PDF\ object and returns its number. \startfunctioncall n = pdf.reserveobj() n = pdf.reserveobj("annot") \stopfunctioncall \subsection{\luatex{pdf.registerannot} (new in 0.47.0)} This function adds an object number to the \type{/Annots} array for the current page without doing anything else. This function can only be used from within \type{\latelua}. \startfunctioncall pdf.registerannot ( objnum) \stopfunctioncall \section{The \luatex{status} library} This contains a number of run|-|time configuration items that you may find useful in message reporting, as well as an iterator function that gets all of the names and values as a table. \startfunctioncall
info = status.list() \stopfunctioncall The keys in the table are the known items, the value is the current value. Almost all of the values in \type{status} are fetched through a metatable at run|-|time whenever they are accessed, so you cannot use \type{pairs} on \type{status}, but you {\it can\/} use \type{pairs} on \type{info}, of course. If you do not need the full list, you can also ask for a single item by using its name as an index into \type{status}. The current list is: \starttabulate[|lT|p|] \NC \ssbf key \NC \bf explanation \NC\NR \NC pdf_gone\NC written \PDF\ bytes \NC \NR \NC pdf_ptr\NC not yet written \PDF\ bytes \NC \NR \NC dvi_gone\NC written \DVI\ bytes \NC \NR \NC dvi_ptr\NC not yet written \DVI\ bytes \NC \NR \NC total_pages\NC number of written pages \NC \NR \NC output_file_name\NC name of the \PDF\ or \DVI\ file \NC \NR \NC log_name\NC name of the log file \NC \NR \NC banner\NC terminal display banner \NC \NR \NC var_used\NC variable (one|-|word) memory in use \NC \NR \NC dyn_used\NC token (multi|-|word) memory in use \NC \NR \NC str_ptr\NC number of strings \NC \NR \NC init_str_ptr\NC number of \INITEX\ strings \NC \NR \NC max_strings\NC maximum allowed strings \NC \NR \NC pool_ptr\NC string pool index \NC \NR \NC init_pool_ptr\NC \INITEX\ string pool index \NC \NR \NC pool_size\NC current size allocated for string characters \NC \NR \NC node_mem_usage\NC a string giving insight into currently used nodes\NC\NR \NC var_mem_max\NC number of allocated words for nodes\NC \NR \NC fix_mem_max\NC number of allocated words for tokens\NC \NR \NC fix_mem_end\NC maximum number of used tokens\NC \NR \NC cs_count\NC number of control sequences \NC \NR \NC hash_size\NC size of hash \NC \NR \NC hash_extra\NC extra allowed hash \NC \NR \NC font_ptr\NC number of active fonts \NC \NR \NC max_in_stack\NC max used input stack entries \NC \NR \NC max_nest_stack\NC max used nesting stack entries \NC \NR \NC max_param_stack\NC max used parameter stack entries \NC \NR \NC max_buf_stack\NC max used buffer position \NC \NR \NC max_save_stack\NC max used save stack entries \NC \NR \NC stack_size\NC input stack size \NC \NR \NC nest_size\NC nesting stack size \NC \NR \NC param_size\NC parameter stack size \NC \NR \NC buf_size\NC current allocated size of the line buffer \NC \NR \NC save_size\NC save stack size \NC \NR \NC obj_ptr\NC max \PDF\ object pointer \NC \NR \NC obj_tab_size\NC \PDF\ object table size \NC \NR \NC pdf_os_cntr\NC max \PDF\ object stream pointer \NC \NR \NC pdf_os_objidx\NC \PDF\ object stream index \NC \NR \NC pdf_dest_names_ptr\NC max \PDF\ destination pointer \NC \NR \NC dest_names_size\NC \PDF\ destination table size \NC \NR \NC pdf_mem_ptr\NC max \PDF\ memory used \NC \NR \NC pdf_mem_size\NC \PDF\ memory size \NC \NR \NC largest_used_mark\NC max referenced marks class \NC \NR \NC filename\NC name of the current input file \NC \NR \NC inputid\NC numeric id of the current input \NC \NR \NC linenumber\NC location in the current input file\NC \NR \NC lasterrorstring\NC last error string\NC \NR \NC luabytecodes\NC number of active \LUA\ bytecode registers\NC \NR \NC luabytecode_bytes\NC number of bytes in \LUA\ bytecode registers\NC \NR \NC luastate_bytes\NC number of bytes in use by \LUA\ interpreters\NC \NR \NC output_active\NC \type{true} if the \tex{output} routine is active\NC \NR \NC callbacks\NC total number of executed callbacks so far\NC \NR \NC indirect_callbacks\NC number of those that were themselves a result of other callbacks (e.g. file readers)\NC \NR \NC luatex_svn\NC the luatex repository id (added in 0.51)\NC\NR \NC luatex_version\NC the luatex version number (added in 0.38)\NC\NR \NC luatex_revision\NC the luatex revision string (added in 0.38)\NC\NR \NC ini_version\NC \type{true} if this is an \INITEX\ run (added in 0.38)\NC\NR \stoptabulate \section{The \luatex{tex} library} The \luatex{tex} table contains a large list of virtual internal \TEX\ parameters that are partially writable. The designation \quote{virtual} means that these items are not properly defined in \LUA, but are only front\-ends that are handled by a metatable that operates on the actual \TEX\ values. As a result, most of the \LUA\ table operators (like \type{pairs} and \type{#}) do not work on such items. At the moment, it is possible to access almost every parameter that has these characteristics: \startitemize[packed] \item You can use it after \tex{the} \item It is a single token. \item Some special others, see the list below \stopitemize This excludes parameters that need extra arguments, like \tex{the}\tex{scriptfont}. The subset comprising simple integer and dimension registers are writable as well as readable (stuff like \tex{tracingcommands} and \tex{parindent}). \subsection{Internal parameter values} For all the parameters in this section, it is possible to access them directly using their names as index in the \type{tex} table, or by using one of the functions \type{tex.get()} and \type{tex.set()}. The exact parameters and return values differ depending on the actual parameter, and so does whether \type{tex.set} has any effect. For the parameters that {\it can\/} be set, it is possible to use \type{'global'} as the first argument to \type{tex.set}; this makes the assignment global instead of local. \startfunctioncall tex.set ( n, ...) tex.set ('global', n, ...) ... = tex.get ( n) \stopfunctioncall \subsubsection{Integer parameters} The integer parameters accept and return \LUA\ numbers. Read-write: \startcolumns[n=2] \starttyping tex.adjdemerits tex.binoppenalty tex.brokenpenalty tex.catcodetable tex.clubpenalty tex.day tex.defaulthyphenchar tex.defaultskewchar tex.delimiterfactor tex.displaywidowpenalty tex.doublehyphendemerits tex.endlinechar tex.errorcontextlines tex.escapechar tex.exhyphenpenalty tex.fam tex.finalhyphendemerits tex.floatingpenalty tex.globaldefs tex.hangafter tex.hbadness tex.holdinginserts tex.hyphenpenalty tex.interlinepenalty tex.language tex.lastlinefit tex.lefthyphenmin tex.linepenalty tex.localbrokenpenalty tex.localinterlinepenalty tex.looseness tex.mag tex.maxdeadcycles tex.month tex.newlinechar tex.outputpenalty tex.pausing tex.pdfadjustspacing tex.pdfcompresslevel tex.pdfdecimaldigits tex.pdfgamma tex.pdfgentounicode tex.pdfimageapplygamma tex.pdfimagegamma tex.pdfimagehicolor tex.pdfimageresolution tex.pdfinclusionerrorlevel tex.pdfminorversion tex.pdfobjcompresslevel tex.pdfoutput tex.pdfpagebox tex.pdfpkresolution tex.pdfprotrudechars tex.pdftracingfonts tex.pdfuniqueresname tex.postdisplaypenalty tex.predisplaydirection tex.predisplaypenalty tex.pretolerance tex.relpenalty tex.righthyphenmin tex.savinghyphcodes tex.savingvdiscards tex.showboxbreadth tex.showboxdepth tex.time tex.tolerance tex.tracingassigns tex.tracingcommands tex.tracinggroups tex.tracingifs tex.tracinglostchars tex.tracingmacros tex.tracingnesting tex.tracingonline tex.tracingoutput tex.tracingpages tex.tracingparagraphs tex.tracingrestores tex.tracingscantokens tex.tracingstats tex.uchyph tex.vbadness tex.widowpenalty tex.year \stoptyping \stopcolumns Read|-|only: \startcolumns[n=3] \starttyping tex.deadcycles tex.insertpenalties tex.parshape tex.prevgraf tex.spacefactor \stoptyping \stopcolumns \subsubsection{Dimension parameters} The dimension parameters accept \LUA\ numbers (signifying scaled points) or strings (with included dimension). The result is always a number in scaled points. Read|-|write: \startcolumns[n=3] \starttyping tex.boxmaxdepth tex.delimitershortfall tex.displayindent tex.displaywidth tex.emergencystretch tex.hangindent tex.hfuzz tex.hoffset tex.hsize tex.lineskiplimit tex.mathsurround tex.maxdepth tex.nulldelimiterspace tex.overfullrule tex.pagebottomoffset tex.pageheight tex.pageleftoffset tex.pagerightoffset tex.pagetopoffset tex.pagewidth tex.parindent tex.pdfdestmargin tex.pdfeachlinedepth tex.pdfeachlineheight tex.pdffirstlineheight tex.pdfhorigin tex.pdflastlinedepth tex.pdflinkmargin tex.pdfpageheight tex.pdfpagewidth tex.pdfpxdimen tex.pdfthreadmargin tex.pdfvorigin tex.predisplaysize tex.scriptspace tex.splitmaxdepth tex.vfuzz tex.voffset tex.vsize \stoptyping \stopcolumns Read|-|only: \startcolumns[n=3] \starttyping tex.pagedepth tex.pagefilllstretch tex.pagefillstretch tex.pagefilstretch tex.pagegoal tex.pageshrink tex.pagestretch tex.pagetotal tex.prevdepth \stoptyping \stopcolumns \subsubsection{Direction parameters} The direction parameters are read|-|only and return a \LUA\ string. \startcolumns[n=3] \starttyping tex.bodydir tex.mathdir tex.pagedir tex.pardir tex.textdir \stoptyping \stopcolumns \subsubsection{Glue parameters} The glue parameters accept and return a userdata object that represents a \type{glue_spec} node. \startcolumns[n=3] \starttyping tex.abovedisplayshortskip tex.abovedisplayskip tex.baselineskip tex.belowdisplayshortskip tex.belowdisplayskip tex.leftskip tex.lineskip tex.parfillskip tex.parskip tex.rightskip tex.spaceskip tex.splittopskip tex.tabskip tex.topskip tex.xspaceskip \stoptyping \stopcolumns \subsubsection{Muglue parameters} All muglue parameters are to be used read|-|only and return a \LUA\ string. \startcolumns[n=3] \starttyping tex.medmuskip tex.thickmuskip tex.thinmuskip \stoptyping \stopcolumns \subsubsection{Tokenlist parameters} The tokenlist parameters accept and return \LUA\ strings. \LUA\ strings are converted to and from token lists using \tex{the}\tex{toks} style expansion: all category codes are either space (10) or other (12). It follows that assigning to some of these, like \quote{tex.output}, is actually useless, but it feels bad to make exceptions in view of a coming extension that will accept full-blown token strings. \startcolumns[n=3] \starttyping tex.errhelp tex.everycr tex.everydisplay tex.everyeof tex.everyhbox tex.everyjob tex.everymath tex.everypar tex.everyvbox tex.output tex.pdfpageattr tex.pdfpageresources tex.pdfpagesattr tex.pdfpkmode \stoptyping \stopcolumns \subsection{Convert commands} All \quote{convert} commands are read|-|only and return a \LUA\ string. The supported commands at this moment are: \startcolumns[n=2] \starttyping tex.AlephVersion tex.Alephrevision tex.OmegaVersion tex.Omegarevision tex.eTeXVersion tex.eTeXrevision tex.formatname tex.jobname tex.luatexrevision tex.luatexdatestamp tex.pdfnormaldeviate tex.pdftexbanner tex.pdftexrevision tex.fontname(number) tex.pdffontname(number) tex.pdffontobjnum(number) tex.pdffontsize(number) tex.uniformdeviate(number) tex.number(number) tex.romannumeral(number) tex.pdfpageref(number) tex.pdfxformname(number) tex.fontidentifier(number) \stoptyping \stopcolumns If you are wondering why this list looks haphazard; these are all the cases of the \quote{convert} internal command that do not require an argument, as well as the ones that require only a simple numeric value. The special (lua-only) case of \type{tex.fontidentifier} returns the \type{csname} string that matches a font id number (if there is one). \subsection{Last item commands} All \quote{last item} commands are read|-|only and return a number. The supported commands at this moment are: \startcolumns[n=3] \starttyping tex.lastpenalty tex.lastkern tex.lastskip tex.lastnodetype tex.inputlineno tex.pdftexversion tex.pdflastobj tex.pdflastxform tex.pdflastximage tex.pdflastximagepages tex.pdflastannot tex.pdflastxpos tex.pdflastypos tex.pdfrandomseed tex.pdflastlink tex.luatexversion tex.Alephversion tex.Omegaversion tex.Alephminorversion tex.Omegaminorversion tex.eTeXminorversion tex.eTeXversion tex.currentgrouplevel tex.currentgrouptype tex.currentiflevel tex.currentiftype tex.currentifbranch tex.pdflastximagecolordepth \stoptyping \stopcolumns \subsection{Attribute, count, dimension, skip and token registers} \TEX's attributes (\tex{attribute}), counters (\tex{count}), dimensions (\tex{dimen}), skips (\tex{skip}) and token (\tex{toks}) registers can be accessed and written to using two times five virtual sub|-|tables of the \luatex{tex} table: \startcolumns[n=3] \starttyping tex.attribute tex.count tex.dimen tex.skip tex.toks \stoptyping \stopcolumns It is possible to use the names of relevant \tex{attributedef}, \tex{countdef}, \tex{dimendef}, \tex{skipdef}, or \tex{toksdef} control sequences as indices to these tables: \starttyping tex.count.scratchcounter = 0 enormous = tex.dimen['maxdimen'] \stoptyping In this case, \LUATEX\ looks up the value for you on the fly. You have to use a valid \tex{countdef} (or \tex{attributedef}, or \tex{dimendef}, or \tex{skipdef}, or \tex{toksdef}), anything else will generate an error (the intent is to eventually also allow \type{} and even macros that expand into a number). The attribute and count registers accept and return \LUA\ numbers. The dimension registers accept \LUA\ numbers (in scaled points) or strings (with an included absolute dimension; \type {em} and \type {ex} and \type {px} are forbidden). The result is always a number in scaled points. The token registers accept and return \LUA\ strings. \LUA\ strings are converted to and from token lists using \tex{the}\tex{toks} style expansion: all category codes are either space (10) or other (12). The skip registers accept and return \type{glue_spec} userdata node objects (see the description of the node interface elsewhere in this manual). As an alternative to array addressing, there are also accessor functions defined for all cases, for example, here is the set of possibilities for \type{\skip} registers: \startfunctioncall tex.setskip ( n, s) tex.setskip ( s, s) tex.setskip ('global', n, s) tex.setskip ('global', s, s) s = tex.getskip ( n) s = tex.getskip ( s) \stopfunctioncall In the function-based interface, it is possible to define values globally by using the string \type{'global'} as the first function argument. \subsection{Character code registers (0.63)} \TEX's character code tables (\tex{lccode}, \tex{uccode}, \tex{sfcode}, \tex{catcode}, \tex{mathcode}, \tex{delcode}) can be accessed and written to using six virtual subtables of the \type{tex} table \startcolumns[n=3] \starttyping tex.lccode tex.uccode tex.sfcode tex.catcode tex.mathcode tex.delcode \stoptyping \stopcolumns The function call interfaces are roughly as above, but there are a few twists. \type{sfcode}s are the simple ones: \startfunctioncall tex.setsfcode ( n, s) tex.setsfcode ('global', n, s) s = tex.getsfcode ( n) \stopfunctioncall The function call interface for \type{lccode} and \type{uccode} additionally allows you to set the associated sibling at the same time: \startfunctioncall tex.setlccode (['global'], n, lc) tex.setlccode (['global'], n, lc, uc) lc = tex.getlccode ( n) tex.setuccode (['global'], n, uc) tex.setuccode (['global'], n, uc, lc) uc = tex.getuccode ( n) \stopfunctioncall The function call interface for \type{catcode} also allows you to specify a category table to use on assignment or on query (default in both cases is the current one): \startfunctioncall tex.setcatcode (['global'], n, c) tex.setcatcode (['global'], cattable, n, c) lc = tex.getcatcode ( n) lc = tex.getcatcode ( cattable, n) \stopfunctioncall The interfaces for \type{delcode} and \type{mathcode} use small array tables to set and retrieve values: \startfunctioncall tex.setmathcode (['global'], n,
mval )
mval = tex.getmathcode ( n) tex.setdelcode (['global'], n,
dval )
dval = tex.getdelcode ( n) \stopfunctioncall Where the table for \type{mathcode} is an array of 3 numbers, like this: \starttyping { mathclass, family, character} \stoptyping And the table for \type{delcode} is an array with 4 numbers, like this: \starttyping { small_fam, small_char, large_fam, large_char} \stoptyping Normally, the third and fourth values in a delimiter code assignment will be zero according to \tex{Udelcode} usage, but the returned table can have values there (if the delimiter code was set using \type{\delcode}, for example). Unset \type{delcode}'s can be recognized because \type{dval[1]} is $-1$. \subsection{Box registers} It is possible to set and query actual boxes, using the node interface as defined in the \luatex{node} library: \starttyping tex.box \stoptyping for array access, or \starttyping tex.setbox( n, s) tex.setbox('global', n, s) n = tex.getbox( n) \stoptyping for function|-|based access. In the function-based interface, it is possible to define values globally by using the string \type{'global'} as the first function argument. Be warned that an assignment like \starttyping tex.box[0] = tex.box[2] \stoptyping does not copy the node list, it just duplicates a node pointer. If \tex{box2} will be cleared by \TEX\ commands later on, the contents of \tex{box0} becomes invalid as well. To prevent this from happening, always use \luatex{node.copy_list()} unless you are assigning to a temporary variable: \starttyping tex.box[0] = node.copy_list(tex.box[2]) \stoptyping %{\bf note: In previous versions of \LUATEX\ there were also three %virtual tables called \type{tex.wd}, \type{tex.ht}, and \type{tex.dp} %along with an associated function call interface. These were %removed in version 0.63. You should switch to using \type{tex.box[].width} %etc. instead.} % %If for some reason you want the functionality of these tables back, %you can add \LUA\ code to do that for you, like this: % %\starttyping %local box = tex.box % %local wd = { % __index = function(t,k) local bk = box[k] return bk and bk.width or 0 end, % __newindex = function(t,k,v) local bk = box[k] if bk then bk.width = v end end, %} %local ht = { % __index = function(t,k) local bk = box[k] return bk and bk.height or 0 end, % __newindex = function(t,k,v) local bk = box[k] if bk then bk.height = v end end, %} %local dp = { % __index = function(t,k) local bk = box[k] return bk and bk.depth or 0 end, % __newindex = function(t,k,v) local bk = box[k] if bk then bk.depth = v end end, %} % %tex.wd = { } setmetatable(tex.wd,wd) %tex.ht = { } setmetatable(tex.ht,ht) %tex.dp = { } setmetatable(tex.dp,dp) %\stoptyping \subsection{Math parameters} It is possible to set and query the internal math parameters using: \startfunctioncall tex.setmath( n, t, n) tex.setmath('global', n, t, n) n = tex.getmath( n, t) \stopfunctioncall As before an optional first parameter \type{'global'} indicates a global assignment. The first string is the parameter name minus the leading \quote{Umath}, and the second string is the style name minus the trailing \quote{style}. Just to be complete, the values for the math parameter name are: \starttyping quad axis operatorsize overbarkern overbarrule overbarvgap underbarkern underbarrule underbarvgap radicalkern radicalrule radicalvgap radicaldegreebefore radicaldegreeafter radicaldegreeraise stackvgap stacknumup stackdenomdown fractionrule fractionnumvgap fractionnumup fractiondenomvgap fractiondenomdown fractiondelsize limitabovevgap limitabovebgap limitabovekern limitbelowvgap limitbelowbgap limitbelowkern underdelimitervgap underdelimiterbgap overdelimitervgap overdelimiterbgap subshiftdrop supshiftdrop subshiftdown subsupshiftdown subtopmax supshiftup supbottommin supsubbottommax subsupvgap spaceafterscript connectoroverlapmin ordordspacing ordopspacing ordbinspacing ordrelspacing ordopenspacing ordclosespacing ordpunctspacing ordinnerspacing opordspacing opopspacing opbinspacing oprelspacing opopenspacing opclosespacing oppunctspacing opinnerspacing binordspacing binopspacing binbinspacing binrelspacing binopenspacing binclosespacing binpunctspacing bininnerspacing relordspacing relopspacing relbinspacing relrelspacing relopenspacing relclosespacing relpunctspacing relinnerspacing openordspacing openopspacing openbinspacing openrelspacing openopenspacing openclosespacing openpunctspacing openinnerspacing closeordspacing closeopspacing closebinspacing closerelspacing closeopenspacing closeclosespacing closepunctspacing closeinnerspacing punctordspacing punctopspacing punctbinspacing punctrelspacing punctopenspacing punctclosespacing punctpunctspacing punctinnerspacing innerordspacing inneropspacing innerbinspacing innerrelspacing inneropenspacing innerclosespacing innerpunctspacing innerinnerspacing \stoptyping The values for the style parameter name are: \starttyping display crampeddisplay text crampedtext script crampedscript scriptscript crampedscriptscript \stoptyping \subsection{Special list heads} The virtual table \luatex{tex.lists} contains the set of internal registers that keep track of building page lists. \starttabulate[|lT|p|] \NC \bf field \NC \bf description \NC \NR \NC page_ins_head \NC circular list of pending insertions \NC \NR \NC contrib_head \NC the recent contributions \NC \NR \NC page_head \NC the current page content\NC \NR %\NC temp_head \NC \NC \NR \NC hold_head \NC used for held-over items for next page\NC \NR \NC adjust_head \NC head of the current \tex{vadjust} list \NC \NR \NC pre_adjust_head \NC head of the current \tex{vadjust pre} list\NC \NR % \NC align_head \NC \NC \NR \stoptabulate \subsection{Semantic nest levels (0.51)} The virtual table \luatex{tex.nest} contains the currently active semantic nesting state. It has two main parts: a zero-based array of userdata for the semantic nest itself, and the numerical value \type{tex.nest.ptr}, which gives the highest available index. Neither the array items in \type{tex.nest[]} nor \type{tex.nest.ptr} can be assigned to (as this would confuse the typesetting engine beyond repair), but you can assign to the individual values inside the array items, e.g. \type{tex.nest[tex.nest.ptr].prevdepth}. \type{tex.nest[tex.nest.ptr]} is the current nest state, \type{tex.nest[0]} the outermost (main vertical list) level. The known fields are: \starttabulate[|lT|l|l|p|] \NC \ssbf key \NC \bf type \NC \bf modes \NC \bf explanation \NC\NR \NC mode \NC number \NC all \NC The current mode. This is a number representing the main mode at this level:\crlf 0 == no mode (this happens during \type{\write})\crlf 1 == vertical,\crlf 127 = horizontal,\crlf 253 = display math.\crlf $-1$ == internal vertical,\crlf $-127$ = restricted horizontal,\crlf $-253$ = inline math.\NC\NR \NC modeline \NC number \NC all \NC source input line where this mode was entered in, negative inside the output routine.\NC\NR \NC head \NC node \NC all \NC the head of the current list\NC\NR \NC tail \NC node \NC all \NC the tail of the current list\NC\NR \NC prevgraf \NC number \NC vmode \NC number of lines in the previous paragraph\NC\NR \NC prevdepth \NC number \NC vmode \NC depth of the previous paragraph (equal to \type{\pdfignoreddimen} when it is to be ignored)\NC\NR \NC spacefactor \NC number \NC hmode \NC the current space factor\NC\NR \NC dirs \NC node \NC hmode \NC used for temporary storage by the line break algorithm\NC\NR \NC noad \NC node \NC mmode \NC used for temporary storage of a pending fraction numerator, for \type{\over} etc.\NC\NR \NC delimptr \NC node \NC mmode \NC used for temporary storage of the previous math delimiter, for \type{\middle}.\NC\NR \NC mathdir \NC boolean \NC mmode \NC true when during math processing the \type{\mathdir} is not the same as the surrounding \type{\textdir}\NC\NR \NC mathstyle \NC number \NC mmode \NC the current \type{\mathstyle} \NC\NR \stoptabulate \subsection{Print functions} The \luatex{tex} table also contains the three print functions that are the major interface from \LUA\ scripting to \TEX. The arguments to these three functions are all stored in an in|-|memory virtual file that is fed to the \TEX\ scanner as the result of the expansion of \tex{directlua}. The total amount of returnable text from a \tex{directlua} command is only limited by available system \RAM. However, each separate printed string has to fit completely in \TEX's input buffer. The result of using these functions from inside callbacks is undefined at the moment. \subsubsection{\luatex{tex.print}} \startfunctioncall tex.print( s, ...) tex.print( n, s, ...) tex.print(
t) tex.print( n,
t) \stopfunctioncall Each string argument is treated by \TEX\ as a separate input line. If there is a table argument instead of a list of strings, this has to be a consecutive array of strings to print (the first non-string value will stop the printing process). This syntax was added in 0.36. The optional parameter can be used to print the strings using the catcode regime defined by \tex{catcodetable}~\type{n}. If \type{n} is $-1$, the currently active catcode regime is used. If \type{n} is $-2$, the resulting catcodes are the result of \type{\the\toks}: all category codes are 12 (other) except for the space character, that has category code 10 (space). Otherwise, if \type{n} is not a valid catcode table, then it is ignored, and the currently active catcode regime is used instead. The very last string of the very last \luatex{tex.print()} command in a \tex{directlua} will not have the \tex{endlinechar} appended, all others do. \subsubsection{\luatex{tex.sprint}} \startfunctioncall tex.sprint( s, ...) tex.sprint( n, s, ...) tex.sprint(
t) tex.sprint( n,
t) \stopfunctioncall Each string argument is treated by \TEX\ as a special kind of input line that makes it suitable for use as a partial line input mechanism: \startitemize[packed] \item \TEX\ does not switch to the \quote{new line} state, so that leading spaces are not ignored. \item No \tex{endlinechar} is inserted. \item Trailing spaces are not removed. Note that this does not prevent \TEX\ itself from eating spaces as result of interpreting the line. For example, in \starttyping before\directlua{tex.sprint("\\relax")tex.sprint(" inbetween")}after \stoptyping the space before \type{inbetween} will be gobbled as a result of the \quote{normal} scanning of \tex{relax}. \stopitemize If there is a table argument instead of a list of strings, this has to be a consecutive array of strings to print (the first non-string value will stop the printing process). This syntax was added in 0.36. The optional argument sets the catcode regime, as with \type{tex.print()}. \subsubsection{\luatex{tex.tprint}} \startfunctioncall tex.tprint({ n, s, ...}, {...}) \stopfunctioncall This function is basically a shortcut for repeated calls to \luatex{tex.sprint( n, s, ...)}, once for each of the supplied argument tables. \subsubsection{\luatex{tex.write}} \startfunctioncall tex.write( s, ...) tex.write(
t) \stopfunctioncall Each string argument is treated by \TEX\ as a special kind of input line that makes it suitable for use as a quick way to dump information: \startitemize \item All catcodes on that line are either \quote{space} (for '~') or \quote{character} (for all others). \item There is no \tex{endlinechar} appended. \stopitemize If there is a table argument instead of a list of strings, this has to be a consecutive array of strings to print (the first non-string value will stop the printing process). This syntax was added in 0.36. \subsection{Helper functions} \subsubsection{\luatex{tex.round}} \startfunctioncall n = tex.round( o) \stopfunctioncall Rounds \LUA\ number \type{o}, and returns a number that is in the range of a valid \TEX\ register value. If the number starts out of range, it generates a \quote{number to big} error as well. \subsubsection{\luatex{tex.scale}} \startfunctioncall n = tex.scale( o, delta)
n = tex.scale(table o, delta) \stopfunctioncall Multiplies the \LUA\ numbers \type{o} and \type{delta}, and returns a rounded number that is in the range of a valid \TEX\ register value. In the table version, it creates a copy of the table with all numeric top||level values scaled in that manner. If the multiplied number(s) are of range, it generates \quote{number to big} error(s) as well. Note: the precision of the output of this function will depend on your computer's architecture and operating system, so use with care! An interface to \LUATEX's internal, 100\% portable scale function will be added at a later date. \subsubsection{\luatex{tex.sp} (0.51)} \startfunctioncall n = tex.sp( o) n = tex.sp( s) \stopfunctioncall Converts the number \type{o} or a string \type{s} that represents an explicit dimension into an integer number of scaled points. For parsing the string, the same scanning and conversion rules are used that \LUATEX\ would use if it was scanning a dimension specifier in its \TEX-like input language (this includes generating errors for bad values), expect for the following: \startitemize[n] \item only explicit values are allowed, control sequences are not handled \item infinite dimension units (\type{fil...}) are forbidden \item \type{mu} units do not generate an error (but may not be useful either) \stopitemize \subsubsection{\luatex{tex.definefont}} \startfunctioncall tex.definefont( csname, fontid) tex.definefont( global, csname, fontid) \stopfunctioncall Associates \type{csname} with the internal font number \type{fontid}. The definition is global if (and only if) \type{global} is specified and true (the setting of \type{globaldefs} is not taken into account). \subsubsection{\luatex{tex.error} (0.61)} \startfunctioncall tex.error( s) tex.error( s,
help) \stopfunctioncall This creates an error somewhat like the combination of \tex{errhelp} and \tex{errmessage} would. During this error, deletions are disabled. The array part of the \type{help} table has to contain strings, one for each line of error help. \subsection[luaprimitives]{Functions for dealing with primitives } \subsubsection{\luatex{tex.enableprimitives}} \startfunctioncall tex.enableprimitives( prefix,
primitive names) \stopfunctioncall This function accepts a prefix string and an array of primitive names. For each combination of \quote{prefix} and \quote{name}, the \type{tex.enableprimitives} first verifies that \quote{name} is an actual primitive (it must be returned by one of the \type{tex.extraprimitives()} calls explained below, or part of \TEX82, or \type{\directlua}). If it is not, \type{tex.enableprimitives} does nothing and skips to the next pair. But if it is, then it will construct a csname variable by concatenating the \quote{prefix} and \quote{name}, unless the \quote{prefix} is already the actual prefix of \quote{name}. In the latter case, it will discard the \quote{prefix}, and just use \quote{name}. Then it will check for the existence of the constructed csname. If the csname is currently undefined (note: that is not the same as \type{\relax}), it will globally define the csname to have the meaning: run code belonging to the primitive \quote{name}. If for some reason the csname is already defined, it does nothing and tries the next pair. An example: \starttyping tex.enableprimitives('LuaTeX', {'formatname'}) \stoptyping will define \type{\LuaTeXformatname} with the same intrinsic meaning as the documented primitive \type{\formatname}, provided that the control sequences \type{\LuaTeXformatname} is currently undefined. Second example: \starttyping tex.enableprimitives('Omega',tex.extraprimitives ('omega')) \stoptyping will define a whole series of csnames like \type{\Omegatextdir}, \type{\Omegapardir}, etc., but it will stick with \type{\OmegaVersion} instead of creating the doubly-prefixed \type{\OmegaOmegaVersion}. Starting with version 0.39.0 (and this is why the above two functions are needed), \LUATEX\ in \type{--ini} mode contains only the \TEX82 primitives and \type{\directlua}, no extra primitives {\bf at all}. So, if you want to have all the new functionality available using their default names, as it is now, you will have to add \starttyping \ifx\directlua\undefined \else \directlua {tex.enableprimitives('',tex.extraprimitives ())} \fi \stoptyping near the beginning of your format generation file. Or you can choose different prefixes for different subsets, as you see fit. Calling some form of \type{tex.enableprimitives()} is highly important though, because if you do not, you will end up with a \TEX82-lookalike that can run lua code but not do much else. The defined csnames are (of course) saved in the format and will be available at runtime. \subsubsection{\luatex{tex.extraprimitives}} \startfunctioncall
t = tex.extraprimitives( s, ...) \stopfunctioncall This function returns a list of the primitives that originate from the engine(s) given by the requested string value(s). The possible values and their (current) return values are: \startluacode function out_prim (a) local v = tex.extraprimitives(a) table.sort(v) for _,n in pairs(v) do if n == ' ' then n = '\\normalcontrolspace' end tex.print(n .. '\\hskip 4pt plus 5em') end end \stopluacode \starttabulate[|l|p|] \NC \bf name\NC \bf values \NC \NR \NC tex \NC \ctxlua{out_prim('tex') } \NC \NR \NC core \NC \ctxlua{out_prim('core') } \NC \NR \NC etex \NC \ctxlua{out_prim('etex') } \NC \NR \NC pdftex \NC \ctxlua{out_prim('pdftex') } \NC \NR \NC omega \NC \ctxlua{out_prim('omega') } \NC \NR \NC aleph \NC \ctxlua{out_prim('aleph') } \NC \NR \NC luatex \NC \ctxlua{out_prim('luatex') } \NC \NR \stoptabulate Note that \type{'luatex'} does not contain \type{directlua}, as that is considered to be a core primitive, along with all the \TEX82 primitives, so it is part of the list that is returned from \type{'core'}. Running \type{tex.extraprimitives()} will give you the complete list of primitives that are not defined at \LUATEX\ 0.39.0 \type{-ini} startup. It is exactly equivalent to \type{tex.extraprimitives('etex', 'pdftex', 'omega', 'aleph', 'luatex')} \subsubsection{\luatex{tex.primitives}} \startfunctioncall
t = tex.primitives() \stopfunctioncall This function returns a hash table listing all primitives that \LUATEX\ knows about. The keys in the hash are primitives names, the values are tables representing tokens (see~\in{section }[luatokens]). The third value is always zero. \subsection{Core functionality interfaces} \subsubsection{\luatex{tex.badness} (0.53)} \startfunctioncall b = tex.badness( f, s) \stopfunctioncall This helper function is useful during linebreak calculations. \type{f} and \type{s} are scaled values; the function returns the badness for when total \type{f} is supposed to be made from amounts that sum to \type{s}. The returned number is a reasonable approximation of $100(t/s)^3$; \subsubsection{\luatex{tex.linebreak} (0.53)} \startfunctioncall local nodelist,
info = tex.linebreak( listhead,
parameters) \stopfunctioncall The understood parameters are as follows: \starttabulate[|l|l|p|] \NC \bf name \NC \bf type \NC \bf description \NC \NR \NC pardir \NC string \NC \NC \NR \NC pretolerance \NC number \NC \NC \NR \NC tracingparagraphs \NC number \NC \NC \NR \NC tolerance \NC number \NC \NC \NR \NC looseness \NC number \NC \NC \NR \NC hyphenpenalty \NC number \NC \NC \NR \NC exhyphenpenalty \NC number \NC \NC \NR \NC pdfadjustspacing \NC number \NC \NC \NR \NC adjdemerits \NC number \NC \NC \NR \NC pdfprotrudechars \NC number \NC \NC \NR \NC linepenalty \NC number \NC \NC \NR \NC lastlinefit \NC number \NC \NC \NR \NC doublehyphendemerits \NC number \NC \NC \NR \NC finalhyphendemerits \NC number \NC \NC \NR \NC hangafter \NC number \NC \NC \NR \NC interlinepenalty \NC number or table \NC if a table, then it is an array like \type{\interlinepenalties}\NC \NR \NC clubpenalty \NC number or table \NC if a table, then it is an array like \type{\clubpenalties}\NC \NR \NC widowpenalty \NC number or table \NC if a table, then it is an array like \type{\widowpenalties}\NC \NR \NC brokenpenalty \NC number \NC \NC \NR \NC emergencystretch \NC number \NC in scaled points \NC \NR \NC hangindent \NC number \NC in scaled points \NC \NR \NC hsize \NC number \NC in scaled points \NC \NR \NC leftskip \NC glue_spec node \NC \NC \NR \NC rightskip \NC glue_spec node \NC \NC \NR \NC pdfeachlineheight \NC number \NC in scaled points \NC \NR \NC pdfeachlinedepth \NC number \NC in scaled points \NC \NR \NC pdffirstlineheight \NC number \NC in scaled points \NC \NR \NC pdflastlinedepth \NC number \NC in scaled points \NC \NR \NC pdfignoreddimen \NC number \NC in scaled points \NC \NR \NC parshape \NC table \NC \NC \NR \stoptabulate Note that there is no interface for \type{\displaywidowpenalties}, you have to pass the right choice for \type{widowpenalties} yourself. The meaning of the various keys should be fairly obvious from the table (the names match the \TEX\ and \PDFTEX\ primitives) except for the last 5 entries. The four \type{pdf...line...} keys are ignored if their value equals \type{pdfignoreddimen}. It is your own job to make sure that \type{listhead} is a proper paragraph list: this function does not add any nodes to it. To be exact, if you want to replace the core line breaking, you may have to do the following (when you are not actually working in the \type{pre_linebreak_filter} or \type{linebreak_filter} callbacks, or when the original list starting at listhead was generated in horizontal mode): \startitemize \item add an \quote{indent box} and perhaps a \type{local_par} node at the start (only if you need them) \item replace any found final glue by an infinite penalty (or add such a penalty, if the last node is not a glue) \item add a glue node for the \type{\parfillskip} after that penalty node \item make sure all the \type{prev} pointers are OK \stopitemize The result is a node list, it still needs to be vpacked if you want to assign it to a \tex{vbox}. The returned \type{info} table contains four values that are all numbers: \starttabulate[|l|p|] \NC prevdepth \NC depth of the last line in the broken paragraph \NC \NR \NC prevgraf \NC number of lines in the broken paragraph \NC \NR \NC looseness \NC the actual looseness value in the broken paragraph \NC \NR \NC demerits \NC the total demerits of the chosen solution \NC \NR \stoptabulate Note there are a few things you cannot interface using this function: You cannot influence font expansion other than via \type{pdfadjustspacing}, because the settings for that take place elsewhere. The same is true for hbadness and hfuzz etc. All these are in the \type{hpack()} routine, and that fetches its own variables via globals. \subsubsection{\luatex{tex.shipout} (0.51)} \startfunctioncall tex.shipout( n) \stopfunctioncall Ships out box number \type{n} to the output file, and clears the box register. \section[texconfig]{The \luatex{texconfig} table} This is a table that is created empty. A startup \LUA\ script could fill this table with a number of settings that are read out by the executable after loading and executing the startup file. \starttabulate[|lT|l|l|p|] \NC \ssbf key \NC \bf type \NC \bf default \NC \bf explanation \NC\NR \NC kpse_init \NC boolean \NC true \NC \type{false} totally disables \KPATHSEA\ initialisation, and enables interpretation of the following numeric key--value pairs. (only ever unset this if you implement {\it all\/} file find callbacks!)\NC \NR \NC shell_escape \NC string\NC \type{'f'}\NC Use \type{'y'} or \type{'t'} or \type{'1'} to enable \type{\write18} unconditionally, \type{'p'} to enable the commands that are listed in \type{shell_escape_commands} (new in 0.37)\NC\NR \NC shell_escape_commands \NC string\NC \NC Comma-separated list of command names that may be executed by \type{\write18} even if \type{shell_escape} is set to \type{'p'}. Do {\it not\/} use spaces around commas, separate any required command arguments by using a space, and use the ASCII double quote (\type{"}) for any needed argument or path quoting (new in 0.37)\NC\NR \NC string_vacancies \NC number\NC 75000\NC cf.\ web2c docs \NC \NR \NC pool_free \NC number\NC 5000\NC cf.\ web2c docs \NC \NR \NC max_strings \NC number\NC 15000\NC cf.\ web2c docs \NC \NR \NC strings_free \NC number\NC 100\NC cf.\ web2c docs \NC \NR \NC nest_size \NC number\NC 50\NC cf.\ web2c docs \NC \NR \NC max_in_open \NC number\NC 15\NC cf.\ web2c docs \NC \NR \NC param_size \NC number\NC 60\NC cf.\ web2c docs \NC \NR \NC save_size \NC number\NC 4000\NC cf.\ web2c docs \NC \NR \NC stack_size \NC number\NC 300\NC cf.\ web2c docs \NC \NR \NC dvi_buf_size \NC number\NC 16384\NC cf.\ web2c docs \NC \NR \NC error_line \NC number\NC 79\NC cf.\ web2c docs \NC \NR \NC half_error_line \NC number\NC 50\NC cf.\ web2c docs \NC \NR \NC max_print_line \NC number\NC 79\NC cf.\ web2c docs \NC \NR \NC hash_extra \NC number\NC 0\NC cf.\ web2c docs \NC \NR \NC pk_dpi \NC number\NC 72\NC cf.\ web2c docs \NC \NR \NC trace_file_names \NC boolean \NC true \NC \type{false} disables \TEX's normal file open|-|close feedback (the assumption is that callbacks will take care of that) \NC \NR \NC file_line_error \NC boolean \NC false \NC do \type{file:line} style error messages\NC \NR \NC halt_on_error \NC boolean \NC false \NC abort run on the first encountered error\NC \NR \NC formatname \NC string \NC \NC if no format name was given on the commandline, this key will be tested first instead of simply quitting\NC \NR \NC jobname \NC string \NC \NC if no input file name was given on the commandline, this key will be tested first instead of simply giving up\NC \NR \stoptabulate {\bf Note:} the numeric values that match web2c parameters are only used if \type{kpse_init} is explicitly set to \type{false}. In all other cases, the normal values from \type{texmf.cnf} are used. \section{The \luatex{texio} library} This library takes care of the low|-|level I/O interface. \subsection{Printing functions} \subsubsection{\luatex{texio.write}} \startfunctioncall texio.write( target, s, ...) texio.write( s, ...) \stopfunctioncall Without the \type{target} argument, writes all given strings to the same location(s) \TEX\ writes messages to at this moment. If \tex{batchmode} is in effect, it writes only to the log, otherwise it writes to the log and the terminal. The optional \type{target} can be one of three possibilities: \type{term}, \type{log} or \type {term and log}. Note: If several strings are given, and if the first of these strings is or might be one of the targets above, the \type{target} must be specified explicitly to prevent \LUA\ from interpreting the first string as the target. \subsubsection{\luatex{texio.write_nl}} \startfunctioncall texio.write_nl( target, s, ...) texio.write_nl( s, ...) \stopfunctioncall This function behaves like \luatex{texio.write}, but make sure that the given strings will appear at the beginning of a new line. You can pass a single empty string if you only want to move to the next line. %*********************************************************************** \section[luatokens]{The \luatex{token} library} The \luatex{token} table contains interface functions to \TEX's handling of tokens. These functions are most useful when combined with the \luatex{token_filter} callback, but they could be used standalone as well. A token is represented in \LUA\ as a small table. For the moment, this table consists of three numeric entries: \starttabulate[|l|l|p|] \NC \bf index\NC \bf meaning \NC \bf description \NC \NR \NC 1 \NC command code \NC this is a value between~$0$ and~$130$ (approximately)\NC \NR \NC 2 \NC command modifier \NC this is a value between~$0$ and~$2^{21}$ \NC \NR \NC 3 \NC control sequence id \NC for commands that are not the result of control sequences, like letters and characters, it is zero, otherwise, it is a number pointing into the \quote {equivalence table} \NC \NR \stoptabulate \subsection{\luatex{token.get_next}} \startfunctioncall token t = token.get_next() \stopfunctioncall This fetches the next input token from the current input source, without expansion. \subsection{\luatex{token.is_expandable}} \startfunctioncall b = token.is_expandable( t) \stopfunctioncall This tests if the token \type{t} could be expanded. \subsection{\luatex{token.expand}} \startfunctioncall token.expand( t) \stopfunctioncall If a token is expandable, this will expand one level of it, so that the first token of the expansion will now be the next token to be read by \luatex{token.get_next()}. \subsection{\luatex{token.is_activechar}} \startfunctioncall b = token.is_activechar( t) \stopfunctioncall This is a special test that is sometimes handy. Discovering whether some control sequence is the result of an active character turned out to be very hard otherwise. \subsection{\luatex{token.create}} \startfunctioncall token t = token.create( csname) token t = token.create( charcode) token t = token.create( charcode, catcode) \stopfunctioncall This is the token factory. If you feed it a string, then it is the name of a control sequence (without leading backslash), and it will be looked up in the equivalence table. If you feed it number, then this is assumed to be an input character, and an optional second number gives its category code. This means it is possible to overrule a character's category code, with a few exceptions: the category codes~0 (escape), 9~(ignored), 13~(active), 14~(comment), and 15 (invalid) cannot occur inside a token. The values~0, 9, 14 and~15 are therefore illegal as input to \luatex{token.create()}, and active characters will be resolved immediately. Note: unknown string sequences and never defined active characters will result in a token representing an \quote{undefined control sequence} with a near|-|random name. It is {\em not} possible to define brand new control sequences using \luatex{token.create}! \subsection{\luatex{token.command_name}} \startfunctioncall commandname = token.command_name( t) \stopfunctioncall This returns the name associated with the \quote{command} value of the token in \LUATEX. There is not always a direct connection between these names and primitives. For instance, all \tex{ifxxx} tests are grouped under \type {if_test}, and the \quote{command modifier} defines which test is to be run. \subsection{\luatex{token.command_id}} \startfunctioncall i = token.command_id( commandname) \stopfunctioncall This returns a number that is the inverse operation of the previous command, to be used as the first item in a token table. \subsection{\luatex{token.csname_name}} \startfunctioncall csname = token.csname_name( t) \stopfunctioncall This returns the name associated with the \quote{equivalence table} value of the token in \LUATEX. It returns the string value of the command used to create the current token, or an empty string if there is no associated control sequence. Keep in mind that there are potentially two control sequences that return the same csname string: single character control sequences and active characters have the same \quote{name}. \subsection{\luatex{token.csname_id}} \startfunctioncall i = token.csname_id( csname) \stopfunctioncall This returns a number that is the inverse operation of the previous command, to be used as the third item in a token table. \chapter[math]{Math} The handling of mathematics in \LUATEX\ differs quite a bit from how \TEX82 (and therefore \PDFTEX) handles math. First, \LUATEX\ adds primitives and extends some others so that \UNICODE\ input can be used easily. Second, all of \TEX82's internal special values (for example for operator spacing) have been made accessible and changeable via control sequences. Third, there are extensions that make it easier to use \OPENTYPE\ math fonts. And finally, there are some extensions that have been proposed in the past that are now added to the engine. \section{The current math style} Starting with \LUATEX\ 0.39.0, it is possible to discover the math style that will be used for a formula in an expandable fashion (while the math list is still being read). To make this possible, \LUATEX\ adds the new primitive: \type{\mathstyle}. This is a \quote{convert command} like e.g. \type{\romannumeral}: its value can only be read, not set. \subsection{\tex{mathstyle}} The returned value is between 0 and 7 (in math mode), or $-1$ (all other modes). For easy testing, the eight math style commands have been altered so that the can be used as numeric values, so you can write code like this: \starttyping \ifnum\mathstyle=\textstyle \message{normal text style} \else \ifnum\mathstyle=\crampedtextstyle \message{cramped text style} \fi \fi \stoptyping \subsection{\tex{Ustack}} There are a few math commands in \TEX\ where the style that will be used is not known straight from the start. These commands (\tex{over}, \tex{atop}, \tex{overwithdelims}, \tex{atopwithdelims}) would therefore normally return wrong values for \type{\mathstyle}. To fix this, \LUATEX\ introduces a special prefix command: \type{\Ustack}: \starttyping $\Ustack {a \over b}$ \stoptyping The \type{\Ustack} command will scan the next brace and start a new math group with the correct (numerator) math style. \section{Unicode math characters} Character handling is now extended up to the full \UNICODE\ range. The extension from 8-bit to 16-bit was already present in \ALEPH\ by means of a set of extra primitives starting with the \type{\o} prefix, the extension to full \UNICODE\ (the \type{\U} prefix) is compatible with \XETEX. The math primitives from \TEX\ and \ALEPH\ are kept as they are, except for the ones that convert from input to math commands: \type{mathcode}, \type{omathcode}, \type{delcode}, and \type{odelcode}. These four now allow for a 21-bit character argument on the left hand side of the equals sign. Some of the \ALEPH\ math primitives and the new \LUATEX\ primitives read more than one separate value. This is shown in the tables below by a plus sign in the second column. The input for such primitives would look like this: \starttyping \def\overbrace {\Umathaccent 0 1 "23DE } \stoptyping Altered \TEX82 primitives: \starttabulate[|l|l|l|] \NC \bf primitive \NC \bf value range (in hex) \NC\NR \NC \tex{mathcode} \NC 0--10FFFF = 0--8000 \NC\NR \NC \tex{delcode} \NC 0--10FFFF = 0--FFFFFF \NC\NR \stoptabulate Unaltered: \starttabulate[|l|l|l|] \NC \bf primitive \NC \bf value range (in hex) \NC\NR \NC \tex{mathchardef} \NC 0--8000 \NC\NR \NC \tex{mathchar} \NC 0--7FFF \NC\NR \NC \tex{mathaccent} \NC 0--7FFF \NC\NR \NC \tex{delimiter} \NC 0--7FFFFFF \NC\NR \NC \tex{radical} \NC 0--7FFFFFF \NC\NR \stoptabulate Altered \ALEPH\ primitives: \starttabulate[|l|l|l|] \NC \bf primitive \NC \bf value range (in hex) \NC\NR \NC \tex{omathcode} \NC 0--10FFFF = 0--8000000 \NC\NR \NC \tex{odelcode} \NC 0--10FFFF = 0+0--FFFFFF+FFFFFF \NC\NR \stoptabulate Unaltered: \starttabulate[|l|l|l|] \NC \bf primitive \NC \bf value range (in hex) \NC\NR \NC \tex{omathchardef} \NC 0--8000000 \NC\NR \NC \tex{omathchar} \NC 0--7FFFFFF \NC\NR \NC \tex{omathaccent} \NC 0--7FFFFFF \NC\NR \NC \tex{odelimiter} \NC 0+0--7FFFFFF + FFFFFF \NC\NR \NC \tex{oradical} \NC 0+0--7FFFFFF + FFFFFF \NC\NR \stoptabulate New primitives that are compatible with \XETEX: \starttabulate[|l|l|l|l|] \NC \bf primitive \NC \bf value range (in hex) \NC\NR \NC \tex{Umathchardef} \NC 0+0+0--7+FF+10FFFF$^1$ \NC\NR \NC \tex{Umathcode} \NC 0--10FFFF = 0+0+0--7+FF+10FFFF$^1$ \NC\NR \NC \tex{Udelcode} \NC 0--10FFFF = 0+0--FF+10FFFF$^2$ \NC\NR \NC \tex{Umathchar} \NC 0+0+0--7+FF+10FFFF \NC\NR \NC \tex{Umathaccent} \NC 0+0+0--7+FF+10FFFF$^{2,4}$ \NC\NR \NC \tex{Udelimiter} \NC 0+0+0--7+FF+10FFFF$^2$ \NC\NR \NC \tex{Uradical} \NC 0+0--FF+10FFFF$^2$ \NC\NR \NC \tex{Umathcharnum} \NC -80000000--7FFFFFFF$^3$ \NC\NR \NC \tex{Umathcodenum} \NC 0--10FFFF = -80000000--7FFFFFFF$^3$ \NC\NR \NC \tex{Udelcodenum} \NC 0--10FFFF = -80000000--7FFFFFFF$^3$ \NC\NR \stoptabulate Note 1: \type{\Umathchardef="8"0"0} and \type{\Umathchardef="8"0"0} are also accepted. Note 2: The new primitives that deal with delimiter-style objects do not set up a \quote{large family}. Selecting a suitable size for display purposes is expected to be dealt with by the font via the \tex{Umathoperatorsize} parameter (more information a following section). Note 3: For these three primitives, all information is packed into a single signed integer. For the first two (\tex{Umathcharnum} and \tex{Umathcodenum}), the lowest 21 bits are the character code, the 3 bits above that represent the math class, and the family data is kept in the topmost bits (This means that the values for math families 128--255 are actually negative). For \tex{Udelcodenum} there is no math class; the math family information is stored in the bits directly on top of the character code. Using these three commands is not as natural as using the two- and three-value commands, so unless you know exactly what you are doing and absolutely require the speedup resulting from the faster input scanning, it is better to use the verbose commands instead. Note 4: As of \LUATEX\ 0.65, \tex{Umathaccent} accepts optional keywords to control various details regarding math accents. See \in{section}[mathacc] below for details. New primitives that exist in \LUATEX\ only (all of these will be explained in following sections): \starttabulate[|l|l|l|l|] \NC \bf primitive \NC \bf value range (in hex) \NC\NR %\NC \tex{Umathbotaccent} \NC 0+0+0--7+FF+10FFFF \NC\NR %\NC \tex{Umathaccents} \NC 0+0+0+0+0+0--7+FF+10FFFF+7+FF+10FFFF \NC\NR \NC \tex{Uroot} \NC 0+0--FF+10FFFF$^2$ \NC\NR \NC \tex{Uoverdelimiter} \NC 0+0--FF+10FFFF$^2$ \NC\NR \NC \tex{Uunderdelimiter} \NC 0+0--FF+10FFFF$^2$ \NC\NR \NC \tex{Udelimiterover} \NC 0+0--FF+10FFFF$^2$ \NC\NR \NC \tex{Udelimiterunder} \NC 0+0--FF+10FFFF$^2$ \NC\NR \stoptabulate \section{Cramped math styles} \LUATEX\ has four new primitives to set the cramped math styles directly: \starttyping \crampeddisplaystyle \crampedtextstyle \crampedscriptstyle \crampedscriptscriptstyle \stoptyping These additional commands are not all that valuable on their own, but they come in handy as arguments to the math parameter settings that will be added shortly. \section{Math parameter settings} In \LUATEX, the font dimension parameters that \TEX\ used in math typesetting are now accessible via primitive commands. In fact, refactoring of the math engine has resulted in many more parameters than were accessible before. \starttabulate \NC \bf primitive name \NC \bf description \NC \NR \NC \type{\Umathquad} \NC the width of 18mu's\NC \NR \NC \type{\Umathaxis} \NC height of the vertical center axis of the math formula above the baseline\NC \NR \NC \type{\Umathoperatorsize} \NC minimum size of large operators in display mode \NC \NR \NC \type{\Umathoverbarkern} \NC vertical clearance above the rule \NC \NR \NC \type{\Umathoverbarrule} \NC the width of the rule \NC \NR \NC \type{\Umathoverbarvgap} \NC vertical clearance below the rule \NC \NR \NC \type{\Umathunderbarkern} \NC vertical clearance below the rule \NC \NR \NC \type{\Umathunderbarrule} \NC the width of the rule \NC \NR \NC \type{\Umathunderbarvgap} \NC vertical clearance above the rule \NC \NR \NC \type{\Umathradicalkern} \NC vertical clearance above the rule \NC \NR \NC \type{\Umathradicalrule} \NC the width of the rule \NC \NR \NC \type{\Umathradicalvgap} \NC vertical clearance below the rule \NC \NR \NC \type{\Umathradicaldegreebefore}\NC the forward kern that takes place before placement of the radical degree \NC \NR \NC \type{\Umathradicaldegreeafter} \NC the backward kern that takes place after placement of the radical degree \NC \NR \NC \type{\Umathradicaldegreeraise} \NC this is the percentage of the total height and depth of the radical sign that the degree is raised by. It is expressed in \type{percents}, so 60\% is expressed as the integer $60$.\NC \NR \NC \type{\Umathstackvgap} \NC vertical clearance between the two elements in a \type{\atop} stack \NC \NR \NC \type{\Umathstacknumup} \NC numerator shift upward in \type{\atop} stack \NC \NR \NC \type{\Umathstackdenomdown} \NC denominator shift downward in \type{\atop} stack\NC \NR \NC \type{\Umathfractionrule} \NC the width of the rule in a \type{\over}\NC \NR \NC \type{\Umathfractionnumvgap} \NC vertical clearance between the numerator and the rule\NC \NR \NC \type{\Umathfractionnumup} \NC numerator shift upward in \type{\over} \NC \NR \NC \type{\Umathfractiondenomvgap} \NC vertical clearance between the denominator and the rule\NC \NR \NC \type{\Umathfractiondenomdown} \NC denominator shift downward in \type{\over} \NC \NR \NC \type{\Umathfractiondelsize} \NC minimum delimiter size for \type{\...withdelims}\NC \NR \NC \type{\Umathlimitabovevgap} \NC vertical clearance for limits above operators\NC \NR \NC \type{\Umathlimitabovebgap} \NC vertical baseline clearance for limits above operators\NC \NR \NC \type{\Umathlimitabovekern} \NC space reserved at the top of the limit\NC \NR \NC \type{\Umathlimitbelowvgap} \NC vertical clearance for limits below operators\NC \NR \NC \type{\Umathlimitbelowbgap} \NC vertical baseline clearance for limits below operators\NC \NR \NC \type{\Umathlimitbelowkern} \NC space reserved at the bottom of the limit\NC \NR \NC \type{\Umathoverdelimitervgap} \NC vertical clearance for limits above delimiters\NC \NR \NC \type{\Umathoverdelimiterbgap} \NC vertical baseline clearance for limits above delimiters\NC \NR \NC \type{\Umathunderdelimitervgap} \NC vertical clearance for limits below delimiters\NC \NR \NC \type{\Umathunderdelimiterbgap} \NC vertical baseline clearance for limits below delimiters\NC \NR \NC \type{\Umathsubshiftdrop} \NC subscript drop for boxes and subformulas\NC \NR \NC \type{\Umathsubshiftdown} \NC subscript drop for characters\NC \NR \NC \type{\Umathsupshiftdrop} \NC superscript drop (raise, actually) for boxes and subformulas\NC \NR \NC \type{\Umathsupshiftup} \NC superscript raise for characters\NC \NR \NC \type{\Umathsubsupshiftdown} \NC subscript drop in the presence of a superscript\NC \NR \NC \type{\Umathsubtopmax} \NC the top of standalone subscripts cannot be higher than this above the baseline\NC \NR \NC \type{\Umathsupbottommin} \NC the bottom of standalone superscripts cannot be less than this above the baseline\NC \NR \NC \type{\Umathsupsubbottommax} \NC the bottom of the superscript of a combined super- and subscript be at least as high as this above the baseline\NC \NR \NC \type{\Umathsubsupvgap} \NC vertical clearance between super- and subscript\NC \NR \NC \type{\Umathspaceafterscript} \NC additional space added after a super- or subscript\NC \NR \NC \type{\Umathconnectoroverlapmin}\NC minimum overlap between parts in an extensible recipe\NC \NR \stoptabulate Each of the parameters in this section can be set by a command like this: \starttyping \Umathquad\displaystyle=1em \stoptyping they obey grouping, and you can use \type{\the\Umathquad\displaystyle} if needed. \section{Font-based Math Parameters} While it is nice to have these math parameters available for tweaking, it would be tedious to have to set each of them by hand. For this reason, \LUATEX\ initializes a bunch of these parameters whenever you assign a font identifier to a math family based on either the traditional math font dimensions in the font (for assignments to math family~2 and~3 using \TFM|-|based fonts like \type{cmsy} and \type{cmex}), or based on the named values in a potential \type{MathConstants} table when the font is loaded via Lua. If there is a \type{MathConstants} table, this takes precedence over font dimensions, and in that case no attention is paid to which family is being assigned to: the \type{MathConstants} tables in the last assigned family sets all parameters. In the table below, the one-letter style abbreviations and symbolic tfm font dimension names match those using in the \TeX book. Assignments to \tex{textfont} set the values for the cramped and uncramped display and text styles. Use \tex{scriptfont} for the script styles, and \tex{scriptscriptfont} for the scriptscript styles (totalling eight parameters for three font sizes). In the \TFM\ case, assignments only happen in family~2 and family~3 (and of course only for the parameters for which there are font dimensions). Besides the parameters below, \LUATEX\ also looks at the \quote{space} font dimension parameter. For math fonts, this should be set to zero. \start \switchtobodyfont[8pt] \starttabulate[|l|l|l|p|] \NC \bf variable \NC \bf style \NC \bf default value opentype \NC \bf default value tfm \NC\NR \NC \tex{Umathaxis} \NC -- \NC AxisHeight \NC axis_height \NC\NR \NC \tex{Umathoperatorsize} \NC D, D' \NC DisplayOperatorMinHeight \NC $^6$ \NC\NR \NC \tex{Umathfractiondelsize} \NC D, D' \NC FractionDelimiterDisplayStyleSize$^9$ \NC delim1 \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC FractionDelimiterSize$^9$ \NC delim2 \NC\NR \NC \tex{Umathfractiondenomdown}\NC D, D' \NC FractionDenominatorDisplayStyleShiftDown \NC denom1 \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC FractionDenominatorShiftDown \NC denom2 \NC\NR \NC \tex{Umathfractiondenomvgap}\NC D, D' \NC FractionDenominatorDisplayStyleGapMin \NC 3*default_rule_thickness \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC FractionDenominatorGapMin \NC default_rule_thickness \NC\NR \NC \tex{Umathfractionnumup} \NC D, D' \NC FractionNumeratorDisplayStyleShiftUp \NC num1 \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC FractionNumeratorShiftUp \NC num2 \NC\NR \NC \tex{Umathfractionnumvgap} \NC D, D' \NC FractionNumeratorDisplayStyleGapMin \NC 3*default_rule_thickness \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC FractionNumeratorGapMin \NC default_rule_thickness \NC\NR \NC \tex{Umathfractionrule} \NC -- \NC FractionRuleThickness \NC default_rule_thickness \NC\NR \NC \tex{Umathlimitabovebgap} \NC -- \NC UpperLimitBaselineRiseMin \NC big_op_spacing3 \NC\NR \NC \tex{Umathlimitabovekern} \NC -- \NC 0$^1$ \NC big_op_spacing5 \NC\NR \NC \tex{Umathlimitabovevgap} \NC -- \NC UpperLimitGapMin \NC big_op_spacing1 \NC\NR \NC \tex{Umathlimitbelowbgap} \NC -- \NC LowerLimitBaselineDropMin \NC big_op_spacing4 \NC\NR \NC \tex{Umathlimitbelowkern} \NC -- \NC 0$^1$ \NC big_op_spacing5 \NC\NR \NC \tex{Umathlimitbelowvgap} \NC -- \NC LowerLimitGapMin \NC big_op_spacing2 \NC\NR \NC \tex{Umathoverdelimitervgap}\NC -- \NC StretchStackGapBelowMin \NC big_op_spacing1 \NC\NR \NC \tex{Umathoverdelimiterbgap}\NC -- \NC StretchStackTopShiftUp \NC big_op_spacing3 \NC\NR \NC \tex{Umathunderdelimitervgap}\NC-- \NC StretchStackGapAboveMin \NC big_op_spacing2 \NC\NR \NC \tex{Umathunderdelimiterbgap}\NC-- \NC StretchStackBottomShiftDown \NC big_op_spacing4 \NC\NR \NC \tex{Umathoverbarkern} \NC -- \NC OverbarExtraAscender \NC default_rule_thickness \NC\NR \NC \tex{Umathoverbarrule} \NC -- \NC OverbarRuleThickness \NC default_rule_thickness \NC\NR \NC \tex{Umathoverbarvgap} \NC -- \NC OverbarVerticalGap \NC 3*default_rule_thickness \NC\NR \NC \tex{Umathquad} \NC -- \NC $^1$ \NC math_quad \NC\NR \NC \tex{Umathradicalkern} \NC -- \NC RadicalExtraAscender \NC default_rule_thickness \NC\NR \NC \tex{Umathradicalrule} \NC -- \NC RadicalRuleThickness \NC $^2$ \NC\NR \NC \tex{Umathradicalvgap} \NC D, D' \NC RadicalDisplayStyleVerticalGap \NC (default_rule_thickness+\crlf (abs(math_x_height)/4))$^3$ \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC RadicalVerticalGap \NC (default_rule_thickness+\crlf (abs(default_rule_thickness)/4))$^3$ \NC\NR \NC \tex{Umathradicaldegreebefore}\NC -- \NC RadicalKernBeforeDegree \NC $^2$ \NC\NR \NC \tex{Umathradicaldegreeafter}\NC -- \NC RadicalKernAfterDegree \NC $^2$ \NC\NR \NC \tex{Umathradicaldegreeraise}\NC -- \NC RadicalDegreeBottomRaisePercent \NC $^{2,7}$ \NC\NR \NC \tex{Umathspaceafterscript} \NC -- \NC SpaceAfterScript \NC script_space$^4$ \NC\NR \NC \tex{Umathstackdenomdown} \NC D, D' \NC StackBottomDisplayStyleShiftDown \NC denom1 \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC StackBottomShiftDown \NC denom2 \NC\NR \NC \tex{Umathstacknumup} \NC D, D' \NC StackTopDisplayStyleShiftUp \NC num1 \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC StackTopShiftUp \NC num3 \NC\NR \NC \tex{Umathstackvgap} \NC D, D' \NC StackDisplayStyleGapMin \NC 7*default_rule_thickness \NC\NR \NC " \NC T, T', S, S', SS, SS' \NC StackGapMin \NC 3*default_rule_thickness \NC\NR \NC \tex{Umathsubshiftdown} \NC -- \NC SubscriptShiftDown \NC sub1 \NC\NR \NC \tex{Umathsubshiftdrop} \NC -- \NC SubscriptBaselineDropMin \NC sub_drop \NC\NR \NC \tex{Umathsubsupshiftdown} \NC -- \NC SubscriptShiftDownWithSuperscript$^8$ \NC \NC\NR \NC \NC \NC \quad\ or SubscriptShiftDown \NC sub2 \NC\NR \NC \tex{Umathsubtopmax} \NC -- \NC SubscriptTopMax \NC (abs(math_x_height * 4) / 5) \NC\NR \NC \tex{Umathsubsupvgap} \NC -- \NC SubSuperscriptGapMin \NC 4*default_rule_thickness \NC\NR \NC \tex{Umathsupbottommin} \NC -- \NC SuperscriptBottomMin \NC (abs(math_x_height) / 4) \NC\NR \NC \tex{Umathsupshiftdrop} \NC -- \NC SuperscriptBaselineDropMax \NC sup_drop \NC\NR \NC \tex{Umathsupshiftup} \NC D \NC SuperscriptShiftUp \NC sup1 \NC\NR \NC " \NC T, S, SS, \NC SuperscriptShiftUp \NC sup2 \NC\NR \NC " \NC D', T', S', SS' \NC SuperscriptShiftUpCramped \NC sup3 \NC\NR \NC \tex{Umathsupsubbottommax} \NC -- \NC SuperscriptBottomMaxWithSubscript \NC (abs(math_x_height * 4) / 5) \NC\NR \NC \tex{Umathunderbarkern} \NC -- \NC UnderbarExtraDescender \NC default_rule_thickness \NC\NR \NC \tex{Umathunderbarrule} \NC -- \NC UnderbarRuleThickness \NC default_rule_thickness \NC\NR \NC \tex{Umathunderbarvgap} \NC -- \NC UnderbarVerticalGap \NC 3*default_rule_thickness \NC\NR \NC \tex{Umathconnectoroverlapmin}\NC -- \NC MinConnectorOverlap \NC 0$^5$ \NC\NR \stoptabulate \stop Note 1: \OPENTYPE\ fonts set \tex{Umathlimitabovekern} and \tex{Umathlimitbelowkern} to zero and set \tex{Umathquad} to the font size of the used font, because these are not supported in the MATH table, Note 2: \TFM\ fonts do not set \tex{Umathradicalrule} because \TeX82\ uses the height of the radical instead. When this parameter is indeed not set when \LUATEX\ has to typeset a radical, a backward compatibility mode will kick in that assumes that an oldstyle \TeX\ font is used. Also, they do not set \tex{Umathradicaldegreebefore}, \tex{Umathradicaldegreeafter}, and \tex{Umathradicaldegreeraise}. These are then automatically initialized to $5/18$quad, $-10/18$quad, and 60. Note 3: If tfm fonts are used, then the \tex{Umathradicalvgap} is not set until the first time \LUATEX\ has to typeset a formula because this needs parameters from both family2 and family3. This provides a partial backward compatibility with \TEX82, but that compatibility is only partial: once the \tex{Umathradicalvgap} is set, it will not be recalculated any more. Note 4: (also if tfm fonts are used) A similar situation arises wrt. \tex{Umathspaceafterscript}: it is not set until the first time \LUATEX\ has to typeset a formula. This provides some backward compatibility with \TEX82. But once the \tex{Umathspaceafterscript} is set, \tex{scriptspace} will never be looked at again. Note 5: Tfm fonts set \tex{Umathconnectoroverlapmin} to zero because \TeX82\ always stacks extensibles without any overlap. Note 6: The \tex{Umathoperatorsize} is only used in \type{\displaystyle}, and is only set in \OPENTYPE\ fonts. In \TFM\ font mode, it is artificially set to one scaled point more than the initial attempt's size, so that always the \quote{first next} will be tried, just like in \TEX82. Note 7: The \tex{Umathradicaldegreeraise} is a special case because it is the only parameter that is expressed in a percentage instead of as a number of scaled points. Note 8: \type{SubscriptShiftDownWithSuperscript} does not actually exist in the \quote{standard} Opentype Math font Cambria, but it is useful enough to be added. New in version 0.38. Note 9: \type{FractionDelimiterDisplayStyleSize} and \type{FractionDelimiterSize} do not actually exist in the \quote{standard} Opentype Math font Cambria, but were useful enough to be added. New in version 0.47. \section{Math spacing setting} Besides the parameters mentioned in the previous sections, there are also 64 new primitives to control the math spacing table (as explained in Chapter~18 of the \TeX book). The primitive names are a simple matter of combining two math atom types, but for completeness' sake, here is the whole list: \startcolumns[n=2] \starttyping \Umathordordspacing \Umathordopspacing \Umathordbinspacing \Umathordrelspacing \Umathordopenspacing \Umathordclosespacing \Umathordpunctspacing \Umathordinnerspacing \Umathopordspacing \Umathopopspacing \Umathopbinspacing \Umathoprelspacing \Umathopopenspacing \Umathopclosespacing \Umathoppunctspacing \Umathopinnerspacing \Umathbinordspacing \Umathbinopspacing \Umathbinbinspacing \Umathbinrelspacing \Umathbinopenspacing \Umathbinclosespacing \Umathbinpunctspacing \Umathbininnerspacing \Umathrelordspacing \Umathrelopspacing \Umathrelbinspacing \Umathrelrelspacing \Umathrelopenspacing \Umathrelclosespacing \Umathrelpunctspacing \Umathrelinnerspacing \Umathopenordspacing \Umathopenopspacing \Umathopenbinspacing \Umathopenrelspacing \Umathopenopenspacing \Umathopenclosespacing \Umathopenpunctspacing \Umathopeninnerspacing \Umathcloseordspacing \Umathcloseopspacing \Umathclosebinspacing \Umathcloserelspacing \Umathcloseopenspacing \Umathcloseclosespacing \Umathclosepunctspacing \Umathcloseinnerspacing \Umathpunctordspacing \Umathpunctopspacing \Umathpunctbinspacing \Umathpunctrelspacing \Umathpunctopenspacing \Umathpunctclosespacing \Umathpunctpunctspacing \Umathpunctinnerspacing \Umathinnerordspacing \Umathinneropspacing \Umathinnerbinspacing \Umathinnerrelspacing \Umathinneropenspacing \Umathinnerclosespacing \Umathinnerpunctspacing \Umathinnerinnerspacing \stoptyping \stopcolumns These parameters are of type \type{\muskip}, so setting a parameter can be done like this: \starttyping \Umathopordspacing\displaystyle=4mu plus 2mu \stoptyping They are all initialized by initex to the values mentioned in the table in Chapter~18 of the \TeX book. Note 1: for ease of use as well as for backward compatibility, \type{\thinmuskip}, \type{\medmuskip} and \type{\thickmuskip} are treated especially. In their case a pointer to the corresponding internal parameter is saved, not the actual \type{\muskip} value. This means that any later changes to one of these three parameters will be taken into account. Note 2: Careful readers will realise that there are also primitives for the items marked \type{*} in the \TeX book. These will not actually be used as those combinations of atoms cannot actually happen, but it seemed better not to break orthogonality. They are initialized to zero. \section[mathacc]{Math accent handling} \LUATEX\ supports both top accents and bottom accents in math mode, and math accents stretch automatically (if this is supported by the font the accent comes from, of course). Bottom and combined accents as well as fixed-width math accents are controlled by optional keywords following \tex{Umathaccent}. The keyword \type{bottom} after \tex{Umathaccent} signals that a bottom accent is needed, and the keyword \type{both} signals that both a top and a bottom accent are needed (in this case two accents need to be specified, of course). Then the set of three integers defining the accent is read. This set of integers can be prefixed by the \type{fixed} keyword to indicate that a non-stretching variant is requested (in case of both accents, this step is repeated). A simple example: \starttyping \Umathaccent both fixed 0 0 "20D7 fixed 0 0 "20D7 {example} \stoptyping The primitives \tex{Umathbotaccent} and \tex{Umathaccents} are deprecated since version 0.65, and will be removed eventually. If a math top accent has to be placed and the accentee is a character and has a non-zero \type{top_accent} value, then this value will be used to place the accent instead of the \type{\skewchar} kern used by \TEX82. The \type{top_accent} value represents a vertical line somewhere in the accentee. The accent will be shifted horizontally such that its own \type{top_accent} line coincides with the one from the accentee. If the \type{top_accent} value of the accent is zero, then half the width of the accent followed by its italic correction is used instead. The vertical placement of a top accent depends on the \type{x_height} of the font of the accentee (as explained in the \TEX book), but if value that turns out to be zero and the font had a MathConstants table, then \type{AccentBaseHeight} is used instead. If a math bottom accent has to be placed, the \type{bot_accent} value is checked instead of \type{top_accent}. Because bottom accents do not exist in \TEX82, the \type{\skewchar} kern is ignored. The vertical placement of a bottom accent is straight below the accentee, no correction takes place. \section{Math root extension} The new primitive \type{\Uroot} allows the construction of a radical noad including a degree field. Its syntax is an extension of \type{\Uradical}: \starttyping \Uradical \Uroot \stoptyping The placement of the degree is controlled by the math parameters \type{\Umathradicaldegreebefore}, \type{\Umathradicaldegreeafter}, and \type{\Umathradicaldegreeraise}. The degree will be typeset in \type{\scriptscriptstyle}. \section{Math kerning in super- and subscripts} The character fields in a lua-loaded OpenType math font can have a \quote{mathkern} table. The format of this table is the same as the \quote{mathkern} table that is returned by the \type{fontloader} library, except that all height and kern values have to be specified in actual scaled points. When a super- or subscript has to be placed next to a math item, \LUATEX\ checks whether the super- or subscript and the nucleus are both simple character items. If they are, and if the fonts of both character imtes are OpenType fonts (as opposed to legacy \TEX\ fonts), then \LUATEX\ will use the OpenType MATH algorithm for deciding on the horizontal placement of the super- or subscript. This works as follows: \startitemize \item The vertical position of the script is calculated. \item The default horizontal position is flat next to the base character. \item For superscripts, the italic correction of the base character is added. \item For a superscript, two vertical values are calculated: the bottom of the script (after shifting up), and the top of the base. For a subscript, the two values are the top of the (shifted down) script, and the bottom of the base. \item For each of these two locations: \startitemize \item find the mathkern value at this height for the base (for a subscript placement, this is the bottom_right corner, for a superscript placement the top_right corner) \item find the mathkern value at this height for the script (for a subscript placement, this is the top_left corner, for a superscript placement the bottom_left corner) \item add the found values together to get a preliminary result. \stopitemize \item The horizontal kern to be applied is the smallest of the two results from previous step. \stopitemize The mathkern value at a specific height is the kern value that is specified by the next higher height and kern pair, or the highest one in the character (if there is no value high enough in the character), or simply zero (if the character has no mathkern pairs at all). \section{Scripts on horizontally extensible items like arrows} The new primitives \tex{Uunderdelimiter} and \tex{Uoverdelimiter} (both from 0.35) allow the placement of a subscript or superscript on an automatically extensible item and \tex{Udelimiterunder} and \tex{Udelimiterover} (both from 0.37) allow the placement of an automatically extensible item as a subscript or superscript on a nucleus. The vertical placements are controlled by \tex{Umathunderdelimiterbgap}, \tex{Umathunderdelimitervgap}, \tex{Umathoverdelimiterbgap}, and \tex{Umathoverdelimitervgap} in a similar way as limit placements on large operators. The superscript in \tex{Uoverdelimiter} is typeset in a suitable scripted style, the subscript in \tex{Uunderdelimiter} is cramped as well. \section {Extensible delimiters} \LUATEX\ internally uses a structure that supports \OPENTYPE\ \quote{MathVariants} as well as \TFM\ \quote{extensible recipes}. \section{Other Math changes} \subsection {Verbose versions of single-character math commands} \LUATEX\ defines six new primitives that have the same function as \type{^}, \type{_}, \type{$}, and \type{$$}. %$ \starttabulate[|l|l|l|l|] \NC \bf primitive \NC \bf explanation \NC\NR \NC \tex{Usuperscript} \NC Duplicates the functionality of \type{^} \NC\NR \NC \tex{Usubscript} \NC Duplicates the functionality of \type{_} \NC\NR \NC \tex{Ustartmath} \NC Duplicates the functionality of \type{$}, % $ when used in non-math mode. \NC\NR \NC \tex{Ustopmath} \NC Duplicates the functionality of \type{$}, % $ when used in inline math mode. \NC\NR \NC \tex{Ustartdisplaymath}\NC Duplicates the functionality of \type{$$}, % $$ when used in non-math mode. \NC\NR \NC \tex{Ustopdisplaymath} \NC Duplicates the functionality of \type{$$}, % $$ when used in display math mode. \NC\NR \stoptabulate All are new in version 0.38. The \tex{Ustopmath} and \tex{Ustopdisplaymath} primitives check if the current math mode is the correct one (inline vs. displayed), but you can freely intermix the four mathon|/|mathoff commands with explicit dollar sign(s). \subsection{Allowed math commands in non-math modes} The commands \type{\mathchar}, \type{\omathchar}, and \type{\Umathchar} and control sequences that are the result of \type{\mathchardef}, \type{\omathchardef}, or \type{\Umathchardef} are also acceptable in the horizontal and vertical modes. In those cases, the \type{\textfont} from the requested math family is used. \section{Math todo} The following items are still todo. \startitemize \item Pre-scripts. \item Multi-story stacks. \item Flattened accents for high characters (?). \item Better control over the spacing around displays and handling of equation numbers. \item Support for multi-line displays using \MATHML\ style alignment points. \stopitemize \chapter[languages]{Languages and characters, fonts and glyphs} \LUATEX's internal handling of the characters and glyphs that eventually become typeset is quite different from the way \TEX82 handles those same objects. The easiest way to explain the difference is to focus on unrestricted horizontal mode (i.\,e.\ paragraphs) and hyphenation first. Later on, it will be easy to deal with the differences that occur in horizontal and math modes. In \TEX82, the characters you type are converted into \type{char_node} records when they are encountered by the main control loop. \TEX\ attaches and processes the font information while creating those records, so that the resulting \quote{horizontal list} contains the final forms of ligatures and implicit kerning. This packaging is needed because we may want to get the effective width of for instance a horizontal box. When it becomes necessary to hyphenate words in a paragraph, \TEX\ converts (one word at time) the \type{char_node} records into a string array by replacing ligatures with their components and ignoring the kerning. Then it runs the hyphenation algorithm on this string, and converts the hyphenated result back into a \quote{horizontal list} that is consecutively spliced back into the paragraph stream. Keep in mind that the paragraph may contain unboxed horizontal material, which then already contains ligatures and kerns and the words therein are part of the hyphenation process. The \type{char_node} records are somewhat misnamed, as they are glyph positions in specific fonts, and therefore not really \quote{characters} in the linguistic sense. There is no language information inside the \type{char_node} records. Instead, language information is passed along using \type{language whatsit} records inside the horizontal list. In \LUATEX, the situation is quite different. The characters you type are always converted into \type{glyph_node} records with a special subtype to identify them as being intended as linguistic characters. \LUATEX\ stores the needed language information in those records, but does not do any font|-|related processing at the time of node creation. It only stores the index of the font. When it becomes necessary to typeset a paragraph, \LUATEX\ first inserts all hyphenation points right into the whole node list. Next, it processes all the font information in the whole list (creating ligatures and adjusting kerning), and finally it adjusts all the subtype identifiers so that the records are \quote{glyph nodes} from now on. That was the broad overview. The rest of this chapter will deal with the minutiae of the new process. \section[charsandglyphs]{Characters and glyphs} \TEX82 (including \PDFTEX) differentiated between \type{char_node}s and \type{lig_node}s. The former are simple items that contained nothing but a \quote{character} and a \quote{font} field, and they lived in the same memory as tokens did. The latter also contained a list of components, and a subtype indicating whether this ligature was the result of a word boundary, and it was stored in the same place as other nodes like boxes and kerns and glues. In \LUATEX, these two types are merged into one, somewhat larger structure called a \type{glyph_node}. Besides having the old character, font, and component fields, and the new special fields like \quote{attr} (see~\in{section}[glyphnodes]), these nodes also contain: \startitemize \item A subtype, split into four main types: \startitemize \item \type{character}, for characters to be hyphenated: the lowest bit (bit 0) is set to 1. \item \type{glyph}, for specific font glyphs: the lowest bit (bit 0) is not set. \item \type{ligature}, for ligatures (bit 1 is set) \item \type{ghost}, for \quote{ghost objects} (bit 2 is set) \stopitemize The latter two make further use of two extra fields (bits 3 and 4): \startitemize \item \type{left}, for ligatures created from a left word boundary and for ghosts created from \tex{leftghost} \item \type{right}, for ligatures created from a right word boundary and for ghosts created from \tex{rightghost} \stopitemize For ligatures, both bits can be set at the same time (in case of a single|-|glyph word). \item \type{glyph_node}s of type \quote{character} also contain language data, split into four items that were current when the node was created: the \tex{setlanguage} (15 bits), \tex{lefthyphenmin} (8 bits), \tex{righthyphenmin} (8 bits), and \tex{uchyph} (1 bit). \stopitemize Incidentally, \LUATEX\ allows 32768 separate languages, and words can be 256 characters long. Because the \tex{uchyph} value is saved in the actual nodes, its handling is subtly different from \TEX82: changes to \tex{uchyph} become effective immediately, not at the end of the current partial paragraph. Typeset boxes now always have their language information embedded in the nodes themselves, so there is no longer a possible dependency on the surrounding language settings. In \TEX82, a mid-paragraph statement like \tex{unhbox0} would process the box using the current paragraph language unless there was a \tex{setlanguage} issued inside the box. In \LUATEX, all language variables are already frozen. \section{The main control loop} In \LUATEX's main loop, almost all input characters that are to be typeset are converted into \type{glyph_node} records with subtype \quote{character}, but there are a few small exceptions. First, the \tex{accent} primitives creates nodes with subtype \quote{glyph} instead of \quote{character}: one for the actual accent and one for the accentee. The primary reason for this is that \tex{accent} in \TEX82 is explicitly dependent on the current font encoding, so it would not make much sense to attach a new meaning to the primitive's name, as that would invalidate many old documents and macro packages. A secondary reason is that in \TEX82, \tex{accent} prohibits hyphenation of the current word. Since in \LUATEX\ hyphenation only takes place on \quote{character} nodes, it is possible to achieve the same effect. This change of meaning did happen with \tex{char}, that now generates \quote{character} nodes, consistent with its changed meaning in \XETEX. The changed status of \tex{char} is not yet finalized, but if it stays as it is now, a new primitive \tex{glyph} should be added to directly insert a font glyph id. Second, all the results of processing in math mode eventually become nodes with \quote{glyph} subtypes. Third, the \ALEPH-derived commands \tex{leftghost} and \tex{rightghost} create nodes of a third subtype: \quote{ghost}. These nodes are ignored completely by all further processing until the stage where inter-glyph kerning is added. Fourth, automatic discretionaries are handled differently. \TEX82 inserts an empty discretionary after sensing an input character that matches the \tex{hyphenchar} in the current font. This test is wrong, in our opinion: whether or not hyphenation takes place should not depend on the current font, it is a language property. In \LUATEX, it works like this: if \LUATEX\ senses a string of input characters that matches the value of the new integer parameter \tex{exhyphenchar}, it will insert an explicit discretionary after that series of nodes. Initex sets the \tex{exhyphenchar=`\-}. Incidentally, this is a global parameter instead of a language-specific one because it may be useful to change the value depending on the document structure instead of the text language. Note: as of \LUATEX\ 0.63.0, the insertion of discretionaries after a sequence of explicit hyphens happens at the same time as the other hyphenation processing, {\it not\/} inside the main control loop. The only use \LUATEX\ has for \tex{hyphenchar} is at the check whether a word should be considered for hyphenation at all. If the \tex{hyphenchar} of the font attached to the first character node in a word is negative, then hyphenation of that word is abandoned immediately. {\bf This behavior is added for backward compatibility only, and the use of \type{\hyphenchar=-1} as a means of preventing hyphenation should not be used in new \LUATEX\ documents.} Fifth, \tex{setlanguage} no longer creates whatsits. The meaning of \tex{setlanguage} is changed so that it is now an integer parameter like all others. That integer parameter is used in \tex{glyph_node} creation to add language information to the glyph nodes. In conjunction, the \tex{language} primitive is extended so that it always also updates the value of \tex{setlanguage}. Sixth, the \tex{noboundary} command (this command prohibits word boundary processing where that would normally take place) now does create whatsits. These whatsits are needed because the exact place of the \tex{noboundary} command in the input stream has to be retained until after the ligature and font processing stages. Finally, there is no longer a \type{main_loop} label in the code. Remember that \TEX82 did quite a lot of processing while adding \type{char_nodes} to the horizontal list? For speed reasons, it handled that processing code outside of the \quote{main control} loop, and only the first character of any \quote{word} was handled by that \quote{main control} loop. In \LUATEX, there is no longer a need for that (all hard work is done later), and the (now very small) bits of character-handling code have been moved back inline. When \tex{tracingcommands} is on, this is visible because the full word is reported, instead of just the initial character. \section[patternsexceptions]{Loading patterns and exceptions} The hyphenation algorithm in \LUATEX\ is quite different from the one in \TEX82, although it uses essentially the same user input. After expansion, the argument for \tex{patterns} has to be proper UTF-8 with individual patterns separated by spaces, no \tex{char} or \tex{chardef-ed} commands are allowed. (The current implementation is even more strict, and will reject all non|-|\UNICODE\ characters, but that will be changed in the future. For now, the generated errors are a valuable tool in discovering font-encoding specific pattern files) Likewise, the expanded argument for \tex{hyphenation} also has to be proper UTF-8, but here a tiny little bit of extra syntax is provided: \startitemize[n] \item three sets of arguments in curly braces (\type{{}{}{}}) indicates a desired complex discretionary, with arguments as in \tex{discretionary}'s command in normal document input. \item \type{-} indicates a desired simple discretionary, cf. \tex{-} and \type{\discretionary{-}{}{}} in normal document input. \item Internal command names are ignored. This rule is provided especially for \tex{discretionary}, but it also helps to deal with \tex{relax} commands that may sneak in. \item \type{=} indicates a (non-discretionary) hyphen in the document input. \stopitemize The expanded argument is first converted back to a space-separated string while dropping the internal command names. This string is then converted into a dictionary by a routine that creates key||value pairs by converting the other listed items. It is important to note that the keys in an exception dictionary can always be generated from the values. Here are a few examples: \starttabulate[|l|l|l|] \NC \ssbf value \NC \ssbf implied key (input) \NC \ssbf effect\NC\NR \NC \type{ta-ble} \NC table \NC \type{ta\-ble} ($=$ \type{ta\discretionary{-}{}{}ble})\NC\NR \NC \type{ba{k-}{}{c}ken}\NC backen \NC \type{ba\discretionary{k-}{}{c}ken}\NC\NR \stoptabulate The resultant patterns and exception dictionary will be stored under the language code that is the present value of \tex{language}. In the last line of the table, you see there is no \tex{discretionary} command in the value: the command is optional in the \TEX-based input syntax. The underlying reason for that is that it is conceivable that a whole dictionary of words is stored as a plain text file and loaded into \LUATEX\ using one of the functions in the \LUA\ \luatex{lang} library. This loading method is quite a bit faster than going through the \TEX\ language primitives, but some (most?) of that speed gain would be lost if it had to interpret command sequences while doing so. Starting with \LUATEX\ 0.63.0, it is possible to specify extra hyphenation points in compound words by using \type{{-}{}{-}} for the explicit hyphen character (replace \type{-} by the actual explicit hyphen character if needed). For example, this matches the word \quote{multi-word-boundaries} and allows an extra break inbetweem \quote{boun} and \quote{daries}: \starttyping \hyphenation{multi{-}{}{-}word{-}{}{-}boun-daries} \stoptyping The motivation behind the \ETEX\ extension \tex{savinghyphcodes} was that hyphenation heavily depended on font encodings. This is no longer true in \LUATEX, and the corresponding primitive is ignored pending complete removal. The future semantics of \tex{uppercase} and \tex{lowercase} are still under consideration, no changes have taken place yet. \section{Applying hyphenation} The internal structures \LUATEX\ uses for the insertion of discretionaries in words is very different from the ones in \TEX82, and that means there are some noticeable differences in handling as well. First and foremost, there is no \quote{compressed trie} involved in hyphenation. The algorithm still reads \PATGEN-generated pattern files, but \LUATEX\ uses a finite state hash to match the patterns against the word to be hyphenated. This algorithm is based on the \quote{libhnj} library used by OpenOffice, which in turn is inspired by \TEX. The memory allocation for this new implementation is completely dynamic, so the \WEBC\ setting for \type{trie_size} is ignored. Differences between \LUATEX\ and \TEX82 that are a direct result of that: \startitemize \item \LUATEX\ happily hyphenates the full \UNICODE\ character range. \item Pattern and exception dictionary size is limited by the available memory only, all allocations are done dynamically. The trie-related settings in \type{texmf.cnf} are ignored. \item Because there is no \quote{trie preparation} stage, language patterns never become frozen. This means that the primitive \tex{patterns} (and its \LUA\ counterpart \luatex{lang.patterns}) can be used at any time, not only in initex. \item Only the string representation of \tex{patterns} and \tex{hyphenation} is stored in the format file. At format load time, they are simply re-evaluated. It follows that there is no real reason to preload languages in the format file. In fact, it is usually not a good idea to do so. It is much smarter to load patterns no sooner than the first time they are actually needed. \item \LUATEX\ uses the language-specific variables \tex{prehyphenchar} and \tex{posthyphenchar} in the creation of implicit discretionaries, instead of \TEX82's \tex{hyphenchar}, and the values of the language-specific variables \tex{preexhyphenchar} and \tex{postexhyphenchar} for explicit discretionaries (instead of \TEX82's empty discretionary). \stopitemize Inserted characters and ligatures inherit their attributes from the nearest glyph node item (usually the preceding one, but the following one for the items inserted at the left-hand side of a word). Word boundaries are no longer implied by font switches, but by language switches. One word can have two separate fonts and still be hyphenated correctly (but it can not have two different languages, the \tex{setlanguage} command forces a word boundary). All languages start out with \tex{prehyphenchar=`\-}, \tex{posthyphenchar=0}, \tex{preexhyphenchar=0} and \tex{postexhyphenchar=0}. When you assign the values of one of these four parameters, you are actually changing the settings for the current \tex{language}, this behavior is compatible with \tex{patterns} and \tex{hyphenation}. \LUATEX\ also hyphenates the first word in a paragraph. Words can be up to 256 characters long (up from 64 in \TEX82). Longer words generate an error right now, but eventually either the limitation will be removed or perhaps it will become possible to silently ignore the excess characters (this is what happens in \TEX82, but there the behavior cannot be controlled). If you are using the \LUA\ function \type{lang.hyphenate}, you should be aware that this function expects to receive a list of \quote{character} nodes. It will not operate properly in the presence of \quote{glyph}, \quote{ligature}, or \quote{ghost} nodes, nor does it know how to deal with kerning. In the near future, it will be able to skip over \quote{ghost} nodes, and we may add a less fuzzy function you can call as well. The hyphenation exception dictionary is maintained as key-value hash, and that is also dynamic, so the \type{hyph_size} setting is not used either. A technical paper detailing the new algorithm will be released as a separate document. \section{Applying ligatures and kerning} After all possible hyphenation points have been inserted in the list, \LUATEX\ will process the list to convert the \quote{character} nodes into \quote{glyph} and \quote{ligature} nodes. This is actually done in two stages: first all ligatures are processed, then all kerning information is applied to the result list. But those two stages are somewhat dependent on each other: If the used font makes it possible to do so, the ligaturing stage adds virtual \quote{character} nodes to the word boundaries in the list. While doing so, it removes and interprets \type{noboundary} nodes. The kerning stage deletes those word boundary items after it is done with them, and it does the same for \quote{ghost} nodes. Finally, at the end of the kerning stage, all remaining \quote{character} nodes are converted to \quote{glyph} nodes. This work separation is worth mentioning because, if you overrule from \LUA\ only one of the two callbacks related to font handling, then you have to make sure you perform the tasks normally done by \LUATEX\ itself in order to make sure that the other, non|-|overruled, routine continues to function properly. Work in this area is not yet complete, but most of the possible cases are handled by our rewritten ligaturing engine. We are working hard to make sure all of the possible inputs will become supported soon. For example, take the word \type{office}, hyphenated \type{of-fice}, using a \quote{normal} font with all the \type{f}-\type{f} and \type{f}-\type{i} type ligatures: \starttabulate[|l|l|] \NC Initial: \NC \type{{o}{f}{f}{i}{c}{e}}\NC\NR \NC After hyphenation: \NC \type{{o}{f}{{-},{},{}}{f}{i}{c}{e}}\NC\NR \NC First ligature stage: \NC \type{{o}{{f-},{f},{}}{i}{c}{e}}\NC\NR \NC Final result: \NC \type{{o}{{f-},{},{}}{c}{e}} \NC\NR \stoptabulate That's bad enough, but let us assume that there is also a hyphenation point between the \type{f} and the \type{i}, to create \type{of-f-ice}. Then the final result should be: \starttyping {o}{{f-}, {{f-}, {i}, {}}, {{-}, {i}, {}}}{c}{e} \stoptyping with discretionaries in the post-break text as well as in the replacement text of the top-level discretionary that resulted from the first hyphenation point. Here is that nested solution again, in a different representation: \starttabulate[|l|l|l|l|] \NC \NC pre \NC post \NC replace \NC \NR \NC topdisc \NC \type{f-}$^1$ \NC sub1 \NC sub2 \NC \NR \NC sub1 \NC \type{f-}$^2$ \NC \type{i}$^3$ \NC \type{}$^4$ \NC \NR \NC sub2 \NC \type{-}$^5$\NC \type{i}$^6$ \NC \type{}$^7$\NC \NR \stoptabulate When line breaking is choosing its breakpoints, the following fields will eventually be selected: \starttabulate[|l|l|l|] \NC \type{of-f-ice} \NC \type{f-}$^1$ \NC \NR \NC \NC \type{f-}$^2$ \NC \NR \NC \NC \type{i}$^3$ \NC \NR \NC \type{of-fice} \NC \type{f-}$^1$ \NC \NR \NC \NC \type{}$^4$ \NC \NR \NC \type{off-ice} \NC \type{-}$^5$ \NC \NR \NC \NC \type{i}$^6$ \NC \NR \NC \type{office} \NC \type{}$^7$ \NC \NR \stoptabulate The current solution in \LUATEX\ is not able to handle nested discretionaries, but it is in fact smart enough to handle this fictional \type{of-f-ice} example. It does so by combining two sequential discretionary nodes as if they were a single object (where the second discretionary node is treated as an extension of the first node). One can observe that the \type{of-f-ice} and \type{off-ice} cases both end with the same actual post replacement list (\type{i}), and that this would be the case even if that \type{i} was the first item of a potential following ligature like \type{ic}. This allows \LUATEX\ to do away with one of the fields, and thus make the whole stuff fit into just two discretionary nodes. The mapping of the seven list fields to the six fields in this discretionary node pair is as follows: \starttabulate[|l|p|] \NC \bf field \NC \bf description \NC \NR \NC \type{disc1.pre} \NC \type{f-}$^1$ \NC \NR \NC \type{disc1.post} \NC \type{}$^4$ \NC \NR \NC \type{disc1.replace} \NC \type{}$^7$ \NC \NR \NC \type{disc2.pre} \NC \type{f-}$^2$ \NC \NR \NC \type{disc2.post} \NC \type{i}$^{3{,}6}$\NC \NR \NC \type{disc2.replace} \NC \type{-}$^5$\NC \NR \stoptabulate What is actually generated after ligaturing has been applied is therefore: \starttyping {o}{{f-}, {}, {}} {{f-}, {i}, {-}}{c}{e} \stoptyping The two discretionaries have different subtypes from a discretionary appearing on its own: the first has subtype 4, and the second has subtype 5. The need for these special subtypes stems from the fact that not all of the fields appear in their \quote{normal} location. The second discretionary especially looks odd, with things like the \type{-} appearing in \type{disc2.replace}. The fact that some of the fields have different meanings (and different processing code internally) is what makes it necessary to have different subtypes: this enables \LUATEX\ to distinguish this sequence of two joined discretionary nodes from the case of two standalone discretionaries appearing in a row. \section{Breaking paragraphs into lines} This code is still almost unchanged, but because of the above|-|mentioned changes with respect to discretionaries and ligatures, line breaking will potentially be different from traditional \TEX. The actual line breaking code is still based on the \TEX82 algorithms, and it does not expect there to be discretionaries inside of discretionaries. But that situation is now fairly common in \LUATEX, due to the changes to the ligaturing mechanism. And also, the \LUATEX\ discretionary nodes are implemented slightly different from the \TEX82 nodes: the \type{no_break} text is now embedded inside the disc node, where previously these nodes kept their place in the horizontal list (the discretionary node contained a counter indicating how many nodes to skip). The combined effect of these two differences is that \LUATEX\ does not always use all of the potential breakpoints in a paragraph, especially when fonts with many ligatures are used. % TODO: % Check \sfcode handling % Implement \glyph % % Remove \savinghyphcodes % Allow non-UCS characters in \patterns \chapter[fonts]{Font structure} All \TEX\ fonts are represented to \LUA\ code as tables, and internally as C~structures. All keys in the table below are saved in the internal font structure if they are present in the table returned by the \luatex{define_font} callback, or if they result from the normal \TFM|/|\VF\ reading routines if there is no \luatex{define_font} callback defined. The column \quote{from \VF} means that this key will be created by the \luatex{font.read_vf()} routine, \quote{from \TFM} means that the key will be created by the \luatex{font.read_tfm()} routine, and \quote{used} means whether or not the \LUATEX\ engine itself will do something with the key. The top|-|level keys in the table are as follows: \starttabulate[|Tl|l|l|l|l|p|] \NC \ssbf key \NC \bf from vf \NC \bf from tfm \NC \bf used\NC \bf value type \NC \bf description \NC\NR \NC name \NC yes \NC yes \NC yes \NC string \NC metric (file) name\NC\NR \NC area \NC no \NC yes \NC yes \NC string \NC (directory) location, typically empty\NC\NR \NC used \NC no \NC yes \NC yes \NC boolean\NC used already? (initial: false)\NC \NR \NC characters \NC yes \NC yes \NC yes \NC table \NC the defined glyphs of this font \NC \NR \NC checksum \NC yes \NC yes \NC no \NC number \NC default: 0 \NC \NR \NC designsize \NC no \NC yes \NC yes \NC number \NC expected size (default: 655360 == 10pt) \NC \NR \NC direction \NC no \NC yes \NC yes \NC number \NC default: 0 (TLT) \NC \NR \NC encodingbytes \NC no \NC no \NC yes \NC number \NC default: depends on \type {format}\NC\NR \NC encodingname \NC no \NC no \NC yes \NC string \NC encoding name\NC\NR \NC fonts \NC yes \NC no \NC yes \NC table \NC locally used fonts\NC \NR \NC psname \NC no \NC no \NC yes \NC string \NC actual (\POSTSCRIPT) name (this is the PS fontname in the incoming font source, also used as fontname identifier in the \PDF\ output, new in 0.43)\NC\NR \NC fullname \NC no \NC no \NC yes \NC string \NC output font name, used as a fallback in the \PDF\ output if the psname is not set\NC\NR \NC header \NC yes \NC no \NC no \NC string \NC header comments, if any\NC \NR \NC hyphenchar \NC no \NC no \NC yes \NC number \NC default: TeX's \tex{hyphenchar} \NC \NR \NC parameters \NC no \NC yes \NC yes \NC hash \NC default: 7 parameters, all zero \NC \NR \NC size \NC no \NC yes \NC yes \NC number \NC loaded (at) size. (default: same as designsize) \NC \NR \NC skewchar \NC no \NC no \NC yes \NC number \NC default: TeX's \tex{skewchar} \NC \NR \NC type \NC yes \NC no \NC yes \NC string \NC basic type of this font\NC \NR \NC format \NC no \NC no \NC yes \NC string \NC disk format type\NC \NR \NC embedding \NC no \NC no \NC yes \NC string \NC \PDF\ inclusion\NC \NR \NC filename \NC no \NC no \NC yes \NC string \NC disk file name\NC\NR \NC tounicode \NC no \NC yes \NC yes \NC number \NC if 1, \LUATEX\ assumes per-glyph tounicode entries are present in the font\NC\NR \NC stretch \NC no \NC no \NC yes \NC number \NC the \quote {stretch} value from \tex{pdffontexpand}\NC\NR \NC shrink \NC no \NC no \NC yes \NC number \NC the \quote {shrink} value from \tex{pdffontexpand}\NC\NR \NC step \NC no \NC no \NC yes \NC number \NC the \quote {step} value from \tex{pdffontexpand}\NC\NR \NC auto_expand \NC no \NC no \NC yes \NC boolean\NC the \quote {autoexpand} keyword from\crlf \tex{pdffontexpand}\NC\NR \NC expansion_factor \NC no \NC no \NC no \NC number \NC the actual expansion factor of an expanded font\NC\NR \NC attributes \NC no \NC no \NC yes \NC string \NC the \tex{pdffontattr}\NC\NR \NC cache \NC no \NC no \NC yes \NC string \NC this key controls caching of the lua table on the \type{tex} end. \type{yes}: use a reference to the table that is passed to \LUATEX\ (this is the default). \type{no}: don't store the table reference, don't cache any lua data for this font. \type{renew}: don't store the table reference, but save a reference to the table that is created at the first access to one of its fields in font.fonts. (new in 0.40.0, before that caching was always \type{yes}). Note: the saved reference is thread-local, so be careful when you are using coroutines: an error will be thrown if the table has been cached in one thread, but you reference it from another thread ($\approx$ coroutine)\NC\NR \NC nomath \NC no \NC no \NC yes \NC boolean\NC this key allows a minor speedup for text fonts. if it is present and true, then \LUATEX\ will not check the character enties for math-specific keys. (0.42.0)\NC\NR \NC slant \NC no \NC no \NC yes \NC number \NC This has the same semantics as the \type{SlantFont} operator in font map files. (0.47.0)\NC\NR \NC extent \NC no \NC no \NC yes \NC number \NC This has the same semantics as the \type{ExtendFont} operator in font map files. (0.50.0)\NC\NR \stoptabulate The key \type{name} is always required. The keys \type{stretch}, \type{shrink}, \type{step} and optionally \type{auto_expand} only have meaning when used together: they can be used to replace a post-loading \tex{pdffontexpand} command. The \type{expansion_factor} is value that can be present inside a font in \type{font.fonts}. It is the actual expansion factor (a value between \type{-shrink} and \type{stretch}, with step \type{step}) of a font that was automatically generated by the font expansion algorithm. The key \type{attributes} can be used to replace \tex{pdffontattr}. The key \type{used} is set by the engine when a font is actively in use, this makes sure that the font's definition is written to the output file (\DVI\ or \PDF). The \TFM\ reader sets it to false. The \type{direction} is a number signalling the \quote{normal} direction for this font. There are sixteen possibilities: \starttabulate[|Tc|c|c|c|] \NC \ssbf number \NC \bf meaning \NC \bf number \NC \bf meaning \NC\NR \NC 0 \NC LT \NC 8 \NC TT \NC\NR \NC 1 \NC LL \NC 9 \NC TL \NC\NR \NC 2 \NC LB \NC 10 \NC TB \NC\NR \NC 3 \NC LR \NC 11 \NC TR \NC\NR \NC 4 \NC RT \NC 12 \NC BT \NC\NR \NC 5 \NC RL \NC 13 \NC BL \NC\NR \NC 6 \NC RB \NC 14 \NC BB \NC\NR \NC 7 \NC RR \NC 15 \NC BR \NC\NR \stoptabulate These are \OMEGA|-|style direction abbreviations: the first character indicates the \quote{first} edge of the character glyphs (the edge that is seen first in the writing direction), the second the \quote{top} side. The \type{parameters} is a hash with mixed key types. There are seven possible string keys, as well as a number of integer indices (these start from 8 up). The seven strings are actually used instead of the bottom seven indices, because that gives a nicer user interface. The names and their internal remapping are: \starttabulate[|lT|c|] \NC \ssbf name \NC \bf internal remapped number \NC\NR \NC slant \NC 1 \NC\NR \NC space \NC 2 \NC\NR \NC space_stretch \NC 3 \NC\NR \NC space_shrink \NC 4 \NC\NR \NC x_height \NC 5 \NC\NR \NC quad \NC 6 \NC\NR \NC extra_space \NC 7 \NC\LR \stoptabulate The keys \type{type}, \type{format}, \type{embedding}, \type{fullname} and \type{filename} are used to embed \OPENTYPE\ fonts in the result \PDF. The \type{characters} table is a list of character hashes indexed by an integer number. The number is the \quote{internal code} \TEX\ knows this character by. Two very special string indexes can be used also: \type{left_boundary} is a virtual character whose ligatures and kerns are used to handle word boundary processing. \type{right_boundary} is similar but not actually used for anything (yet!). Other index keys are ignored. Each character hash itself is a hash. For example, here is the character \quote{f} (decimal 102) in the font cmr10 at 10 points: \starttyping [102] = { ['width'] = 200250, ['height'] = 455111, ['depth'] = 0, ['italic'] = 50973, ['kerns'] = { [63] = 50973, [93] = 50973, [39] = 50973, [33] = 50973, [41] = 50973 }, ['ligatures'] = { [102] = { ['char'] = 11, ['type'] = 0 }, [108] = { ['char'] = 13, ['type'] = 0 }, [105] = { ['char'] = 12, ['type'] = 0 } } } \stoptyping The following top|-|level keys can be present inside a character hash: \starttabulate[|lT|c|c|c|l|p|] \NC \ssbf key \NC \bf from vf \NC \bf from tfm \NC \bf used \NC \bf value type \NC \bf description \NC\NR \NC width \NC yes \NC yes \NC yes \NC number \NC character's width, in sp (default 0) \NC\NR \NC height \NC no \NC yes \NC yes \NC number \NC character's height, in sp (default 0) \NC\NR \NC depth \NC no \NC yes \NC yes \NC number \NC character's depth, in sp (default 0) \NC\NR \NC italic \NC no \NC yes \NC yes \NC number \NC character's italic correction, in sp (default zero) \NC\NR \NC top_accent \NC no \NC no \NC maybe \NC number \NC character's top accent alignment place, in sp (default zero) \NC\NR \NC bot_accent \NC no \NC no \NC maybe \NC number \NC character's bottom accent alignment place, in sp (default zero) \NC\NR \NC left_protruding \NC no \NC no \NC maybe \NC number \NC character's \tex{lpcode}\NC\NR \NC right_protruding \NC no \NC no \NC maybe \NC number \NC character's \tex{rpcode}\NC\NR \NC expansion_factor \NC no \NC no \NC maybe \NC number \NC character's \tex{efcode}\NC\NR \NC tounicode \NC no \NC no \NC maybe \NC string \NC character's Unicode equivalent(s), in UTF-16BE hexadecimal format\NC\NR \NC next \NC no \NC yes \NC yes \NC number \NC the \quote{next larger} character index \NC\NR \NC extensible \NC no \NC yes \NC yes \NC table \NC the constituent parts of an extensible recipe \NC\NR \NC vert_variants \NC no \NC no \NC yes \NC table \NC constituent parts of a vertical variant set\NC \NR \NC horiz_variants\NC no \NC no \NC yes \NC table \NC constituent parts of a horizontal variant set\NC \NR \NC kerns \NC no \NC yes \NC yes \NC table \NC kerning information \NC\NR \NC ligatures \NC no \NC yes \NC yes \NC table \NC ligaturing information \NC\NR \NC commands \NC yes \NC no \NC yes \NC array \NC virtual font commands \NC\NR \NC name \NC no \NC no \NC no \NC string \NC the character (\POSTSCRIPT) name \NC\NR \NC index \NC no \NC no \NC yes \NC number \NC the (\OPENTYPE\ or \TRUETYPE) font glyph index \NC\NR \NC used \NC no \NC yes \NC yes \NC boolean \NC typeset already (default: false)? \NC\NR \NC mathkern \NC no \NC no \NC yes \NC table \NC math cut-in specifications \NC\NR \stoptabulate The values of \type{top_accent}, \type{bot_accent} and \type{mathkern} are used only for math accent and superscript placement, see the \at{math chapter}[math] in this manual for details. The values of \type{left_protruding} and \type{right_protruding} are used only when \tex{pdfprotrudechars} is non-zero. Whether or not \type{expansion_factor} is used depends on the font's global expansion settings, as well as on the value of \tex{pdfadjustspacing}. The usage of \type{tounicode} is this: if this font specifies a \type{tounicode=1} at the top level, then \LUATEX\ will construct a \type{/ToUnicode} entry for the \PDF\ font (or font subset) based on the character-level \type{tounicode} strings, where they are available. If a character does not have a sensible \UNICODE\ equivalent, do not provide a string either (no empty strings). If the font-level \type{tounicode} is not set, then \LUATEX\ will build up \type{/ToUnicode} based on the \TEX\ code points you used, and any character-level \type{tounicodes} will be ignored. {\it At the moment, the string format is exactly the format that is expected by Adobe \CMAP\ files (\UTF-16BE in hexadecimal encoding), minus the enclosing angle brackets. This may change in the future.} Small example: the \type{tounicode} for a \type{fi} ligature would be \type{00660069}. The presence of \type{extensible} will overrule \type{next}, if that is also present. It in in turn can be overruled by \type{vert_variants}. The \type{extensible} table is very simple: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf description \NC\NR \NC top \NC number \NC \quote{top} character index \NC\NR \NC mid \NC number \NC \quote{middle} character index \NC\NR \NC bot \NC number \NC \quote{bottom} character index \NC\NR \NC rep \NC number \NC \quote{repeatable} character index \NC\NR \stoptabulate The \type{horiz_variants} and \type{vert_variants} are arrays of components. Each of those components is itself a hash of up to five keys: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC component \NC number \NC The character index (note that this is an encoding number, not a name).\NC \NR \NC extender \NC number \NC One (1) if this part is repeatable, zero (0) otherwise.\NC \NR \NC start \NC number \NC Maximum overlap at the starting side (in scaled points).\NC \NR \NC end \NC number \NC Maximum overlap at the ending side (in scaled points).\NC \NR \NC advance \NC number \NC Total advance width of this item (can be zero or missing, then the natural size of the glyph for character \type{component} is used).\NC \NR \stoptabulate The \type{kerns} table is a hash indexed by character index (and \quote{character index} is defined as either a non|-|negative integer or the string value \type {right_boundary}), with the values the kerning to be applied, in scaled points. The \type{ligatures} table is a hash indexed by character index (and \quote{character index} is defined as either a non|-|negative integer or the string value \type {right_boundary}), with the values being yet another small hash, with two fields: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf description \NC\NR \NC type \NC number \NC the type of this ligature command, default 0 \NC\NR \NC char \NC number \NC the character index of the resultant ligature \NC\NR \stoptabulate The \type{char} field in a ligature is required. The \type{type} field inside a ligature is the numerical or string value of one of the eight possible ligature types supported by \TEX. When \TEX\ inserts a new ligature, it puts the new glyph in the middle of the left and right glyphs. The original left and right glyphs can optionally be retained, and when at least one of them is kept, it is also possible to move the new \quote{insertion point} forward one or two places. The glyph that ends up to the right of the insertion point will become the next \quote{left}. \starttabulate[|l|c|l|l|] \NC \bf textual (Knuth) \NC \bf number \NC \bf string \NC result \NC\NR \NC l + r =: n \NC 0 \NC \type{=:} \NC \|n \NC\NR \NC l + r =:\| n \NC 1 \NC \type{=:|} \NC \|nr \NC\NR \NC l + r \|=: n \NC 2 \NC \type{|=:} \NC \|ln \NC\NR \NC l + r \|=:\| n \NC 3 \NC \type{|=:|} \NC \|lnr \NC\NR \NC l + r =:\|\> n \NC 5 \NC \type{=:|>} \NC n\|r \NC\NR \NC l + r \|=:\> n \NC 6 \NC \type{|=:>} \NC l\|n \NC\NR \NC l + r \|=:\|\> n \NC 7 \NC \type{|=:|>} \NC l\|nr \NC\NR \NC l + r \|=:\|\>\> n \NC 11 \NC \type{|=:|>>} \NC ln\|r \NC\NR \stoptabulate The default value is~0, and can be left out. That signifies a \quote{normal} ligature where the ligature replaces both original glyphs. In this table the~\| indicates the final insertion point. The \type{commands} array is explained below. \section {Real fonts} Whether or not a \TEX\ font is a \quote{real} font that should be written to the \PDF\ document is decided by the \type{type} value in the top|-|level font structure. If the value is \type{real}, then this is a proper font, and the inclusion mechanism will attempt to add the needed font object definitions to the \PDF. Values for \type{type}: \starttabulate[|Tl|p|] \NC \ssbf value \NC \bf description \NC\NR \NC real \NC this is a base font \NC\NR \NC virtual \NC this is a virtual font \NC\NR \stoptabulate The actions to be taken depend on a number of different variables: \startitemize[packed] \item Whether the used font fits in an 8-bit encoding scheme or not \item The type of the disk font file \item The level of embedding requested \stopitemize A font that uses anything other than an 8-bit encoding vector has to be written to the \PDF\ in a different way. The rule is: if the font table has \type {encodingbytes} set to~2, then this is a wide font, in all other cases it isn't. The value~2 is the default for \OPENTYPE\ and \TRUETYPE\ fonts loaded via \LUA. For \TYPEONE\ fonts, you have to set \type {encodingbytes} to~2 explicitly. For \PK\ bitmap fonts, wide font encoding is not supported at all. If no special care is needed, \LUATEX\ currently falls back to the mapfile|-|based solution used by \PDFTEX\ and \DVIPS. This behavior will be removed in the future, when the existing code becomes integrated in the new subsystem. But if this is a \quote{wide} font, then the new subsystem kicks in, and some extra fields have to be present in the font structure. In this case, \LUATEX\ does not use a map file at all. The extra fields are: \type{format}, \type{embedding}, \type{fullname}, \type{cidinfo} (as explained above), \type{filename}, and the \type{index} key in the separate characters. Values for \type{format} are: \starttabulate[|Tl|p|] \NC \ssbf value \NC \bf description \NC\NR \NC type1 \NC this is a \POSTSCRIPT\ \TYPEONE\ font \NC\NR \NC type3 \NC this is a bitmapped (\PK) font \NC\NR \NC truetype \NC this is a \TRUETYPE\ or \TRUETYPE|-|based \OPENTYPE\ font \NC\NR \NC opentype \NC this is a \POSTSCRIPT|-|based \OPENTYPE\ font \NC\NR \stoptabulate (\type{type3} fonts are provided for backward compatibility only, and do not support the new wide encoding options.) Values for \type{embedding} are: \starttabulate[|Tl|p|] \NC \ssbf value \NC \bf description \NC\NR \NC no \NC don't embed the font at all \NC\NR \NC subset \NC include and atttempt to subset the font \NC\NR \NC full \NC include this font in its entirety \NC\NR \stoptabulate It is not possible to artificially modify the transformation matrix for the font at the moment. The other fields are used as follows: The \type{fullname} will be the \POSTSCRIPT|/|\PDF\ font name. The \type{cidinfo} will be used as the character set (the CID \type{/Ordering} and \type{/Registry} keys). The \type{filename} points to the actual font file. If you include the full path in the \type{filename} or if the file is in the local directory, \LUATEX\ will run a little bit more efficient because it will not have to re|-|run the \type{find_xxx_file} callback in that case. Be careful: when mixing old and new fonts in one document, it is possible to create \POSTSCRIPT\ name clashes that can result in printing errors. When this happens, you have to change the \type{fullname} of the font. Typeset strings are written out in a wide format using 2~bytes per glyph, using the \type{index} key in the character information as value. The overall effect is like having an encoding based on numbers instead of traditional (\POSTSCRIPT) name|-|based reencoding. The way to get the correct \type{index} numbers for \TYPEONE\ fonts is by loading the font via \type{fontloader.open}; use the table indices as \type{index} fields. This type of reencoding means that there is no longer a clear connection between the text in your input file and the strings in the output \PDF\ file. Dealing with this is high on the agenda. \section[virtualfonts]{Virtual fonts} You have to take the following steps if you want \LUATEX\ to treat the returned table from \luatex{define_font} as a virtual font: \startitemize[packed] \item Set the top|-|level key \type {type} to \type {virtual}. \item Make sure there is at least one valid entry in \luatex{fonts} (see below). \item Give a \type {commands} array to every character (see below). \stopitemize The presence of the toplevel \type {type} key with the specific value \type {virtual} will trigger handling of the rest of the special virtual font fields in the table, but the mere existence of 'type' is enough to prevent \LUATEX\ from looking for a virtual font on its own. Therefore, this also works \quote{in reverse}: if you are absolutely certain that a font is not a virtual font, assigning the value \type{base} or \type{real} to \type{type} will inhibit \LUATEX\ from looking for a virtual font file, thereby saving you a disk search. The \luatex{fonts} is another \LUA\ array. The values are one- or two|-|key hashes themselves, each entry indicating one of the base fonts in a virtual font. In case your font is referring to itself, you can use the \type {font.nextid()} function which returns the index of the next to be defined font which is probably the currently defined one. An example makes this easy to understand \starttyping fonts = { { name = 'ptmr8a', size = 655360 }, { name = 'psyr', size = 600000 }, { id = 38 } } \stoptyping says that the first referenced font (index 1) in this virtual font is \type{ptrmr8a} loaded at 10pt, and the second is \type{psyr} loaded at a little over 9pt. The third one is previously defined font that is known to \LUATEX\ as fontid \quote{38}. The array index numbers are used by the character command definitions that are part of each character. The \luatex{commands} array is a hash where each item is another small array, with the first entry representing a command and the extra items being the parameters to that command. The allowed commands and their arguments are: \starttabulate[|Tl|l|l|p|] \NC \ssbf command name \NC \bf arguments \NC \bf arg type \NC \bf description \NC\NR \NC font \NC 1 \NC number \NC select a new font from the local \luatex{fonts} table\NC\NR \NC char \NC 1 \NC number \NC typeset this character number from the current font, and move right by the character's width\NC\NR \NC node \NC 1 \NC node \NC output this node (list), and move right by the width of this list\NC\NR \NC slot \NC 2 \NC number \NC a shortcut for the combination of a font and char command\NC\NR \NC push \NC 0 \NC \NC save current position\NC\NR \NC nop \NC 0 \NC \NC do nothing \NC\NR \NC pop \NC 0 \NC \NC pop position \NC\NR \NC rule \NC 2 \NC 2 numbers \NC output a rule $ht*wd$, and move right.\NC\NR \NC down \NC 1 \NC number \NC move down on the page\NC\NR \NC right \NC 1 \NC number \NC move right on the page\NC\NR \NC special \NC 1 \NC string \NC output a \tex{special} command\NC\NR \NC image \NC 1 \NC image \NC output an image (the argument can be either an \type{} variable or an \type{image_spec} table)\NC\NR \NC comment \NC any \NC any \NC the arguments of this command are ignored\NC\NR \stoptabulate Here is a rather elaborate glyph commands example: \starttyping ... commands = { {'push'}, -- remember where we are {'right', 5000}, -- move right about 0.08pt {'font', 3}, -- select the fonts[3] entry {'char', 97}, -- place character 97 (ASCII 'a') {'pop'}, -- go all the way back {'down', -200000}, -- move upwards by about 3pt {'special', 'pdf: 1 0 0 rg'} -- switch to red color {'rule', 500000, 20000} -- draw a bar {'special','pdf: 0 g'} -- back to black } ... \stoptyping The default value for \type {font} is always~1 at the start of the \type{commands} array. Therefore, if the virtual font is essentially only a re|-|encoding, then you do usually not have create an explicit \quote{font} command in the array. Rules inside of \type{commands} arrays are built up using only two dimensions: they do not have depth. For correct vertical placement, an extra \type{down} command may be needed. Regardless of the amount of movement you create within the \type {commands}, the output pointer will always move by exactly the width that was given in the \type {width} key of the character hash. Any movements that take place inside the \type{commands} array are ignored on the upper level. \subsection{Artificial fonts} Even in a \quote{real} font, there can be virtual characters. When \LUATEX\ encounters a \type {commands} field inside a character when it becomes time to typeset the character, it will interpret the commands, just like for a true virtual character. In this case, if you have created no \quote{fonts} array, then the default (and only) \quote{base} font is taken to be the current font itself. In practice, this means that you can create virtual duplicates of existing characters which is useful if you want to create composite characters. Note: this feature does {\it not\/} work the other way around. There can not be \quote{real} characters in a virtual font! You cannot use this technique for font re-encoding either; you need a truly virtual font for that (because characters that are already present cannot be altered). \subsection{Example virtual font} Finally, here is a plain \TEX\ input file with a virtual font demonstration: \startbuffer \directlua { callback.register('define_font', function (name,size) if name == 'cmr10-red' then f = font.read_tfm('cmr10',size) f.name = 'cmr10-red' f.type = 'virtual' f.fonts = {{ name = 'cmr10', size = size }} for i,v in pairs(f.characters) do if (string.char(i)):find('[tacohanshartmut]') then v.commands = { {'special','pdf: 1 0 0 rg'}, {'char',i}, {'special','pdf: 0 g'}, } else v.commands = {{'char',i}} end end else f = font.read_tfm(name,size) end return f end ) } \font\myfont = cmr10-red at 10pt \myfont This is a line of text \par \font\myfontx= cmr10 at 10pt \myfontx Here is another line of text \par \stopbuffer \typebuffer %\getbuffer \chapter[nodes]{Nodes} \section{\LUA\ node representation} \TEX's nodes are represented in \LUA\ as userdata object with a variable set of fields. In the following syntax tables, such the type of such a userdata object is represented as \syntax{}. The current return value of \luatex{node.types()} is: \ctxlua { local d = node.types() tex.print('\\type{' .. d[0] .. '} (' .. 0 .. '), ') for _,v in pairs(d) do if _ > 0 then tex.print('\\type{' .. v .. '} (' .. _ .. '), ') end end }. NOTE: The \type {\lastnodetype} primitive is \ETEX\ compliant. The valid range is still -1 .. 15 and glyph nodes have number 0 (used to be char node) and ligature nodes are mapped to 7. That way macro packages can use the same symbolic names as in traditional \ETEX. Keep in mind that the internal node numbers are different and that there are more node types than 15. \subsection{Auxiliary items} A few node|-|typed userdata objects do not occur in the \quote{normal} list of nodes, but can be pointed to from within that list. They are not quite the same as regular nodes, but it is easier for the library routines to treat them as if they were. \subsubsection{glue_spec items} Skips are about the only type of data objects in traditional \TEX\ that are not a simple value. The structure that represents the glue components of a skip is called a \type {glue_spec}, and it has the following accessible fields: \starttabulate[|lT|l|p|] \NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR \NC width \NC number \NC \NC\NR \NC stretch \NC number \NC \NC\NR \NC stretch_order \NC number \NC \NC\NR \NC shrink \NC number \NC \NC\NR \NC shrink_order \NC number \NC \NC\NR \NC writable \NC boolean \NC If this is true, you can't assign to this \type{glue_spec} because it is one of the preallocated special cases. New in 0.52\NC\NR \stoptabulate These objects are reference counted, so there is actually an extra field named \type {ref_count} as well. This item type will likely disappear in the future, and the glue fields themselves will become part of the nodes referencing glue items. \subsubsection{attribute{\_}list and attribute items} The newly introduced attribute registers are non|-|trivial, because the value that is attached to a node is essentially a sparse array of key|-|value pairs. It is generally easiest to deal with attribute lists and attributes by using the dedicated functions in the \luatex{node} library, but for completeness, here is the low|-|level interface. An \type{attribute_list} item is used as a head pointer for a list of attribute items. It has only one user-visible field: \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC next \NC \syntax{} \NC pointer to the first attribute\NC\NR \stoptabulate A normal node's attribute field will point to an item of type \type{attribute_list}, and the \type{next} field in that item will point to the first defined \quote{attribute} item, whose \type {next} will point to the second \quote{attribute} item, etc. Valid fields in \type{attribute} items: \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC next \NC \syntax{} \NC pointer to the next attribute\NC\NR \NC number \NC number \NC the attribute type id\NC\NR \NC value \NC number \NC the attribute value\NC\NR \stoptabulate \subsubsection{action item} Valid fields: \showfields{action}\crlf Id: \showid{action} These are a special kind of item that only appears inside pdf start link objects. \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC action_type \NC number \NC \NC\NR \NC action_id \NC number or string \NC \NC\NR \NC named_id \NC number \NC \NC\NR \NC file \NC string \NC \NC\NR \NC new_window \NC number \NC \NC\NR \NC data \NC string \NC \NC\NR \NC ref_count \NC number \NC \NC\NR \stoptabulate \subsection{Main text nodes} These are the nodes that comprise actual typesetting commands. A few fields are present in all nodes regardless of their type, these are: \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC next \NC \syntax{} \NC The next node in a list, or nil\NC\NR \NC id \NC number \NC The node's type (\type{id}) number \NC\NR \NC subtype \NC number \NC The node \type{subtype} identifier\NC\NR \stoptabulate The \type{subtype} is sometimes just a stub entry. Not all nodes actually use the \type{subtype}, but this way you can be sure that all nodes accept it as a valid field name, and that is often handy in node list traversal. In the following tables \type{next} and \type{id} are not explicitly mentioned. Besides these three fields, almost all nodes also have an \type {attr} field, and there is a also a field called \type{prev}. That last field is always present, but only initialized on explicit request: when the function \type{node.slide()} is called, it will set up the \type{prev} fields to be a backwards pointer in the argument node list. \subsubsection{hlist nodes} Valid fields: \showfields{hlist}\crlf Id: \showid{hlist} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC 0 = unknown origin, 1 = created by linebreaking, 2 = explicit box command. (0.46.0), 3 = paragraph indentation box, 4 = alignment column or row, 5 = alignment cell (0.62.0)\NC\NR \NC attr \NC \syntax{} \NC The head of the associated attribute list \NC\NR \NC width \NC number \NC \NC\NR \NC height \NC number \NC \NC\NR \NC depth \NC number \NC \NC\NR \NC shift \NC number \NC a displacement perpendicular to the character progression direction \NC\NR \NC glue_order \NC number \NC a number in the range 0--4, indicating the glue order\NC\NR \NC glue_set \NC number \NC the calculated glue ratio\NC\NR \NC glue_sign \NC number \NC \NC\NR \NC head \NC \syntax{} \NC the first node of the body of this list\NC\NR \NC dir \NC string \NC the direction of this box. see~\in{}[dirnodes]\NC\NR \stoptabulate A warning: never assign a node list to the \type{head} field unless you are sure its internal link structure is correct, otherwise an error may result. Note: the new field name \type{head} was introduced in 0.65 to replace the old name \type{list}. Use of the name \type{list} is now deprecated, but it will stay available until at least version 0.80. \subsubsection{vlist nodes} Valid fields: As for hlist, except that \quote{shift} is a displacement perpendicular to the line progression direction, and \quote{subtype} only has subtypes 0, 4, and 5. \subsubsection{rule nodes} Valid fields: \showfields{rule}\crlf Id: \showid{rule} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC unused\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC the width of the rule; the special value $-1073741824$ is used for \quote{running} glue dimensions\NC\NR \NC height \NC number \NC the height of the rule (can be negative)\NC\NR \NC depth \NC number \NC the depth of the rule (can be negative)\NC\NR \NC dir \NC string \NC the direction of this rule. see~\in{}[dirnodes]\NC\NR \stoptabulate \subsubsection{ins nodes} Valid fields: \showfields{ins}\crlf Id: \showid{ins} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC the insertion class\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC cost \NC number \NC the penalty associated with this insert\NC\NR \NC height \NC number \NC \NC\NR \NC depth \NC number \NC \NC\NR \NC head \NC \syntax{} \NC the first node of the body of this insert\NC\NR \NC spec \NC \syntax{} \NC a pointer to the \tex{splittopskip} glue spec\NC\NR \stoptabulate A warning: never assign a node list to the \type{head} field unless you are sure its internal link structure is correct, otherwise an error may be result. Note: the new field name \type{head} was introduced in 0.65 to replace the old name \type{list}. Use of the name \type{list} is now deprecated, but it will stay available until at least version 0.80. \subsubsection{mark nodes} Valid fields: \showfields{mark}\crlf Id: \showid{mark} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC unused\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC class \NC number \NC the mark class\NC\NR \NC mark \NC table \NC a table representing a token list\NC\NR \stoptabulate \subsubsection{adjust nodes} Valid fields: \showfields{adjust}\crlf Id: \showid{adjust} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC 0 = normal, 1 = \quote{pre}\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC head \NC \syntax{} \NC adjusted material\NC\NR \stoptabulate A warning: never assign a node list to the \type{head} field unless you are sure its internal link structure is correct, otherwise an error may be result. Note: the new field name \type{head} was introduced in 0.65 to replace the old name \type{list}. Use of the name \type{list} is now deprecated, but it will stay available until at least version 0.80. \subsubsection{disc nodes} Valid fields: \showfields{disc}\crlf Id: \showid{disc} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC indicates the source of a discretionary. 0 = the \tex{discretionary} command, 1 = the \tex{-} command, 2 = added automatically following a \type{-}, 3 = added by the hyphenation algorithm (simple), 4 = added by the hyphenation algorithm (hard, first item), 5 = added by the hyphenation algorithm (hard, second item)\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC pre \NC \syntax{} \NC pointer to the pre|-|break text\NC\NR \NC post \NC \syntax{} \NC pointer to the post|-|break text\NC\NR \NC replace \NC \syntax{} \NC pointer to the no|-|break text\NC\NR \stoptabulate The subtype numbers~4 and~5 belong to the \quote{of-f-ice} explanation given elsewhere. A warning: never assign a node list to the pre, post or replace field unless you are sure its internal link structure is correct, otherwise an error may be result. \subsubsection{math nodes} Valid fields: \showfields{math}\crlf Id: \showid{math} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC 0 = \quote{on}, 1 = \quote{off}\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC surround \NC number \NC width of the \tex{mathsurround} kern\NC\NR \stoptabulate \subsubsection{glue nodes} Valid fields: \showfields{glue}\crlf Id: \showid{glue} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC 0 = \tex{skip}, 1--18 = internal glue parameters, 100-103 = \quote{leader} subtypes \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC spec \NC \syntax{} \NC pointer to a glue{\_}spec item \NC\NR \NC leader \NC \syntax{} \NC pointer to a box or rule for leaders\NC\NR \stoptabulate The exact meanings of the subtypes are as follows: \starttabulate[|rT|l|] \NC 1 \NC \tex{lineskip} \NC \NR \NC 2 \NC \tex{baselineskip} \NC \NR \NC 3 \NC \tex{parskip} \NC \NR \NC 4 \NC \tex{abovedisplayskip} \NC \NR \NC 5 \NC \tex{belowdisplayskip} \NC \NR \NC 6 \NC \tex{abovedisplayshortskip} \NC \NR \NC 7 \NC \tex{belowdisplayshortskip} \NC \NR \NC 8 \NC \tex{leftskip} \NC \NR \NC 9 \NC \tex{rightskip} \NC \NR \NC 10 \NC \tex{topskip} \NC \NR \NC 11 \NC \tex{splittopskip} \NC \NR \NC 12 \NC \tex{tabskip} \NC \NR \NC 13 \NC \tex{spaceskip} \NC \NR \NC 14 \NC \tex{xspaceskip} \NC \NR \NC 15 \NC \tex{parfillskip} \NC \NR \NC 16 \NC \tex{thinmuskip} \NC \NR \NC 17 \NC \tex{medmuskip} \NC \NR \NC 18 \NC \tex{thickmuskip} \NC \NR \NC 100 \NC \tex{leaders} \NC \NR \NC 101 \NC \tex{cleaders} \NC \NR \NC 102 \NC \tex{xleaders} \NC \NR \NC 103 \NC \tex{gleaders} \NC \NR \stoptabulate \subsubsection{kern nodes} Valid fields: \showfields{kern}\crlf Id: \showid{kern} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC 0 = from font, 1 = from \tex{kern} or \tex{/}, 2 = from \tex{accent}\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC kern \NC number \NC \NC\NR \stoptabulate \subsubsection{penalty nodes} Valid fields: \showfields{penalty}\crlf Id: \showid{penalty} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC not used\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC penalty \NC number \NC \NC\NR \stoptabulate \subsubsection[glyphnodes]{glyph nodes} Valid fields: \showfields{glyph}\crlf Id: \showid{glyph} \starttabulate[|lT|l|p|] \NC \ssbf field\NC \ssbf type \NC \ssbf explanation \NC\NR \NC subtype \NC number \NC bitfield\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC char \NC number \NC \NC\NR \NC font \NC number \NC \NC\NR \NC lang \NC number \NC \NC\NR \NC left \NC number \NC \NC\NR \NC right \NC number \NC \NC\NR \NC uchyph \NC boolean \NC \NC\NR \NC components \NC \syntax{} \NC pointer to ligature components\NC\NR \NC xoffset \NC number \NC \NC\NR \NC yoffset \NC number \NC \NC\NR \NC width \NC number \NC (new in 0.53)\NC\NR \NC height \NC number \NC (new in 0.53)\NC\NR \NC depth \NC number \NC (new in 0.53)\NC\NR \stoptabulate A warning: never assign a node list to the components field unless you are sure its internal link structure is correct, otherwise an error may be result. Valid bits for the \type{subtype} field are: \starttabulate[|c|l|] \NC \ssbf bit \NC \bf meaning \NC\NR \NC 0 \NC character \NC\NR \NC 1 \NC glyph \NC\NR \NC 2 \NC ligature \NC\NR \NC 3 \NC ghost \NC\NR \NC 4 \NC left \NC\NR \NC 5 \NC right \NC\NR \stoptabulate See \in{section}[charsandglyphs] for a detailed description of the \type{subtype} field. \subsubsection{margin{\_}kern nodes} Valid fields: \showfields{margin_kern}\crlf Id: \showid{margin_kern} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC 0 = left side, 1 = right side\NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC \NC\NR \NC glyph \NC \syntax{} \NC \NC\NR \stoptabulate \subsection{Math nodes} These are the so||called \quote{noad}s and the nodes that are specifically associated with math processing. Most of these nodes contain sub-nodes so that the list of possible fields is actually quite small. First, the subnodes: \subsubsection{Math kernel subnodes} Many object fields in math mode are either simple characters in a specific family or math lists or node lists. There are four associated subnodes that represent these cases (in the following node descriptions these are indicated by the word \type{}). The \type{next} and \type{prev} fields for these subnodes are unused. \subsubsubsection{math{\_}char and math{\_}text{\_}char subnodes} Valid fields: \showfields{math_char}\crlf Id: \showid{math_char} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{}\NC \NC\NR \NC char \NC number \NC \NC \NR \NC fam \NC number \NC \NC\NR \stoptabulate The \type{math_char} is the simplest subnode field, it contains the character and family for a single glyph object. The \type{math_text_char} is a special case that you will not normally encounter, it arises temporarily during math list conversion (its sole function is to suppress a following italic correction). \subsubsubsection{sub{\_}box and sub{\_}mlist subnodes} Valid fields: \showfields{sub_box}\crlf Id: \showid{sub_box} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{}\NC \NC\NR \NC head \NC \syntax{}\NC \NC \NR \stoptabulate These two subnode types are used for subsidiary list items. For \type{sub_box}, the \type{head} points to a \quote{normal} vbox or hbox. For \type{sub_mlist}, the \type{head} points to a math list that is yet to be converted. A warning: never assign a node list to the \type{head} field unless you are sure its internal link structure is correct, otherwise an error may be result. Note: the new field name \type{head} was introduced in 0.65 to replace the old name \type{list}. Use of the name \type{list} is now deprecated, but it will stay available until at least version 0.80. \subsubsection{Math delimiter subnode} There is a fifth subnode type that is used exclusively for delimiter fields. As before, the \type{next} and \type{prev} fields are unused. \subsubsubsection{delim subnodes} Valid fields: \showfields{delim}\crlf Id: \showid{delim} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{}\NC \NC\NR \NC small_char \NC number \NC \NC \NR \NC small_fam \NC number \NC \NC\NR \NC large_char \NC number \NC \NC \NR \NC large_fam \NC number \NC \NC\NR \stoptabulate The fields \type{large_char} and \type{large_fam} can be zero, in that case the font that is sed for the \type{small_fam} is expected to provide the large version as an extension to the \type{small_char}. \subsubsection{Math core nodes} First, there are the objects (the \TEX book calls then \quote{atoms}) that are associated with the simple math objects: Ord, Op, Bin, Rel, Open, Close, Punct, Inner, Over, Under, Vcent. These all have the same fields, and they are combined into a single node type with separate subtypes for differentiation. \subsubsubsection{simple nodes} Valid fields: \showfields{noad}\crlf Id: \showid{noad} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC see below \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC nucleus \NC \syntax{}\NC \NC\NR \NC sub \NC \syntax{}\NC \NC\NR \NC sup \NC \syntax{}\NC \NC\NR \stoptabulate Operators are a bit special because they occupy three subtypes. \type{subtype}. \starttabulate[|lT|p|] \NC \ssbf number \NC \bf node sub type \NC\NR \NC 0 \NC Ord \NC\NR \NC 1 \NC Op, \type{\displaylimits} \NC\NR \NC 2 \NC Op, \type{\limits} \NC\NR \NC 3 \NC Op, \type{\nolimits} \NC\NR \NC 4 \NC Bin \NC\NR \NC 5 \NC Rel \NC\NR \NC 6 \NC Open \NC\NR \NC 7 \NC Close \NC\NR \NC 8 \NC Punct \NC\NR \NC 9 \NC Inner \NC\NR \NC 10 \NC Under \NC\NR \NC 11 \NC Over \NC\NR \NC 12 \NC Vcent \NC\NR \stoptabulate \subsubsubsection{accent nodes} Valid fields: \showfields{accent}\crlf Id: \showid{accent} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC the first bit is used for a fixed top accent flag (if the \type{accent} field is present), the second bit for a fixed bottom accent flag (if the \type{bot_accent} field is present). Example: the actual value \type{3} means: do not stretch either accent\NC\NR \NC attr \NC \syntax{}\NC \NC\NR \NC nucleus \NC \syntax{}\NC \NC \NR \NC sub \NC \syntax{}\NC \NC\NR \NC sup \NC \syntax{}\NC \NC \NR \NC accent \NC \syntax{}\NC \NC\NR \NC bot_accent \NC \syntax{}\NC \NC\NR \stoptabulate \subsubsubsection{style nodes} Valid fields: \showfields{style}\crlf Id: \showid{style} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC style \NC string \NC contains the style \NC\NR \stoptabulate There are eight possibilities for the string value: one of \quote{display}, \quote{text}, \quote{script}, or \quote{scriptscript}. Each of these can have a trailing \type{'} to signify \quote{cramped} styles. \subsubsubsection{choice nodes} Valid fields: \showfields{choice}\crlf Id: \showid{choice} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{}\NC \NC\NR \NC display \NC \syntax{}\NC \NC\NR \NC text \NC \syntax{}\NC \NC\NR \NC script \NC \syntax{}\NC \NC\NR \NC scriptscript \NC \syntax{}\NC \NC\NR \stoptabulate A warning: never assign a node list to the display, text, script, or scriptscript field unless you are sure its internal link structure is correct, otherwise an error may be result. \subsubsubsection{radical nodes} Valid fields: \showfields{radical}\crlf Id: \showid{radical} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{}\NC \NC\NR \NC nucleus \NC \syntax{}\NC \NC \NR \NC sub \NC \syntax{}\NC \NC\NR \NC sup \NC \syntax{}\NC \NC \NR \NC left \NC \syntax{}\NC \NC \NR \NC degree \NC \syntax{}\NC Only set by \type{\Uroot} \NC \NR \stoptabulate A warning: never assign a node list to the nucleus, sub, sup, left, or degree field unless you are sure its internal link structure is correct, otherwise an error may be result. \subsubsubsection{fraction nodes} Valid fields: \showfields{fraction}\crlf Id: \showid{fraction} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{}\NC \NC\NR \NC width \NC number \NC \NC \NR \NC num \NC \syntax{}\NC \NC\NR \NC denom \NC \syntax{}\NC \NC \NR \NC left \NC \syntax{}\NC \NC \NR \NC right \NC \syntax{}\NC \NC \NR \stoptabulate A warning: never assign a node list to the num, or denom field unless you are sure its internal link structure is correct, otherwise an error may be result. \subsubsubsection{fence nodes} Valid fields: \showfields{fence}\crlf Id: \showid{fence} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC subtype \NC number \NC 1 = \type{\left}, 2 = \type{\middle}, 3 = \type{\right} \NC\NR \NC attr \NC \syntax{}\NC \NC\NR \NC delim \NC \syntax{}\NC \NC \NR \stoptabulate \subsection{whatsit nodes} Whatsit nodes come in many subtypes that you can ask for by running \luatex{node.whatsits()}: \ctxlua {for n,name in table.sortedpairs(node.whatsits()) do if (n<100) then if (n>0) then tex.sprint (', ') end tex.sprint('\\type{' .. name .. '} (' .. n .. ')') end end } \subsubsection{open nodes} Valid fields: \showfields{whatsit,open}\crlf Id: \showid{whatsit,open} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC stream \NC number \NC \TEX's stream id number\NC\NR \NC name \NC string \NC file name \NC\NR \NC ext \NC string \NC file extension \NC\NR \NC area \NC string \NC file area (this may become obsolete) \NC\NR \stoptabulate \subsubsection{write nodes} Valid fields: \showfields{whatsit,write}\crlf Id: \showid{whatsit,write} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC stream \NC number \NC \TEX's stream id number\NC\NR \NC data \NC table \NC a table representing the token list to be written\NC\NR \stoptabulate \subsubsection{close nodes} Valid fields: \showfields{whatsit,close}\crlf Id: \showid{whatsit,close} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC stream \NC number \NC \TEX's stream id number\NC\NR \stoptabulate \subsubsection{special nodes} Valid fields: \showfields{whatsit,special}\crlf Id: \showid{whatsit,special} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC data \NC string \NC the \tex{special} information\NC\NR \stoptabulate \subsubsection{language nodes} \LUATEX\ does not have language whatsits any more. All language information is already present inside the glyph nodes themselves. This whatsit subtype will be removed in the next release. \subsubsection{local_par nodes} Valid fields: \showfields{whatsit,local_par}\crlf Id: \showid{whatsit,local_par} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC pen_inter \NC number \NC local interline penalty (from \tex{localinterlinepenalty})\NC\NR \NC pen_broken\NC number \NC local broken penalty (from \tex{localbrokenpenalty})\NC\NR \NC dir \NC string \NC the direction of this par. see~\in{}[dirnodes]\NC\NR \NC box_left \NC \syntax{} \NC the \tex{localleftbox}\NC\NR \NC box_left_width\NC number\NC width of the \tex{localleftbox}\NC\NR \NC box_right \NC \syntax{} \NC the \tex{localrightbox}\NC\NR \NC box_right_width\NC number\NC width of the \tex{localrightbox}\NC\NR \stoptabulate A warning: never assign a node list to the box_left or box_right field unless you are sure its internal link structure is correct, otherwise an error may be result. \subsubsection[dirnodes]{dir nodes} Valid fields: \showfields{whatsit,dir}\crlf Id: \showid{whatsit,dir} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC dir \NC string \NC the direction (but see below)\NC\NR \NC level \NC number \NC nesting level of this direction whatsit\NC\NR \NC dvi_ptr \NC number \NC a saved dvi buffer byte offset\NC\NR \NC dir_h \NC number \NC a saved dvi position\NC\NR \stoptabulate A note on \type{dir} strings. Direction specifiers are three-letter combinations of \type{T}, \type{B}, \type{R}, and \type{L}. These are built up out of three separate items: \startitemize \item the first is the direction of the \quote{top} of paragraphs. \item the second is the direction of the \quote{start} of lines. \item the third is the direction of the \quote{top} of glyphs. \stopitemize However, only four combinations are accepted: \type{TLT}, \type{TRT}, \type{RTT}, and \type{LTL}. Inside actual \type{dir} whatsit nodes, the representation of \type{dir} is not a three-letter but a four-letter combination. The first character in this case is always either \type{+} or \type{-}, indicating whether the value is pushed or popped from the direction stack. \subsubsection{pdf_literal nodes} Valid fields: \showfields{whatsit,pdf_literal}\crlf Id: \showid{whatsit,pdf_literal} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC mode \NC number \NC the \quote{mode} setting of this literal\NC\NR \NC data \NC string \NC the \tex{pdfliteral} information\NC\NR \stoptabulate Mode values: \starttabulate[|lT|p|] \NC \ssbf value \NC \ssbf corresponding \tex{pdftex} keyword \NC \NR \NC 0 \NC setorigin \NC \NR \NC 1 \NC page \NC \NR \NC 2 \NC direct \NC \NR \stoptabulate \subsubsection{pdf_refobj nodes} Valid fields: \showfields{whatsit,pdf_refobj}\crlf Id: \showid{whatsit,pdf_refobj} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC objnum \NC number \NC the referenced \PDF\ object number\NC\NR \stoptabulate \subsubsection{pdf_refxform nodes} Valid fields: \showfields{whatsit,pdf_refxform}\crlf Id: \showid{whatsit,pdf_refxform}. \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC \NC \NR \NC height \NC number \NC \NC \NR \NC depth \NC number \NC \NC \NR \NC objnum \NC number \NC the referenced \PDF\ object number\NC\NR \stoptabulate Be aware that \type{pdf_refxform} nodes have dimensions that are used by \LUATEX. \subsubsection{pdf_refximage nodes} Valid fields: \showfields{whatsit,pdf_refximage}\crlf Id: \showid{whatsit,pdf_refximage} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC \NC \NR \NC height \NC number \NC \NC \NR \NC depth \NC number \NC \NC \NR \NC objnum \NC number \NC the referenced \PDF\ object number\NC\NR \stoptabulate Be aware that \type{pdf_refximage} nodes have dimensions that are used by \LUATEX. \subsubsection{pdf_annot nodes} Valid fields: \showfields{whatsit,pdf_annot}\crlf Id: \showid{whatsit,pdf_annot} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC \NC \NR \NC height \NC number \NC \NC \NR \NC depth \NC number \NC \NC \NR \NC objnum \NC number \NC the referenced \PDF\ object number\NC\NR \NC data \NC string \NC the annotation data\NC\NR \stoptabulate \subsubsection{pdf_start_link nodes} Valid fields: \showfields{whatsit,pdf_start_link}\crlf Id: \showid{whatsit,pdf_start_link} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC \NC \NR \NC height \NC number \NC \NC \NR \NC depth \NC number \NC \NC \NR \NC objnum \NC number \NC the referenced \PDF\ object number\NC\NR \NC link_attr \NC table \NC the link attribute token list\NC\NR \NC action \NC \syntax{} \NC the action to perform\NC\NR \stoptabulate \subsubsection{pdf_end_link nodes} Valid fields: \showfields{whatsit,pdf_end_link}\crlf Id: \showid{whatsit,pdf_end_link} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \stoptabulate \subsubsection{pdf_dest nodes} Valid fields: \showfields{whatsit,pdf_dest}\crlf Id: \showid{whatsit,pdf_dest} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC \NC \NR \NC height \NC number \NC \NC \NR \NC depth \NC number \NC \NC \NR \NC named_id \NC number \NC is the dest_id a string value?\NC\NR \NC dest_id \NC number or string \NC the destination id\NC\NR \NC dest_type \NC number\NC type of destination\NC\NR \NC xyz_zoom \NC number\NC \NC\NR \NC objnum \NC number \NC the \PDF\ object number\NC\NR \stoptabulate \subsubsection{pdf_thread nodes} Valid fields: \showfields{whatsit,pdf_thread}\crlf Id: \showid{whatsit,pdf_thread} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC \NC \NR \NC height \NC number \NC \NC \NR \NC depth \NC number \NC \NC \NR \NC named_id \NC number \NC is the tread_id a string value?\NC\NR \NC tread_id \NC number or string \NC the thread id\NC\NR \NC thread_attr\NC number \NC extra thread information\NC\NR \stoptabulate \subsubsection{pdf_start_thread nodes} Valid fields: \showfields{whatsit,pdf_start_thread}\crlf Id: \showid{whatsit,pdf_start_thread} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC width \NC number \NC \NC \NR \NC height \NC number \NC \NC \NR \NC depth \NC number \NC \NC \NR \NC named_id \NC number \NC is the tread_id a string value?\NC\NR \NC tread_id \NC number or string \NC the thread id\NC\NR \NC thread_attr\NC number \NC extra thread information\NC\NR \stoptabulate \subsubsection{pdf_end_thread nodes} Valid fields: \showfields{whatsit,pdf_end_thread}\crlf Id: \showid{whatsit,pdf_end_thread} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \stoptabulate \subsubsection{pdf_save_pos nodes} Valid fields: \showfields{whatsit,pdf_save_pos}\crlf Id: \showid{whatsit,pdf_save_pos} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \stoptabulate \subsubsection{late_lua nodes} Valid fields: \showfields{whatsit,late_lua}\crlf Id: \showid{whatsit,late_lua} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC data \NC string \NC data to execute\NC\NR \NC string \NC string \NC data to execute (0.63)\NC\NR \NC name \NC string \NC the name to use for lua error reporting\NC\NR \stoptabulate The difference between \type{data} and \type{string} is that on assignment, the \type{data} field is converted to a token list, cf. use as \tex{latelua}. The \type{string} version is treated as a literal string. \subsubsection{pdf_colorstack nodes} Valid fields: \showfields{whatsit,pdf_colorstack}\crlf Id: \showid{whatsit,pdf_colorstack} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC stack \NC number \NC colorstack id number\NC\NR \NC cmd \NC number \NC command to execute\NC\NR \NC data \NC string \NC data\NC\NR \stoptabulate \subsubsection{pdf_setmatrix nodes} Valid fields: \showfields{whatsit,pdf_setmatrix}\crlf Id: \showid{whatsit,pdf_setmatrix} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC data \NC string \NC data\NC\NR \stoptabulate \subsubsection{pdf_save nodes} Valid fields: \showfields{whatsit,pdf_save}\crlf Id: \showid{whatsit,pdf_save} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \stoptabulate \subsubsection{pdf_restore nodes} Valid fields: \showfields{whatsit,pdf_restore}\crlf Id: \showid{whatsit,pdf_restore} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \stoptabulate \subsubsection{user_defined nodes} User|-|defined whatsit nodes can only be created and handled from \LUA\ code. In effect, they are an extension to the extension mechanism. The \LUATEX\ engine will simply step over such whatsits without ever looking at the contents. Valid fields: \showfields{whatsit,user_defined}\crlf Id: \showid{whatsit,user_defined} \starttabulate[|lT|l|p|] \NC \ssbf field \NC \bf type \NC \bf explanation \NC\NR \NC attr \NC \syntax{} \NC \NC\NR \NC user_id \NC number \NC id number\NC\NR \NC type \NC number \NC type of the value\NC\NR \NC value \NC number \NC \NC\NR \NC \NC string \NC \NC\NR \NC \NC \syntax{} \NC \NC\NR \NC \NC table \NC \NC\NR \stoptabulate The \type{type} can have one of five distinct values: \starttabulate[|lT|p|] \NC \ssbf value \NC \bf explanation \NC\NR \NC 97 \NC the value is an attribute node list \NC\NR \NC 100 \NC the value is a number \NC\NR \NC 110 \NC the value is a node list \NC\NR \NC 115 \NC the value is a string\NC\NR \NC 116 \NC the value is a token list in \LUA\ table form\NC\NR \stoptabulate \chapter{Modifications} Besides the expected changes caused by new functionality, there are a number of not|-|so|-|expected changes. These are sometimes a side|-|effect of a new (conflicting) feature, or, more often than not, a change necessary to clean up the internal interfaces. \section{Changes from \TEX\ 3.1415926} \startitemize \item The current code base is written in C, not Pascal web (as of \LUATEX~0.42.0). \item See~\in{chapter}[languages] for many small changes related to paragraph building, language handling, and hyphenation. Most important change: adding a brace group in the middle of a word (like in \type{of{}fice}) does not prevent ligature creation. \item There is no pool file, all strings are embedded during compilation. \item \type {plus 1 fillll} does not generate an error. The extra \quote{l} is simply typeset. \item The upper limit to \tex{endlinechar} and \tex{newlinechar} is 127. \stopitemize \section{Changes from \ETEX\ 2.2} \startitemize \item The \ETEX\ functionality is always present and enabled (but see below about \TEXXET), so the prepended asterisk or \type{-etex} switch for \INITEX\ is not needed. \item \TEXXET\ is not present, so the primitives \starttyping \TeXXeTstate \beginR \beginL \endR \endL \stoptyping are missing. \item Some of the tracing information that is output by \ETEX's \tex{tracingassigns} and \tex{tracingrestores} is not there. \item Register management in \LUATEX\ uses the \ALEPH\ model, so the maximum value is 65535 and the implementation uses a flat array instead of the mixed flat|\&|sparse model from \ETEX. \item \type{savinghyphcodes} is a no-op. See~\in{chapter}[languages] for details. \item When kpathsea is used to find files, \LUATEX\ uses the \type{ofm} file format to search for font metrics. In turn, this means that \LUATEX\ looks at the \type{OFMFONTS} configuration variable (like \OMEGA\ and \ALEPH) instead of \type{TFMFONTS} (like \TEX\ and \PDFTEX). Likewise for virtual fonts (\LUATEX\ uses the variable \type{OVFFONTS} instead of \type{VFFONTS}). \stopitemize \section{Changes from \PDFTEX\ 1.40} \startitemize \item The (experimental) support for snap nodes has been removed, because it is much more natural to build this functionality on top of node processing and attributes. The associated primitives that are now gone are: \tex{pdfsnaprefpoint}, \tex{pdfsnapy}, and \tex{pdfsnapycomp}. \item The (experimental) support for specialized spacing around nodes has also been removed. The associated primitives that are now gone are: \tex{pdfadjustinterwordglue}, \tex{pdfprependkern}, and \tex{pdfappendkern}, as well as the five supporting primitives \tex{knbscode}, \tex{stbscode}, \tex{shbscode}, \tex{knbccode}, and \tex{knaccode}. \item A number of \quote{utility functions} is removed: \startcolumns[n=3] \starttyping \pdfelapsedtime \pdfescapehex \pdfescapename \pdfescapestring \pdffiledump \pdffilemoddate \pdffilesize \pdflastmatch \pdfmatch \pdfmdfivesum \pdfresettimer \pdfshellescape \pdfstrcmp \pdfunescapehex \stoptyping \stopcolumns \item The four primitives that were already marked obsolete in \PDFTEX~1.40 have been removed since \LUATEX~0.42: \startcolumns[n=2] \starttyping \pdfoptionalwaysusepdfpagebox \pdfoptionpdfinclusionerrorlevel \pdfforcepagebox \pdfmovechars \stoptyping \stopcolumns \item A few other experimental primitives are also provided without the extra \luatex {pdf} prefix, so they can also be called like this: \startcolumns[n=3] \starttyping \primitive \ifprimitive \ifabsnum \ifabsdim \stoptyping \stopcolumns \item The \tex{pdftexversion} is set to 200. \item The PNG transparency fix from 1.40.6 is not applied (high-level support is pending) \item LFS (\PDF\ Files larger than 2GiB) support is not working yet. \item \LUATEX~0.45.0 introduces two extra token lists, \tex{pdfxformresources} and \tex{pdfxformattr}, as an alternative to \tex{pdfxform} keywords. \item As of \LUATEX~0.50.0 is no longer possible for fonts from embedded pdf files to be replaced by / merged with the document fonts of the enveloping pdf document. This regression may be temporary, depending on how the rewritten font backend will look after beta 0.60. \stopitemize \section{Changes from \ALEPH\ RC4} \startitemize \item Starting with \LUATEX\ 0.63.0, OCP processing is no longer supported at all. As a consequence, the following primitives have been removed: \startcolumns[n=2] \starttyping \ocp \externalocp \ocplist \pushocplist \popocplist \clearocplists \addbeforeocplist \addafterocplist \removebeforeocplist \removeafterocplist \ocptracelevel \stoptyping \stopcolumns \item \LUATEX\ only understands 4~of the 16~direction specifiers of \ALEPH: \type{TLT} (latin), \type{TRT} (arabic), \type{RTT} (cjk), \type{LTL} (mongolian). All other direction specifiers generate an error (\LUATEX\ 0.45). \item The input translations from \ALEPH\ are not implemented, the related primitives are not available: \startcolumns[n=2] \starttyping \DefaultInputMode \noDefaultInputMode \noInputMode \InputMode \DefaultOutputMode \noDefaultOutputMode \noOutputMode \OutputMode \DefaultInputTranslation \noDefaultInputTranslation \noInputTranslation \InputTranslation \DefaultOutputTranslation \noDefaultOutputTranslation \noOutputTranslation \OutputTranslation \stoptyping \stopcolumns \item The \tex{hoffset} bug when \tex{pagedir TRT} is fixed, removing the need for an explicit fix to \tex{hoffset} \item A bug causing \tex{fam} to fail for family numbers above 15 is fixed. \item A fair amount of other minor bugs are fixed as well, most of these related to \tex{tracingcommands} output. \item The internal function \type{scan_dir()} has been renamed to \type{scan_direction()} to prevent a naming clash, and it now allows an optional space after the direction is completely parsed. \item The \type{^^} notation can come in five and six item repetitions also, to insert characters that do not fit in the BMP. \item Glues {\it immediately after} direction change commands are not legal breakpoints. \stopitemize \section{Changes from standard \WEBC} \startitemize \item There is no mltex \item There is no enctex \item The following commandline switches are silently ignored, even in non|-|\LUA\ mode: \starttyping -8bit -translate-file=TCXNAME -mltex -enc -etex \stoptyping \item \tex{openout} whatsits are not written to the log file. \item Some of the so|-|called web2c extensions are hard to set up in non|-|\KPSE\ mode because texmf.cnf is not read: \type{shell-escape} is off (but that is not a problem because of \LUA's \lua{os.execute}), and the paranoia checks on \type{openin} and \type{openout} do not happen (however, it is easy for a \LUA\ script to do this itself by overloading \lua{io.open}). \item The \quote{E} option does not do anything useful. \stopitemize \chapter{Implementation notes} \section{Primitives overlap} The primitives \starttabulate[|l|l|] \NC \tex{pdfpagewidth} \NC \tex{pagewidth} \NC \NR \NC \tex{pdfpageheight}\NC \tex{pageheight} \NC \NR \NC \tex{fontcharwd} \NC \tex{charwd} \NC \NR \NC \tex{fontcharht} \NC \tex{charht} \NC \NR \NC \tex{fontchardp} \NC \tex{chardp} \NC \NR \NC \tex{fontcharic} \NC \tex{charit} \NC \NR \stoptabulate are all aliases of each other. \section{Memory allocation} The single internal memory heap that traditional \TEX\ used for tokens and nodes is split into two separate arrays. Each of these will grow dynamically when needed. The \type{texmf.cnf} settings related to main memory are no longer used (these are: \type{main_memory}, \type{mem_bot}, \type{extra_mem_top} and \type{extra_mem_bot}). \quote{Out of main memory} errors can still occur, but the limiting factor is now the amount of RAM in your system, not a predefined limit. Also, the memory (de)allocation routines for nodes are completely rewritten. The relevant code now lives in the C file \type{texnode.c}, and basically uses a dozen or so \quote{avail} lists instead of a doubly|-|linked model. An extra function layer is added so that the code can ask for nodes by type instead of directly requisitioning a certain amount of memory words. Because of the split into two arrays and the resulting differences in the data structures, some of the macros have been duplicated. For instance, there are now \type{vlink} and \type{vinfo} as well as \type{token_link} and \type{token_info}. All access to the variable memory array is now hidden behind a macro called \type{vmem}. The implementation of the growth of two arrays (via reallocation) introduces a potential pitfall: the memory arrays should never be used as the left hand side of a statement that can modify the array in question. The input line buffer and pool size are now also reallocated when needed, and the \type{texmf.cnf} settings \type{buf_size} and \type{pool_size} are silently ignored. \section{Sparse arrays} The \tex{mathcode}, \tex{delcode}, \tex{catcode}, \tex{sfcode}, \tex{lccode} and \tex{uccode} tables are now sparse arrays that are implemented in~C. They are no longer part of the \TEX\ \quote{equivalence table} and because each had 1.1 million entries with a few memory words each, this makes a major difference in memory usage. The \tex{catcode}, \tex{sfcode}, \tex{lccode} and \tex{uccode} assignments do not yet show up when using the etex tracing routines \tex{tracingassigns} and \tex{tracingrestores} (code simply not written yet). A side|-|effect of the current implementation is that \tex{global} is now more expensive in terms of processing than non|-|global assignments. See \type{mathcodes.c} and \type{textcodes.c} if you are interested in the details. Also, the glyph ids within a font are now managed by means of a sparse array and glyph ids can go up to index $2^{21}-1$. \section{Simple single-character csnames} Single|-|character commands are no longer treated specially in the internals, they are stored in the hash just like the multiletter csnames. The code that displays control sequences explicitly checks if the length is one when it has to decide whether or not to add a trailing space. Active characters are internally implemented as a special type of multi-letter control sequences that uses a prefix that is otherwise impossible to obtain. \section{Compressed format} The format is passed through zlib, allowing it to shrink to roughly half of the size it would have had in uncompressed form. This takes a bit more CPU cycles but much less disk I/O, so it should still be faster. \section{Binary file reading} All of the internal code is changed in such a way that if one of the \type{read_xxx_file} callbacks is not set, then the file is read by a C function using basically the same convention as the callback: a single read into a buffer big enough to hold the entire file contents. While this uses more memory than the previous code (that mostly used \type{getc} calls), it can be quite a bit faster (depending on your I/O subsystem). \chapter{Known bugs and limitations, TODO} There used to be a lists of bugs and planned features below here, but that did not work out too well. There are lists of open bugs and feature requests in the tracker at \hyphenatedurl{http://tracker.luatex.org}. \stoptext