\documentclass[pagesize=auto]{scrartcl} \usepackage{fixltx2e} \usepackage{xspace} \usepackage{lmodern} \usepackage{mflogo} \usepackage[T1]{fontenc} \usepackage{textcomp} \usepackage{array} \usepackage{booktabs} \usepackage{microtype} \usepackage{hyperref} \newcommand*{\ArabTeX}{Arab\kern-0.12em\TeX\@\xspace} \newcommand*{\meta}[1]{\textlangle\textsl{#1}\textrangle} \newcommand*{\symb}[1]{\textsf{\textlangle#1\textrangle}} \pdfstringdefDisableCommands{% \def\ArabTeX{ArabTeX\xspace}% \def\meta#1{<#1>}% \def\symb#1{<#1>}% } \newenvironment{codetable}[1]{% \catcode`\"=12 \catcode`\'=12 \catcode`\`=12 \catcode`\_=12 \catcode`\^=12 \catcode`\~=12 \par \nopagebreak \medskip \noindent \quitvmode \tabular{@{}*{\numexpr#1-1\relax}{I@{\qquad}}I@{}}% }{% \endtabular \par \medskip } \newcolumntype{I}{>{\ttfamily}r@{\hspace{0.6em}}l} \title{The \ArabTeX package} \author{Klaus Lagally} \date{20.11.1993} \begin{document} \maketitle \tableofcontents \section{\ArabTeX Version~3 (20.11.1993)} The introduction below is slightly out of date but may be used as a first start. \section{\ArabTeX Version~2 (05.11.1992)} \subsection{What is \ArabTeX?} \ArabTeX is a package extending the capabilities of \TeX/\LaTeX\ to generate the arabic writing from an ASCII transliteration for texts in several languages using the arabic script. It consists of a \TeX\ macro package and an arabic font in several sizes, presently only available in the Naskhi style. \ArabTeX will run with Plain \TeX\ and also with \LaTeX; other additions to \TeX\ have not been tried. \ArabTeX is primarily intended for generating the arabic writing, but the scientific transcription can be also easily generated. For other languages using the arabic script limited support is available. \subsection{Installing \ArabTeX:} The installation procedure is system dependent. You have to install the ``\textsf{nash14}'' font with its ``\texttt{*.pk}'' and ``\texttt{*.tfm}'' files on the font search path of your \TeX\ system, and the ``\texttt{*.sty}'' files and ``\texttt{arabtex.tex}'' on the source search path of your system. Possibly you will have to rename the ``\texttt{*.pk}'' files according to local conventions, and as a last resort you can try to recreate the font from the ``\texttt{*.mf}'' \MF\ sources. Additional fonts if available are installed analogously. \subsection{Activating ArabTeX:} With Plain \TeX, load the ArabTeX macros by ``\verb+\input arabtex+''.\@ With \LaTeX, include the option ``\texttt{arabtex}'' in the document header. In both cases several additional files will be loaded automatically. \ArabTeX defines several additional commands as indicated below, and also a large number of internal commands which could lead to storage overflow in a small TeX implementation. All internal commands contain an ``at'' sign \texttt{<@>} in their names and thus should not interfere with any user defined commands (but possibly with \TeX\ extensions we do not know about). With Plain \TeX, the arabic font is only available at the normal 14~point size which ought to cooperate well with the ``\textsf{cm}'' fonts at 10~points. For other sizes, change the ``\verb+\magnification+'' or define additional font identifiers yourself. To change the default, inspect ``\texttt{arabtex.tex}'' and redefine the ``\verb+\pnash+'' command accordingly. With \LaTeX, the size changing commands will also operate on the arabic font. \subsection{Input to ArabTeX:} After activating \ArabTeX, your modified \TeX/\LaTeX\ system will recognize the following items: % \begin{itemize} \item normal \TeX/\LaTeX\ text and commands, \item short arabic quotations bracketed by \verb+<+ and \verb+>+; these must fit on one line of output, and you have to select one of the Arabic writing styles, e.g \verb+\setarab+, before using this feature. A quotation may also be started with \verb+\<+. \item longer arabic texts bracketed by \verb+\begin{arabtext}+ and \verb+\end{arabtext}+, called ``Arabic Environments'' in the sequel. \end{itemize} An Arabic Environment consists of one or several paragraphs separated by blank lines or \verb+\par+ commands. Every paragraph and every arabic quotation is a sequence of the following kinds of items, separated by blank spaces or newlines: % \begin{itemize} \item isolated (legal) special characters, interpreted as the corresponding arabic special character; \item ``numbers'', character sequences starting with a digit. A ``number'' will be translated in the normal writing sequence from left to right even if it contains letters and/or special characters; \item ``arabic quotes'', coded as two left quotes or two right quotes each; \item ``words'', character sequences starting with a letter or special character followed by a letter. The (coded) characters of a word will in the output be arranged from right to left. \item \TeX/\LaTeX\ control sequences WITHOUT parameters. These will be executed immediately. \item \ArabTeX control sequences with or without parameters. These will be executed immediately. \item a sequence of items enclosed in curly braces \verb+{+ and \verb+}+. The output from the constituents will be arranged from right to left and must fit on one output line. As far as \TeX\ is concerned, this is NO group. This feature may not be nested. \end{itemize} Output from all items will be arranged from right to left, lines will be broken as necessary. Inside an Arabic Environment, but NOT in an arabic quotation, you may also have: \begin{itemize} \item short mathematical insertions, bracketed by SINGLE \verb+$+ signs. They must fit on one output line and are processed as usual; \item short non-arabic text quotations, bracketed by \verb+<+ and \verb+>+. These must fit on one output line and introduce a new level of grouping, so if they contain any \TeX/\LaTeX\ assignments the effects of these will be local. \end{itemize} Control sequences in an Arabic Environment may be of the following kinds: \begin{itemize} \item \ArabTeX option changing commands. These may also be used outside an Arabic Environment and generally have a global effect; \item \verb+\\+ for a new line; \item \verb+\par+ (or a blank line) for a new paragraph, \verb+\noindent+ for a new paragraph without indentation (NOT in arabic quotations); \item \verb+\emphasize+ \meta{item} will put a bar over the next \meta{item}; \item \verb+\setnash+, \verb+\setnashbf+, \verb+\setnastaliq+ font selection commands, see below; \item size changing \LaTeX\ commands like \verb+\large+ etc., only if you use \LaTeX; \item most other \TeX/\LaTeX\ commands make no sense in an Arabic Environment. \item you MUST NOT nest another \LaTeX\ environment inside an Arabic Environment (except possibly display math which we did not test, and might work); \item if you really need to use a control sequence with parameters, define a new \TeX\ macro or enclose the whole construct in curly braces \verb+{+ and \verb+}+. \end{itemize} \subsection{Font selection:} For space economy, only the \textsf{Naskh} font is available by default. With \LaTeX, additional fonts can be loaded by the document style options ``\texttt{nashbf}'' and/or ``\texttt{nastaliq}'' (when available). Users of Plain \TeX\ can load and define suitable fonts themselves. The following font selection commands are available: \begin{itemize} \item \verb+\setnash+ (default) selects the \textsf{Naskh} font. \item \verb+\setnashbf+ selects a bold-face version of \textsf{Naskh}. \item \verb+\setnastaliq+ selects the \textsf{Nastaliq} font. \end{itemize} If a font is not available or has not been loaded, the corresponding command will select the default font. With \LaTeX, the size changing commands will also operate on the additional fonts. \subsection{Input coding:} The ASCII input notation for arabic text is modelled closely after the transliteration standards ISO/R\,233 and DIN\,31\,635. As these standards do not guarantee unique re-transliteration and are also not ASCII compatible, some modifications were necessary. These follow the general rules: % \begin{itemize} \item if the transliteration uses a single letter, code that letter; \item if the transliteration uses a letter with a diacritical mark, put a special character similar to the diacritical mark BEFORE the letter. \end{itemize} \subsubsection{Additional characters generally available:} \begin{codetable}{6} b & bah & d & dal & .s & ssad & f & fah & h & hah & ' & hamza \\ t & tah & _d & dhal & .d & ddad & q & qaf & w & waw & N & tanween \\ _t & thah & r & rah & .t & ttah & k & kaf & y & yah & Y & alif maqsoura \\ ^g & geem & z & zay & .z & tthah & l & lam & g & gaf & _A & alif maqsoura \\ .h & hhah & s & seen & ` & `ain & m & meem & p & pah & T & tah marbouta \\ _h & khah & ^s & sheen & .g & ghain & n & noon & v & vah & W & waw (see below) \end{codetable} \subsubsection{Standard arabic and persian characters:} \begin{codetable}{1} c & hhah with hamza \\ ^c & gim with three dots (below) \\ ,c & khah with three dots (above) \\ ^z & zay with three dots (above) \\ ~n & kaf with three dots (Ottoman) \\ ~l & law with a bow accent (Kurdish) \\ ~r & rah with two bows (Kurdish) \end{codetable} See also ``Urdu'' and ``Pashto'' below. \subsubsection{Additional coding rules:} \begin{itemize} \item For long vowels, use capital letters \verb+A+, \verb+I+, \verb+U+, or \verb+_a+, \verb+_i+, \verb+_u+. \item As the transliteration is ambiguous, use \verb+T+ for \symb{tah marbouta}, \verb+N+ for \symb{tanween}, \verb+Y+ or \verb+_A+ for \symb{alif maqsoura}. \item Short vowels \verb+a+, \verb+i+, \verb+u+ need not generally be written except in the following cases: \begin{itemize} \item at the beginning of a word where they generate \symb{alif}, \item adjacent to \symb{hamza} where they will influence the carrier, \item when the transcription is wanted, \item in \verb+\fullvocalize+ mode. \end{itemize} \item \symb{hamza} is denoted by a single RIGHT quote; its carrier will be determined from the context according to the rules for writing arabic words. If that is not wanted, ``quote'' it (see below). \item \symb{`ain} is a single LEFT quote, don't confuse it with \symb{hamza}! \item \symb{madda} is generated by a right quote (\symb{hamza}) before \verb+A+: \verb+'A+. \item The ``invisible letter'' \verb+|+ may be inserted in order to break unwanted ligatures and to influence the \symb{hamza} writing. It will not show in the arabic output or in the transcription. \item \symb{tashdid} is indicated by doubling the appropriate letter. \item The article is always written \verb+al-+ (with hyphen!). \item Hyphens \verb+-+ may be used freely, and generally do not change the writing, but will show up in the transcription. At the beginning and the end of a word they enforce the use of the connection form of the adjacent letter (if it exists), like e.g. in the date \verb+1400 h-+. \item A double hyphen \verb+--+ between two otherwise joining letters will break any ligature and will insert a horizontal stroke (\symb{tatweel}, \symb{kashida}) without appearing in the transcription. It may be used repeatedly. \end{itemize} \subsection{Quoting:} A double quote \verb+"+ will modify the meaning of the following character as follows: % \begin{itemize} \item if a short vowel follows, the appropriate diacritical mark \symb{fatha}, \symb{kasra}, \symb{damma} will be put on the preceding character even if the vocalization is off otherwise. If \verb+N+ follows the short vowel, the appropriate form of \symb{tanween} will be generated instead. At the beginning of a word, \symb{alif} is assumed as the first character. If the previous word ended with a vowel, \symb{wasla} is generated instead of the vowel indicator. \item if the following character is a single right quote, a \symb{hamza} mark will be put on the preceding character even if in conflict with the \symb{hamza} rules. \item if the following character is the ``invisible letter'' \verb+|+, the connection between the adjacent letters will be broken and a small space inserted. \item otherwise: a \symb{sukun} will be put on the preceding character. The following character will be processed again. \end{itemize} % The double quote will not show up in the transcription. \subsection{Ligatures:} There is no way to explicitly indicate ligatures as a large number of them are generated automatically. Any unwanted ligature can be suppressed by interposing the invisible letter \verb+|+ between the two letters otherwise combined into a ligature. After ``\verb+\ligsfalse+'' ligatures in the middle of a word will not normally be produced; for some texts this looks better. You can return to the normal strategy by ``\verb+\ligstrue+''. \subsection{Vocalization:} There are three modes of rendering short vowels: % \begin{labeling}{\texttt{\string\fullvocalize}:} \item[\texttt{\string\fullvocalize}:] \begin{itemize} \item every short vowel will generate the corresponding diacritic mark \symb{fatha}, \symb{kasra}, \symb{damma}. \item If \verb+N+ follows a short vowel, the corresponding form of \symb{tanween} is generated instead. \item \verb+_a+ will produce a \symb{qur'an alif} accent instead of an explicit \symb{alif} character which is coded \verb+A+. \item if a long vowel follows a consonant, the corresponding short vowel is implied. The long vowel itself carries no diacritical mark. \item if no vowel is given after a consonant, \symb{sukun} will be generated except if a double ``sun letter'' follows \symb{lam}. \item \symb{alif} at the beginning of a word carries \symb{wasla} instead of the vowel indicator if the preceding word ended with a vowel. \end{itemize} \item[\texttt{\string\vocalize}:] as above, but \symb{sukun} and \symb{wasla} will not be generated except if explicitly indicated by ``quoting''. \item[\texttt{\string\novocalize}:] no diacritics will be generated except if explicitly asked for by ``quoting''. \end{labeling} % In all modes, a double consonant will generate \symb{tashdid}, and \verb+'A+ always generates \symb{madda} on \symb{alif}. After \verb+aN+ the silent \symb{alif} character is generated if necessary. The silent \symb{alif} may also be explicitly indicated by \verb+aNa+ or \verb+aNA+, or coded literally as \verb+A+ in \verb+\novocalize+ mode. If a silent \symb{alif maqsoura} is wanted instead, write \verb+aNY+, \verb+aN_A+, \verb+Y+ or \verb+_A+. A silent \symb{alif} after \symb{waw} is indicated by \verb+Ua+, \verb+UA+ or \verb+Wa+, \verb+WA+ (with a capital \verb+W+!). \subsection{Transcription:} In addition to the arabic writing, the standard scientific transcription may also be obtained from a fully vocalized input text. This is indicated by ``\verb+\transtrue+'' and may be switched off again by ``\verb+\transfalse+''. If ONLY the transcription is wanted, you can deactivate the arabic writing by ``\verb+\arabfalse+''; it can be reactivated by ``\verb+\arabtrue+''. If both modes are active their output will be interleaved line by line. The transcription mode assumes that the input text is in the Arabic language and has been coded according to the rules given above. For words from other languages the transcription might be in error. For Arabic text, the following special cases are handled: % \begin{itemize} \item after the article, a double consonant will be assimilated; \item an initial vowel will be omitted if the preceding word ended with a vowel. If that is not wanted start with \symb{hamza}. \item a silent \symb{alif} or \symb{alif maqsoura} after \verb+N+ (\symb{tanwin}) and \verb+U+ is omitted in the transcription. The same happens after \symb{waw} if it is written \verb+W+. \end{itemize} For space economy, the transcription module is NOT loaded by default. If you want to use it, add the style option ``\texttt{atrans}'' with \LaTeX; and with Plain \TeX, say ``\verb+\input atrans.sty+''. \subsection{Support for other languages:} \ArabTeX is primarily intended for writing texts in classical and modern Arabic, but it also provides limited support for several other languages that are customarily written in the arabic alphabet. The vocalization and the transcription cannot generally be expected to be correct, but might work by accident. In order to switch to the conventions for one of these languages, say ``\verb+\setfarsi+'', ``\verb+\seturdu+'', ``\verb+\setpashto+'', ``\verb+\setmaghribi+''; ``\verb+\setarab+'' is the default and can also be used to switch back to the arabic conventions. \subsubsection{Farsi, Dari:} All characters needed for writing Farsi are available by default. The short vowels \verb+e+ and \verb+o+ are mapped to \symb{i} and \symb{u}, the long vowels \verb+E+ and \verb+O+ to \symb{I} and \symb{U}. The \symb{izafet} connection may be written literally, which may look awkward in the case of \verb+h"'+, or always as \verb+-i+ (with hyphen); then the correct spelling will be determined from the context. Likewise the \symb{yah-i-wahdat} can always be written \verb+-I+. The final \symb{yah} carries no dots. Farsi uses the Nasta`liq font if available. \subsubsection{Ottoman:} see Farsi. \subsubsection{Kurdish:} see Farsi. \subsubsection{Urdu:} The additional characters in Urdu are coded as follows: % \begin{codetable}{1} h & always denotes the ``two-eyed \symb{hah}'' \\ ,h & the ``wavy'' \symb{hah} letter \\ ,t & \symb{tah} with a small \symb{ttah} accent \\ ,d & \symb{dal} with a small \symb{ttah} accent \\ ,r & \symb{rah} with a small \symb{ttah} accent \\ .n & \symb{noon} without a dot (modifies a preceding vowel) \\ E & \symb{yah bari} in the final position, otherwise mapped to \symb{yah} \\ O & mapped to \symb{U} \end{codetable} The short vowels \verb+e+ and \verb+o+ are mapped to \symb{a} and \symb{u}. \emph{Note:} Some of the given codings also occur in Pashto but with a different meaning, see below. Urdu uses the Nasta`liq font if available. \subsubsection{Pashto:} The additional characters for Pashto are coded as follows: % \begin{codetable}{1} ,t & \symb{tah} with a small loop \\ ,d & \symb{dal} with a small loop \\ ,r & \symb{rah} with a small loop \\ .n & \symb{noon} with a small loop \\ g & \symb{gaf} is written with a small loop instead of a bar \\ ,z & \symb{rah} with one dot above and one below \\ ,s & \symb{sin} with one dot above and one below \\ e & like \symb{a}, with a \symb{zwarakay} mark if vocalized \\ e'i & \symb{yah} with \symb{hamza} \\ E & \symb{yah} with two dots below aligned vertically \\ Ey & \symb{yah} written with a final stroke \\ o & mapped to \symb{u} \\ O & mapped to \symb{U} \\ w"' & \symb{hamza} on \symb{waw} \\ h"' & \symb{hamza} on \symb{hah} \end{codetable} The \symb{qur'an alif} accent is not available for Pashto. The rules for \symb{izafet} and \symb{yah-i-wahdat} apply. Note: Some of the given codings also occur in Urdu but with a different meaning, see above. For writing some words in the Urdu style, write the command \verb+\seturdu+ and afterwards switch back to the Pashto conventions by \verb+\setpashto+. \subsubsection{Maghribi:} This is just a different writing convention. \symb{fah} is written with one dot below the letter, \symb{qaf} with one dot above the normal letter form of \symb{fah}. The three dots of \symb{vah} are put below the letter. Otherwise like Arabic. \subsection{Miscellaneous features:} As \ArabTeX is slow, it will produce some terminal output while running to indicate it is still alive. If that is not wanted, say ``\verb+\quiet+'' or ``\verb*+\tracingarab = 0 +'' (outside an Arabic Environment). ``\verb*+\tracingarab = 1 +'' will report arabic paragraphs, a value of 2: arabic lines and insertions, a value of 3 or more: individual items. Whether \symb{yah} in the final position carries dots or not depends on the chosen language convention. You can override this by ``\verb+\yahdots+'' and ``\verb+\yahnodots+''. To reproduce erroneous or archaic texts exactly as they are, the following additional codings are available: % \begin{codetable}{1} .k & \symb{kaf} in final position without a diacritical mark \\ .f & \symb{fah} without a dot \\ .b & \symb{bah} without a dot \\ .n & \symb{noon} without a dot (not available for Pashto) \\ Y & \symb{alif maqsoura}, \symb{yah} without dots in all positions. \\ \end{codetable} \subsection{How to move from Version 1 to Version 2} Version 2 is not fully compatible with Version 1; however, moving to the new version should cause little problems, and is recommended as version 1 is no longer supported. Apart from some extensions, most changes were introduced in order to better conform to the transliteration standards, and to have less compatibility problems with \TeX\ and \LaTeX. Further versions are expected to be upward compatible if no grave bugs turn up. The main differences between versions 1 and 2 are: % \begin{itemize} \item The font size has increased, so the document layout may change. The old font can no more be used. \item Some arabic characters are now coded differently: \symb{`ain} is denoted by a left quote, and \verb+c+, \verb+^z+, \verb+^t+, and \verb+.n+ denote different characters from what they did before. This was changed in order to better conform to the standard transliteration. \item There are a lot more ligatures than before. This normally need not concern the user. \item \verb+\vocalize+ will no more generate \symb{sukun} and \symb{wasla} except if explicitly indicated by quoting. See \verb+\fullvocalize+. \item Arabic Environments are always bracketed by the new control sequences\\ \verb+\begin{arabtext}+ and \verb+\end{arabtext}+ even if only the transcription is wanted. \item Short arabic quotations are now bracketed by \verb+\<+ and \verb+>+ so \verb+<+ has its standard \TeX\ meaning. \end{itemize} We recommend converting existent input files to the new notation. If that is impractical in special cases, the \LaTeX\ option ``\texttt{oldarabtex}'' and/or the command ``\verb+\oldarabtex+'' will switch back to most of the old conventions (and problems). This shortcut will probably go away in some future version. \subsection{Acknowledgments:} The development of \ArabTeX would not have been possible without the assistance of many people. Apart from my local team, helpful advice came among others from Wolfdietrich Fischer, Ahmed El-Hadi, Abdelsalam Heddaya, Iqbal Khan, Tom Koornwinder, Eberhard Krueger, Asif Lakehsar, Jan Lodder, Richard Lorch, Eberhard Mattes, and Bernd Raichle. I also have to thank the many people who sent bug reports and comments. \subsection{Please send bug reports, suggestions and inquiries to the author:} \noindent Prof. Klaus Lagally\\ Institut fuer Informatik\\ Universitaet Stuttgart\\ Breitwiesenstrasse 20--22\\ D-70565 Stuttgart \\ GERMANY \medskip \noindent \texttt{lagally@informatik.uni-stuttgart.de} \bigskip \noindent \textbf{Copyright \textcopyright\ 1990--1993, Klaus Lagally} \end{document}