\documentclass[a4paper]{article}

\usepackage{array}

\makeatletter
\@ifundefined{l@nohyphenation}{\newlanguage\l@nohyphenation}{}
\DeclareRobustCommand\meta[1]{%
   \ensuremath{\langle}%
   \sbox{\z@}{%
      \setlanguage\l@nohyphenation
      \normalfont\itshape #1\/%
      \setlanguage\language
   }%
   \unhbox\z@
   \ensuremath{\rangle}%
}
\makeatother
\DeclareRobustCommand\marg[1]{%
   \texttt{\char`\{}\meta{#1}\texttt{\char`\}}%
}
\DeclareRobustCommand\cs[1]{\texttt{\char`\\#1}}

\makeatletter
\DeclareTextFontCommand\textsmaller{%
   \fontsize{\scaledsize{\f@size}}{\f@baselineskip}\selectfont
}
\newcommand\scaledsize[1]{%
   \ifdim #1\p@>6\p@
      \ifdim #1\p@>7\p@
         \ifdim #1\p@>8\p@
            \ifdim #1\p@>9\p@
               \ifdim #1\p@>10\p@
                  \ifdim #1\p@>11\p@
                     \ifdim #1\p@>12\p@
                        \ifdim #1\p@>14\p@ 14%
                        \else 12\fi
                     \else 11\fi
                  \else 10\fi
               \else 9\fi
            \else 8\fi
         \else 7\fi
      \else 6\fi
   \else 5\fi
}
\makeatother
\DeclareRobustCommand\ETX{\textsmaller{ETX}}
\DeclareRobustCommand\PDF{\textsmaller{PDF}}

% From tugboat.cls
\def\thinskip{\hskip 0.16667em\relax}
\def\endash{--}
\def\emdash{\endash-}
\makeatletter
\def\d@sh#1#2{\unskip#1\thinskip#2\thinskip\ignorespaces}
\def\dash{\d@sh\nobreak\endash}
\def\Dash{\d@sh\nobreak\emdash}
\def\ldash{\d@sh\empty{\hbox{\endash}\nobreak}}
\def\rdash{\d@sh\nobreak\endash}
\def\Ldash{\d@sh\empty{\hbox{\emdash}\nobreak}}
\def\Rdash{\d@sh\nobreak\emdash}
\newcommand{\La}%
   {L\kern-.36em
        {\setbox0\hbox{T}%
         \vbox to\ht0{\hbox{$\m@th$%
                            \csname S@\f@size\endcsname
                            \fontsize\sf@size\z@
                            \math@fontsfalse\selectfont
                            A}%
                      \vss}%
        }}
\IfFileExists{mflogo.sty}%
  {\RequirePackage{mflogo}}%
 {\TBWarning
     {Package mflogo.sty not available --\MessageBreak
       Proceeding to emulate mflogo.sty}
   \DeclareRobustCommand\logofamily{%
     \not@math@alphabet\logofamily\relax
     \fontencoding{U}\fontfamily{logo}\selectfont}
   \DeclareTextFontCommand{\textlogo}{\logofamily}
   \def\MF{\textlogo{META}\-\textlogo{FONT}\@}
   \def\MP{\textlogo{META}\-\textlogo{POST}\@}
   \DeclareFontFamily{U}{logo}{}
   \DeclareFontShape{U}{logo}{m}{n}{%
     <8><9>gen*logo%
     <10><10.95><12><14.4><17.28><20.74><24.88>logo10%
   }{}
   \DeclareFontShape{U}{logo}{m}{sl}{%
     <8><9>gen*logosl%
     <10><10.95><12><14.4><17.28><20.74><24.88>logosl10%
   }{}
   \DeclareFontShape{U}{logo}{m}{it}{%
     <->ssub*logo/m/sl%
   }{}%
  }
\makeatother
\def\AllTeX{(\La\kern-.075em)\kern-.075em\TeX}

\usepackage{shortvrb}
\MakeShortVerb{\|}
\DeclareRobustCommand\cs[1]{\texttt{\char`\\#1}}

\newcommand{\TeXOmega}{Omega}
\DeclareRobustCommand\eTeX{\ensuremath{\varepsilon}-\kern-.125em\TeX}
\DeclareRobustCommand\package[1]{\textsf{#1}}

\providecommand*{\href}[2]{#2}
\newcommand*{\ctanref}[2]{\href{ftp://ftp.ctan.org/#1}{#2}}

\title{Writing \ETX\ format font encoding specifications}
\author{Lars Hellstr\"om}
\date{2003/07/09}

\begin{document}
\maketitle

\begin{abstract}
  This paper explains how one writes formal specifications of font 
  encodings for \LaTeX\ and suggests a ratification procedure for such 
  specifications.
\end{abstract}


\tableofcontents

\vspace{0mm plus 35mm}
\pagebreak[2]


\section{Introduction}

One of the many difficult problems any creator of a new typesetting 
system encounters is that of \emph{font construction}\Dash to create 
fonts that provide all information that the typesetting system needs 
to do its job. From the early history of \TeX, we learn that this 
problem is so significant that it motivated the creation of \TeX's 
companion and equal \MF, whose implementation proved to be an even 
greater scientific challenge than \TeX\ was. It is also a tell-tale 
sign that the \texttt{fonts} subtree of the te\TeX\ distribution is 
about three times as large as the \texttt{tex} subtree: fonts are 
important, and not at all trivial to generate.

The most respected and celebrated part of font construction is 
\emph{font design}\Ldash the creation from practically nothing of new 
letter (and symbol) shapes, in pursuit of an artistic 
vision\Dash but it is also something very few people have the time 
and skill to carry through. More common is the task of \emph{font 
installation}, where one has to solve the very concrete problem of how 
to set up an existing font so that it can be used with \AllTeX. The 
subproblems in this domain ranges from the very technical\Ldash how 
to make different pieces of software ``talk'' to each other, for 
example making information in file format~$A$ available to 
program~$B$\Dash to the almost artistic\Ldash finding values for 
glyph metrics and kerns that will make them look good in text\Dash 
but these extremes tend to be clearly defined even if solving them 
can be hard, so they are not what will be considered here. Rather, 
this paper is about a class of more subtle problems that have to do 
with how a font is organised.

The technical name for such a ``font organisation'' is a \emph{font 
encoding}. In some contexts, font encodings are assumed to be mere 
mappings from a set of ``slots'' to a set of glyph identifiers, but 
in \TeX\ the concept entails much more; the various aspects are 
detailed in subsequent sections. For the moment, it is sufficient to 
observe that the role that a font encoding plays in a typesetting 
system is that of a standard: it describes what an author can expect 
from a font, so that a document or macro package can be written that 
work with a large class of fonts rather than just for one font family. 
The world of \AllTeX\ would be very different if papers published in 
journal $X$ that is printed in commercial font $Y$ could not use 
essentially the same sources as the author prepared for typesetting 
in the free font $Z$. Fine-tuning of a document (overfull lines, bad 
page breaks, etc.\@) depends on the exact font used, but it is a great 
convenience that one can typeset a well-coded body of text under a 
rather wide range of layout parameter (of which the main font family 
is one) values and still expect the result to look decent, often even 
good. Had font encodings not been standardised, the results might not 
even have been readable.

When font encodings are viewed as standards, the historical states of 
most \AllTeX\ font encodings becomes rather embarrassing, as they 
lack something as fundamental as proper specifications! The typical 
origin of a font encoding has been that some\-one creates a font that 
behaves noticably different from other fonts, macro packages are then 
created to support this new font, and in time other people create other 
fonts that work with the same macros. At the end of this story the new 
encoding exists, but it is not clear who created it, and there is 
probably no document that describes all aspects of the encoding. Later 
contributors have typically had to rely on a combination of imitation 
of previous works, folklore, and reverse engineering of existing 
software when trying to figure out what they need to provide, but the 
results are not always verifiable. Furthermore the errors in this area 
are usually silent\Ldash the classical error being that a `\textdollar' 
was substituted for a `\textsterling' (or vice versa)\Dash which means 
they can only be discovered through careful proofreading, and then 
only \emph{provided} there at all exists a document which exercises 
all aspects of the font encoding. Since font encodings interact with 
hyphenation, exhaustive font verification through proofreading is 
probably beyond the capabilities of any living \TeX pert on purely 
linguistic grounds.

Proper specifications of font encodings makes the task of font 
installation\Ldash and to some extent also the task of font design, 
as it too is subject to the technicalities of font encodings\Dash 
much simpler, as there is then a document that authoratively gives 
all details of a font encoding. This paper even goes one step further, 
and proposes (i)~a standard format for formal specifications of 
\AllTeX\ font encodings and (ii)~a process through which such 
specifications can be ratified as \emph{the} specification of a 
particular encoding. My hope is that future \AllTeX\ font encodings 
will have proper specifications from the start, as this will greatly 
simplify making more fonts available in these encodings, and perhaps 
also make font designers aware of the subtler points of \AllTeX\ font 
design, as many details have been poorly documented.

The proposed file format for encoding specifications is a development 
of the \textsf{fontinst}~\cite{fontinst-pre} \ETX\ format. One reason 
for this choice was that it is an established format; many of those 
who are making fonts already use it, even if for a slightly 
different purpose. Another major reason is that an \ETX\ file is both 
a \LaTeX\ document and a processable data file; this is the same kind 
of bilinguality that has made the \texttt{.dtx} format so useful. 
Finally the \ETX\ format makes it easy to create experimental font 
installations when a new encoding is being designed; \textsf{fontinst} 
can directly read the file, but the file can also be automatically 
converted to a PostScript encoding vector if that approach seems more 
convenient.
On the other hand, there are some features\Ldash most notably the 
prominent role of the glyph names\Dash of the \ETX\ format that would 
probably had been done differently in a file format that was built 
from scratch, but this is necessary for several of the advantages 
listed above.


% \paragraph*{Why should one make formal specifications?}
% Because the informal specifications that we have today 
% are incomplete and hard to use. E.g.\ the \LaTeX\ 
% \meta{enc}\texttt{enc.def} files only say something about the 
% characters that are accessed via commands, and even for those you 
% really have to do reverse engineering to figure out what the 
% encoding contains. To figure out what the remaining characters 
% should do you have to compile what the various user manuals claim 
% to work and then work backwards from that, but I don't think the 
% general problem of which character tokens are allowed in input is 
% thoroughly treated anywhere. On top of that, \LaTeX\ itself 
% contributes some character tokens when the document is being 
% typeset.\footnote{This is basically the ``a \texttt{T}$*$ encoding 
% must contain the characters \ldots'' problem that was the reason 
% that the \texttt{T2} encoding had to be split up.}
% 
% On the other side of things there are the files which tell e.g.
% \textsf{fontinst} or \textsf{AFMtoTFM} what the target font 
% encoding is. These are basically recipes which are known (?\@) to 
% produce valid results, and they do usually provide more 
% information about the encoding than the sources listed above, but 
% they don't give much information about where the recipe can be 
% modified.
    
% \paragraph*{Why use the \ETX\ format?}


\section{Points to keep in mind}

\subsection{Characters, glyphs, and slots}

One fundamental difference one must understand is that between 
characters and glyphs. A \emph{character} is a semantic entity---it 
carries some meaning, even if you usually have to combine several 
characters to make up even one word---whereas a \emph{glyph} simply 
is a piece of graphics. In printed text, glyphs are used to represent 
characters and the first step of reading is to determine which 
character(s) a given glyph is representing.\footnote{Some \PDF\ viewers 
also try to accomplish this, but in general they need extra 
information to do it right. The generic solution provided is to embed 
a \emph{ToUnicode CMap}\Ldash which is precisely a map from slots to 
characters\Rdash in the \PDF\ font object.}

In the output, \TeX\ neither deals with characters nor glyphs, really 
(although many of its messages speak of characters), but with 
\emph{slots}, which essentially are numbered positions in a font. To 
\TeX, a slot is simply something which can have certain metric 
properties (width, height, depth, etc.\@) but to the driver which 
actually does the printing the slot also specifies a glyph. The same 
slot in two different fonts can correspond to two quite different 
characters.

For completeness it should also be mentioned that the \emph{input} of 
\TeX\ is a stream of semantic entities and thus \TeX\ is dealing with 
characters on that side, but the input is not the subject of this 
paper.


\subsection{Ligatures}

In typography, a \emph{ligature} is a glyph which has been formed by 
joining glyphs that represent two or more characters; this joining can 
involve quite a lot of deformation of the original shapes. Examples 
of ligatures are the `fi' ligature (from `f' and `i'), the `\AE' 
ligature (from `A' and `E'), and the `\textit{\&}' character (from `E' 
and `t'), the latter two of which has evolved to become characters of 
their own. For those ligatures (such as `fi') that have not evolved to 
characters, \TeX\ has a mechanism for forming the ligature out of the 
characters it is composed from, under the guidance of ligature\slash 
kerning programs found in the font.

More technically, what happens is really that if the |\char| (or 
equivalent) for one slot is immediately followed by the |\char| (or 
equivalent) for another (or the same) slot and there is a ligaturing 
instruction in the \texttt{\small LIGKERN} table of the current font 
which applies to this slot pair then this ligaturing instruction is 
executed. This usually replaces the two slots in the pair with a 
single new slot specified by the ligaturing instruction (it could 
also keep one or both of the original slots, but that is less common). 
\TeX\ has no idea about whether these replacements change the meaning 
of anything, but \TeX\ assumes that it doesn't, and it is up to the 
font designer to ensure that this is the case.

Apart from forming ligatures in text, the ligaturing mechanism of 
\TeX\ is traditionally also employed for another task which is much 
more problematic. Ligatures are also used to produce certain 
characters which are not part of visible ASCII---the most common are 
the endash (typed as |--|) and the emdash (typed as |---|). This is a 
problem because it violates \TeX's assumption that the meaning is 
unchanged; the classical problem with this appears in the \texttt{OT2} 
encoding, where the Unicode character \texttt{U+0446} 
(\textsc{cyrillic small letter tse}) could be typed as |ts|, whilst 
the |t| and |s| by themselves produced Unicode characters 
\texttt{U+0442} (\textsc{cyrillic small letter te}) and \texttt{U+0441} 
(\textsc{cyrillic small letter es}) respectively. \TeX's hyphenation 
mechanism can however decompose ligatures, so it sometimes happened 
that the \textsc{tse} was hyphenated as \textsc{te}-\textsc{es}, 
which is quite different from what was intended. Since this is such 
an obvious disadvantage, the use of ligatures for forming non-English 
letters quickly disappeared after 8-bit input encodings became 
available. The practice still remains in use for punctuation, however, 
and the font designer must be aware of this. For many font encodings 
there is a set of ligatures which must be present and replace two or 
more characters by a single, different character. These ligatures are 
called \emph{mandatory ligatures} in this paper.

The use of mandatory ligatures in new font encodings is strongly 
discouraged, for a number of reasons. The main problem is that they 
create unhealthy dependencies between input and output encoding, 
whereas these should ideally be totally independent. Using ligatures 
in this way complicates the internal representation of text, and it 
also makes it much harder to typeset text where those ligatures are not 
wanted (such as verbatim text). Furthermore it creates problems with 
kerning, since the ``ligature'' has not yet been formed when a kern 
to the left of it is inserted. Finally, a much better solution (when 
it is available) is to use an \TeXOmega\ translation process 
(see~\cite[Sec.~8--11]{Omega-doc}), since that \emph{is} independent 
of the font, different translations can be combined, and they can 
easily handle even ``abbreviations'' much more complicated than those 
ligatures can deal with.


\subsection{Output stages}

On its way out of \LaTeX\ towards the printed text, a character passes 
through a number of stages. The following five seem to cover what is 
relevant for the present discussion:
\begin{enumerate}
  \item \emph{\LaTeX\ Internal Character Representation} (LICR); 
    see~\cite{LaTeXCompanion}, Section~7.11 for a full description. 
    At this point the character is a character token (e.g.~|a|), 
    a text command (e.g.~|\ss|), or a combination (e.g.~|\H{o}|).
  \item \emph{Horizontal material;} this is what the character is 
    en route from \TeX's mouth to its stomach. For most characters 
    this is equivalent to a single |\char| command (e.g.\ |a| is 
    equivalent to |\char|\,|97|), but some require more than one, some 
    are combined using the |\accent| and |\char| commands, some 
    involve rules and\slash or kerns, and some are built using boxes 
    that arbitrarily combine the above elements.
  \item \emph{DVI commands;} this is the DVI file commands that 
    produce the printed representation of the character.
  \item \emph{Printed text;} this is the graphical representation of 
    the character, e.g. as ink on paper or as a pattern on a computer 
    screen. Here the text consists of glyphs.
  \item \emph{Interpreted text;} this is essentially printed text 
    modulo equivalence of interpretation, hence the text doesn't really 
    reach this stage until someone reads it. Here the text consists of 
    characters.
\end{enumerate}

In theory there is a universal mapping from LICR to interpreted text, 
but various technical restrictions make it impossible to simultaneously 
support the entire mapping. A \LaTeX\ encoding selects a restriction 
of this mapping to a limited set which will be ``well supported'' 
(meaning kerning and such between characters in the set works), whereas 
elements outside this set at best can be supported through temporary 
encoding changes. The encoding also specifies a decomposition of the 
mapping into one part which maps LICR to horizontal material and one 
part which maps horizontal material to interpreted text. The first 
part is realized by the text command definitions usually found in the 
\meta{enc}\texttt{enc.def} file for the encoding. The second part is 
the font encoding, the specification of which is the topic of this 
paper. It is also worth noticing that an actual font is a mapping of 
horizontal material to printed text.

An alternative decomposition of the mapping from LICR to interpreted 
text would be at the DVI command level, but even though this 
decomposition is realized in most \TeX\ implementations, it has very 
little relevance for the discussion of encodings. The main reason for 
this is that it depends not only on the encoding of a font, but 
also on its metrics. Furthermore it is worth noticing that in e.g.\ 
pdf\TeX\ there need not be a DVI command level.


\subsection{Hyphenation}

There are strong connections between font encoding and hyphenation because 
\TeX's hyphenation mechanism operates on horizontal material; more 
precisely the hyphenation mechanism only works on pieces of horizontal 
material that are equivalent to sequences of |\char| commands. This 
implies that hyphenation patterns, as selected via the |\language| 
parameter, are not only for a specific language, they are also for a 
specific font encoding.

The hyphenation mechanism uses the |\lccode| values to distinguish 
between three types of slots: lower case letters (|\lccode|\(\,n = 
n\)), upper case letters (|\lccode|\(\,n \notin \{0,n\}\)), and 
non-letters (|\lccode|\(\,n = 0\)); only the first two types can be 
part of a hyphenatable word and only lower case letters are needed 
in the hyphenation patters. This does however place severe 
restrictions on how letters can be placed in a text font because 
\TeX\ uses the same |\lccode| values for all text in a paragraph and 
therefore these values cannot be changed whenever the encoding changes. 
In \LaTeX\ the |\lccode| table is not allowed to change at all and 
consequently all text font encodings must work using the standard set 
of |\lccode| values.

In \eTeX\ each set of hyphenation patterns has its own set of 
|\lccode| values for hyphenation, so the problem isn't as severe 
there. The hyphenation mechanism of \TeXOmega\ should become 
completely independent of the font encoding, although the last time I 
checked it was still operating on material encoded according to 
a font encoding.


\subsection{Production and specification \ETX\ files}

Finally, it is worth pointing out the difference between an \ETX\ file 
created for the specification of a font encoding and one created for 
being used in actually producing fonts with this encoding. They are 
usually not the same. Specification \ETX s certainly may be of direct 
use in the production of fonts---especially experimental fonts 
produced as part of the work on a new encoding---but they are usually 
not ideal for the purpose. In particular there is often a need to 
switch between alternative names for a glyph to accommodate what is 
actually in the fonts, but such trickeries are undesirable 
complications in a specification. On the other hand a production 
\ETX\ file has little need for verbose comments, whereas they are rather 
an advantage in a specification \ETX\ file.

Therefore one shouldn't be surprised if there are two \ETX\ files for a 
specific encoding: one which is a specification version and one which 
is a production version. If both might need to be in the same 
directory then one should, as a rule of thumb, include a 
`\texttt{spec}' in the name of the specification version.


\section{Font encoding specifications}
\label{Sec:FontEncSpec}

\subsection{Basic principles}

Most features of the font encoding are categorized as either 
\emph{mandatory} or \emph{ordinary}. The mandatory features are what 
macros may rely on, whereas the ordinary simply are something which 
fonts with this encoding normally provide. Font designers may choose 
to provide other features than the ordinary, but are recommended to 
provide the ordinary features to the extent that the available 
resources permit.

Many internal references in the specification are in the form of 
\emph{glyph names} and the choice of these is a slightly tricky 
matter. From the point of formal specification, the choices can be 
completely arbitrary, but from the point of practical usefulness they 
most likely are not. One of the main advantages of the \ETX\ format 
for specifications is that such specifications can also be used to 
make experimental implementations, but this requires that the glyph 
names in the specification are the same as those used in the fonts 
from which the experimental implementation should be built. Yet 
another aspect is that the glyph names are best chosen to be the ones 
one can expect to find in actual fonts, as that will make things 
easier for other people that want to make non-experimental 
implementations later. For this last purpose, a good reference is 
Adobe's technical note on Unicode and glyph names~\cite{unicodesign}. 
For most common glyphs, \cite{unicodesign} ends up recommending that 
one should follow the Adobe glyph list~\cite{AGL}, which however has 
the peculiar trait of recommending names on the form 
\texttt{afii}\textit{ddddd} (rather than the Unicode-based alternative 
\texttt{uni}\textit{xxxx}) for most non-latin glyphs. This is somewhat 
put in perspective by~\cite{ATN5013}.


\subsection{Slot assignments}

The purpose of the slot assignments is to specify for each slot which 
character or characters it is mapped to. That one slot is mapped to 
many characters is an unfortunate, but not very uncommon, reality in 
many encodings, as limitations in font size have often encouraged 
identifications of two characters which are almost the same. It should 
be avoided in new encodings.

Slot assignmets are done using the |\nextslot| command and a 
|\setslot| \dots\ |\endsetslot| construction as follows:
\begin{quote}
  |\nextslot|\marg{slot number}\\*
  |\setslot|\marg{glyph name}\\*
  \mbox{\quad}\meta{slot commands}\\*
  |\endsetslot|
\end{quote}
A typical example of this is
\begin{quote}\begin{verbatim}
\nextslot{65}
\setslot{A}
  \Unicode{0041}{LATIN CAPITAL LETTER A}
\endsetslot
\end{verbatim}\end{quote}
which gets typeset as
\begin{quote}
  \textbf{Slot 65 `\texttt{A}'}\\*
  Unicode character \texttt{U+0041}, \textsc{latin capital letter a}.
\end{quote}

The |\nextslot| command does not typeset anything; it simply stores 
the slot number in a counter, for later use by |\setslot|. The 
|\endsetslot| command increments this counter by one. Hence the 
|\nextslot| command is unnecessary between |\setslot|s for consecutive 
slots. Besides |\nextslot|, there is also a command |\skipslots| which 
increments the slot number counter by a specified amount. The 
argument of both |\nextslot| and |\skipslots| can be arbitrary 
\package{fontinst} integer expressions (see~\cite{fontinst-man}). All 
\TeX\ \meta{number}s that survive full expansion are valid 
\package{fontinst} integer expressions, but for example |`\~| isn't, 
as |\~| is a macro which will break before the expression is typeset. 
These cases can however be fixed by preceding the \TeX\ \meta{number} 
by |\number|, as |\number`\~| survives full expansion by expanding to 
|126|.

The main duty of the \meta{slot commands} is to specify the target 
character (or characters) for this slot. The simplest way of doing 
this is to use the |\Unicode| command, which has the syntax
\begin{quote}
  |\Unicode|\marg{code point}\marg{name}
\end{quote}
The \meta{code point} is the number of the character (in hexadecimal 
notation, usually a four-digit number) and the \meta{name} is the name. 
Case is insignificant in these arguments. If a slot corresponds to a 
string of characters rather than to a single character, then one uses 
the |\charseq| command, which has the syntax
\begin{quote}
  |\charseq|\marg{\cs{Unicode} commands}
\end{quote}
e.g.
\begin{quote}\begin{verbatim}
\nextslot{30}
\setslot{ffi}
  \charseq{
    \Unicode{0066}{LATIN SMALL LETTER F}
    \Unicode{0066}{LATIN SMALL LETTER F}
    \Unicode{0069}{LATIN SMALL LETTER I}
  }
\endsetslot
\end{verbatim}\end{quote}
Several |\Unicode| commands not in the argument of a |\charseq| 
instead mean that each of the listed characters is a valid 
interpretation of the slot.

If a character cannot be specified in terms of Unicode code points then 
the specification should simply be a description in text which 
identifies the character. Such descriptions are written using the 
|\comment| command
\begin{quote}
  |\comment|\marg{text}
\end{quote}
It is worth noticing that the \meta{text} is technically only an 
argument of |\comment| when the program processing the \ETX\ file is 
ignoring |\comment| commands. This means |\verb| and similar 
catcode-changing commands \emph{can} be used in the \meta{text}. The 
|\par| command is on the other hand not allowed in the \meta{text}.

The |\comment| command should also be used for any further piece of 
explanation of or commentary to the character used for the slot, if the 
exposition seems to need it. There can be any number of |\comment| 
commands in the \meta{slot commands}.


\subsection{Ligatures}

There are three classes of ligatures in the font encoding 
specifications: mandatory, ordinary, and odd. Mandatory ligatures must 
be present in any font which complies with the encoding, whereas 
ordinary and odd ligatures need not be. No clear distinction can be 
made between ordinary and odd ligatures, but a non-mandatory ligature 
should be categorized as ordinary if it makes sense for the majority 
of users, and as odd otherwise. Hence the `fi' ligature is 
categorized as ordinary in the \texttt{T1} encoding (although it 
makes no sense in Turkish), whereas the `ij' ligature is odd.

In the \ETX\ format, a ligature is specified using one of the slot 
commands
\begin{quote}
  |\Ligature|\marg{ligtype}\marg{right}\marg{new}\\
  |\ligature|\marg{ligtype}\marg{right}\marg{new}\\
  |\oddligature|\marg{note}\marg{ligtype}\marg{right}\marg{new}
\end{quote}
|\Ligature| is used for mandatory ligatures, |\ligature| is used for 
ordinary ligatures, and |\oddligature| is used for odd ligatures. The 
\meta{right} and \meta{new} arguments are names of the glyphs being 
assigned to the slots involved in this ligature. The \meta{right} 
specifies the right part in the slot pair being affected by the 
ligature, whereas the left part is the one of the |\setslot| \dots\ 
|\endsetslot| construction in which the ligaturing command is placed. 
The \meta{new} specifies a new slot which will be inserted by the 
ligaturing instruction. The \meta{ligtype} is the actual ligaturing 
instruction that will be used; it must be |LIG|, |/LIG|, |/LIG>|, 
|LIG/|, |LIG/>|, |/LIG/|, |/LIG/>|, or |/LIG/>>|. The slashes specify 
retention of the left or right original character; the |>| signs 
specify passing over that many slots in the result without further 
ligature processing. \meta{note}, finally, is a piece of text which 
explains when the odd ligature may be appropriate. It is typeset as a 
footnote.

As an example of ligatures we find the following in the specification 
of the \texttt{T1} encoding:
\begin{quote}
  |\nextslot{33}|\\
  |\setslot{exclam}|\\
  |  \Unicode{0021}{EXCLAMATION MARK}|\\
  |  \Ligature{LIG}{quoteleft}{exclamdown}|\\
  |\endsetslot|
\end{quote}
It is typeset as
\begin{quote}
  \textbf{Slot 33 `\texttt{exclam}'}\\*
  Unicode character \texttt{U+0021}, \textsc{exclamation mark}.\\*
  \textbf{Mandatory ligature} 
  \texttt{exclam}${}*{}$\texttt{quoteleft}${}\rightarrow
  {}$\texttt{exclamdown}
\end{quote}
With other \meta{ligtype}s there may be more names listed on the 
right hand side and possibly a `$\lfloor$' symbol showing the 
position at which ligature processing will start afterwards.
  

\subsection{Math font specialities}

There are numerous technicalities which are special to math fonts, but 
only a few of them are exhibited in \ETX\ files.\footnote{For an 
overview of the subject, see for example Vieth~\cite{Vieth2001}.} Most 
of these have to do with the \TeX\ mechanisms that find sufficiently 
large characters for commands like |\left|, |\sqrt|, and |\widetilde|.

The first mechanism for this is that a character in a font can sort of 
say ``If I'm too small, then then try character \dots\ instead''. This 
is expressed in an \ETX\ file using the |\nextlarger| command, which 
has the syntax
\begin{quote}
  |\nextlarger|\marg{glyph name}
\end{quote}
The second mechanism constructs a sufficiently large character from 
smaller pieces; this is known as a `varchar' or `extensible character'. 
This is expressed in an \ETX\ file using an ``extensible recipe'', the 
syntax for which is
\begin{quote}
  |\varchar| \meta{varchar commands} |\endvarchar|
\end{quote}
where each \meta{varchar command} is one of
\begin{quote}
  |\varrep|\marg{glyph name}\\
  |\vartop|\marg{glyph name}\\
  |\varmid|\marg{glyph name}\\
  |\varbot|\marg{glyph name}
\end{quote}
There can be at most one of each and their order is irrelevant. The 
most important is the |\varrep| command, as that is the part which is 
repeated until the character is sufficiently large. The |\vartop|, 
|\varmid|, and |\varbot| commands are used to specify some other part 
which should be put at the top, middle, and bottom of the extensible 
character respectively. Not all extensible recipes use all of these, 
however.

As an example, here is how a very large left brace is constructed:
\begin{center}
  \begin{tabular}{>{%
    \fontencoding{OMX}\fontfamily{cmex}\selectfont
    $\vcenter\bgroup\hbox\bgroup
  }l<{\egroup\egroup$} l}
    \char"38& For |\vartop{bracelefttp}|\\
    \char"3E& For |\varrep{braceex}|\\
    \char"3C& For |\varmid{braceleftmid}|\\
    \char"3E& Again for |\varrep{braceex}|\\
    \char"3A& For |\varbot{braceleftbt}|
  \end{tabular}
\end{center}

Both |\nextlarger| and |\varchar| commands are like |\ligature| in 
that they describe ordinary features for the encoding; they appear in 
a specification \ETX\ file mainly to explain the purpose of some 
ordinary character. There is no such thing as a mandatory |\nextlarger| 
or |\varchar|, but varchars are occationally used to a similar effect. 
In these cases, the character generated by the extensible recipe is 
something quite different from what a |\char| for that slot would 
produce. Thus for the slot to produce the expected result it must be 
referenced using a |\delimiter| or |\radical| primitive, since those 
are the only ones which make use of the extensible recipe. The effect 
is that the slot has a \emph{semimandatory} assignment; the result of 
|\char| is unspecified (as for a slot with an ordinary assignment), but 
the result for a large delimiter or radical is not (as for a slot with 
a mandatory assignment). 

Thus some math fonts have an extra section ``Semimandatory characters'' 
between the mandatory and ordinary character sections. In that section 
for the \texttt{OMX} encoding we find for example 
\begin{quote}\begin{verbatim}
\nextslot{60}
\setslot{braceleftmid}
  \Unicode{2016}{DOUBLE VERTICAL LINE}
  \comment{This is the large size of the |\Arrowvert| 
     delimiter, a glyphic variation on |\Vert|. 
     The \texttt{braceleftmid} glyph ordinarily 
     placed in this slot must not be too tall, 
     or else the extensible recipe actually producing 
     the character might sometimes not be used.}
  \varchar
    \varrep{arrowvertex}
  \endvarchar
\endsetslot
\end{verbatim}\end{quote}
which is typeset as
\begin{quote}
  \textbf{Slot 60 `\texttt{braceleftmid}'}\\*
  Unicode character \texttt{U+2016}, \textsc{double vertical line}.\\
  This is the large size of the |\Arrowvert| 
  delimiter, a glyphic variation on |\Vert|. 
  The \texttt{braceleftmid} glyph ordinarily 
  placed in this slot must not be too tall, 
  or else the extensible recipe actually producing 
  the character might sometimes not be used.\\
  \textbf{Extensible glyph:}\\*
  \textbf{Repeated} \texttt{arrowvertex}
\end{quote}




\subsection{Fontdimens}

Each \TeX\ font contains a list of fontdimens, numbered from $1$ and 
up, which are accessible via the |\fontdimen| \TeX\ primitive. Quite a 
few are also used implicitly by \TeX\ and therefore cannot be left out 
even if they are totally irrelevant, but as one can always include 
some extra fontdimens in a font---the only bounds on how many 
fontdimens there may be are the general bound on the size of a TFM 
file and the amount of font memory \TeX\ has available---this is 
usually not a problem.

The reason fontdimens are part of font encoding specifications is 
that the meaning of e.g.\ |\fontdimen|\,|8| varies between different 
fonts depending on their encoding; thus the encoding specification 
must define the quantity stored in each |\fontdimen| parameter. This 
is done using the |\setfontdimen| command, which has the syntax
\begin{quote}
  |\setfontdimen|\marg{number}\marg{name}
\end{quote}
The \meta{number} is the fontdimen number (as a sequence of decimal 
digits where the first digit isn't zero) and the \meta{name} is a 
symbolic name for the quantity.

The standard list of symbolic names for fontdimen quantities appears 
below; the listed quantities should always be described using the names 
in this list. Encoding specifications that employ other quantities as 
fontdimens should include definitions of these quantities. Those 
quantities that are defined as ``Formula parameter \dots'' have to 
do with how mathematical formulae are rendered and are usually much 
too complicated to explain here. For exact definitions of these 
parameters, the reader is referred to Appendix~G of \textit{The 
\TeX book}~\cite{TeXbook}.
\begin{list}{}{%
   \setlength\labelwidth{0pt}%
   \setlength\itemindent{-\leftmargin}%
   \def\makelabel#1{\hspace{\labelsep}\normalfont\itshape #1}%
   \setlength\itemsep{0.5\itemsep}%
   \setlength\parsep{0.5\parsep}%
}
\item[acccapheight]
  The height of accented full capitals.
\item[ascender]
  The height of lower case letters with ascenders.
\item[axisheight] Formula parameter $\sigma_{22}$.
\item[baselineskip]
  The font designer's recommendation for natural length of the 
  \TeX\ parameter |\baselineskip|.
\item[bigopspacing1] Formula parameter $\xi_{9}$.
\item[bigopspacing2] Formula parameter $\xi_{10}$.
\item[bigopspacing3] Formula parameter $\xi_{11}$.
\item[bigopspacing4] Formula parameter $\xi_{12}$.
\item[bigopspacing5] Formula parameter $\xi_{13}$.
\item[capheight]
  The height of full capitals.
\item[defaultrulethickness] Formula parameter $\xi_{8}$.
\item[delim1] Formula parameter $\sigma_{20}$.
\item[delim2] Formula parameter $\sigma_{21}$.
\item[denom1] Formula parameter $\sigma_{11}$.
\item[denom2] Formula parameter $\sigma_{12}$.
\item[descender]
  The depth of lower case letters with descenders.
\item[digitwidth]
  The median width of the digits in the font.
\item[extraspace]
  The natural width of extra interword glue at the end of a sentence. 
  \TeX\ implicitly uses this parameter if |\spacefactor| is $2000$ or 
  more and |\xspaceskip| is zero.
\item[interword]
  The natural width of interword glue (spaces). \TeX\ implicitly uses 
  this parameter unless |\spaceskip| is nonzero.
\item[italicslant]
  The slant per point of the font. Unlike all other fontdimens, it is 
  not proportional to the font size. 
\item[maxdepth]
  The maximal depth over all slots in the font.
\item[maxheight]
  The maximal height over all slots in the font.
\item[num1] Formula parameter $\sigma_{8}$.
\item[num2] Formula parameter $\sigma_{9}$.
\item[num3] Formula parameter $\sigma_{10}$.
\item[quad]
  The quad width of the font, normally approximately equal to the 
  font size and\slash or the width of an `M'. Also implicitly available 
  as the length unit |em| and used for determining the size of the 
  length unit |mu|.
\item[shrinkword]
  The (finite) shrink component of interword glue (spaces). \TeX\ 
  implicitly uses this parameter unless |\spaceskip| is nonzero.
\item[stretchword]
  The (finite) stretch component of interword glue (spaces). \TeX\ 
  implicitly uses this parameter unless |\spaceskip| is nonzero.
\item[sub1] Formula parameter $\sigma_{16}$.
\item[sub2] Formula parameter $\sigma_{17}$.
\item[subdrop] Formula parameter $\sigma_{19}$.
\item[sup1] Formula parameter $\sigma_{13}$.
\item[sup2] Formula parameter $\sigma_{14}$.
\item[sup3] Formula parameter $\sigma_{15}$.
\item[supdrop] Formula parameter $\sigma_{18}$.
\item[verticalstem]
  The dominant width of vertical stems. This quantity is meant to be used 
  as a measure of how ``dark'' the font is.
\item[xheight]
  The x-height (height of lower case letters without ascenders). Also 
  implicitly available as the length unit |ex|.
\end{list}


\subsection{The codingscheme}

The final encoding-dependent piece of information in a \TeX\ font is 
the codingscheme, which is essentially a string declaring what 
encoding the font has. This information is currently only used by 
programs that convert the information in a \TeX\ font to some other 
format and these use it to identify the glyphs in the font. Therefore 
this string should be chosen so that the contents of the slots in the 
font can be positively identified. Observe that the encoding 
specification by itself does not provide enough information for this, 
since there are usually a couple of slots that do not contain 
mandatory characters. On the other hand, it is not a problem in this 
context if the font leaves some of the slots (even mandatory ones) 
empty as that is anyway easily detected. The only problem is with 
fonts where the slots are assigned to other characters than the ones 
specified in the encoding.

For that reason, it is appropriate to assign two codingscheme strings 
to each encoding. The main codingscheme is for fonts were all slots 
(mandatory and ordinary alike) have been assigned according to the 
specification or have been left empty. The variant codingscheme is for 
fonts where some ordinary slots have been assigned other characters 
than the ones listed in the specification, but where the mandatory 
slots are still assigned according to the specification or are left 
empty. The font encoding specification should give the main 
codingscheme name, whereas the variant codingscheme name could be 
formed by adding \verb*| VARIANT| to the main codingscheme name.

Technically the codingscheme is specified by setting the 
\texttt{codingscheme} string variable. This has the syntax
\begin{quote}
  |\setstr{codingscheme}|\marg{codingscheme name}
\end{quote}
e.g.
\begin{quote}
  |\setstr{codingscheme}{EXTENDED TEX FONT ENCODING - LATIN}|
\end{quote}
which is typeset as
\begin{quote}
  \textbf{Default} s(\texttt{codingscheme}) = 
  \verb*|EXTENDED TEX FONT ENCODING - LATIN|
\end{quote}
A codingscheme name may be at most 40 characters long and may not 
contain parentheses. If the entire \verb*| VARIANT| cannot be suffixed 
to a main name because the result becomes to long (as in the above 
example) then use the first 40 characters of the result.


\subsection{Overall document structure}
\label{Ssec:Structure}

The overall structure of a font encoding specification should be 
roughly the following
\begin{quote}
  |\relax|\\
  |\documentclass[twocolumn]{article}|\\
  |\usepackage[specification]{fontdoc}|\\
  \meta{preamble}\\
  |\begin{document}|\\
  \meta{title}\\
  \meta{manifest}\\
  |\encoding|\\
  \meta{body}\\
  |\endencoding|\\
  \meta{discussion}\\
  \meta{change history}\\
  \meta{bibliography}\\
  |\end{document}|
\end{quote}
The commands described in the preceding subsections must all go in 
the \meta{body} part of the document, as that is the only part of the 
file which actually gets processed as a data file. The part before 
|\encoding| is skipped and the part after |\endencoding| is never 
even input, so whatever appears there is only part of the \LaTeX\ 
document. For the purposes of processing as a data file, the 
important markers in the file are the |\relax|, the |\endcoding|, and 
the |\endencoding| commands.

The \meta{title} is the usual |\maketitle| (and the like) stuff. The 
person or persons who appear as author(s) are elsewhere in this paper 
described as the \emph{encoding proposers}. The \meta{title} should 
also give the date when the specification was last changed.

The \meta{manifest} is an important, although usually pretty short, 
part of the specification. It is a piece of text which explains the 
purpose of the encoding (in particular what it can be used for) and 
the basic ideas (if any) which have been used in its construction. It 
is often best marked up as an abstract.

The \meta{discussion} is the place for any longer comments on the 
encoding, such as analyses of different implementations, comparisons 
with other encodings, etc. This is also the place to explain any more 
general structures in the encoding, such as the arrow kit in the 
proposed \texttt{MS2} encoding~\cite{ClasenVieth}. In cases where the 
specification is mainly a formulation of what is already an 
established standard the \meta{discussion} is often rather short as 
the relevant discussion has already been published elsewhere, but 
it is anyway a service to the reader to include this information. 
References to the original documents should always be given.

It might be convenient to include an FAQ section at the end of the 
discussion. This is particularly suited for explaining things where 
one has to look for a while and consult the references to find the 
relevant information.

The \meta{change history} documents how the specification has changed 
over time. It is preferably detailed, as each detail in an encoding 
is important, but one should not be surprised if it is anyway rather 
short due to that there haven't been that many changes.

The \meta{bibliography} is an important part of the specification. It 
should at the very least include all the sources which have been used 
in compiling the encoding specification, regardless of whether they 
are printed, available on the net, merely ``personal communication'', 
or something else. It is also a service to the reader to include in 
the bibliography some more general references for related matters.

The \meta{preamble} is just a normal \LaTeX\ preamble and there are no 
restrictions on defining new commands in it, although use of such 
commands in the \meta{body} part is subject to the same restrictions 
as use of any general \LaTeX\ command. The preamble should however 
\emph{not} load any packages not part of the required suite of 
\LaTeX\ packages, as that may prevent users who do not have these 
packages from typesetting the specification. Likewise, the 
specification should \emph{not} require that some special font is 
available. Glyph examples for characters are usually better 
referenced via Unicode character charts than via special fonts. 

An exception to this rule about packages is that the specification 
must load the \package{fontdoc} package, as shown in the outline 
above, since that defines the |\setslot| etc.\ commands that should 
appear in the \meta{body}. This should not be a problem, as the 
\package{fontdoc} package can preferably be kept in the same directory 
as the collection of encoding specifications (see below). The 
\texttt{specification} option should be passed to the package to let it know 
that the file being processed is an encoding specification---otherwise 
|\Ligature| and |\ligature| will get the same formatting, for one. It 
is not necessary to use the \package{article} document class, and 
neither must it be passed the \texttt{twocolumn} option, but it is 
customary to do so. In principle any other document class within 
required \LaTeX\ will do just as well.

If you absolutely think that using some non-required package 
significantly improves the specification, then try writing the code so 
that is loads the package only if it is available and provide some 
kind of fallback definition for sites where it is not. E.g.\ the 
\package{url} package could be loaded as
\begin{verbatim}
\IfFileExists{url.sty}{\usepackage{url}}{}
\providecommand\url{\verb}
\end{verbatim}
The |\url| command defined by this is not equivalent to the command 
defined by the \package{url} package, but it can serve fairly well 
(with a couple of extra overfull lines as only ill effect) if its 
use is somewhat restricted.

Finally, a technical restriction on the \meta{preamble}, \meta{title}, 
and \meta{manifest} is that they must not contain any mismatched 
|\if|s (of any type) or |\fi|s, as \TeX\ conditionals will be used for 
skipping those parts of the file when it is processed as a data file. 
If the definition of some macro includes mismatched |\if|s or |\fi|s 
(this will probably occur only rarely) then include some extra code 
so that they do match.


% All technical parts of the encoding specification (slot assignments, 
% fontdimens, etc.\@) have to be in the \meta{encoding commands} part. 
% The other parts are suitably used for longer commentry, such as the 
% mainfest (see below), revision history, and bibliography.
% 
% When the file is being typeset as a \LaTeX\ document there is nothing 
% special going on. The |\encoding| and |\endencoding| commands may set 
% some internal variables, but otherwise they do very little. When the 
% file is being read by \package{fontinst}, things are quite different. 
% Everything between the initial |\relax| and |\encoding| is skipped, 
% and the file is not read further than to the |\endencoding|. Hence 
% the \meta{preamble}, \meta{\LaTeX\ text 1}, and \meta{\LaTeX\ text 2} 
% can contain pretty much anything (with a few exceptions) which is 
% legal in a \LaTeX\ document. 


\subsection{Encoding specification body syntax}

The \meta{body} part of an encoding specification must comply to a 
much stricter syntax than the rest of the file. The \meta{body} is 
a sequence of \meta{encoding command}s, each of which should be one 
of the following:
\begin{quote}
  |\setslot|\marg{glyph name} \meta{slots commands} |\endsetslot|\\
  |\nextslot|\marg{number}\\
  |\skipslots|\marg{number}\\
  |\setfontdimen|\marg{number}\marg{name}\\
  |\setstr{codingscheme}|\marg{codingscheme name}\\
  |\needsfontinstversion|\marg{version number}
\end{quote}
The |\needsfontinstversion| command is usually placed immediately 
after the |\encoding| command. The \meta{version number} must be at 
least |1.918| for many of the features described in this file to be 
available, and at least |1.928| if the |\charseq| command is used.

The \meta{slot commands} are likewise a sequence of \meta{slot 
command}s, each of which should be one of the following:
\begin{quote}
  |\Unicode|\marg{code point}\marg{name}\\
  |\charseq|\marg{\cs{Unicode} commands}\\
  |\comment|\marg{text}\\
  |\Ligature|\marg{ligtype}\marg{right}\marg{new}\\
  |\ligature|\marg{ligtype}\marg{right}\marg{new}\\
  |\oddligature|\marg{note}\marg{ligtype}\marg{right}\marg{new}\\
  |\nextlarger|\marg{glyph name}\\
  |\varchar| \meta{varchar commands} |\endvarchar|
\end{quote}
where \meta{varchar commands} similarly is a sequence of \meta{varchar 
command}s, each of which should be one of the following:
\begin{quote}
  |\varrep|\marg{glyph name}\\
  |\vartop|\marg{glyph name}\\
  |\varmid|\marg{glyph name}\\
  |\varbot|\marg{glyph name}
\end{quote}
Finally, one can include any number of \meta{comment command}s between 
any two encoding, slot, or varchar commands. The comment commands are
\begin{quote}
  |\begincomment| \meta{\LaTeX\ text} |\endcomment|\\
  |\label|\marg{reference label}
\end{quote}
The \meta{\LaTeX\ text} can be pretty much any \LaTeX\ code that can 
appear in conditional text. (|\begincomment| is either |\iffalse| or 
|\iftrue| depending on whether the encoding specification is 
processed as a data file or typeset as a \LaTeX\ document respectively. 
|\endcomment| is always |\fi|.) The |\label| command is just 
the normal \LaTeX\ |\label| command; when it is used in a \meta{slot 
commands} it references that particular slot (by number and glyph 
name).

The full syntax of the \ETX\ format can be found in the 
\package{fontinst} manual~\cite{fontinst-man}, but font encoding 
specifications only need a subset of that. 


\subsection{Additional \package{fontdoc} features}

There is an ``in comment paragraph'' form |\textunicode| of the 
|\Unicode| command. Both commands have the same syntax, but 
|\textunicode| is only allowed in ``comment'' contexts. A typical use 
of |\textunicode| is
\begin{quote}
  |\comment{An |\dots\\
  \quad\dots| this is \textunicode{2012}{FIGURE DASH}; in |\dots\\
  |}|
\end{quote}
which is typeset as
\begin{quote}
  An \dots\ this is \texttt{U+2012} (\textsc{figure dash}); in \dots
\end{quote}

The \package{fontdoc} package inputs a configuration file 
\texttt{fontdoc.cfg} if that exists. This can be used to pass 
additional options to the package. The only currently available 
options that this could be of interest for are the \texttt{hypertex} 
and \texttt{pdftex} options, which hyperlinks each \texttt{U+}\dots\ 
generated by |\Unicode| or |\textunicode| (using Hyper\TeX\ or 
pdf\TeX\ conventions\footnote{One could just as well do the same 
thing using some other convention if a suitable definition of 
\cs{FD@codepoint} is included in \texttt{fontdoc.cfg}. See the 
\package{fontinst} sources~\cite{fontinst-pre} for more details.} 
respectively) to a corresponding glyph image on the Unicode consortium 
website. To use this feature one should put the line
\begin{quote}
  |\ExecuteOptions{hypertex}|
\end{quote}
or
\begin{quote}
  |\ExecuteOptions{pdftex}|
\end{quote}
in the \texttt{fontdoc.cfg} file. \emph{Please} do not include this 
option in the |\usepackage|\nolinebreak[1]|{fontdoc}| of an encoding 
specification file as that can be a severe annoyance for people whose 
\TeX\ program or DVI viewers do not support the necessary extensions.
% Hyper\TeX\ |\special|s.


\section{Font encoding ratification}

This section describes a suggested ratification process for font 
encoding specifications. As there are fewer technical matters that 
impose restrictions on what it may look like, it is probably more 
subjective than the other parts of this paper.

\medskip

A specification in the process of being ratified can be in one of 
three different stages: \emph{draft}, \emph{beta}, or \emph{final}. 
Initially the specification is in the draft stage, during which it 
will be scrutinized and can be subject to major changes. A 
specification which is in the beta stage has got a formal approval 
but the encoding in question may still be subject to some minor 
changes if weighty arguments present themselves. Once the 
specification has reached the final stage, the encoding may not 
change at all.


\subsection{Getting to the draft stage}

The process of taking an encoding to the draft stage can be 
summarized in the following steps. Being in the draft stage doesn't 
really say anything about whether the encoding is in any way correct 
or useful, except in that some people (the encoding proposers) 
believe it is and are willing to spend some time on ratifying it. 

\paragraph{Write an encoding specification} The first step is to 
write a specification for the font encoding in question. This 
document must not only technically describe the encoding but also 
explain what the encoding is for and why it was created. See 
Subsection~\ref{Ssec:Structure} for details on how the document is 
preferably organised.

\paragraph{Request an encoding name} The second step is to write to 
the \LaTeX3 project and request a \LaTeX\ encoding name for the 
encoding. This mail should be in the form of a \LaTeX\ bug report, it 
must be sent to 
\begin{quote}
  \href{mailto:latex-bugs@latex-project.org}%
  {\texttt{latex-bugs@latex-project.org}},
\end{quote}
and it must include the encoding specification file. Suggestions for 
an encoding name are appreciated, but not necessarily accepted.
The purpose of this mail is \emph{not} to get an approval of the 
encoding, but only to have a reasonable name assigned to it.

\paragraph{Upload the specification to CTAN} The third step is make 
the encoding specification publicly available by uploading it to 
CTAN. Encoding specifications are collected in the
\begin{quote}
  \ctanref{info/encodings}{\texttt{info/encodings}}
\end{quote}
directory (which should also contain the most recent version of this 
paper). The name of the uploaded file should be 
\meta{encoding name}\texttt{draft.etx}. The reason for this naming is 
that it must be clear that the specification has not yet been ratified.

\paragraph{Announce the encoding} When the upload has been confirmed, 
it is time to announce the encoding by posting a message about it to 
the relevant forums. Most important is the \texttt{tex-fonts} mailing 
list, since that is where new encodings should be debated. Messages 
should also be posted to the \texttt{comp.text.tex} newsgroup and any 
forums related to the intended use of the encoding: an encoding for 
Sanskrit should be announced on Indian \TeX\ users forums, an 
encoding for printing chess positions should be announced on some 
chess-with-\TeX\ user forum, etc.; in the extent that such forums exist.

The full address of the \texttt{tex-fonts} mailing list is
\begin{quote}
  \texttt{tex-fonts@math.utah.edu}
\end{quote}
This list rejects postings from non-members, so you need to subscribe 
to it before you can post your announcement. This is done by sending 
a `subscribe me' mail to
\begin{quote}
  \href{mailto:tex-fonts-request@math.utah.edu}
    {\texttt{tex-fonts-request@math.utah.edu}}
\end{quote}
The list archives can be found at
\begin{quote}
  \href{http://www.math.utah.edu/mailman/listinfo/tex-fonts}
    {\textsc{http:}/\slash \texttt{www.math.utah.edu}\slash 
    \texttt{mailman}\slash \texttt{listinfo}\slash 
    \texttt{tex-fonts}}
\end{quote}
A tip is to read through the messages from a couple of months 
before you write up your announcement, as that should help you get 
acquainted with the normal style on the list. Please do not send 
messages encoded in markup languages (notably, \textsmaller{HTML}, 
\textsmaller{XML}, and word processor formats) to the list.

\paragraph{Experimental encodings} There is a point in going through 
the above procedure even for experimental encodings, i.e., encodings 
whose names start with an \texttt{E}. Of course there is no idea in 
ratifying a specification of an experimental encoding, as it is very 
likely to frequently change, but having a proper name assigned to the 
encoding and uploading its specification to CTAN makes it much simpler 
for other people to learn about and make references to the encoding.


\subsection{From draft to beta stage}

The main difference between a draft and beta stage specification 
respectively is that beta stage specifications have been scrutinized 
by other people and found to be free of errors. The practical 
implementation of this is that a debate is held (in the normal 
anarchical manner of mailing list debates) on the \texttt{tex-fonts} 
mailing list. In particular the following aspects of the 
specification should be checked:
\begin{enumerate}
  \item \emph{Is the encoding technically correct?} 
    There are many factors which affect what \TeX\ does and it is 
    easy to overlook some. (The \cs{lccode}s seem to be particularly 
    troublesome, in this respect.) Sometimes fonts simply cannot work 
    as an encoding specifies they should and it is important that 
    such defects in the encoding are discovered on an early stage.
  \item \emph{Are there any errors in the specification?}
    A font encoding specification is largely a table and typos are 
    easy to make. Proof-reading may be boring, but it is very, very 
    important.
  \item \emph{Is the specification sufficiently precise?}
    Are there any omissions, ambiguities, inaccuracies, or completely 
    irrelevant material in the specification? There shouldn't be.
\end{enumerate}
During the debate, the encoding proposers should hear what other 
people have to say about the encoding draft, revise it accordingly 
when some flaw is pointed out, and upload the revised version. This 
cycle may well have to be repeated several times before everyone's 
content. It is worth pointing out that in practice the debate should 
turn out to be more of a collective authoring of the specification 
than a defense of its validity. There is no point in going into it 
expecting the worst.

Unfortunately, it might happen that there never is a complete agreement 
on an encoding specification---depending on what side on takes, either 
the encoding proposers refuse to correct obvious flaws in it, or someone 
on the list insists that there is a flaw although there is obviously 
not---but hopefully that will never happen. If it anyway does happen 
then the person objecting should send a mail whose subject contains the 
phrase "formal protest against XXX encoding" (with XXX replaced by 
whatever the encoding is called) to the list. Then it will be up to 
the powers that be to decide on the fate of the encoding (see below).

\paragraph{Summarize the debate} When the debate on the encoding is 
over---e.g.\ a month after anyone last posted anything new on the 
subject---then the encoding proposers should summarize the debate on 
the encoding specification draft and post this summary as a follow-up 
on the original mail to \texttt{latex-bugs}. This summary should list 
the changes that have been made to the encoding, what suggestions there 
were for changes which have not been included, and whether there were 
any formal protests against the encoding. The summary should also explain 
what the proposers want to have be done with the encoding. In the 
usual case this is having it advanced to beta stage, but the proposers 
might alternatively at this point have reached the conclusion that the 
encoding wasn't such a good idea to start with and therefore withdraw 
it, possibly to come again later with a different proposal.

In response to this summary, the \LaTeX-project people may do one of 
three things:
\begin{itemize}
  \item 
    If the proposers wants the encoding specification advanced and 
    there are no formal protests against this, then the encoding 
    should be advanced to the beta stage. The \LaTeX-project 
    people do this by adding the encoding to the list of approved 
    (beta or final stage) encodings that they [presumably] maintain.
  \item 
    If the proposers want to withdraw the encoding specification 
    then the name assigned to it should once again be made available 
    for use for other encodings.
  \item
    If the proposers want the encoding specification advanced but 
    there is some formal protest against this, then the entire matter 
    should be handed over to some suitable authority, as a suggestion 
    some technical TUG committee, for resolution.
\end{itemize}


\paragraph{Update the specification on CTAN} When the specification 
has reached the beta stage, its file on CTAN should be updated to say 
so. In particular the file name should be changed from \meta{encoding 
name}\texttt{draft.etx} to \meta{encoding name}\texttt{spec.etx}.


\paragraph{Modifying beta stage encodings} If a beta stage encoding is 
modified then the revised specification should go through the above 
procedure of ratification again before it can replace the previous 
\meta{encoding name}\texttt{spec.etx} file on CTAN. The revised 
version should thus initially be uploaded as \meta{encoding 
name}\texttt{draft.etx}, reannounced, and redebated. It can however 
be expected that such debates will not be as extensive as the 
original debates.


\subsection{From beta stage to final stage}

The requirements for going from beta stage to final stage are more 
about showing that the encoding has reached a certain maturity than 
about demonstrating any technical merits of it. The main difference in 
usefulness between a beta stage encoding and a final stage encoding is 
that the latter can be considered safe for archival purposes, whereas 
one should have certain reservations against such use of beta stage 
encodings.

It seems reasonable that the following conditions should have to be 
fulfilled before a beta stage encoding can be made a final stage 
encoding:
\begin{itemize}
  \item At least one year must have passed since the last change was 
    made to the specification.
  \item At least two people other than the proposer must have 
    succeeded in implemented the encoding in a font.
\end{itemize}
It is quite possible that some condition should be added or some of 
the above conditions reformulated.


% References updated 2004/08/07.
\begin{thebibliography}{???}
\bibitem{ATN5013}
  Adobe Systems Incorporated:
  \textit{Adobe Standard Cyrillic Font Specification}, 
  Adobe Technical Note \#5013, 1998;
  \href{http://partners.adobe.com/asn/developer/pdfs/tn/%
  5013.Cyrillic_Font_Spec.pdf}{\textsc{http}:/\slash 
  \texttt{partners.adobe.com}\slash \texttt{asn}\slash 
  \texttt{developer}\slash \texttt{pdfs}\slash \texttt{tn}\slash 
  \texttt{5013.Cyrillic\_Font\_Spec.pdf}}.
\bibitem{AGL}
  Adobe Systems Incorporated: \textit{Adobe Glyph List},
  text file, 1998,
  \href{http://partners.adobe.com/asn/developer/type/glyphlist.txt}
  {\textsc{http}:/\slash \texttt{partners.adobe.com}\slash
  \texttt{asn}\slash \texttt{developer}\slash \texttt{type}\slash
  \texttt{glyphlist.txt}}.
\bibitem{unicodesign}
  Adobe Systems Incorporated:
  \textit{Adobe Solutions Network: Unicode and Glyph Names}, 
  web page, 1998,
  \href{http://partners.adobe.com/asn/developer/type/unicodegn.html}
  {\textsc{http}:/\slash \texttt{partners.adobe.com}\slash
  \texttt{asn}\slash \texttt{developer}\slash \texttt{type}\slash
  \texttt{unicodegn.html}}.
\bibitem{ClasenVieth}
  Matthias Clasen and Ulrik Vieth:
  \textit{Towards a new Math Font Encoding for (La)TeX},
  March 1998, presented at EuroTeX'98;
  \href{http://tug.org/twg/mfg/papers/current/mfg-euro-all.ps.gz}
  {\textsc{http}:/\slash \texttt{tug.org}\slash \texttt{twg}\slash
  \texttt{mfg}\slash \texttt{papers}\slash \texttt{current}\slash
  \texttt{mfg-euro-all.ps.gz}}.
\bibitem{fontinst-man}
  Alan Jeffrey, Rowland McDonnell, Ulrik Vieth, and Lars Hellstr\"om:
  \textit{\package{fontinst}---font installation software for \TeX} 
  (manual), 2004, 
  \ctanref{fonts/utilities/fontinst/doc/fontinst.tex}{%
  \textsc{ctan}:\discretionary{}{}{\thinspace}%
  \texttt{fonts}\slash \texttt{utilities}\slash 
  \texttt{fontinst}\slash \texttt{doc}\slash \texttt{fontinst.tex}}.
% \bibitem{fontinst}
%   Alan Jeffrey, Sebastian Rahtz, and Ulrik Vieth: 
%   \textit{The \package{fontinst} utility}, documented source code, 
%   v\,1.801,
%   \ctanref{fonts/utilities/fontinst/source}{%
%   \textsc{ctan}:\discretionary{}{}{\thinspace}%
%   \texttt{fonts}\slash \texttt{utilities}\slash 
%   \texttt{fontinst}\slash \texttt{source}/}.
\bibitem{fontinst-pre}
  Alan Jeffrey, Sebastian Rahtz, Ulrik Vieth, and Lars Hellstr\"om: 
  \textit{The \package{fontinst} utility}, documented source code, 
  v\,1.9xx,
  \ctanref{fonts/utilities/fontinst/source}{%
  \textsc{ctan}:\discretionary{}{}{\thinspace}%
  \texttt{fonts}\slash \texttt{utilities}\slash 
  \texttt{fontinst}\slash \texttt{source}/}.
\bibitem{TeXbook}
  Donald E.\ Knuth, Duane Bibby (illustrations): \textit{The \TeX book}, 
  Ad\-di\-son--Wes\-ley, 1991; 
  volume A of \textit{Computers and typesetting}.
\bibitem{LaTeXCompanion}
  Frank Mittelbach and Michel Goossens, with Johannes Braams, 
  David Carlisle, and Chris Rowley:
  \textit{The \LaTeX\ Companion} (second edition), 
  Ad\-di\-son--Wes\-ley, 2004; ISBN~0-201-36299-6.
\bibitem{Omega-doc}
  John Plaice and Yannis Haralambous:
  \textit{Draft documentation for the Omega system},
  version~1.12, 1999; 
  \href{http://omega.cse.unsw.edu.au:8080/doc-1.12.ps}{%
  \textsc{http:}/\slash \texttt{omega.cse.unsw.edu.au:8080}\slash
  \texttt{doc-1.12.ps}}.
%   \textsc{ctan}:\discretionary{}{}{\thinspace}%
%   \texttt{systems}\slash \texttt{omega}\slash 
%   \texttt{omega-doc-1.8.tar.gz}.
\bibitem{Vieth2001}
  Ulrik Vieth: 
  \textit{Math typesetting in \TeX: The~good, the~bad, the~ugly},
  to appear in the proceedings of Euro\TeX\ 2001;
  \href{http://www.ntg.nl/eurotex/vieth.pdf}{%
  \textsc{http}:/\slash \texttt{www.ntg.nl}\slash 
  \texttt{eurotex}\slash \texttt{vieth.pdf}}.
\end{thebibliography}


\end{document}