% -*-latex-*-
% Document name: /u/sy/beebe/tex/talks/special/special.ltx
% Creator: Nelson H.F. Beebe [beebe@magna.math.utah.edu]
% Creation Date: Sun Nov 11 07:06:19 1990
% 1.02 -- [04-Jun-1991] fix two small typos
% 1.01 -- [12-Nov-1990] last major changes in original version

%%  @texfile{
%%      author          = "Nelson H. F. Beebe",
%%      version         = "1.02",
%%      date            = "04 June 1991",
%%      filename        = "special.ltx",
%%      address         = "Center for Scientific Computing
%%                         Department of Mathematics
%%                         South Physics Building
%%                         University of Utah
%%                         Salt Lake City, UT 84112
%%                         USA
%%                         Tel: (801) 581-5254",
%%      checksum        = "1491    6916   48967",
%%      email           = "beebe@math.utah.edu (Internet)",
%%      codetable       = "ISO/ASCII",
%%      keywords        = "",
%%      supported       = "yes",
%%      docstring       = "This document contains a proposal for the
%%                         handling of \\special and paper
%%                         specifications by DVI drivers.
%%
%%                         The checksum field above contains the
%%                         standard UNIX wc (word count) utility
%%                         output of lines, words, and characters;
%%                         eventually, a better checksum scheme should
%%                         be developed."
%%      }

\documentstyle[special,ltugboat]{article}

\title{A Proposal for \protect\TeX{}
      {\tt\char92special} Commands and
      \protect\DVI{} Driver Paper
      Specification}

\author{Nelson H. F. Beebe}

\address{Center for Scientific Computing\\
        Department of Mathematics\\
        220 South Physics Building\\
        University of Utah\\
        Salt Lake City, UT 84112\\
        USA\\
        Tel: (801) 581-5254\\
        FAX: (801) 581-4148}

\netaddress[\network{Internet}]{Beebe@math.utah.edu}

\begin{document}

\maketitle

\bibliographystyle{plain}

\section{Introduction}

\TeX{} is now a {\em de facto\/} standard; its
source code development is now frozen, with the
version number converging to $\pi$ as increasingly
rare bugs are fixed \cite{Knuth:TB11-4-???-???}.
\TeX{} has been implemented on nearly every
computer architecture commercially available
today, from personal computers to supercomputers,
on a wide variety of operating systems.

\TeX{}'s principal output is a {\em
device-independent file}, the \DVI{} file, which
contains a compact description of where to set
characters on the page.  It does not contain any
descriptions of the characters themselves, only a
reference to the fonts in which they are found.
A few other programs besides \TeX{} also produce
\DVI{} files.

It is the job of separate programs, known as
\DVI{} drivers, to translate this description into
a form suitable for some output device, which
might be a printer, a display screen, a
phototypesetter, or even another \DVI{} file.

Because a separate \DVI{} driver is needed for
each output device, and each operating system,
there is the potential for an explosion in the
number of auxiliary programs that may be required
to obtain usable output from \TeX{}, and
regrettably, that seems to have happened.

\section{The \protect\DVI{} driver problem}

I have previously espoused the view
\cite{Beebe:TB8-1-41} that prevention of \DVI{}
driver program proliferation is properly addressed
by writing a {\em family} of drivers that supports
a wide variety of output devices, sharing common
source code as much as possible.  The code must
be highly portable, so as to work on a wide
variety of operating systems.  My implementation
of such a family of drivers has been
well-received, and many thousands of copies of the
programs are in use around the world.

The last public release, version 2.10 of October
1988, consists of about 30~000 lines of code for
19 drivers, together with about 8400 lines of
documentation, corresponding to about 150 typeset
pages.  Five major operating environments (Atari,
DEC TOPS-20, DEC \VAX{} VMS, IBM PC, and \UNIX{})
%
\typeout{EDITOR: small caps on UNIX looks odd}
%
are supported, with several different compilers.
Ports have been carried out to other operating
systems as well, but the changes have not been
made generally available.

The widespread use of those programs has confirmed
my thesis, but has also demonstrated that they
have some deficiencies.  This is to be expected in
any software product, whether public or
commercial.  Even \TeX{} has evolved from its
original design.

Consequently, in the fall of 1988 I set out to
redesign the driver family to remedy all of the
deficiencies, to further enhance portability to
new operating systems and new compilers, to make
it easier to modify existing driver code to
support other output devices, and to extend and
enhance the documentation.  The development
version is known as 3.0.

This work is, alas, far from complete, and I
sometimes wonder whether Don Knuth will finish
Volume 4 of the {\em Art of Computer Programming}
before I finish the new driver family.  However,
considerable progress has been made.  The number
of output devices and operating environments
supported has more than doubled.  The source code
is now over 55~000 lines; for comparison, \TeX{}
and \MF{} are each about 20~000 lines when
prettyprinted.  There are 23~500 lines of
documentation, corresponding to about
% dvidrive      256
% dviman         53
% dviman2        37
% dvistatu       20
% dvi.ps         33
% Total =       399
400 typeset pages.  The `manual pages' are written
in a subset of \LaTeX{}, then converted
automatically to \UNIX{} \verb|troff| format,
\VAX{} VMS \verb|help| file format, Emacs
\TeX{}info format, and a simple ASCII text file
format.

\section{Standardization of \protect\DVI{}
drivers}

As should be expected, the proliferation of \DVI{}
driver code written by many authors has led to
considerable variation in driver interfaces and
operation.  While human interfaces unavoidably
depend somewhat on the operating environment, one
can demand that the same {\em capabilities\/}
(e.g.\ page selection and order, paper sizes, page
origin offset, file paths, startup-file support,
\ldots{}) be available in all drivers.

Operational differences are less excusable.  For
example, most programs, including \TeX{}, have
fixed limits arising from compile-time choices of
internal storage sizes.  User annoyance and
frustration results when those limits are reached
prematurely.  Even on the same output device,
slight variations in page positioning, and
placement of rules and characters will be found in
different drivers.  Worse, some drivers may refuse
to print certain \DVI{} files, because internal
limits are reached, or a particular font cannot be
found.

To address these problems, a committee of the
\TeX{} Users Group was established to develop a
standard for \DVI{} drivers.  Completion of a
level-0 draft is imminent.

This draft is intended to define minimal standards
that all \DVI{} drivers should adhere to.  It does
not address some of the thornier issues,
particularly the \TeX{} \verb|\special| command,
which will be covered in a higher-level draft
standard yet to be prepared.

\section{The \protect\TeX{} {\tt\char92special}
         command}

When \TeX{} was first defined in 1977--78, its
author realized that there would be a need for
extensibility.  He chose to provide this in a very
simple form---an arbitrary string provided as the
argument to the \verb|\special| command is
macro-expanded, then passed verbatim to the \DVI{}
file without further interpretation.

To guide \TeX{} users and authors of \DVI{}
drivers, he offered this advice
\cite[pp.~228--229]{Knuth:texbook}:
%
  \begin{quote}
The $\langle$token list$\rangle$ in a
\verb|\special| command should consist of a
keyword followed if necessary by a space and
appropriate arguments.  For example,
%
\begin{verbatim}
\special{halftone pic1}
\end{verbatim}
%
might mean that a picture on the file \verb|pic1|
should be inserted on the current page, with its
reference point at the current position.

$\vdots$

\noindent
Software programs that convert \verb|dvi| files to
printed or display output should be able to fail
gracefully when they don't recognize your special
keywords.

$\vdots$

\noindent
However, the author anticipates that certain
standards for common graphics operations will
emerge in the \TeX{} user community, after careful
experiments have been made by groups of people;
then there will be a chance for some uniformity in
the use of \verb|\special| extensions.
  \end{quote}

As Knuth noted, the most common use of
\verb|\special| is to inform the \DVI{} driver
that a graphics file is to be inserted in the
output.  Many other possibilities exist, including
paper specification, operator messages, grey
shading, change bars, color selection, page
overlays, and output device control.

With very few exceptions, existing drivers,
including my own 2.10 version, have adopted {\em
ad hoc\/} syntax for the \verb|\special| string.
The result is that the \DVI{} file is no longer
device-independent; it depends both on the output
device, and {\em on the particular driver
that is expected to process it}.

\section{Improving the handling of
        {\tt\char92special} commands}

In the 3.0 \DVI{} driver development, I had to
solve the \verb|\special| problem.  This section
will describe how, and why, I did so.

While the complete source code for the 3.0 drivers
will not be released for some time, the part
described in this article for \verb|\special|
strings and paper specifications is complete, and
{\bf I am making it available immediately, and
without any restrictions whatsoever, to authors of
\DVI{} drivers for incorporation in their
programs.}

The source code is written in ANSI C
\cite{ANSI:c89}.  C is already used for many
\DVI{} driver programs; for drivers that are not
written in C, it should be considerably easier to
start with this code and reprogram it in some
other modern language, such as Pascal or Modula-2,
than to redevelop equivalent code from scratch.

The previous section observed that most existing
drivers have chosen an arbitrary syntax for the
\verb|\special| strings they support.
This is undesirable, for at least these reasons:
%
  \begin{itemize}
    \item
          The chosen syntax is mostly unique to a
          particular driver, and therefore
          seriously compromises document
          portability.
    \item
          The syntax is not obviously extensible.
    \item
          The syntax cannot always be
          unambiguously parsed.
    \item
          The output device, or driver, to which
          the \verb|\special| applies is not
          determinable.
    \item
          The capabilities are weak, and fail to
          address many of the potential uses of
          the \verb|\special| command.
  \end{itemize}

The syntax that I have developed completely
resolves these objections.  It has the following
features:
%
  \begin{itemize}
    \item
          The \verb|\special| command string is
          defined to contain a program written in
          a small language that consists of
          sequences of assignment statements,
          possibly with embedded comments.
    \item
          The \verb|\special| language is {\em
          rigorously defined\/} by a programming
          language grammar, based on the C
          language grammar
          \cite{ANSI:c89,Harbison:carm-2}.
          Correct parsers for the language can be
          developed using any of several standard
          methods that are well-known in computer
          science
          \cite{Aho:red-dragon,Aho:green-dragon,%
          Holub:compiler-design,%
          Schreiner:compiler}
          and the \UNIX{} world
          \cite{Johnson:yacc,Lesk:lex}.
          Implementations of some of these are
          available from several sources, and for
          several operating systems
          \cite{Abraxas:pcyacc,Donnelly:bison,%
          Gray:lex,Holub:compiler-design,%
          MKS:yacc,Paxson:flex}.
    \item
          The language is {\em extensible}.  An
          assignment statement consists of a
          keyword\slash value pair.  Several
          keywords are already defined, and {\em
          new ones can be added without
          invalidating existing uses of the
          language}.
    \item
          Keywords are typed, and constant values
          assigned to them must be of the same
          type.  The supported types are scalar
          strings, numbers, and dimensions.  The
          latter include all of \TeX{}'s standard
          dimensional units.
   \item
          There is {\em no limit\/} (other than
          host memory) on the length of a constant
          string.
    \item
          Value string concatenation is supported
          in the style of ANSI C \cite{ANSI:c89},
          avoiding the often severe line length
          limitations of text editors, operating
          systems, and file systems.
   \item
          Provision is made for encoding {\em all}
          characters in the host character set, so
          that, e.g.\ binary printer control
          sequences can be incorporated as {\em
          printable}, and {\em portable}, text in
          \TeX{} documents.
    \item
          A particular keyword, \verb|language|,
          is provided to permit the user to
          specify the output device language, or
          the \DVI{} driver, to which the
          \verb|\special| command is directed.
    \item
          The language is general enough that it
          can be used for other purposes.  In my
          3.0 \DVI{} driver software, the complex
          issue of paper specification is handled
          by the same language, and importantly,
          by the {\em same parser\/} that is used
          for \verb|\special| strings.
  \end{itemize}
%
In the actual implementation of the parser, I
chose {\em not\/} to use one of the above-cited
parser generators.  There are two important
reasons for this.

  \begin{itemize}
    \item
          Parser generators convert a grammar file
          to an output program that is impossible
          to modify by hand.  Portability and
          extensions of the drivers would be
          compromised if part of their source code
          could only be generated on certain
          systems, or with proprietary software.
    \item
          Parser generators encode the language
          keywords into the parser code, usually
          in incomprehensible forms; examine the
          parsing tables in a
          \verb|yacc|-generated parser
          \cite{Johnson:yacc} to see why.
  \end{itemize}

By suitable abstractions, it has proven possible
to create a recursive-descent parser
\cite{Aho:red-dragon,Aho:green-dragon} for the
language in which {\em the keywords and value
storage locations are provided in a table passed
to the parser}.  The parser code is therefore
completely portable, and {\em independent\/} of
the keywords in the language it parses.  The same
code is used for both the \verb|\special| command
strings, and for paper specification.

\section{A proposed syntax for the
          {\tt\char92special} command}

The preceding section has described the motivation
for a new approach to the definition of a
\verb|\special| language.  What does the language
look like?  Some examples will give the general
flavor before we describe the details of the
grammar.  Here are some fragments of \TeX{} input
with \verb|\special| commands intended for a
\DVI{} driver that produces \POSTSCRIPT{}; each of
these works with \verb|dvialw| in the version 3.0
development.
%
\begin{verbatim}
% Display a picture with the
% upper-left corner at the current
% point
\special{language "PostScript",
         include "pict.eps"}

% Display a picture at its original
% absolute page position
\special{language "PostScript",
         overlay "pict.eps"}

% Use literal PostScript to draw a
% 1in box with lower-left corner at
% TeX's current point
\special{language "PostScript",
  literal
    "newpath
        % move origin from upper-left
        % to lower-left
        0 -72 translate
        0 0 moveto
        0 72 rlineto
        72 0 rlineto
        0 -72 rlineto
        -72 0 rlineto
    closepath
    4 setlinewidth
    stroke
    showpage"}

% Display a figure at half size
\special{language "PostScript",
  literal "0.5 0.5 scale",
  include "pict.eps"}

% Display the figure in landscape
% mode by rotating the coordinates
% about the center of the bounding
% box
\special{language "PostScript",
  literal
   "BoxWidth 2 div
    BoxHeight 2 div translate
    90 rotate
    BoxWidth -2 div
    BoxHeight -2 div translate",
  include "pict.eps"}
\end{verbatim}

Naturally, the details of a \verb|\special|
command invocation should be hidden away in
suitable macros that are easy to use.  Here are
some examples from a recent document illustrating
the incorporation by \verb|dvialw| of
\POSTSCRIPT{} figures from a variety of sources:
%
\begin{verbatim}
\newcommand{\FIGPLOT}[4]{%
  % Arg 1 = EPS file to plot
  % Arg 2 = figure caption
  % Arg 3 = width in inches
  % Arg 4 = height in inches
  \begin{figure}[htb]
    \Figrule\smallskip
    \begin{center}
      \setlength{\unitlength}{1.0in}
      \begin{picture}(#3,#4)(0,0)%
        \put(0,0){\special{
            language "PostScript",
            position "bottom left",
    literal
      "/SX {#3 72 mul BoxWidth div} def
      /SY {#4 72 mul BoxHeight div} def
      1 SX sub BoxLLX mul
      1 SY sub BoxLLY
      mul translate
      SX SY scale",
            include "#1"}}%
        \put(0,0){\circle*{0.5}}%
        \put(0,0){\dashbox{0.1}%
                 (#3,#4)[t]{}}%
      \end{picture}%
    \end{center}
    \caption{\tolerance=6000
        \emergencystretch=3pt
        #2
        File: {\tt #1}.
        Picture size: #3in wide by
        #4in high.}
    \label{#1}
    \smallskip\Figrule
  \end{figure}
}

\newcommand{\Figrule}{\hrule
                      width \linewidth
                      height 2pt
                      depth 2pt \relax}

\FIGPLOT{roseart.ps}{Adobe Illustrator
                    1.0b2 rose art
                    (scaled 1:2)}
        {3.4861}{4.625}

\FIGPLOT{golfart.ps}{Test of golfart
                    scaling
                    (scaled $1:2$).}
        {3.95833}{4.82639}

\FIGPLOT{tiger.ps}{A bitmapped image.}
        {4.5}{3.0107}

\end{verbatim}

The \verb|literal| string makes use of
\POSTSCRIPT{} macros output by \verb|dvialw| to
define the position (\verb|BoxLLX|, \verb|BoxLLY|,
\verb|BoxURX|, and \verb|BoxURY|) and size
(\verb|BoxHeight| and \verb|BoxWidth|) of the
bounding box.  The current page position is also
available as (\verb|CurrentX|, \verb|CurrentY|),
and the paper size as \verb|PaperHeight| and
\verb|PaperWidth|.  All of these are in standard
\POSTSCRIPT{} units of big points.  These
quantities are needed to support things like
figure transformations, landscape mode, change
bars, and grey shading.

If a \verb|\special| contains both a
\verb|literal| and an \verb|include| or
\verb|overlay| statement, then the literal string
is output {\em before} the inserted file.  Should
the reverse order be required, then the literal
string must be specified in a separate following
\verb|\special|.

Finally, here are some examples of the same
language, now used to parse paper specifications.
The first is a command-line, or startup-file,
switch which provides a paper program as the
switch value:
%
\begin{verbatim}
-paper:{paper="letter";width=8.5in;
        height=11in;dev_init="...";}
\end{verbatim}
%
In a startup file, such options can be written
more clearly:
%
\begin{verbatim}
-paper:
{
  % standard US paper size
  paper = "letter";
  width = 8.5in;
  height = 11in;

  % printer origin is off by 0.05in
  x_origin = 1.05in;
  y_origin = 1in;

  % printer wraps coordinates, so
  % we need clipping turned on
  x_clip = 1;
  y_clip = 1;

  % not all of page is imageable
  x_left = 0.3in;
  x_right = 0.3in;
  y_top = 0.5in;
  y_bottom = 0.5in;

  % print pages from last to first
  output_order = -1;

  % adjacent strings are concatenated
  dev_init = "...."
       "...."
       "....";
  % final formfeed and printer reset
  dev_term = "\f\033E";
  page_init = "....";
  page_term = "....";
}

-paper=
{
  paper         = "ALW-note";
  use           = "letter";
  x_left        = 0.41in;
  x_right       = 0.41in;
  y_top         = 0.42in;
  y_bottom      = 0.42in;
}
\end{verbatim}

The last example illustrates a feature of the
paper specification language; the \verb|use|
keyword references a paper type defined elsewhere
whose specifications are copied into the internal
data structures for the new type before the new
values are installed.    This makes it easy to
prepare modifications of base forms.  For example,
most laser printers use the same size paper, but
differ in the imageable area and output stacking
order.  The example above defines a paper
type known to the Apple LaserWriter in terms of a
standard paper type.

Comments are from percent to end-of-line (like
\TeX{}), and letter case is {\em not significant}
in variable names.  Whitespace is ignored, so the
specification can be formatted for readability, or
for compactness.

Dimensions can be given in any unit known to
\TeX{} (bp cc cm dd in mm pc pt sp).

The presence of a left brace following the paper
switch signals that a forms definition follows;
otherwise, the following token is a paper name.
To facilitate collection of the complete
specification at a higher level without having to
parse it in detail when the switch and its value
are collected, braces must be balanced; escape
sequences and comments provide ways to ensure
this.

\section{The language grammar}

The grammar for the language is based on the C
programming language grammar given in Appendix B
of \cite{Harbison:carm-2}, with changes affecting
hexadecimal escape sequences in strings, and
concatenation of adjacent strings, as specified in
the ANSI C standard \cite{ANSI:c89}.

Adjacent string concatenation is a convenient way
of working around limitations on line length when
long strings are needed, and adding support for it
took only four lines of code.  Hexadecimal escape
sequences of arbitrary length permit transparent
support for character sizes larger than 8 bits.
Octal escape sequences remain limited to 3 digits
for backward compatibility; hexadecimal escape
sequences are new with ANSI C.

In the following grammar, the suffix \verb|-opt|
means that the item is optional.  For brevity,
numeric constants are not specified in grammatical
form here.  They are parsed by the ANSI C library
routine, \verb|strtod()|, which expects numbers in
the form (\verb|[ ]| marks optional fields,
\verb={|}= marks alternatives):
%
\begin{verbatim}
[whitespace][sign][digits][. digits]
  [{e|E}[sign]digits]
\end{verbatim}

Except in quoted strings, tokens may not contain
embedded blanks.  Thus, 210mm is legal input, but
210\verb*| |mm is not.

Here is the grammar, in standard Backus-Naur form:
%
\begin{verbatim}
program:
  statement

statement:
  assignment-statement
  compound-statement
  null-statement

assignment-statement:
  name = constant
  name : constant
  name   constant

compound-statement:
  { statement-list-opt }

null-statement:
  ,
  ;

statement-list:
  statement
  statement ; statement-list
  statement , statement-list

constant:
  dimension-constant
  float-constant
  string-constant
  name

dimension-constant:
  float-constant dimension-unit

dimension-unit: one of
  bp cc cm dd in mm pc pt sp

string-constant:
  simple-string-constant
  string-constant simple-string-constant

simple-string-constant:
  " character-sequence-opt "
  ' raw-character-sequence-opt '

character-sequence:
  character
  character-sequence character

raw-character-sequence:
  raw-character
  raw-character-sequence character

character:
  printing-character
  escape-character

raw-character:
  printing-character
  \'

printing-character: one of
  (note that " and \ are omitted,
  and ' may be specified by \'
  as well)
  <space> !   # $ % &   ( ) * + , -
  . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
  @ A B C D E F G H I J K L M
  N O P Q R S T U V W X Y Z [   ] ^ _
  ` a b c d e f g h i j k l m
  n o p q r s t u v w x y z { | } ~

escape-character:
  \ escape-code

escape-code:
  character-escape-code
  octal-escape-code
  hexadecimal-escape-code

character-escape-code:
  a b f n r t v \ ' "

octal-escape-code:
  octal-digit
  octal-digit octal-digit
  octal-digit octal-digit octal-digit

hexadecimal-escape-code:
 x hexadecimal-digit
 hexadecimal-escape-code hexadecimal-digit

octal-digit: one of
  0 1 2 3 4 5 6 7

hexadecimal-digit: one of
  0 1 2 3 4 5 6 7 8 9
  A B C D E F
  a b c d e f

name:
  letter
  letter extended-letter-sequence

extended-letter-sequence:
  extended-letter
  extended-letter-sequence extended-letter

letter: one of alphabetic or
  underscore characters
  A B C D E F G H I J K L M
  N O P Q R S T U V W X Y Z
  a b c d e f g h i j k l m
  n o p q r s t u v w x y z
  _

extended-letter:
  0 1 2 3 4 5 6 7 8 9
  A B C D E F G H I J K L M
  N O P Q R S T U V W X Y Z
  a b c d e f g h i j k l m
  n o p q r s t u v w x y z
  - . _
\end{verbatim}

For readers unfamiliar with programming language
grammars, a short explanation is in order.  The
beginning rules
%
\begin{verbatim}
program:
  statement

statement:
  assignment-statement
  compound-statement
  null-statement
\end{verbatim}
%
say that a \verb|program| is a \verb|statement|,
and that a \verb|statement| is either an {\tt
assignment-statement}, or a {\tt
compound-statement}, or a {\tt null-statement}.
Further rules in turn define what each of these
are.  The last rule says that an {\tt
extended-letter} is a digit, letter, hyphen, dot,
or underscore.

The characters permitted in {\tt extended-letter}
are chosen
%
\begin{itemize}
  \item
        to avoid conflict with characters
        otherwise significant in the grammar, and
  \item
        to cover the most common filename syntax,
        so as to allow unquoted simple filenames
        to be collected as single constant name
        tokens for  assignments.
\end{itemize}

This grammar supports two kinds of quoted strings.
The {\em normal\/} kind is delimited by double
quotes, and inside it are recognized all the
escape sequences supported by the C language.  The
{\em raw\/} kind is delimited by single quotes;
only escape-single-quote pairs are recognized
inside it.  This is more convenient when it is
necessary to have strings with several
backslashes, since it then avoids having to double
all of them.  Once normal and raw strings are
parsed, they are stored identically.

German \TeX{} styles often change the syntax of
the quotation mark to add an umlaut accent to the
following character; users of such styles can
happily use the raw string form to avoid conflict.

Backslashes in literal strings and filenames pose
a small problem for the user, because \TeX{} will
ordinarily try to interpret control sequences
triggered by backslashes in the argument of the
\verb|\special| command.

For filenames, IBM PC DOS is the only operating
system that normally would use backslashes, and
then only as a directory separators.  In most
cases, you should omit directory paths anyway, and
rely instead on the \CODE{DVIINPUTS} search path
to let the drivers find the files at run time;
doing so will enhance document portability.  If
you still wish to use a directory path in the
\verb|\special| command, you can exploit an
unadvertised feature of PC DOS; namely, system
calls accept forward slashes as equivalent to
backslashes, so you can use forward slashes
instead.  This is normally not possible with
PC DOS commands that accept filenames on the
command line, because their simplistic parsing
confuses such names with option switches.

Literal strings are therefore likely to be the
only place where backslashes may be unavoidable.
Although it would have been possible to choose
another escape character than backslash for such
strings, this would likely prove confusing to
those users who are used to C and \UNIX{}, where
the backslash escape character is firmly
entrenched.

Fortunately, the solution is not difficult,
because \TeX{} does not have backslash hardcoded
as a control sequence prefix; you can change it by
altering \TeX{}'s catcodes.  Thus instead of
writing something like
%
\begin{verbatim}
\special{literal = "\033[I"}
\end{verbatim}
%
\noindent
which would raise a \TeX{} {\em Undefined
control sequence\/} error, you can instead write
%
\begin{verbatim}
{
  \catcode`\@=0
  \catcode`\\=12
  @special{literal = "\033[I"}
}
\end{verbatim}
%
\noindent
This changes the \TeX{} control sequence prefix
from backslash to at-sign, and gives backslash a
meaning that will not cause problems.  The
surrounding braces ensure that the changes
disappear when the braced group is exited.  The
catcodes are of course ugly magic numbers, so if
you do this more than once, you should hide them
in a macro with a more meaningful name, and use
that macro in place of the first two lines
in the group.

The grammar supports statement separators (rather
than terminators), and they may be either commas
or semicolons.  In a simple language, it is
convenient to allow both kinds of separators.
Since there is a null statement, the separator is
optional after the last statement in a sequence.

Drivers will supply an implicit brace pair
surrounding the \verb|\special| string retrieved
from the \DVI{} file, to ensure that
multi-statement text looks like the compound
statement required by the grammar.

Finally, note that the assignment statement may
use either the equals or colon operator, or the
operator may be omitted altogether.  This supports
the common forms
%
\begin{verbatim}
keyword = value
keyword : value
keyword   value
\end{verbatim}
%
Because the values have very limited syntactical
possibilities, there is no ambiguity created by
this.

\section{The {\tt\char92special} language}

The preceding section defined the grammar for the
\verb|\special| language.  We now need to define
what keywords will be recognized.  As emphasized
above, the language is {\em extensible}, and the
parser that I have implemented for it makes it
very easy to add new keywords {\em without
touching a single line of the parser code itself}.
For example, only a short specification like
%
\begin{verbatim}
{
  {"include", 7},
  (symbol_value*)&spec.include,
  T_STRING
},
\end{verbatim}
%
needs to be added to a table to define a new
keyword, together with a small amount of code to
do something with the value returned by the parser
for that keyword.

The current set of keywords recognized is given in
the following table:
%
\begin{center}
  \begin{tabular}{llp{1.3in}}
\hline
Keyword & Value & Action \\
\hline
{\tt boundingbox} & string & Define the
                             coordinates of the
                             lower-left and
                             upper-right corners
                             of the box which
                             bounds the figure
                             input by an
                             \verb|include| or
                             \verb|overlay|
                             command.\\

{\tt graphics} & string  & Execute the generic
                           graphics primitives in
                           string (not yet
                           defined).\\
{\tt include} & filename & Insert file contents
                           with relative page
                           positioning.\\
{\tt language} & string &  Name the output-device
                           language  for which
                           this \verb|\special|
                           is intended.\\
{\tt literal} & string   & Insert literal output
                           device code.\\
{\tt message} & string   & Supply an operator
                           message to be sent to
                           the terminal and log
                           file.\\
{\tt options} & string   & Not yet defined.\\
{\tt overlay} & filename & Insert file contents
                           with absolute page
                           positioning.\\
{\tt position} & string  & Specify the reference
                             point on an inserted
                             figure which is to be
                             mapped to the current
                             page position.\\
\hline
  \end{tabular}
\end{center}
%
In a series of assignment statements, the order of
the keywords is not significant, except that if
duplicate keywords are specified, the value of the
last one is used.  It is not necessary to supply a
final newline in the strings or files; one will be
provided implicitly to ensure correct parsing.

The \verb|graphics| keyword value is intended to
be used to support a simple generic graphics
language, yet to be defined.  Such a language
would make it possible to obtain simple line
graphics on virtually any output device.

The \verb|options| keyword value could be used to
supply device-dependent information; no particular
values have yet been defined in my 3.0 \DVI{}
driver code.

The \verb|message| string provides a means for
operator communication; for example,
%
\begin{verbatim}
message "Thesis bond paper for this job"
\end{verbatim}
%
The message is sent verbatim to the terminal and
the log file.

The \CODE{position} keyword specifies a string
that should contain two blank-separated words.
The first should be one of \CODE{top},
\CODE{middle}, or \CODE{bottom}, and the second
should be one of \CODE{left}, \CODE{center}, or
\CODE{right}.  These words may be abbreviated to a
single letter if desired.  Together, they select
on the bounding box one of nine points (four
corners, four edge centers, and the box center)
which is to be placed at the \TeX{} current point.
If this keyword is not given, the default is
%
\begin{verbatim}
position = "top left"
\end{verbatim}
%
\noindent
The point selected by this keyword (or by default)
will be the {\em reference point\/} for the
insertion of graphics files.

The following remarks are particular to
\POSTSCRIPT{} devices, but the possible
generalizations to others should be evident.

Literal \POSTSCRIPT{} code from a file or a
literal string is expected to be well-behaved, and
preferably, should conform to Adobe's Encapsulated
\POSTSCRIPT{} File format version 2.0 or later
\cite{Adobe:epsf-spec}, and to Adobe's
\POSTSCRIPT{} Document Structuring Conventions,
version 2.0 or later \cite{Adobe:docstruct-spec}.
It may contain a \verb|showpage|, which is
disabled temporarily by the \DVI{} driver during
the execution of the \verb|\special| strings, but
it should not contain any of these operators:
%
\begin{center}
  \tt
  \begin{tabular}{lll}
  \hline
  banddevice    & initgraphics  & setdevice    \\
  copypage      & initmatrix    & setmatrix    \\
  erasepage     & note          & setpageparams\\
  exitserver    & nulldevice    & setsccbatch  \\
  framedevice   & quit          & setscreen    \\
  grestoreall   & renderbands   & settransfer  \\
  initclip \\
  \hline
  \end{tabular}
\end{center}
%
If it does, erroneous output is virtually certain.
While these commands could be disabled like
\verb|showpage| is, Adobe's Encapsulated
\POSTSCRIPT{} guidelines do not recommend doing
so.

The imported \POSTSCRIPT{} should write into its
own dictionary if it needs to define objects.
Because dictionary sizes must be specified when
they are created, it is not possible to define a
standard one in advance in the macros that mark
the start and end of the imported \POSTSCRIPT{}
(called \verb|SB| and \verb|SE| in \verb|dvialw|)
to protect from corruption of the dictionary used
by the \DVI{} driver.

The \verb|language| keyword should specify
\verb|"PS"| or \verb|"PostScript"|; letter case
does not matter.  If any other non-empty value is
found, the \verb|\special| command is ignored by a
\POSTSCRIPT{} driver, since it presumably applies
to some other output device.  However, if no
\verb|language| keyword is given, the driver
assumes it should process the rest of the
\verb|\special| command.

Files specified by \verb|include| and
\verb|overlay| keywords are searched for in the
\verb|DVIINPUTS| path.

In the common {\tt include filename} case, the
upper-left corner of the \POSTSCRIPT{} bounding
box will be placed at the current point.  The
\POSTSCRIPT{} file must then contain (usually near
the start) a comment of the form
%
\begin{verbatim}
%%BoundingBox: llx lly urx ury
\end{verbatim}
%
specifying the bounding box lower-left and
upper-right coordinates in standard \POSTSCRIPT{}
units (big points, 1bp = 1/72 inch).

Alternatively, if the comment
%
\begin{verbatim}
%%BoundingBox: (atend)
\end{verbatim}
%
is found in the file, the last 4096 characters of
the file will be searched to find a bounding box
comment that specifies the coordinates of the two
corners.  The {\em last\/} such comment found is
the one used; this requirement permits correct
handling of inserted files that themselves contain
nested \POSTSCRIPT{} files.

In the {\tt overlay filename} case, the
\POSTSCRIPT{} file to be included will be mapped
onto the page at precisely the coordinates it
specifies, where the page origin is in the
lower-left corner, $x$ increasing to the right,
and $y$ increasing upward.  Any
\verb|%%BoundingBox| specification is ignored,
since it is not required for positioning.  This
option might be used to print an overlay page.
For actions that are to be done on every page,
such as printing a logo, or a string like {\tt
Draft} or {\tt Company Confidential}, it is more
efficient to redefine the \POSTSCRIPT{}
\verb|showpage| command instead.

If the \POSTSCRIPT{} file cannot be opened, or the
\verb|\special| command string cannot be
recognized, or for relative positioning, the
bounding box cannot be determined, a warning
message is issued and the \verb|\special| command
is ignored.

Numerous drivers already support \verb|\special|
command strings of the form {\tt include
filename}; this parser will recognize them.

A key point here is the \verb|language| keyword.
If it is {\em not\/} given, the \DVI{} driver must
assume that the \verb|\special| command is
intended for it, and attempt to process it.  Thus,
%
\begin{verbatim}
\special{include tiger.eps}
\end{verbatim}
%
will be handled as before.  However, when the
\verb|language| keyword is found, then its value
determines whether the \DVI{} driver will process
the \verb|\special|, or ignore it.

Every \DVI{} driver must recognize a generic
language choice relevant to its output device,
such as {\tt PostScript} or {\tt Epson}.  In
addition, each driver must recognize its own name
as a language value.

The reason for this requirement is as follows.
When startup files are supported, their names are
derived from the driver names.  In my 3.0 \DVI{}
driver code, a driver named \verb|dvialw| will
search for startup files named \verb|dvialw.ini|
in a list of standard places.  The default
behavior of a particular driver can be changed
merely by storing a copy of its executable program
under a different name, and providing a
corresponding startup file.  Typically, this would
be done to provide easy-to-use variants of a basic
driver for different paper types, or different
page orientations.   If the user wishes to
incorporate driver-specific \verb|\special|
strings, permitting the \verb|language| value to
be the driver name provides that flexibility.

Existing mini-languages for graphics, such as
\verb|eepic|, \verb|epic|, \verb|tpic|, and
\verb|xpic|, are properly handled using the
\verb|graphics| and \verb|language| keywords
together:
%
\begin{verbatim}
\special{language = "tpic",
         graphics = "..."}
\end{verbatim}

The \DVI{} Driver Standards Committee has debated
whether drivers should issue warning messages
about \verb|\special| commands that they are
unable to process.  In the absence of the
\verb|language| keyword, I believe that such
warnings are desirable, although the driver should
provide an option to suppress such warnings.

However, when a \verb|language| value is found, it
is important that the driver {\em silently
ignore\/} ones that it is not prepared to process.
The presence of that value is sufficient evidence
to conclude that the user intends it to be ignored
by some drivers, and certainly does not want
those drivers to complain about it.

I expect that with more powerful, and
standardized, \verb|\special| command support of
the type described in this paper, use of
\verb|\special|s will increase.  Consider, for
example, a document that makes heavy use of color
or grey-scale requests via \verb|\special|
commands; there could be hundreds, or even
thousands, of them in a document of modest size.
Were the driver to issue warnings for all of them,
the terminal output or log file would be flooded
with mostly useless warning messages that obscure
much more important information.  The
\verb|language| value provides a standard means to
prevent this.

\section{Paper specification}

Paper handling and specification is a complex
issue, and may require future extensions.  Thus,
it is desirable to have a flexible means of
specifying paper characteristics, and a reasonable
scheme seems to be to use a small extensible
language to define it.  The assignment-statement
language whose grammar was presented above is
suitable for this purpose.

Some examples of the paper specifications
supported by my 3.0 \DVI{} driver work were given
earlier.  Here, we define the keywords recognized.

 \begin{center}
   \begin{tabular}{llp{1.1in}}
     \hline
Keyword         & Type          & Description \\
     \hline
\CODE{dev_init}        & string        & initiate device use of paper\\
\CODE{dev_term}        & string        & terminate device use of paper\\
\CODE{height}          & dimension     & paper height\\
\CODE{output_order}    & number        & negative for printing last to first\\
\CODE{paper}           & string        & paper form name\\
\CODE{use}             & string        & name of copied paper form\\
\CODE{width}           & dimension     & paper width\\
\CODE{x_clip}          & number        & clip in x direction if non-zero\\
\CODE{x_left}          & dimension     & width of left unprintable margin\\
\CODE{x_right}         & dimension     & width of right unprintable margin\\
\CODE{x_origin}        & dimension  & horizontal offset of \TeX{} (0,0) point\\
\CODE{y_bottom}        & dimension     & \sloppy
                                         width of bottom unprintable margin\\
\CODE{y_clip}          & number        & clip in y direction if non-zero\\
\CODE{y_origin}        & dimension  & vertical offset of \TeX{} (0,0) point\\
\CODE{y_top}           & dimension     & width of top unprintable margin\\
     \hline
  \end{tabular}
 \end{center}
%
The \CODE{paper} keyword defines a name that is
used to tag the collected parameters.  If the form
name already exists, assignments will replace
previous values.  Otherwise, a new form is
created.

The \CODE{use} keyword names an existing form
whose parameters are to be copied to a new one
named by the \CODE{paper} keyword in the same
program.  This copying happens {\em before\/} any
of the other keyword assignments are done.  The
order of the statements in the program does not
matter, because the results of the assignments are
collected in a temporary form before copying to
the specified form.  Recursive forms references
are supported; just don't make them circular!  The
\CODE{use} keyword should normally be employed to
make private modifications of standard forms
types.

Some printers misbehave if they are presented with
data that are off the page, or too close to the
margins; for example, the Hewlett-Packard LaserJet
wraps such coordinates horizontally.  For such
devices, the \CODE{x_clip} and \CODE{y_clip}
values should be set non-zero.

Few printers place the (0,0) origin exactly in the
upper-left page corner; instead, they have it
slightly offset at some other point, which we call
(\CODE{x_origin},\CODE{y_origin}).  The standard
\LaTeX{} file, \FN{testpage.\-tex}, can be used to
determine the correct settings of these values.
If you print its typeset output, the upper-left
corner of the inner frame should be exactly one
inch from the page edges.  Suppose you actually
find that that corner lies 0.75in from the left
edge, and 1.1in from the top edge.  This means the
printer's (0,0) point is to the left, and just
below, the upper-left corner.  Setting
\CODE{x_origin = -0.25in} and \CODE{y_origin =
0.1in} will compensate, so the next time you print
the test page, the inner frame should be correctly
positioned.

Most printers are incapable of printing very close
to the edges of the physical page; the margin
values \CODE{x_left}, \CODE{x_right},
\CODE{y_bottom}, and \CODE{y_top} should be set to
indicate the relevant limits.  Sometimes these
values can be found in the printer documentation.
However, if the physical paper position relative
to the printing mechanism is adjustable, as it is
for most dot-matrix printers, you may have to
experiment.  If you print the \FN{testpage.tex}
typeset output, the tick marks in the four margins
will usually not print near the paper edges; use
them as a guide to setting reasonable values for
the margin values.

\DVI{} drivers that require a page bitmap will
allocate memory corresponding to the paper surface
inside of these margins.  Wide margin settings
can therefore reduce the amount of memory
required; that in turn can reduce the number of
bitmap strips that must be processed for
high-resolution printers, speeding the output.

The standard \TeX{} and \LaTeX{} macro packages
are parametrized to assume that the \TeX{} (0,0)
point will be exactly one inch in from the left,
and one inch down from the top.  They also usually
assume American paper sizes.  Text widths and
heights are then chosen to ensure identical top
and bottom margins, and except for two-sided
printing styles, identical left and right margins.
While the \DVI{} driver \OPTION{x} and \OPTION{y}
command line options can be used to adjust the
output position, it is usually better to do so by
setting paper parameters.

 \begin{sloppypar}
For example, ISO A4 paper is 210mm (8.2677in)
wide; \TeX{} macro packages assume 6.5in text
width with 1in left and right margins.  To center
that text on A4 paper, the 1in margins need to be
reduced by (8.5 - 8.2677)/2 = 0.1161in, so we
could put \CODE{x_origin = +0.1161in}.  Similarly,
the A4 height of 297mm (11.6929in) exceeds the
11in U.~S.  paper height, and requires adding
(11.6929 - 11.0)/2 = 0.3465in to the top and
bottom margins.  That can be accomplished by
setting \CODE{y_origin = -0.3465in}.  Of course,
if you already have non-zero values of these
parameters, you will have to adjust them
accordingly; just {\em add\/} the above offsets to
the existing values.
 \end{sloppypar}

If you routinely use non-American paper sizes,
then you probably should be using a style file
modification that accounts for the different page
dimensions, rather than fiddling with paper
positioning on your output device.

The \CODE{output_order} value should be set
negative if you want pages printed from last to
first.  This provides an alternate to the
\OPTION{backwards} command line option, but
affects only the paper forms types it is defined
for.  If \CODE{output_order} is negative, the
\DVI{} drivers will simply flip the current
setting of the backwards-printing switch, which
may have already been set from the command line.

If the printer needs to receive some magic codes
to select an alternate paper type (e.g.\ some
high-speed laser printers support multiple input
paper trays), it will be necessary for the \DVI{}
driver to write them into the output file.  The
\CODE{dev_init} and \CODE{dev_term} strings
provide for this.  The \DVI{} drivers output the
initialization string at the start of the job, and
the termination string at the end.  These are
output verbatim with nothing added, not even a
newline.

For example, if you are using the \POSTSCRIPT{}
driver, \verb|dvialw|, on a system that does not
have a \POSTSCRIPT{} printer spooler, you might
want the end of the file to have the \POSTSCRIPT{}
serial line job terminator character, \CTL{D}.
You could arrange that by setting
%
\begin{verbatim}
dev_term = "\004";
\end{verbatim}
%
\noindent
in a paper program.

The DVI drivers already know how to initialize and
terminate their output devices under normal
conditions, so you should rarely need to specify
\CODE{dev_init} and \CODE{dev_term} values.

\bibliography{special}

\makesignature

\end{document}
