\documentclass{ltxdoc}

\usepackage{fontspec,polski,bredzenie,soul,booktabs,unicode-math}

\setmainfont[Numbers=OldStyle]{STIX Two Text}
\setmathfont{STIX Two Math}

\edef\MaxBredzenie{\csname bredzenie@max\endcsname}

\title{Package \texttt{bredzenie} – \MaxBredzenie\
  paragraphs of pseudo-Polish at your disposal}

\author{Marcin Woliński}

% Get the version and date of bredzenie:
\def\getVersion#1 #2 #3\relax{%
  \date{#2, #1}%
}
\expandafter\expandafter\expandafter
\getVersion\csname ver@bredzenie.sty\endcsname\relax

\begin{document}
\selecthyphenation{english}
\maketitle

Graphic designers often need some semantically neutral material to
showcase a document layout.  This problem is usually solved by using
the classic pseudo-Latin text \textit{Lorem ipsum dolor sit amet…}.
This text has been made available by Patrick Happel in the form of the
\LaTeX\ package \texttt{lipsum.sty}.

However, languages differ in word lengths, characteristic frequency of
letters and their $n$-grams, and so on.  This means that a page of
text in Latin will look different than a page of Polish.  For the
development of Polish document classes I have developed the package
\texttt{bredzenie}\footnote{In Polish \emph{bredzenie} means ‘talking
  nonsense’.}, which provides access to several paragraphs of
pseudo-Polish.

The text has been generated with Hidden Markov Models and,
alternatively, Recurrent Neural Networks trained on a corpus of
Polish.  Although the text makes absolutely no sense, it exhibits
correct statistical characteristics, it “looks” Polish (in particular
Polish hyphenation patterns are applicable).

The implementation uses e\TeX\ features (\cs{numexpr} and
\cs{protected}), but that should be OK with all contemporary \TeX\
implementations.

\section{User Interface}
\label{sec:interface}

The package can be loaded with the usual
\begin{verbatim}
\usepackage{bredzenie}
\end{verbatim}
The text provided by the package is encoded in UTF-8, so if one is
using old-fashioned 8-bit \TeX, an earlier
\verb|\usepackage[utf8]{inputenc}| invocation is mandatory.


\begin{macro}{\bredzenie}
The main command provided by the package is \cs{bredzenie} with one mandatory
argument.\footnote{This is different than in package \texttt{lipsum},
where the argument is optional.}  This command fetches one or more of \csname
bredzenie@max\endcsname\ stored paragraphs.  The
argument has to be a number of a paragraph (starting with 1) or a
range specified by two numbers:
\begin{verbatim}
\bredzenie{1}
\bredzenie{105-108}
\end{verbatim}
For example, the range 105–108 generates:
\begin{quotation}%\itshape
  \selecthyphenation{polish}
  \bredzenie{105-108}
\end{quotation}

\begin{macro}{\BredzenieSep}
  When a range of text snippets is fetched, they are separated with
  calls to \cs{BredzenieSep}.  This command is defined in the package
  as \cs{par}, so the snippets are treated as separate paragraphs.
  However, no separator is added at the end, so bredzenie can also be
  used within a paragraph (in this context short fragments number
  201–210 are very useful):
\begin{verbatim}
\bredzenie{201} \textsc{\bredzenie{202}} \bredzenie{203}
\textit{\bredzenie{205}} \bredzenie{206} 
\(\sum_{i=1}^{210}\alpha_iv_i\) \bredzenie{207}.
\end{verbatim}
\begin{quotation}
  \selecthyphenation{polish}
\bredzenie{201} \textsc{\bredzenie{202}} \bredzenie{203}
\textit{\bredzenie{205}} \bredzenie{206} 
\(\sum_{i=1}^{210}\alpha_iv_i\) \bredzenie{207}.
\end{quotation}
For special effects \cs{BredzenieSep} can be redefined.
\end{macro}

The command \cs{bredzenie} is fully expandable, so you can use it also
in the contexts like
\begin{verbatim}
\message{\bredzenie{72}}
\end{verbatim}
or in an \cs{edef}. For example, the command \cs{ul} from the package
\texttt{soul} needs to be given explicit text, not a command providing
text.  This can be achieved in the following way:
\begin{verbatim}
\edef\tobeunderlined{\bredzenie{205}}
\bredzenie{202} \bredzenie{204}
\expandafter\ul\expandafter{\tobeunderlined}
\bredzenie{206}…
\end{verbatim}
\begin{quotation}
  \selecthyphenation{polish}
\edef\tobeunderlined{\bredzenie{205}}
\bredzenie{202} \bredzenie{204}
\expandafter\ul\expandafter{\tobeunderlined}
\bredzenie{206}…
\end{quotation}
\end{macro}

The following three commands can be redefined by the user to fine-tune
the appearance of the typeset text.
\begin{macro}{\BredzenieHyphen}
According to a Polish typographical rule, explicit hyphens used in
compound words should be doubled when they fall at a line break.  Such
explicit hyphens in \texttt{bredzenie} are represented with the
command \cs{BredzenieHyphen}.
\end{macro}

\begin{macro}{\BredzenieDash}
Dashes used as punctuation are represented with \cs{BredzenieDash}.
The package provides a definition  for this command as an en-dash with
spaces around.  If other variation on dashes is preferred, this
command can be redefined.
\end{macro}

\begin{macro}{\BredzenieNbsp}
  A Polish typographical rule forbids breaking lines after any
  one-letter word.  This is achieved in \LaTeX\ using the \verb|~|
  command inserted at appropriate spots.  However, in the texts
  provided by \texttt{bredzenie} such spaces are represented as
  \cs{BredzenieNbsp} to allow for redefinition.
\end{macro}


\section{The text}

For your reading pleasure (provided you know Polish) here is the full
bredzenie.  The paragraphs are numbered to ease selection of most
catchy fragments. See also Table~\ref{tab:index} for information on
type of text represented by particular paragraphs.
\def\tablename{Table}
\begin{table}[t]
  \caption{A guide to the paragraphs of bredzenie}
  \label{tab:index}
  \begin{center}
    \begin{tabular}{cp{9cm}}
      \toprule
      range & \multicolumn{1}{c}{how generated}\\\midrule
      1–50 & Hidden Markov Model trained on a balanced corpus of
             Polish texts from 1960s (numbers written with words, no abbrevs)\\
      51–100 & LSTM Recurrent Neural Network trained on 1-milion
            subcorpus of the National Corpus of Polish \url{nkjp.pl}
               (numbers, abbrevs, some blog text)\\
      101–\MaxBredzenie & LSTM trained on several novels by Henryk
                       Sienkiewicz (many dialogues)\\
      \bottomrule
    \end{tabular}
  \end{center}
\end{table}


\bigskip
\selecthyphenation{polish}
\makeatletter
\newcounter{numbred}
\let\bredzenie@orig\bredzenie@minmax
\def\bredzenie@minmax{\stepcounter{numbred}%
  \leavevmode\llap{\textsuperscript{\bfseries\thenumbred\quad\indent}}%
  \bredzenie@orig
}

\bredzenie{1-\MaxBredzenie}

\end{document}


%%% Local Variables:
%%% mode: latex
%%% mode: flyspell
%%% TeX-master: t
%%% TeX-engine: xetex
%%% TeX-PDF-mode: t
%%% coding: utf-8
%%% ispell-local-dictionary: "british"
%%% End: