diff --git a/algorithm.tex b/algorithm.tex new file mode 100644 index 0000000..eb71f74 --- /dev/null +++ b/algorithm.tex @@ -0,0 +1,103 @@ +\documentclass{article} +\begin{document} +\title{From math lists to MathML} +\subtitle{The algorithm in luamml} +\author{Marcel} +\maketitle +\section{General concepts} +In the following I assume basic familiarity with both Lua\TeX's representation of math noads and MathML. + +\subsection{Representation of XML elements} +In many places, \luamml\ passes around XML elements. Every element is represented by a Lua table. +Element \texttt 0 must always be present and is a string representing the tag name. +The positive integer elements of the table represent child elements (either strings for direct text content or nested tables for nested elements). +All string members which do not start with a colon are attributes, whose value is the result of applying \texttt{tostring} to the field value. +This implies that these values should almost always be strings, except that the value \texttt 0 (since it never needs a unit) can often be set as a number. +For example the XML document +\begin{verbatim} + + 0 + < + x + +\end{verbatim} +would be represented by the Lua table +\begin{verbatim} +{[0] = "math", block="display", + {[0] = "mn", "0"}, + {[0] = "mo", "<"}, + {[0] = "mi", mathvariant="normal", "x"} +} +\end{verbatim} + +\subsection{Expression cores} +MathML knows the concept of \enquote{embellished operators}: +\begin{blockquote} + The precise definition of an \enquote{embellished operator} is: + \begin{itemize} + \item an \tag{mo} element; + \item or one of the elements \tag{msub}, \tag{msup}, \tag{msubsup}, \tag{munder}, \tag{mover}, \tag{munderover}, \tag{mmultiscripts}, \tag{mfrac}, or \tag{semantics} (ยง 5.1 Annotation Framework), whose first argument exists and is an embellished operator; + \item or one of the elements \tag{mstyle}, \tag{mphantom}, or \tag{mpadded}, such that an mrow containing the same arguments would be an embellished operator; + \item or an \tag{maction} element whose selected sub-expression exists and is an embellished operator; + \item or an \tag{mrow} whose arguments consist (in any order) of one embellished operator and zero or more space-like elements. + \end{itemize} +\end{blockquote} +For every embellished operator, MathML calls the \tag{mo} element defining the embellished operator the \enquote{core} of the embellished operator. + +\Luamml\ makes this slightly more general: Every expression is represented by a pair of two elements: The expression and it's core. +The core is always a \tag{mo}, \tag{mi}, or \tag{mn}, \texttt{nil} or s special marker for space like elements. + +If and only if the element is a embellished operator the core is a \tag{mo} element representing the core of the embellished operator. +The core is a \tag{mi} or a \tag{mn} element if and only if the element would be an embellished operator with this core if this element where a \tag{mo} element. +The core is the special space like marker for space like elements. Otherwise the core is \texttt{nil}. + +\section{Translation of math noads} +A math lists can contain the following node types: noad, fence, fraction, radical, accent, style, choice, ins, mark, adjust, boundary, whatsit, penalty, disc, glue, and kern. The \enquote{noads} + +\subsection{Translation of kernel noads} +The math noads of this list contain nested kernel noads. So in the first step, we look into how kernel nodes are translated to math nodes. + +\subsubsection{\texttt{math_char} kernel noads} +First the family and character value in the \texttt{math_char} are used to lookup the Unicode character value of this \texttt{math_char}. +(For \textt{unicode-math}, this is usually just the character value. Legacy maths has to be remapped based on the family.) +Then there are two cases: The digits \texttt{0} to \texttt{9} are mapped to \tag{mn} elements, everything else becomes a \tag{mi} element with \texttt{mathvariant} set to \texttt{normal}. +(The \texttt{mathvariant} value might get suppressed if the character defaults to mathvariant \texttt{normal}.) +In either case, the \texttt{tex:family} attribute is set to the family number if it's not \texttt{0}. + +The core is always set to the expression itself. E.g.\ the \texttt{math_char} kernel noad \verb+\fam3 a+ would become (assuming no remapping for this family) +\begin{verbatim} +{[0] = 'mi', + mathvariant = 'normal', + ["tex:family"] = 3, + "a" +} +\end{verbatim} + +\subsection{\texttt{sub_box} kernel noads} +I am open to suggestions how to convert them properly. + +\subsection{\texttt{sub_mlist} kernel noads} +The inner list is converted as a \tag{mrow} element, with the core being the core of the \tag{mrow} element. See the rules for this later. + +\subsection{\texttt{delim} kernel noads} +If the \texttt{small_char} is zero, these get converted as space like elements of the form +\begin{verbatim} +{[0] = 'mspace', + width = '1.196pt', +} +\end{verbatim} +where 1.196 is replaced by the current value of \verb+\nulldelimiterspace+ converted to \texttt{bp}. + +Otherwise the same rules as for \texttt{math_char} apply, +except that instead of \texttt{mi} or \tag{mn} elements, +\texttt{mo} elements are generated, +\texttt{mathvariant} is never set, +\texttt{stretchy} is set to \texttt{true} if the operator is not on the list of default stretchy operators in the MathML specification +nd \texttt{lspace} and \texttt{rspace} attributes are set to zero. + +\subsection{\texttt{acc} kernel noads} +Depending on the surrounding element containing the \texttt{acc} kernel noad, it is either stretchy or not. +If it's stretchy, the same rules as for \texttt{delim} apply, except that \texttt{lspace} and \texttt{rspace} are not set. +Otherwise the \textt{stretchy} attribute is set to false if the operator is on the list of default stretchy operators. + +\end{document}