\documentclass{article} \begin{document} \title{From math lists to MathML} \subtitle{The algorithm in luamml} \author{Marcel} \maketitle \section{General concepts} In the following I assume basic familiarity with both Lua\TeX's representation of math noads and MathML. \subsection{Representation of XML elements} In many places, \luamml\ passes around XML elements. Every element is represented by a Lua table. Element \texttt 0 must always be present and is a string representing the tag name. The positive integer elements of the table represent child elements (either strings for direct text content or nested tables for nested elements). All string members which do not start with a colon are attributes, whose value is the result of applying \texttt{tostring} to the field value. This implies that these values should almost always be strings, except that the value \texttt 0 (since it never needs a unit) can often be set as a number. For example the XML document \begin{verbatim} 0 < x \end{verbatim} would be represented by the Lua table \begin{verbatim} {[0] = "math", block="display", {[0] = "mn", "0"}, {[0] = "mo", "<"}, {[0] = "mi", mathvariant="normal", "x"} } \end{verbatim} \subsection{Expression cores} MathML knows the concept of \enquote{embellished operators}: \begin{blockquote} The precise definition of an \enquote{embellished operator} is: \begin{itemize} \item an \tag{mo} element; \item or one of the elements \tag{msub}, \tag{msup}, \tag{msubsup}, \tag{munder}, \tag{mover}, \tag{munderover}, \tag{mmultiscripts}, \tag{mfrac}, or \tag{semantics} (ยง 5.1 Annotation Framework), whose first argument exists and is an embellished operator; \item or one of the elements \tag{mstyle}, \tag{mphantom}, or \tag{mpadded}, such that an mrow containing the same arguments would be an embellished operator; \item or an \tag{maction} element whose selected sub-expression exists and is an embellished operator; \item or an \tag{mrow} whose arguments consist (in any order) of one embellished operator and zero or more space-like elements. \end{itemize} \end{blockquote} For every embellished operator, MathML calls the \tag{mo} element defining the embellished operator the \enquote{core} of the embellished operator. \Luamml\ makes this slightly more general: Every expression is represented by a pair of two elements: The expression and it's core. The core is always a \tag{mo}, \tag{mi}, or \tag{mn}, \texttt{nil} or s special marker for space like elements. If and only if the element is a embellished operator the core is a \tag{mo} element representing the core of the embellished operator. The core is a \tag{mi} or a \tag{mn} element if and only if the element would be an embellished operator with this core if this element where a \tag{mo} element. The core is the special space like marker for space like elements. Otherwise the core is \texttt{nil}. \section{Translation of math noads} A math lists can contain the following node types: noad, fence, fraction, radical, accent, style, choice, ins, mark, adjust, boundary, whatsit, penalty, disc, glue, and kern. The \enquote{noads} \subsection{Translation of kernel noads} The math noads of this list contain nested kernel noads. So in the first step, we look into how kernel nodes are translated to math nodes. \subsubsection{\texttt{math_char} kernel noads} First the family and character value in the \texttt{math_char} are used to lookup the Unicode character value of this \texttt{math_char}. (For \textt{unicode-math}, this is usually just the character value. Legacy maths has to be remapped based on the family.) Then there are two cases: The digits \texttt{0} to \texttt{9} are mapped to \tag{mn} elements, everything else becomes a \tag{mi} element with \texttt{mathvariant} set to \texttt{normal}. (The \texttt{mathvariant} value might get suppressed if the character defaults to mathvariant \texttt{normal}.) In either case, the \texttt{tex:family} attribute is set to the family number if it's not \texttt{0}. The core is always set to the expression itself. E.g.\ the \texttt{math_char} kernel noad \verb+\fam3 a+ would become (assuming no remapping for this family) \begin{verbatim} {[0] = 'mi', mathvariant = 'normal', ["tex:family"] = 3, "a" } \end{verbatim} \subsection{\texttt{sub_box} kernel noads} I am open to suggestions how to convert them properly. \subsection{\texttt{sub_mlist} kernel noads} The inner list is converted as a \tag{mrow} element, with the core being the core of the \tag{mrow} element. See the rules for this later. \subsection{\texttt{delim} kernel noads} If the \texttt{small_char} is zero, these get converted as space like elements of the form \begin{verbatim} {[0] = 'mspace', width = '1.196pt', } \end{verbatim} where 1.196 is replaced by the current value of \verb+\nulldelimiterspace+ converted to \texttt{bp}. Otherwise the same rules as for \texttt{math_char} apply, except that instead of \texttt{mi} or \tag{mn} elements, \texttt{mo} elements are generated, \texttt{mathvariant} is never set, \texttt{stretchy} is set to \texttt{true} if the operator is not on the list of default stretchy operators in the MathML specification nd \texttt{lspace} and \texttt{rspace} attributes are set to zero. \subsection{\texttt{acc} kernel noads} Depending on the surrounding element containing the \texttt{acc} kernel noad, it is either stretchy or not. If it's stretchy, the same rules as for \texttt{delim} apply, except that \texttt{lspace} and \texttt{rspace} are not set. Otherwise the \textt{stretchy} attribute is set to false if the operator is on the list of default stretchy operators. \end{document}