diff --git a/support/luamml-algorithm.tex b/support/luamml-algorithm.tex index 507a0bd..0f06311 100644 --- a/support/luamml-algorithm.tex +++ b/support/luamml-algorithm.tex @@ -1,98 +1,98 @@ -\newcommand\Luamml{\pkg{Luamml}} -\newcommand\luamml{\pkg{luamml}} -\newcommand\xmltag[1]{\texttt{<#1>}} -\section{\Luamml's representation of XML and MathML} -In the following I assume basic familiarity with both Lua\TeX's representation of math noads and MathML. - -\subsection{Representation of XML elements} -In many places, \luamml\ passes around XML elements. Every element is represented by a Lua table. -Element \texttt 0 must always be present and is a string representing the tag name. -The positive integer elements of the table represent child elements (either strings for direct text content or nested tables for nested elements). -All string members which do not start with a colon are attributes, whose value is the result of applying \texttt{tostring} to the field value. -This implies that these values should almost always be strings, except that the value \texttt 0 (since it never needs a unit) can often be set as a number. -For example the XML document -\begin{verbatim} - - 0 - < - x - -\end{verbatim} -would be represented by the Lua table -\begin{verbatim} -{[0] = "math", block="display", - {[0] = "mn", "0"}, - {[0] = "mo", "<"}, - {[0] = "mi", mathvariant="normal", "x"} -} -\end{verbatim} - -\subsection{Expression cores} -MathML knows the concept of \enquote{embellished operators}: -\begin{blockquote} - The precise definition of an \enquote{embellished operator} is: - \begin{itemize} - \item an \xmltag{mo} element; - \item or one of the elements \xmltag{msub}, \xmltag{msup}, \xmltag{msubsup}, \xmltag{munder}, \xmltag{mover}, \xmltag{munderover}, \xmltag{mmultiscripts}, \xmltag{mfrac}, or \xmltag{semantics} (§ 5.1 Annotation Framework), whose first argument exists and is an embellished operator; - \item or one of the elements \xmltag{mstyle}, \xmltag{mphantom}, or \xmltag{mpadded}, such that an mrow containing the same arguments would be an embellished operator; - \item or an \xmltag{maction} element whose selected sub-expression exists and is an embellished operator; - \item or an \xmltag{mrow} whose arguments consist (in any order) of one embellished operator and zero or more space-like elements. - \end{itemize} -\end{blockquote} -For every embellished operator, MathML calls the \xmltag{mo} element defining the embellished operator the \enquote{core} of the embellished operator. - -\Luamml\ makes this slightly more general: Every expression is represented by a pair of two elements: The expression and it's core. -The core is always a \xmltag{mo}, \xmltag{mi}, or \xmltag{mn}, \texttt{nil} or s special marker for space like elements. - -If and only if the element is a embellished operator the core is a \xmltag{mo} element representing the core of the embellished operator. -The core is a \xmltag{mi} or a \xmltag{mn} element if and only if the element would be an embellished operator with this core if this element where a \xmltag{mo} element. -The core is the special space like marker for space like elements. Otherwise the core is \texttt{nil}. - -\subsection{Translation of math noads} -A math lists can contain the following node types: noad, fence, fraction, radical, accent, style, choice, ins, mark, adjust, boundary, whatsit, penalty, disc, glue, and kern. The \enquote{noads} - -\subsubsection{Translation of kernel noads} -The math noads of this list contain nested kernel noads. So in the first step, we look into how kernel nodes are translated to math nodes. - -\paragraph{\texttt{math_char} kernel noads} -First the family and character value in the \texttt{math_char} are used to lookup the Unicode character value of this \texttt{math_char}. -(For \texttt{unicode-math}, this is usually just the character value. Legacy maths has to be remapped based on the family.) -Then there are two cases: The digits \texttt{0} to \texttt{9} are mapped to \xmltag{mn} elements, everything else becomes a \xmltag{mi} element with \texttt{mathvariant} set to \texttt{normal}. -(The \texttt{mathvariant} value might get suppressed if the character defaults to mathvariant \texttt{normal}.) -In either case, the \texttt{tex:family} attribute is set to the family number if it's not \texttt{0}. - -The core is always set to the expression itself. E.g.\ the \texttt{math_char} kernel noad \verb+\fam3 a+ would become (assuming no remapping for this family) -\begin{verbatim} -{[0] = 'mi', - mathvariant = 'normal', - ["tex:family"] = 3, - "a" -} -\end{verbatim} - -\subsubsection{\texttt{sub_box} kernel noads} -I am open to suggestions how to convert them properly. - -\subsubsection{\texttt{sub_mlist} kernel noads} -The inner list is converted as a \xmltag{mrow} element, with the core being the core of the \xmltag{mrow} element. See the rules for this later. - -\subsubsection{\texttt{delim} kernel noads} -If the \texttt{small_char} is zero, these get converted as space like elements of the form -\begin{verbatim} -{[0] = 'mspace', - width = '1.196pt', -} -\end{verbatim} -where 1.196 is replaced by the current value of \verb+\nulldelimiterspace+ converted to \texttt{bp}. - -Otherwise the same rules as for \texttt{math_char} apply, -except that instead of \texttt{mi} or \xmltag{mn} elements, -\texttt{mo} elements are generated, -\texttt{mathvariant} is never set, -\texttt{stretchy} is set to \texttt{true} if the operator is not on the list of default stretchy operators in the MathML specification -nd \texttt{lspace} and \texttt{rspace} attributes are set to zero. - -\subsubsection{\texttt{acc} kernel noads} -Depending on the surrounding element containing the \texttt{acc} kernel noad, it is either stretchy or not. -If it's stretchy, the same rules as for \texttt{delim} apply, except that \texttt{lspace} and \texttt{rspace} are not set. -Otherwise the \texttt{stretchy} attribute is set to false if the operator is on the list of default stretchy operators. +% \newcommand\Luamml{\pkg{Luamml}} +% \newcommand\luamml{\pkg{luamml}} +% \newcommand\xmltag[1]{\texttt{<#1>}} +% \section{\Luamml's representation of XML and MathML} +% In the following I assume basic familiarity with both Lua\TeX's representation of math noads and MathML. +% +% \subsection{Representation of XML elements} +% In many places, \luamml\ passes around XML elements. Every element is represented by a Lua table. +% Element \texttt 0 must always be present and is a string representing the tag name. +% The positive integer elements of the table represent child elements (either strings for direct text content or nested tables for nested elements). +% All string members which do not start with a colon are attributes, whose value is the result of applying \texttt{tostring} to the field value. +% This implies that these values should almost always be strings, except that the value \texttt 0 (since it never needs a unit) can often be set as a number. +% For example the XML document +% \begin{verbatim} +% +% 0 +% < +% x +% +% \end{verbatim} +% would be represented by the Lua table +% \begin{verbatim} +% {[0] = "math", block="display", +% {[0] = "mn", "0"}, +% {[0] = "mo", "<"}, +% {[0] = "mi", mathvariant="normal", "x"} +% } +% \end{verbatim} +% +% \subsection{Expression cores} +% MathML knows the concept of \enquote{embellished operators}: +% \begin{blockquote} +% The precise definition of an \enquote{embellished operator} is: +% \begin{itemize} +% \item an \xmltag{mo} element; +% \item or one of the elements \xmltag{msub}, \xmltag{msup}, \xmltag{msubsup}, \xmltag{munder}, \xmltag{mover}, \xmltag{munderover}, \xmltag{mmultiscripts}, \xmltag{mfrac}, or \xmltag{semantics} (§ 5.1 Annotation Framework), whose first argument exists and is an embellished operator; +% \item or one of the elements \xmltag{mstyle}, \xmltag{mphantom}, or \xmltag{mpadded}, such that an mrow containing the same arguments would be an embellished operator; +% \item or an \xmltag{maction} element whose selected sub-expression exists and is an embellished operator; +% \item or an \xmltag{mrow} whose arguments consist (in any order) of one embellished operator and zero or more space-like elements. +% \end{itemize} +% \end{blockquote} +% For every embellished operator, MathML calls the \xmltag{mo} element defining the embellished operator the \enquote{core} of the embellished operator. +% +% \Luamml\ makes this slightly more general: Every expression is represented by a pair of two elements: The expression and it's core. +% The core is always a \xmltag{mo}, \xmltag{mi}, or \xmltag{mn}, \texttt{nil} or s special marker for space like elements. +% +% If and only if the element is a embellished operator the core is a \xmltag{mo} element representing the core of the embellished operator. +% The core is a \xmltag{mi} or a \xmltag{mn} element if and only if the element would be an embellished operator with this core if this element where a \xmltag{mo} element. +% The core is the special space like marker for space like elements. Otherwise the core is \texttt{nil}. +% +% \subsection{Translation of math noads} +% A math lists can contain the following node types: noad, fence, fraction, radical, accent, style, choice, ins, mark, adjust, boundary, whatsit, penalty, disc, glue, and kern. The \enquote{noads} +% +% \subsubsection{Translation of kernel noads} +% The math noads of this list contain nested kernel noads. So in the first step, we look into how kernel nodes are translated to math nodes. +% +% \paragraph{\texttt{math_char} kernel noads} +% First the family and character value in the \texttt{math_char} are used to lookup the Unicode character value of this \texttt{math_char}. +% (For \texttt{unicode-math}, this is usually just the character value. Legacy maths has to be remapped based on the family.) +% Then there are two cases: The digits \texttt{0} to \texttt{9} are mapped to \xmltag{mn} elements, everything else becomes a \xmltag{mi} element with \texttt{mathvariant} set to \texttt{normal}. +% (The \texttt{mathvariant} value might get suppressed if the character defaults to mathvariant \texttt{normal}.) +% In either case, the \texttt{tex:family} attribute is set to the family number if it's not \texttt{0}. +% +% The core is always set to the expression itself. E.g.\ the \texttt{math_char} kernel noad \verb+\fam3 a+ would become (assuming no remapping for this family) +% \begin{verbatim} +% {[0] = 'mi', +% mathvariant = 'normal', +% ["tex:family"] = 3, +% "a" +% } +% \end{verbatim} +% +% \subsubsection{\texttt{sub_box} kernel noads} +% I am open to suggestions how to convert them properly. +% +% \subsubsection{\texttt{sub_mlist} kernel noads} +% The inner list is converted as a \xmltag{mrow} element, with the core being the core of the \xmltag{mrow} element. See the rules for this later. +% +% \subsubsection{\texttt{delim} kernel noads} +% If the \texttt{small_char} is zero, these get converted as space like elements of the form +% \begin{verbatim} +% {[0] = 'mspace', +% width = '1.196pt', +% } +% \end{verbatim} +% where 1.196 is replaced by the current value of \verb+\nulldelimiterspace+ converted to \texttt{bp}. +% +% Otherwise the same rules as for \texttt{math_char} apply, +% except that instead of \texttt{mi} or \xmltag{mn} elements, +% \texttt{mo} elements are generated, +% \texttt{mathvariant} is never set, +% \texttt{stretchy} is set to \texttt{true} if the operator is not on the list of default stretchy operators in the MathML specification +% nd \texttt{lspace} and \texttt{rspace} attributes are set to zero. +% +% \subsubsection{\texttt{acc} kernel noads} +% Depending on the surrounding element containing the \texttt{acc} kernel noad, it is either stretchy or not. +% If it's stretchy, the same rules as for \texttt{delim} apply, except that \texttt{lspace} and \texttt{rspace} are not set. +% Otherwise the \texttt{stretchy} attribute is set to false if the operator is on the list of default stretchy operators.