Hermes - a semantic XML+MathML+Unicode e-publishing/self-archiving tool for LaTeX authored scientific articles
Download latest version: 0.9.12, released on 28 Nov. 2006
Examples
Some results of Hermes assisted conversions are hosted
here (
the Max Planck Society and Springer could not afford hosting these examples); the source distribution also contains an article in LaTeX source, as well as a content-oriented source sample.
What is Hermes?
What for?
Hermes is here to help individuals at self-archiving, libraries
at long term-archiving, and publishers at having a reference document
for their various specific services.
How does it work?
Hermes follows the steps below, in the specified order:
- semantically seeds a copy of your TeX source
- lets the TeX program do its
job (texing) on this semantically enriched source
- parses the resulting
semantic dvi
- generates the XML reference document, a semantic XML reflection of your TeX source.
It works on Linux, Windows and OS X.
What is the Hermes reference document?
It is a Unicode XML document with a generic structure, containg free text and various
XML vocabularies.
It contains the semantics Hermes managed to recover from the LaTeX source.
Its validating XML-Schema will get published after this generic structure gets less fluid.
Currently, the generic structure consists of:
- sections
- presentation hints (currently font names and sizes),
- free text ((accented)TeX glyphs mapped to their Unicode equivalent),
- metadata (title, author, date etc.)
- bibliography,
- internal and external references (no need for special LaTeX packages to get these activated in the XML),
- tables, images
These items are in a one-to-one relationship with the corresponding
structures in the source/semantic dvi. This list is extensible: LaTeX environments automatically produce an XML structure.
The
XML
vocabularies reflect
the vocabularies used in the LaTeX source, e.g. mathematical regions in
the LaTeX source correspond to MathML regions in the
reference document.
MathML is the only validable
XML vocabulary implemented and supported currently by Hermes (SVG, and
other vocabularies, like MARC, or other open standards, may follow, if users are
interested).
Of
MathML, only
MathML-presentation is generated if Hermes is used to translate legacy LaTeX files (here, by
legacy LaTeX files I mean sources which were not edited with semantic vocabularies in mind) without manual intervention on the source.
MathML-content can
only be generated if a newly authored LaTeX source uses the semantic LaTeX macros available in the Hermes distribution.
- the automatic generation of MathML-presentation is possible only if
the LaTeX math expressions are originally well-formed, that is, made of balanced
expressions (paired delimiters), this should not be an issue because
typing mathematics in LaTeX is a commitment to a controlled vocabulary
anyway;
- use of the \frac macros is encouraged over the '\over' macro (however, the seed utility delimits the regions covered by the '\over' macro getting it closer to the effects of a '\frac' macro).
Installation requirements
A standard latex system, gcc, bison, flex, make and libxml/xslt should be
on your system, in order to compile the program and have the proper
example output (Windows
developers can check out the
Cygwin
distribution, windows
users will have a binary distribution (hermes.exe and seed.exe) issued (almost) synchronously with the source distribution.).
Developers and Unix users can unpack the
source distro and run
make.
After a successful 'make' you get:
- hermes and seed binaries;
-
content.s.dvi - the semantic dvi result of a latex run on the content.s.tex, which, in turn, is produced from content.tex by seed
-
content.xml - the reference document (XML+MathML-content) obtained by using Hermes semantic TeX macros.
-
content.pub.xml - a renderable transformation of the reference document as an XHTML file with embedded MathML content
- the Hermes stylesheet, pub.xslt, is used in this transformation, but you can use your own for different results/looks.
- the
same goes for the other example file: article.tex (i.e. you will get
article.pub.xml, the renderable instance of article.lib.xml, the reference
document Hermes generated from the source).
General use
Follow the steps below:
'Validate' your source:
-
- write an (AMS)LaTeX text containing mathematical expressions; LaTeX it and fix all your editing errors ;).
- - latex document.tex, if you didn't get a dvi return to step 1
Use Hermes to get the reference document (
library) and renderable (
publish) XML files:
- - run ./seed document.tex, if you didn't get document.s.tex go to found-a-bug
- - latex document.s.tex, if you didn't get a document.s.dvi go to found-a-bug
-
- run ./hermes document.s.dvi >document.lib.xml, if you didn't get a document.lib.xml go to found-a-bug
-
- run xsltproc pub.xslt document.lib.xml > document.pub.xml, if you didn't get a document.pub.xml go to found-a-bug
- - now you can archive or send document.lib.xml to your library, and post your document.pub.xml on your website, along with the MathML-stylesheets for others to read/reuse.
found-a-bug:
fix it :).
Architecture of Hermes
-
a set of helper (La)TeX macros (the 'dlt.tex' file),
-
a scanner, written for flex, tokenizing the semantic dvi file,
-
a parser, written for bison;
the grammar generates the XML output.
Developer's tips
-
does not replace nor modify the functionality of the TeX engine, so it should
not restrict the set of macros used for authoring: it uses the dvi format as its
input (it relies on the transparency of the TeX '\special' command).
-
does NOT make inferences for MathML-content, instead, Hermes provides a set of LaTeX authoring macros
(called Hermes semantic macros) to enable an author to write mathematical expressions
which are covered by the MathML-content standard (not tested extensively).
- (almost) preserves the presentational output of the original
source documents (remember, Hermes is intended to produce a document
with semantics equivalent with the (La)TeX source, but fitter for long-term
archiving or publisher processing, the final look depends entirely on the stylesheet you use to create a renderable instance of this document).
- provides the authors the
freedom to semantically enhance (parts of) their original document, at
their own pace: Hermes can generate a mixture of MathML-presentation
and MathML-content. It's easily extendable to allow generating a
mixture of other controlled vocabularies too.
- all
the glyphs in the following TeX fonts are mapped into their unicode
(utf8 encoding) counterparts: fonts having standard names as specified in
fontname and the following fonts with non-standard names:cm.., ams, px.., tx.., ec.., tc.., ty.., euf.., [l,w]asy.
If the source document uses a font which is not in this
list, Hermes dies noisily (listing the fonts which are not mapped yet) before even parsing the text. The list of supported fonts with non-standard names can be easily extended in
future versions, at user's request (for any glyph which has a Unicode
correspondence).
To do
- test Hermes on various collections of TeX documents (arxiv)
- refine the LaTeX document structure Hermes is aware of
- refine the presentation oriented information
- check the completeness of the content oriented macros (used to generate MathML content) provided with the distribution
- add domain specific controlled vocabularies
Credits
Hermes is covered by
GNU GPL, and developed by
Romeo Anghelache.
It was created in the EU funded
MoWGLI research project (ended in Feb. 2005), as a task for Living Reviews in Relativity, from Max Planck Institute for Gravitational Physics, Berlin suburbs area, Germany.
Since June 2015, the LRR journal has become the property of Springer, as anything public in capitalism is bound to become someone's private property/profit.
Its further development was partially supported by :
- Max Planck Institute for Gravitational Physics, January - June 2006,
- EDPSciences, April 2006,
- Design Science, July - October 2006.