Re: PDF vs Markup Languages

From: Peter Murray-Rust <peter.murray-rust_at_NOTTINGHAM.AC.UK>
Date: Fri, 11 Sep 1998 06:16:21 -0400

[I have just discovered this list - please excuse me if I post inappropriately].

I would like to argue very strongly for the introduction and support of XML as
*the* standard for scientific publishing. XML has the power not only to create
structured documents but to supply *interoperable* domain-specific markup in a
way that no other technology can easily do. Thus the W3C and the AMS have collaborated
in producing MathML (Mathematics Markup Language) ( and I have
developed a similar approach for molecular sciences (Chemical Markup Language, CML
at Using the recent W3C namespace proposal it is
possible to combine MathML and CML with (x)HTML to (say) produce a full publication on
physical chemistry.

XML is capable of supporting full publications and I have shown this in chemistry.
[See (interactive demo being mounted) and also a number of static presentations under]. Scientific papers consist of a relatively
small number of components such as: metadata, hypertext, links, numeric quantities
(with units and errors), graphs, tables, images/pixmaps, bibliography. XML supports
the creation of specifications (DTDs or schemas) and tools for each of these in
a parallel and independent manner. Together with maths and chemistry, most of the
components of a scientific paper are now amenable to construction in XML.

An overwhelming advantage of XML over other formats is that papers can be searched and
their components re-used. Thus if a scientific paper is marked up at a detailed
level it is possible to give generic requests like "calculate the energies
for all molecules in this paper", "are there any first-order differential equations?"
"convert all quantatities to SI Units". The distinction between documents and data
disappears for the first time. Different disciplines differ in their emphasis on
data capture as part of the publication process. My own (structural biology and
crystallography) are extremely keen on data capture. In my view XML is a radical
tool and revolutionary (in the political sense). It is now technically possible, with
a combined authoring/browsing tool, for authors and readers to communicate directly.

My dream is that combined author/browsers will be developed to support the XML publication
process. I have created a prototype (JUMBO2 and and this is offered
under the Open Source approach ( - probably an Artistic
License when I have modified it. That means that anyone can take the source and
modify it with a few reasonable restrictions on preserving authors' moral rights.
JUMBO2 is offered as the starting point for a communal XML activity - it's *generic*
(i.e. will manage any XML application, not just chemistry). An additional technology
that could make a large impact on scientific publishing is linking (via XLink) to
controlled vocabularies and dictionaries maintained by learned societies and similar orgs
(see for suggestions).

Apoart from the evangelism, therefore, I'd be very grateful for any like-minded
people who would like to support the direct XML-based communication of author<-->reader
(after all, they are usually the same). And these tools, if universally adopted,
would help cut *technical* publication costs enormously. Papers can be syntactically
validated before submission and may also have some semantic validity applied
client-side (e.g. 'reasonable values of data'). They should also help referees -
XML manuscripts could be annotated in situ with the annotations being available
to the editor but excisable before publication (if required).

I hope this isn't too passionate. I have argued strongly (see vsms/talks) for a
greater openness in the publication process and believe that innovative
experimentation is critical at this stage.

Peter Murray-Rust
Directory, Virtual School of Molecular Sciences
University of Nottingham, UK (Virtual HyperGlossary) (CML)
Received on Tue Aug 25 1998 - 19:17:43 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:45:26 GMT