XML: The DNA of scientific articles

XML: The DNA of scientific articles – CygnusMind´s Blog



What does XML mean? Since the emerge of Internet we have heard about a language of information which can be read easily by machines: the XML format. But what is XML meaning of?

The predecessors of XML are the SGML and HMTL formats. The first came into existence in the 80’s when the main problem for electronic publishing was the whatever form to do publishing. The second emerged in the 90’s as a manner to expand the abilities to introduce the information: view, graphic design, etc. A few years later, the XML format looks as a solution to present the information.

The initials of XML come from the word Extensible Markup Language (XML). The XML is a metalanguage, in other words, is a language used to generate other languages. It was developed by the World Wide Web Consortium (W3C). The XML provides a uniform method for describing and exchanging structured data. It describes structures and semantics, not just information format.

Why is so important the use of XML?

〉The content is isolated of any other idea of presentation of the information.

〉International standard independent of platforms.

〉The XML is an open format which can be interpreted in any other application.

〉The XML can be exchanged between other systems whose origin was idealized for this purpose.

If the XML is a metalanguage, everyone can create its language. There exist numerous XML languages created for multiple purposes. The main languages, we can identify the MathML for mathematics, VML for images with vectors, among others.

For the XML serves as a form of communication, it Is necessary to know the same language; there is when its necessary to improve the same vocabulary used to compose the information. A vocabulary is a set of specific words of a specific language or subject. In this context, the DTD [Document Type Definition] and the XSD [XML Schema Definition] describe the vocabulary used in the language; is pointed out the use and forms to use tags in the documents.

What is difference between XML and JATS?

JATS [Journal Article Tag Suite] is the vocabulary used for scientific journals. Using this vocabulary, it is defined the set for tagging the data in scientific articles, among other specifics documents in any specific field. JATS is a technical standard based on the National Information Standards Organization [NISO] in the current version Z39.96 2012 [NISO z39.96-2015 (JATS 1.1; current standard)]. It comes originally from a standard defined by the NML [The National Medicine Library] in USA.

To understand this language, we can think of a catalogue of data of several book, inside a public library. The catalographic tool identifies and separates the database. At the same moment, information can be organized, distributed by year, theme, editor, author, etc. All the possibilities to organize the database depends on tags. For example, in the researchers’ labor, their work is identified by the title <title>The Digital Native Journal</title>; the year , <year>2018</year>; the format <format>HTML</format>; the author<author>cygnusmind</author>; the URL, <url></url>; the summary <summary>……….</summary>; keywords…., etc.

However, library catalogue only identifies what we know as FRONT. Is the form which we can identify general information such as author, institutional affiliation, including summary and keywords.

BODY refers to the almost entire text; tagging sections like introduction, methods and results, etc. As well as sequence of the paragraphs, tables handling, execution of formulas, position of images along the text, etc.

BACK or the end of the article is where the tagging finishes in sections like reference section or bibliography consulted. Those sections are of high Importance to generate articulations inside the text; the link to identify citations and produce well-known and valued today bibliometrics indicators such as Journal Impact Factor (JIF), Scimago Journal Rank (SJR), Cite Score (CS), h-index, etc. It is important to point out the existence of dozens of bibliometrics indicators derived from references and multiple forms of data grouping (author, country, institution, etc.), but such results or ending products can be generated is the information is identified.

To understand the complexity and extension of tagging vocabulary in JATS is worth mentioning that around more than 200 metadata exist although many of them are not used consistently. Yet it is tagged more sematic information and interoperability is enhanced.

Besides the markup XML helps in different aspects:

  1. Data recovery: how to excel at the magnitude of information found in the Web and growing exponentially from day to day. Precisely, tagging and marking gives greater possibilities for search engines; for “robots” or “spiders” to find immediate and efficiently. The main purpose provides such information: for example, via metadata, search engines can identify the conditions of use of a content by recognizing Creative Commons licenses.
  2. Digital Identity: In the digital world, identity is something slightly worn out and controversial. It is the path and the trace in the Web; It is a consequence of the content process of interrelationship. In the scientific and academic context is, most of the times, the visibility and positioning related to resources gained, the identity of metadata. To obtain the right metadata and define a frame of success is necessary and essential to excel and present the best scientific, academic, technical and cultural production. In a few words, is what associates the identity [e.g. of a journal, author, institution] into the Web.
  3. Digital preservation: The preservation of printed documents it has been a real problem to solve around the World. However, the digital preservation implicates the capacity of accessing to stored data, besides its backup data. While XML is a standardized format, it may be decoded over time to be used with tools or software created in the future.
  4. Indexing: The current academic production lies in front of the pressure of updating indexes, demonstrating quality and compliance in international standards. This forces an interaction, greater or lesser, changing the indexes to involve the exchange of information in XML files tagged simply as FRONT or BACK for the use of references in dating analysis processes, up to the full text tagged in XML.
  5. Visualization and ubiquity: when it is said that the XML format is the DNA of scientific article it is refers to express the code behind the scientific information which allows from itself to support multiple processes, including the generation of reading formats. Thus, immediate PDF, HTML, ePUB among others.
  6. Interoperability: the XML is the format where scientific exchange activity is configured par excellence allowing to substantially, increase the visibility and reach of the content published on the Web.

For these reasons, we have hesitation in saying XML is the DNA where lies the work of The Digital Native Journal.

How to cite this article: Cygnusmind (2019). XML: The DNA of scientific articles. Retrieved from Cygnusmind´s Blog:

Leave a Reply