XML: The DNA of scientific articles

www.cygnusmind.com

XML: The DNA of scientific articles – CygnusMind´s Blog

by Cygnusmind.com

ARTICLE

What does XML mean? Since the emerge of Internet we have heard about a language of information which can be read easily by machines: the XML format. But what is XML meaning of?

The predecessors of XML are the SGML and HMTL formats. The first came into existence in the 80’s when the main problem for electronic publishing was the whatever form to do publishing. The second emerged in the 90’s as a manner to expand the abilities to introduce the information: view, graphic design, etc. A few years later, the XML format looks as a solution to present the information.

The initials of XML come from the word Extensible Markup Language (XML). The XML is a metalanguage, in other words, is a language used to generate other languages. It was developed by the World Wide Web Consortium (W3C). The XML provides a uniform method for describing and exchanging structured data. It describes structures and semantics, not just information format.

Why is so important the use of XML?

〉The content is isolated of any other idea of presentation of the information.

〉International standard independent of platforms.

〉The XML is an open format which can be interpreted in any other application.

〉The XML can be exchanged between other systems whose origin was idealized for this purpose.

If the XML is a metalanguage, everyone can create its language. There exist numerous XML languages created for multiple purposes. The main languages, we can identify the MathML for mathematics, VML for images with vectors, among others.

For the XML serves as a form of communication, it Is necessary to know the same language; there is when its necessary to improve the same vocabulary used to compose the information. A vocabulary is a set of specific words of a specific language or subject. In this context, the DTD [Document Type Definition] and the XSD [XML Schema Definition] describe the vocabulary used in the language; is pointed out the use and forms to use tags in the documents.

What is difference between XML and JATS?

JATS [Journal Article Tag Suite] is the vocabulary used for scientific journals. Using this vocabulary, it is defined the set for tagging the data in scientific articles, among other specifics documents in any specific field. JATS is a technical standard based on the National Information Standards Organization [NISO] in the current version Z39.96 2012 [NISO z39.96-2015 (JATS 1.1; current standard)]. It comes originally from a standard defined by the NML [The National Medicine Library] in USA.

To understand this language, we can think of a catalogue of data of several book, inside a public library. The catalographic tool identifies and separates the database. At the same moment, information can be organized, distributed by year, theme, editor, author, etc. All the possibilities to organize the database depends on tags. For example, in the researchers’ labor, their work is identified by the title <title>The Digital Native Journal</title>; the year , <year>2018</year>; the format <format>HTML</format>; the author<author>cygnusmind</author>; the URL, <url>https://www.cygnusmind.com/blog/</url>; the summary <summary>……….</summary>; keywords…., etc.

However, library catalogue only identifies what we know as FRONT. Is the form which we can identify general information such as author, institutional affiliation, including summary and keywords.

BODY refers to the almost entire text; tagging sections like introduction, methods and results, etc. As well as sequence of the paragraphs, tables handling, execution of formulas, position of images along the text, etc.

BACK or the end of the article is where the tagging finishes in sections like reference section or bibliography consulted. Those sections are of high Importance to generate articulations inside the text; the link to identify citations and produce well-known and valued today bibliometrics indicators such as Journal Impact Factor (JIF), Scimago Journal Rank (SJR), Cite Score (CS), h-index, etc. It is important to point out the existence of dozens of bibliometrics indicators derived from references and multiple forms of data grouping (author, country, institution, etc.), but such results or ending products can be generated is the information is identified.

To understand the complexity and extension of tagging vocabulary in JATS is worth mentioning that around more than 200 metadata exist although many of them are not used consistently. Yet it is tagged more sematic information and interoperability is enhanced.

Besides the markup XML helps in different aspects:

  1. Data recovery: how to excel at the magnitude of information found in the Web and growing exponentially from day to day. Precisely, tagging and marking gives greater possibilities for search engines; for “robots” or “spiders” to find immediate and efficiently. The main purpose provides such information: for example, via metadata, search engines can identify the conditions of use of a content by recognizing Creative Commons licenses.
  2. Digital Identity: In the digital world, identity is something slightly worn out and controversial. It is the path and the trace in the Web; It is a consequence of the content process of interrelationship. In the scientific and academic context is, most of the times, the visibility and positioning related to resources gained, the identity of metadata. To obtain the right metadata and define a frame of success is necessary and essential to excel and present the best scientific, academic, technical and cultural production. In a few words, is what associates the identity [e.g. of a journal, author, institution] into the Web.
  3. Digital preservation: The preservation of printed documents it has been a real problem to solve around the World. However, the digital preservation implicates the capacity of accessing to stored data, besides its backup data. While XML is a standardized format, it may be decoded over time to be used with tools or software created in the future.
  4. Indexing: The current academic production lies in front of the pressure of updating indexes, demonstrating quality and compliance in international standards. This forces an interaction, greater or lesser, changing the indexes to involve the exchange of information in XML files tagged simply as FRONT or BACK for the use of references in dating analysis processes, up to the full text tagged in XML.
  5. Visualization and ubiquity: when it is said that the XML format is the DNA of scientific article it is refers to express the code behind the scientific information which allows from itself to support multiple processes, including the generation of reading formats. Thus, immediate PDF, HTML, ePUB among others.
  6. Interoperability: the XML is the format where scientific exchange activity is configured par excellence allowing to substantially, increase the visibility and reach of the content published on the Web.

For these reasons, we have hesitation in saying XML is the DNA where lies the work of The Digital Native Journal.

How to cite this article: Cygnusmind (2019). XML: The DNA of scientific articles. Retrieved from Cygnusmind´s Blog: https://www.cygnusmind.com/blog/en/article/xml-the-dna-of-scientific-articles/

The Native Digital Journal

www.cygnusmind.com

The Native Digital Journal – CygnusMind´s Blog

by Cygnusmind.com

ARTICLE

In 1665 was born the first scientific journal which took in count the Peer Review evaluation; we talk about Philosophical Transactions of the Royal Society. At the present time, it counts with more than 350 years of publishing uninterrupted. With a great vision could not be different from who is defined as itself as “The independent scientific academy of United Kingdom and the Commonwealth.”

Today we can confirm that this is a really digital journal in the context of electronic journals. The closer touch you get, the better experience you have sharing your articles into a complex communication method. However, we can hardly say what a “digital” journal is in all existence of its meaning. Furthermore, we can think about a concept of communication system where The Native Digital Journal allows you to sum up the whole publishing process; unfortunately, this is a concept whose meaning has emerged late.

While we are talking about structure and manners for exposing the content of scientific journals – not about literary culture or lifestyle-, let’s going to speak clearly in terms of science.  Our hypothesis: Native Digital Journals are limited, and they don’t know how, paradoxically, to profit the advance of digital technology for a better way of communicating its messages properly. As a conclusion, we are unable of find Native Digital Journals. What we can find are electronic journals, most of them, as a poor copy of its printed versions. In other words, the way to configure parameters remains limited by the paradigms of paper and printing; they are still being published only in PDF version and lack the advantages of digital technology.

Discuss this further.

A journal is native digital when has a solid concept, a design, and a conception purely from the beginning in the digital realm. At any time, along all the processes of discourse, opinion, elaboration, publication, distribution, dissemination, collecting works, conformation of authors and readers’ community, it never interferes the role and logic of the printed matter. On top of that, this is the way for enhancing positioning and consolidating a well-recognized prestige.

The Native Digital Journal is developed, manifests itself and is still modified in the digital sphere, although it has emerged many decades ago. “The World Wide Web is not just its medium of diffusion, but its backbone that holds the micro-system of scientific communication which represents every publishing” (cygnusmind.com).

What’s more, hypertext is the organization of information – text, data, sounds, images, etc. – through articulations and ligatures, colloquially called “links”. Therefore, this step allows to branch the information as it’s explained below:

  1. To do a deep reading, follow-up, and a consulting not necessary in a sequential way. You can star reading from whenever you want; you can address scientific texts from the method, discussion, conclusions or even sources of information.
  2. The participation of the individual – reader, viewer, reviewer – determines the plan – direction – the time, and the manner of the use of links; the individual defines when and how in every moment: he leads an active role exploring the information.
  3. The text – in Pair-Review – represents the substantial element in the scientific matter. Moreover, talking about digital realm, the DNA of the speech, scientifically, it is given by the structure and display of information from the digital code: the text in an inherent way.
  4. The digital code gives an advantage to necessary requirements for science:

a. Replicability

b. Construction of a new science from the existing

c. Visibility

d. Interoperability

In this direction, we could consider that mathematical formula, chemical reactions, equations, and more procedural elements cease to be only mathematical or reading signs. They became in execution actions processing data and repeatable and reusable methodologies. The symbol call for action in the digital formula.

The tables contain data and actions that can be processed, sorted, graphed. They are active information units which can be interrogated and interpellated. What is the main raison for? Science knows that conclusions depend on how reality has been questioned – the method – and soundness of them depend on characteristics of data about reality too; of course, better having them in hand

Apart from that, In the digital realm, talking about visualization context, the image ceases to be a marginal or ornamental reference for becoming a source of information itself, allowing us to enrich and add unthinkable elements to discourse. A tomography for the detection of cancer, a microscope element, a set of stars, etc. The image can be represented as a coded information with shape and color in each pixel.

For the first time movement, interaction and sound can be integrated into a scientific text, not only to improve the reading experience, but to generate more knowledge from enriched elements.

Between all the existent information – more than a million articles are written a year – it’s unmanageable and inoperable to research and/or locate something on the Web. This unmeasured information can be only processed by machines for routing to the readers. To do so in, the information must be processed, sorted, filtered, grouped, treated; this will depend on the way in which it is structured.

To this effect, tagging information in XML under JATS standard is the DNA of the text; is the beginning of the semantic web and the knowledge of wisdom; without doing this, it does not exist The Native Digital Journal.

The Extensible Markup Language – better known as XML – is the language of machines and JATS (Journal Article Tag Suite). This language defines the sets of labels – metadata – into which the information of a scientific article is structured. In this way, we can clearly locate the titles>The Native Digital Journal /title; the year>2018/year; the format>HTML/format>; the author(s)>cygnusmind/author>; The URL>https://www.cygnusmind.com/blog/</url>; summary <summary> ………</summary>; palabras clave, etc.

The XML-JATS add value in each phase into the process.

On the other hand, the elaboration of discourse consists in provide semantic and structure beyond from launching and written form. Its scope of action contributes by modifying the way in which the expression of a finding is thought, that is, without limit of pages, attachments, images in very high resolution, visualization formats, etc., which can allow an article marked in XML. Let’s explain this below:

  1. Formation: Interpretation of a text from the XML file to generate different reading formats as PDF, ePUB, HTML, and others.
  2. Publication: Online production of digital formats of scientific articles for an enriched reading and appropriate use of information.
  3. Distribution and Dissemination: Provides accurately and adequately information to search engines, which facilitates articulation and visibility. Also allows information to be present in thousands of libraries, content aggregators and specialized portals around the world.
  4. Collection of works and community of authors and readers. As expressed by one of the five basic principles of library science: “To each user their information and to each user’s information”. A text with structure and semantics reaches interested readers and authors increasing the reach beyond geographic or language limits to form a solid community that will support the journal.
  5. Communication and interaction. Globalization allowed us to know the social diversity of processes and to recognize that information is produced in multiple languages. Paradoxically, the use and evaluation of HTML has been lost when it allows us the automatic translation (treacherous traditore) of various languages, with imperfections but in permanent advance.
  6. Preservation. The Native Digital Journal having XML-JATS is preserving the content to future formats not known and guarantees its adaptation to the future technology, taking independence of media, formats and, of course, of trademarks. This is what the DNA of Native Digital Journal allows to do.
  7. Positioning: building prestige and recognition of a scientific journal. By increasing scope and visibility there is a greater likelihood of obtaining the expected impact. Similarly, by increasing the collection of academic works, selectivity and rigorous quality criteria contribute to the prestige and recognition of the community.

Thus far, this is a profound evolution in the transmission of knowledge. When the content of the paper media is released, various aspects of the scientific publishing and editing process can be rethought much more freely: they do not exist in the paging digital journal nor is it necessary to quote the page. There is no limit on the length of an article or typographical requirement, etc., and the different characteristics of the printed must be reconfigured for the digital environment: typography, spaces, alignment and all the typographic aspects must be thought for the ease of reading in electronic devices.

The “continuous” Native Digital Journal: there is no reason, no one, to avoid the uploading of any article online when it’s finished – when the Peer Review and the style editing is finished – it is not done by an atavism of its version printed and of the classification systems with practices that do not fit the digital realm and The Native Digital Journal. The classification system, volume, number, year, page, and a certain number of works are an atavism typical of the parameters of publication of the printed. So, when a number was to be prepared with a set of items, and the design was supposed to be send to designer, all the process resulted in archaic protocol. Today that process is not necessary. The design is done over programming and output in HTML, ePUB, PDF, etc., is immediate and with the personality of the editor under the concept: it moves and adjust to the document.  In this regard, presenting numbers in advance could be considered an inappropriate editorial practice because of the very confusion involved: referring to something of the future in the present that already exists. All this can be resolved by becoming a “continuous” Native Digital Journal, because in the digital realm, the online setting is due to an editorial decision not subordinated by a technical process passed (layout).

Today the world of scientific communication faces a complex situation about roles and processes. The editor has been and is the guarantor of the quality and integrity of journals’ content, supported by the anonymous face of the reviewers. But the communication of information and the Native Digital Journal demand high technology, knowledge and continuous renewal. The editor needs allies to help the journal take the step; it’s not just a technological and programming issue, it’s a conceptual issue transferred to the technological realm. It can be resolved by those who have the experience in the publishing world but know how to take advantage of the enormous advantages of the technological world.

How to cite this article: Cygnusmind (2019). The Native Digital Journal. Retrieved from Cygnusmind’s blog: https://www.cygnusmind.com/blog/xml/the-digital-native-journal/