XML Documents – An Introduction

Systematix Offers Professional Training

Our expert trainers can help you get the best results.

If you would like any further information please contact one of our training advisors.

post image

XML Documents – An Introduction

XML was developed by the W3C between 1996 and 1998 to provide a universal format for describing structured documents and data it allows data to be self-describing. It is essentially a simplified subset of the Standard Generalized Markup Language (SGML). SGML was created in 1986 as a metalanguage to describe other languages, and XML was to enable generic SGML to be processed on the web in much the same way as is currently the case with HTML. What’s more it is free XML is free with no legal constraints, it doesn’t belong to anyone, so it can’t be hijacked or pirated. And you don’t have to pay a fee to use it although you can choose to use commercial software to deal with it, but you don’t pay for XML itself. So make the most of XML documents for free data storage!.

XML describes a class of data objects called XML documents and the XMLDOM provides the means to manipulate them through code, either on web pages or in applications. The nesting of tags creates a tree-like structure which means that the handling of these documents is greatly simplified.

One great advantage of using XML documents is that it allows the creation of a markup language from scratch, meaning that different industries and professions can develop custom languages that accurately handle their industry specific data. This is well reflected in the explosion of new markups ending in ‘ML’ such as Wireless Markup Language, Chemical Markup Language, Speech Synthesis Markup Language, and Gene Expression Markup Language.

In the years since XML’s completion, it has been adopted across the board with great enthusiasm as it offers greater flexibility in transferring data between different applications on different platforms and machines, and greatly increases the accuracy of web searches. Its reliance on Unicode makes it international which only adds to its attractiveness.

XML allows multi-use of content

XML documents are increasingly being used because today’s problems require the flexibility and simplicity. XML enables you to create structured and semi-structured documents that can be transferred and read by people and programs in multiple formats (for example, pages that can be read on the web, BYOD devices and print). This “multi-use” of content is the driving force behind the adoption of XML technology.

Even given the digital revolution a lot of the world’s information is still locked in paper, unsearchable documents with proprietary file formats, or web pages where search engines return too much data and not enough information. Most organisations spend significant sums to create documents that can’t be easily found or distributed because they are unstructured.

The flexibility of XML lets business users create structured XML documents that can be leveraged for multiple purposes in-house and exchanged to people and businesses around the world. XML breaks new ground by connecting the front office business users with the back office developers.

Bill Trippe, in his article “Do XML Editors Matter?” (Transform Volume 10, Issue 10, page 27), makes the point by saying, “You can view XML as the bridge between the two worlds of structured (relational) and unstructured (document) data.”

XML can carry information suitable for computers and people. Computer-generated XML is dynamically created by a program for B2B ecommerce or other server-to-server transaction. These applications are addressed by XML standards such as ebXML and SOAP. Human-authored content uses XML for improved search capabilities, multi-channelled publication, and syndication. These applications are addressed by standards such as MathML, NewsML, VoiceXML, and many other custom XML dialects.

While highly structured data is independent of the style used to present it, unstructured data is full of style and format. Contrast plain text with no style with rich text which is full of style.

Text documents meant for human authoring and reading have design needs that only XML can address. Examples of semi-structured documents include catalogues, news reports, and technical documentation. Even highly structured data becomes semi-structured if it includes comments, descriptions, or instructions meant to be read by people.

XML documents support the development of semi-structured documents that contain both relational meta data (the structure) and free-form (unstructured) formatted text. The meta data (that is, the XML tags) meets the programmatic need for structure. Without meta data, a computer program cannot understand the content. Formatted text meets the human and business need to express richly styled content. Without style, the content is dry and unattractive.

As you read this paragraph you will notice that it too is an example of formatted text. Most document editors display content as WYSIWYG. For a business user to comfortably create semi-structured textual documents, a document editor must allow the author to add style to the text.

Variations of Structured and Unstructured Data

There are an additional two kinds of semi-structured data that exist between highly structured and unstructured data:

  • highly structured data
  • structured data with unstructured elements
  • unstructured documents with tagged meta data
  • unstructured documents

Structured data with unstructured elements is commonly used in web forms, where most fields are tightly constrained (for example, “Town” must be selected from a list and “Postcode” must be all letters and numbers), yet a ‘comment’ field is available for human-readable content.

For example,

<product>
<name>Deluxe Widget</name>
<listprice units=”lsd”>£19.95</listprice>
<radius>6mm</radius>
<description>
This <em>deluxe <strong>gold</strong> plated</em> product fits most attachments.
</description>
</product>

For this kind of XML document, use a DTD or schema to validate the structure, and include an unstructured element that allows both text and tags. In a DTD, this element would typically be defined as

<!ELEMENT description ANY>

Unstructured documents with tagged meta data are less common but offer the best promise for content that can be effectively searched. HTML provides some meta tags, like <ADDRESS> and <CODE>, but XML provides the flexibility to create custom tags.

Examples,

<owner studentid=”1234″>Joe Bloggs</owner> owns a <automobile model=”JOB LOGG7″>Volkswagen Golf</automobile>.
<my:conditional value=”birds”>
<my:reference>
<my:author>Hen Len</my:author> in his article <my:title type=”article”>Why Chickens have Wings</my:title> <my:bibliography>(<my:source><my:periodical>Poultry Monthly</my:periodical> <my:issue>September 2015</my:issue></my:source>, page <my:page>9</my:page>)</my:bibliography> dispels the usual stereotypes of flightless birds.”
</my:reference>
</my:conditional>

This kind of XML document must be well formed to allow processing by an XML parser but is usually not validated against a DTD or schema. For such a document, XHTML is a natural choice because it is well formed, has extensive formatting capability, and custom XML tags can be added without causing display problems in browsers. Note the namespace “my” was used to distinguish the custom XML tags from standard HTML tags

XML Unlocks Information – Designing an XML DTD or Schema

XML transfers information between two parties, whether human or machine. Just as two people must know the same language, both parties must speak the same XML dialect. The dialect, defined in the DTD (data type definition) or schema, is the vocabulary and grammar used to describe the information being transferred.

The producer and the processor of XML information must share a common DTD or schema. Because the DTD or schema is vital to the success of XML, this article provides guidelines for designing a DTD or schema. Even if you are not designing a DTD or schema, it is worthwhile understanding the rationale behind their design, since it is the structure of XML data that gives it meaning. This structure changes a random sequence of unintelligible words to speech, that is, it transforms data to information.

When designing a DTD or schema for XML data, analyse the nature of the data and how it is created and processed. Consider how data is stored in a relational database, with a clearly defined structure of records, fields and tables.

Before you begin your design, decide whether to store data as the value of an attribute or as a text element (even if numeric) within tags. Generally, it is better to store data in elements, as this approach is more flexible when used with XSL. (XSL is a specification for transforming XML to HTML or some other XML structure.)

Always consider who produces the XML data. If it is produced and processed programmatically, a developer-friendly perspective is appropriate. In fact, XML for B2B transactions should be designed from this perspective to generate fast, reliable and efficient transfer of information. However, if a human is going to create or read the XML data, consider their needs when designing a DTD or schema.

Elements and Attributes

An attribute is the name-value pair that immediately follows a tag name. An element is a tag along with its attributes and all the text and elements that it encloses. Elements within another element are called child elements. Consider the following example.

<tag_name attr_name1=”value1″ attr_name2=”value2″>
<child_tag attr_name3=”value3″ />
<child_with_text>This is some text</child_with_text>
This text is part of the tag_name element
</tag_name>

As seen in this illustration, the tags are tag_name, child_tag and child_with_text. The attribute attr_name1 has a value of “value1”. The element, tag_name, consists of the following attributes and child elements:

Attributes: attr_name1, attr_name2

Child Elements: child_tag, child_with_text

In XML, every attribute value must be quoted with single (‘) or double quotes (“). Also, every tag must have a closing tag or end with “/>”. Since the child_tag element has no child elements or text, the tag ends with “/>” instead of a closing tag, for example, “</child_tag>”.

Michael C. Daconta, in his article “Are Elements and Attributes Interchangeable?” (XML Journal volume 2 issue 7, page 42), presents eight practical rules for deciding whether to use elements or attributes. Some rules depend on whether the design is implemented in a DTD or schema. DTDs cannot enforce constraints between attributes and elements as extensively as schemas can. As a result, the decision to use an attribute may depend on whether a value is constrained.

Viewing XML Files

 

You can view XML files in any browser by clicking on a link, type the URL or double click on the name of an XML file in a folder. If you open an XML file in FireFox it will display the document with colour coded root and child elements. A plus (+) sign or minus sign ( -) to the left of the elements can be clicked to expand or collapse the element structure. If you want to view the raw XML source you must select “view source” from the browser menu.

Note: Do not expect XML files to be formatted like an HTML document

Viewing an invalid XML file

XML documents file with error

If an erroneous XML file is opened the browser will report an error.

Why Does XML Display Like This?

XML documents do not carry information about how to display data.

Since XML tags are “invented “by the author of the document browsers do not know if a tag like <table> describes an HTML table or a dining table.

Without any information most browsers will just display the XML document as it is.

The next step

The number of XML documents in use increases every day. With XML documents, financial information can be exchanged over the Internet easily. XML is one of the main languages for exchanging financial information between businesses over the Internet.

So why wait? Why not learn more about XML and start using it today!

We offer a range of XML custom training courses.

XML Documents Related Links

XML in Wikipedia
XML Tutorial W3 Schools