XML - Extensible Markup Language - is used to describe documents and data in a standardized text-based format. XML provides a powerful and robust framework for data transfer in that it:
The strength of XML lies in the flexible hierarchy of the data structures it provides. The rules of XML consist of a simple set that focus on standardizing the way in which data is organized without limiting the content in anyway. A simple analogy is a language with a strict grammar but where the words are made-up as required by its speakers.
Specifically it is important to understand:
Fig.1. Hierarchy for a Book Reference
The basic XML structures are best understood in the context of an example information set. Consider a bibliography with two book references in:
Each record consists of an author listing (Last Name, Initials) , the title of the book, the publisher and the year published - additionally the ISBN uniquely identifies each book. It is then possible to construct a data hierarchy for a Book Reference Record as shown in Fig.1.
The associated XML definition is shown in the code sample. The key elements of the definition includes:
<element></element>
examples from the code sample include <bibliography></bibliography>
, <reference></reference>
, <authors></authors>
id="1"
, isbn="1-56414-639-1"
<title>Write to the Point</title>
<!--Author Listing-->
Copy |
---|
<!--XML Definition for a Bibliography--> <bibliography> <!--First Book Reference Start--> <reference id="1" isbn="1-56414-639-1"> <!--Author Listing Start--> <authors> <author> <last_name>Iacone</last_name> <initials>SJ</initials> </author> </authors> <!--Author Listing End--> <title>Write to the Point</title> <publisher>Career Press</publisher> <publish_date>2003</publish_date> </reference> <!--First Book End--> !--Second Book Reference Start--> <reference id="2" isbn="0-7645-3829-2"> <!--Author Listing Start--> <authors> <author> <last_name>Benz</last_name> <initials>B</initials> </author> <author> <last_name>Durant</last_name> <initials>JR</initials> </author> </authors> <!--Author Listing End--> <title>XML Programming Bible</title> <publisher>Wiley Publishing, Inc</publisher> <publish_date>2003</publish_date> </reference> <!--Second Book End--> </bibliography> <!--Bibliography End--> |
XML has core set of format requirements in order for XML to be considered well-formed. This is the grammar of the language referred to above.
General formatting rules for XML are:
The following are the rules for elements, according to the XML standard:
Colons should only be used when a namespace has been defined. See namespaces below for more detail |
<title></title>
<img src="images/example.gif" />
<p>This text is <b>not <em>well-formed</b> or valid</em> XML</p>
XML provides a shortcut for empty elements, <empty_element></empty_element> can be written as <empty_element/> |
id="2",isbn="0-7645-3829-2"
Attribute Values can contain apostrophes as long as they are framed by double quotes. e.g. source="Roget's Thesaurus" is valid |
Text usually represents the actual data associated with an element. The only considerations for text center around whitespace (spaces, tabs, etc.) which makes the document more readable and troublesome characters (like &,<,>,",') which may confuse an XML parser
<element>Hello- -World!<element>
is not changed to <element>Hello--World<element>
but remains the sameComments make it easier to understand the XML document and must adhere strictly to the format:
<!--comment-->
|
In the example above all comments are shown in green. e.g. <!--XML Definition for a Bibliography-->
and <!--First Book Reference-->
The XML Declaration identifies the document as an XML document, and although not required it does provide important information to any program trying to interpret the XML file.
The XML Declaration takes the following form:
<?xml version="1.0" encoding="UTF-16" standalone="yes" ?>
UTF stands for Universal Character Set Transformation. UTF-8 uses an eight bit encryption of the character set , UTF-16 uses a 16 bit encryption of the character set. More detail on Unicode formats available is available from www.unicode.org |
Namespaces differentiate elements and attributes defined in different documents or related to different data sets. They help to ensure the uniqueness of element and attribute names which is important when sharing information between different applications or even publicly.
Additionally, it helps to identify information groups and types within the current document.The bibliography example above is extended to include a basic namespace declaration:
Copy |
---|
<--The XML Declaration--> <?xml version="1" encoding="UTF-16" standalone="no"> <--Definition of the Root Element including a Namespace Declaration--> <bibliography xmlns:bib="http://www.k2workflow.com/bibliography"> <!--First Book Reference Start--> <bib:reference bib:id="1" bib:isbn="1-56414-639-1"> <!--Author Listing Start--> <authors> <author> <last_name>Iacone</last_name> <initials>SJ</initials> </author> </authors> <!--Author Listing End--> <bib:title>Write to the Point</bib:title> <bib:publisher>Career Press</bib:publisher> <bib:publish_date>2003</bib:publish_date> </bib:reference> <!--First Book End--> ... |
The following changes in the XML code sample are important:
xmlns:bib="http://www.k2workflow.com/bibliography"
) has three distinct parts:Typically URLs are used as the URI to uniquely identify namespaces. It is, however, not required that this link to an actual file |
There are public namespace URIs, including: - xmlns:html="http://www.w3.org/1999/xhtml" - xmlns:xs="http://www.w3.org/2001/XMLSchema" - xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" |
To cancel the default namespace for an element include an empty namespace declaration, e.g. name in the following code snippet<p xmlns:html="http://www.w3.org/1999/xhtml">I met <name xmlns="">John David</name> on holiday |
<bib:title>
and bib:isbn="1-56414-639-1"
This is not a comprehensive introduction to XML - but rather an orientation to XML as used in K2 |