Since its inception XML has at times been seen as the cure-all for every problem related to Web applications and integration projects. However, poorly written XML can either slow down an integration project, or worse, cause the integration project to collapse.
When developing integration systems such as Web services or any other business-to-business function, developers may encounter the following problems when writing XML:
Preventing the use of poorly written XML is more complicated than most developers realize. The key to successfully using XML in an integration project is first understanding the inefficiencies that may cause poorly written XML, and then applying a rule-based system that establishes policies that can be adhered to. This article will outline the many drawbacks of XML, and will address how a rule-based system can prevent the use of poorly written XML in integration projects.
Understanding XML
The Extensible Markup Language (XML) is a family of technologies that describe structured data. By using XML companies can create common information formats and share this information on the World Wide Web. For example, a company can create an XML document to exchange information about its products over the Internet. For a simple example of an XML document, see Listing 1.
XML and Its Inefficiencies
Although the example XML document in Listing 1 appears to be written correctly, how can developers be completely sure that the code is valid and well-formed, is comprehensible to other developers, and adheres to specific standards? The answer to this question lies in a rule-based system that can establish team policies and practices to prevent poorly written XML.
The following sections will outline some of the inefficiencies that can lead to problematic XML, and will address how a rule-based system can prevent the use of poorly written XML in integration projects. After all, system performance is only as good as the data received and the instructions given. If errors are contained in the XML, it is more likely than not that the system will crash.
Validating XML
One of the main benefits of XML is that it provides mechanisms for verifying document validity. There are two basic mechanisms for verifying document validity: DTD and XML Schema. For example, when creating an XML document developers can reference either of these mechanisms from within the document itself. The DTD or schema that is referenced will specify exactly how the XML document is to be processed, which elements and attributes are contained in the document, and the order in which these elements and attributes should be listed.
Defining DTDs
The following is an example of a simple DTD that can be referenced by an XML document:
<!-- ProductList DTD -->
<!ELEMENT ProductList (Product)*>
<!ELEMENT Product (#PCDATA)>
<!ATTLIST Product color
(red|green|yellow|weird) #REQUIRED
file CDATA #REQUIRED
id CDATA #REQUIRED
isFruit (true|false) 'true'>
To reference this DTD from an XML document, the following header can be added to the beginning of the XML document:
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE ProductList PUBLIC "-
//OnlineGrocer//ProductList//EN" "ProductList.dtd">
A DTD is a specification based on the rules of the Standard Generalized Markup Language (SGML) and provides basic verification of XML documents. DTDs provide mechanisms for expressing which elements are allowed and what the composition of each element can be. Legal attributes can be defined per element type, and legal attribute values can be defined per attribute.
Defining Schemas
For an example of a simple schema that can be referenced by an XML document, see Listing 2. To reference this schema from an XML document, the attribute in the element can be specified with the following header:
<ProductList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="ProductList.xsd">
An XML schema, like a DTD, defines a set of legal elements, attributes, and attribute values. However, XML schemas provide a more robust verification for XML documents. XML schemas are namespace-aware and also cover data types, data bounds, schema class inheritance, and context-sensitive data values - all of which are not covered by DTDs.
Lack of DTD/Schema Enforcement
While referencing DTDs or schemas can guarantee the validity of XML documents, there is no requirement that developers will use headers to reference DTDs or schemas at all. In fact, developers need only to follow simple syntax rules in order for an XML document to be "well-formed." However, a well-formed document is not necessarily a valid document. Without referencing either a DTD or a schema, there is no way to verify whether the XML document is valid or not. Therefore, measures must be taken to ensure that XML documents do, in fact, reference a DTD or schema.
Using Rules to Enforce Document Validity
To guarantee that an XML document references a DTD or schema, development teams can adopt a rule-based system that can detect and prevent errors within the XML code. Developers can create rules that impose constraints on XML documents to verify validity. For example, a rule can be created that enforces an XML document to contain the sample schema header:
<ProductList xmlns:xsi="http://www.
w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation=
"ProductList.xsd">
If the document is missing the specified a header, an error will occur alerting the developer of the violation.
Human-Readable or Ambiguous Code?
As seen in the sample XML document, XML is human-readable. In other words, XML is created in plain text and utilizes actual words or phrases that have specific meanings to developers. However, even though XML can be read and written by humans, it does not necessarily mean that humans can understand XML - developers can still create unreadable XML code. An element that has a specific meaning to one developer may be of no use, or make no sense, to another developer.
For instance, developers can create XML that is completely unintelligible to one another - consider XML tags that are written in Polish or Japanese. Code doesn't need be written in another country to be ambiguous either - ambiguous code written in the same tongue that is to be shared between companies can be quite cryptic as well. For example, the element <Trans> can mean anything from transform, transaction, or Trans-Am, depending on the developer and the application.
Establishing Team-Naming Conventions
To prevent ambiguous XML code, development teams must mutually agree upon a standard XML vocabulary. With a standard language in place, developers within a team will be more likely to understand each other's code.
Naming conventions can be established that verify whether code follows rules that verify anything from W3C guidelines for a specific language, to team-naming standards, to project-specific design requirements, to the proper usage of custom XML tags.
Chaos of Standards
Although the W3C has made an effort to establish a common language, vocabulary, and protocol for XML, these standards are still in development and are constantly changing. Companies that adhere to proposed standards that are not yet fully mature must be prepared to keep up with any changes of the standard in the future. For example, a standard that is in existence today may not exist six months from now. Without any stability in XML standards, developers are forced to either keep up with the rapid changes, or fall behind.
WS-I Basic Profile to the Rescue
In spite of the chaos of standards that may exist for XML development, the release of Basic Profile provides some guidance to developers seeking a widely used XML standard. The Web Services Interoperability Organization (WS-I) Basic Profile standard consists of specifications that establish a baseline for interoperable Web services. These specifications include guidelines that cover XML 1.0.
Developers can now depend on Basic Profile as a common framework for implementing XML and building integration projects. There are more than 25 WS-I member companies that support Basic Profile. Therefore, developers can be confident that the XML standards they use will not be privy to constant flux and change.
Summary
Although XML is meant to be a flexible, easy to use, and fully portable solution for Web applications and integration projects, it is not the cure-all that many once thought it to be. The inefficiency of XML is well known among enterprise developers, but it remains ignored in exchange for the perceived advantages of XML such as flexibility, ease of use, and portability. However, the reality of the issue is that XML has a number of drawbacks that enterprise developers should be leery of when creating integration systems.
The key to successfully using XML in an integration project is to first understand the inefficiencies that may cause poorly written XML, and then utilize the proper techniques that verify correctness at each level of the implementation.