Google
 

Friday, March 02, 2007

XML Learning Notes1

What Is XML?
XML was designed to replace delimited data, as well as other data formats, with something standard, easy to use and to understand, and powerful.

  1. Advantages of XML
    1. XML documents are easily readable and self-describing like HTML, an XML document contains tags that indicate what each type of data is.
    2. XML is interoperable and nothing about XML ties it to any particular operating system or underlying technology.
    3. XML documents are hierarchical and it's easy to add related data to a node in an XML document without making the document unwieldy.
    4. You don't have to write the parser and all types of object-based parser components are available for XML.
    5. Changes to your document won't break the parser.
  2. XML Document Structure and Syntax
    1. Declaration
      The XML declaration is the same for all XML documents.

      <? xml version="1.0"?>

      The declaration says two things: This is an XML document (duh), and this document conforms to the XML 1.0 W3C recommendation.

      The XML declaration, when it exists, must exist on the first line of the document. The declaration does not have to exist, however; it is an optional part of an XML document.

    2. Elements (XML elements are sometimes also called nodes.)
      An element is a part of an XML document that contains data.XML documents must have at least one top-level element to be parsable.

      <? xml version="1.0"?>
      <ORDERS >
      </ORDERS >

      Lack of a closing tag will cause the document to be unparsable.

      Tag names in XML are case sensitive.
    3. Elements That Contain Data

      1 <?xml version="1.0"?>
      2 <ORDERS >
      3 < ORDER>
      4 <DATETIME>1/4/2000 9:32 AM</ DATETIME>
      5 <ID>33849</ ID>
      6 <CUSTOMER>Steve Farben</ CUSTOMER>
      7 <TOTALAMOUNT>3456.92</ TOTALAMOUNT>
      8 </ORDER>
      9 </ORDERS>

      The fact that there's a bunch of markup there doesn't slow the data transfer down significantly.

      At the same time, there is a way to express data more succinctly in an XML document, without the need for as many open and closing markup tags. You can do this through the use of attributes.
    4. Attributes
      An attribute is another way to enclose a piece of data in an XML document. An attribute is always part of an element; it typically modifies or is related to the information in the node.a

      1 <?xml version="1.0"?>
      2 <ORDERS>
      3 <ORDER id="33849" custid="406" >
      4 <DATETIME> 1/4/2000 9:32 AM</DATETIME>
      5 <TOTALAMOUNT> 3456.92</TOTALAMOUNT>
      6 </ORDER>
      7 </ORDERS >

      Attribute values are always enclosed in quotation marks. Using attributes tends to reduce the total size of the document (because you don't need to store open and close tags for the element). This has the effect of reducing the amount of markup at the expense (in some cases) of readability.

      Note that you are allowed to use either single or double quotation marks anywhere XML requires quotes.

      But remember that XML is a bit more rigid than HTML; a bracket out of place or a mismatched close tag will cause the entire document to be unparsable.
    5. Enclosing Character Data
      You've got two ways to deal with this problem in XML: Either replace the forbidden characters with character entities or use a CDATA section as a way to delimit the entire data field.
      1. Using Character Entities
        The idea is to take a character that might be interpreted as a part of markup and replace it with an escape sequence to prevent the parser from going haywire.

        1 <?xml version="1.0"?>
        2 <ORDERS>
        3 <ORDER id="33849">
        4 <NAME>Jones &amp; Williams Certified Public Accountants</ NAME>
        5 <DATETIME>1/4/2000 9:32 AM</ DATETIME>
        6 <TOTALAMOUNT>3456.92</ TOTALAMOUNT>
        7 </ORDER>
        8 </ORDERS>

        Instead of an ampersand, the & character entity is used. (If a data element contains a left bracket, it should be escaped with the < character entity.)

        When you use an XML parser to extract data with escape characters, the parser will automatically convert the escaped characters to their correct representation.

      2. Using CDATA Elements
        <![CDATA[]]>
        A CDATA element tells the XML parser not to interpret or parse characters that appear in the section.

        1 <?xml version=" 1.0"?>
        2 <ORDERS>
        3 <ORDER id="33849">
        4 <NAME><![CDATA[Jones & Williams Certified Public Accountants]]> </NAME>
        5 <DATETIME> 1/4/2000 9:32 AM</DATETIME>
        6 <TOTALAMOUNT> 3456.92</TOTALAMOUNT>
        7 </ORDER>
        8 </ORDERS >

    6. Abbreviated Close-Tag Syntax
      For elements that contain no data, you can use an abbreviated syntax for element tags to reduce the amount of markup overhead contained in your document.

      1 <?xml version="1.0"?>
      2 <ORDERS>
      3 <ORDER id="33849" custid="406">
      4 < DATETIME>1/4/2000 9:32 AM</DATETIME >
      5 <TOTALAMOUNT />
      6 </ ORDER>
      7 </ORDERS>


Accessing XML Data Using .NET Framework Classes

In fact, before .NET came along, two predominant ways were used to parse an XML document: the XML Document Object Model (DOM) and Simple API for XML (SAX).
  1. About Simple API for XML (SAX)
    Simple API for XML (SAX) was designed to provide a higher level of performance and a simpler programmability model than XML DOM. It uses a fundamentally different programmability model. Instead of reading in the entire document at once and exposing the elements of the document as nodes, SAX provides an event-driven model for parsing XML.
    SAX is not supported in .NETa
  2. Using the XML Document Object Model
    • The XML Document Object Model (DOM) is a programming interface used to parse XML documents.
    • The XML DOM does its magic by taking an XML document and exposing it in the form of a complex object hierarchy.
    • In fact, the XML DOM recommendation segregates the objects in the DOM into two groups: fundamental classes and extended classes. Fundamental classes are the ones that application developers find most useful; the extended classes are primarily useful to tools developers and people who like to pummel themselves with detail.
    • The fundamental classes of the XML DOM as implemented in the .NET framework are XmlNode, XmlNodeList, and XmlNamedNodeMap.
      pic:Figure 10.1. Fundamental XML DOM objects.
    • In general, to work with an XML document using the Document Object Model, you first open the document
    • Example:

      //The Full Contents of the books.xml Document Example
      <BOOKS>
      <BOOK>
      <TITLE>C# Developer's Guide To ASP.NET, XML and ADO.NET</TITLE>
      <AUTHOR id='101' location ='San Francisco'>Jeffrey P. McManus </ AUTHOR>
      < AUTHOR id='107' location='Seattle'>Chris Kinsman</AUTHOR>
      </ BOOK>
      </BOOKS >

      //The full content of http://www.w3schools.com/xml/note.xml
      1
      <?xml version="1.0" encoding="ISO-8859-1"?>
      2 <!-- Edited with XML Spy v2007 ( http://www.altova.com) -->
      3 <note>
      4 <to>Tove </to>
      5 <from>Jani </from>
      6 <heading>Reminder </heading>
      7 <body>Don't forget me this weekend! </body>
      8 </note>

      //Loading a Local XML File Using the XmlDocument's .Load() Method
      <%
      @Page language= "C#" debug=" true" %>
      <%@ Import Namespace =" System.Xml" %>

      <SCRIPT runat ='server'>
      void Page_Load(Object Sender,EventArgs e)
      {
      XmlDocument xd
      = new XmlDocument();
      xd.Load(Server.MapPath(
      "books.xml "));
      Response.Write (xd.OuterXml);
      xd
      = null ;
      }
      </SCRIPT>

      //Loading a Local XML File Using the XmlDocument's .Load() Method
      1 using System;
      2 using System.Collections.Generic ;
      3 using System.Text;
      4 using System.Xml ;
      5 namespace LearningXml
      6 {
      7 class Program
      8 {
      9 static void Main(string[] args)
      10 {
      11 const string xmlPath = " c:\\books.xml";
      12 XmlDocument xml = new XmlDocument();
      13 xml.Load(xmlPath);
      14 Console.Write(xml.OuterXml);
      15 Console.Read();
      16 }
      17 }
      18 }
      19

      //Loading an XML File That Resides on a Web Server"
      <%
      @Page language= "C#" debug=" true" %>
      <%@ Import Namespace =" System.Xml" %>

      <SCRIPT runat ='server'>
      void Page_Load(Object Sender,EventArgs e)
      {
      XmlDocument xd
      = new XmlDocument();
      xd.Load(
      " http://www.myserver.com/books.xml");
      Response.Write (xd.OuterXml);
      xd
      = null;
      }
      </SCRIPT >

      // Loading an XML File That Resides on a Web Server
      1 using System;
      2 using System.Collections.Generic;
      3 using System.Text;
      4 using System.Xml;
      5 namespace LearningXml
      6 {
      7 class Program
      8 {
      9 static void Main( string[] args)
      10 {
      11 const string xmlPath = " http://www.w3schools.com/xml/note.xml ";
      12 XmlDocument xml = new XmlDocument();
      13 xml.Load(xmlPath);
      14 Console.Write(xml.OuterXml);
      15 Console.Read();
      16 }
      17 }
      18 }
      19

  3. Viewing Document Data Using the XmlNode Object
    • The XmlNode object represents a node in the XML document. It exposes an object hierarchy that exposes attributes and child nodes, as well as every other part of an XML document.
    • When you've loaded an XML document to parse it, your next step usually involves retrieving that document's top-level node. Use the .FirstChild property to do this.

    • Note also that the value contained in an XML node is returned by the InnerText property in .NET, not by the .text property as it was in the COM-based MSXML library.
    • Use the outer properties when you want to preserve markup; the inner properties return the values themselves.

      1 using System;
      2 using System.Collections.Generic;
      3 using System.Text ;
      4 using System.Xml;
      5 namespace LearningXml
      6 {
      7 class Program
      8 {
      9 static void Main(string[] args)
      10 {
      11 const string xmlPath = "http://www.w3schools.com/xml/note.xml " ;
      12 XmlDocument xml = new XmlDocument();
      13 xml.Load(xmlPath);
      //The first child node is the xml declaration.
      14 XmlNode node = xml.LastChild;
      15 foreach(XmlNode nd in node.ChildNodes)
      16 {
      17 if (nd.Name == "to ")
      18 Console.WriteLine( "The message is sent to '{0}'.",nd.InnerText);
      19 }
      20 Console.Read();
      21 }
      22 }
      23 }
      24

  4. Using the XmlTextReader Object
    • The XmlTextReader object provides a method of accessing XML data that is both easier to code and potentially more efficient than the full-blown XML DOM.
    • Parsing an XML document using the XmlTextReader object involves a few steps. First, you create the object, optionally passing in a filename or URL that represents the source of XML to parse. Next, execute the .Read method of the XmlTextReader object until that method returns the value False.
    • The type of data is exposed through the XmlTextReader object's NodeType property. The value of data retrieved can be retrieved in an untyped format through the Value property of the XmlTextReader object. It can also be retrieved in a typed format through the ReadString method.
    • Most of the time, the NodeType property will be XmlNodeType.Element (an element tag), XmlNodeType.Text (the data contained in a tag), or XmlNodeType.Attribute.

      //Extracting the Message Sender Using the XmlTextReader Object
      using System;
      using System.Text;
      using System.Xml;
      namespace LearningXml
      {
      class Program
      {
      static void Main(string[] args)
      {
      const string xmlPath = "http://www.w3schools.com/xml/note.xml " ;
      XmlTextReader reader
      = new XmlTextReader(xmlPath);
      bool isFromNode = false;
      while (reader.Read())
      {
      switch(reader.NodeType)
      {
      case XmlNodeType.Element:
      if(reader.Name == "from" )
      isFromNode
      = true ;
      break;
      case XmlNodeType.Text:
      if(isFromNode)
      {
      Console.WriteLine (
      "The message is from {0}",reader.ReadString ());
      Console.Read();
      return;
      }
      break;
      }
      }
      }
      }
      }
    • The XmlTextReader works well both for large and small documents. Under most circumstances (particularly for large documents), it should perform better than the XML DOM parser.

    • However, like the DOM, it too has its own set of limitations. The XmlTextReader object doesn't have the capability to scroll or jump around among various areas in the document. Also, as its name implies, the XmlTextReader object permits you only to read data; you can't use it to make changes in existing node values or add new nodes to an existing document.

  5. Writing XML Data Using the XmlTextWriter Object
    • However, the XmlTextWriter object provides some advantages over creating XML files with a general-purpose object such as TextWriter.
    • The main benefit of using XmlTextWriter is that it can validate the XML you generate as you write. The class also has a number of useful features, such as the ability to specify and apply formatting, delimiter, and encoding modes automatically.

      using System;
      using System.Collections.Generic;
      using System.Text;
      using System.Xml;
      namespace LearningXml
      {
      class Program
      {
      static void Main(string[] args)
      {
      /*
      The XML file content to create:
      <?xml version="1.0" encoding="utf-8"?>
      <BOOK CaseSensitive="true">
      <TITLE >C# Developer's Guide</TITLE>
      </BOOK>
      */
      const string xmlPath = "c:\\book.xml";
      XmlTextWriter writer
      = new XmlTextWriter(xmlPath, Encoding.UTF8);
      /*
      * Normally we don't include exception-handling code in our brief code examples
      * (mainly because we're lazy sods,
      * but also because they sometimes detract from the point of the code example).
      * But in this case, we've included a handler to emphasize that it's important to handle exceptions in code
      * that creates or modifies files.
      * If you fail to include an exception handler in file-handling code,
      * it's easy to make a mistake that prevents a file from being closed properly, for example, which is a bad thing.
      */
      try
      {
      /*
      * We then call the WriteStartDocument method to begin working with the document.
      * This has the side effect of sending an XML declaration to the document.
      * Calling WriteStartDocument is required when using the XmlTextWriter,
      * even though XML itself does not require that a declaration entity be present in a document.
      */
      writer.WriteStartDocument();
      // Next we create the root node of the document with a call to the WriteStartElement method.
      writer.WriteStartElement( "BOOK");
      //To create attributes associated with nodes, use the XmlTextWriter object's WriteAttributeString method.
      writer.WriteAttributeString("CaseSensitive" , "true");
      // Next, we insert a node underneath the root node with a call to WriteElementString.
      writer.WriteElementString("TITLE", "C# Developer's Guide");
      /*
      * When we're done with the document,
      * we call the WriteEndDocument method to close the root node
      * and then call the Flush and Close methods to finish the process of committing the file to disk.
      */
      // write the end mark of the root element.
      writer.WriteEndElement();
      writer.WriteEndDocument();
      }
      catch (Exception ex)
      {
      Console.Write(
      "Exception:{0} ", ex.Message);
      }
      finally
      {
      writer.Flush();
      writer.Close();
      Console.Read();
      }
      }
      }
      }

  6. Navigating and Updating Documents Using the XmlNodeReader Object
    • In many ways, the XmlNodeReader object represents the best of all worlds. It provides a simpler programmability model than the XmlDocument object, yet it integrates with the standard DOM objects nicely.
    • In fact, in most cases when you're working with XML data in .NET, you'll typically create an XmlNodeReader by creating a DOM XmlDocument object first.
  7. Navigating Through the Document Using the XmlNodeReader Object
    • After you've created and populated the XmlNodeReader, you can use it to move through the document programmatically.
    • You do this by placing calls to the XmlNodeReader's Read method, which iterates through the document one element at a time.

      using System;
      using System.Collections.Generic;
      using System.Text;
      using System.Xml;
      namespace LearningXml
      {
      class Program
      {
      static void Main(string[] args)
      {
      const string xmlPath = "http://www.w3schools.com/xml/note.xml ";
      XmlDocument xd
      = new XmlDocument();
      xd.Load(xmlPath);
      XmlNodeReader reader
      = new XmlNodeReader(xd);
      /*
      * This is another example of how repeated calls to the Read method control the looping structure
      * the same way you use the Read method of the XmlTextReader object (discussed earlier in this chapter).
      * Because Read returns true when it successfully navigates to a new element and
      * false when no more data is left to traverse to,
      * it's easy to set up a while loop that displays all the data in the document.
      */
      while (reader.Read())
      Console.WriteLine(reader.Name
      + "-" + reader.Value );
      Console.Read();
      }
      }
      }
  8. Using XPath Queries to Retrieve XML Data
    • XPath is a standard that defines a syntax for retrieving data from an XML document. You can use XPath query syntax to retrieve data from an XML document without having to traverse the entire document. In .NET, you do this using the XPathDocument and XPathNavigator objects.

  9. Manipulating the Current Node Using the XPath Iterator's Current Property
    using System;
    using System.Xml;
    using System.Xml.XPath;
    namespace LearningXml
    {
    class Program
    {
    static void Main(string[] args)
    {
    const string xmlPath = "http://www.w3schools.com/xml/note.xml ";
    //To begin performing XPath queries, you start by creating an XPathDocument object.
    //This object is analogous to the XmlDocument object.
    XPathDocument xpd = new XPathDocument(xmlPath);
    /*
    * After you've created an XPathDocument object,
    * you use the object's CreateNavigator method to create an instance of the XPathNavigator object.
    * The XPathNavigator object is responsible for performing the actual XPath query of the document;
    * it returns an iterator (an instance of System.Xml.XPath.XPathNodeIterator) that
    * you can use to access each of the elements returned by the query.
    */
    XPathNavigator nav
    = xpd.CreateNavigator();
    /*
    * The Select method of the XPathNavigator object enables you to filter and retrieve subsets of XML data from any XML document.
    * You do this by constructing an XPath expression and passing the expression to the Select method of the XPathNavigator object.
    * An XPath expression is a compact way of querying an XML document without going to the trouble of parsing the whole thing first.
    * Using XPath, it's possible to retrieve very useful subsets of information from an XML document, often with only a single line of code.
    *
    * The product of this operation is a selection,
    * a subset of XML nodes that can then be manipulated independently of the main document.
    * You can traverse the selection using the XPathNodeIterator object
    * returned from your call to the Select method of the XPathNavigator object.
    * After you have an iterator, you can retrieve and display the data from the selected nodes.
    */
    XPathNodeIterator iterator
    = nav.Select("note/from ");

    while (iterator.MoveNext())
    {
    /*
    * When you retrieve data in this manner, you may need to further manipulate each node.
    * You can do this by using the Current property of the XPathNodeIterator object.
    * This property, an instance of an XPathNavigator object,
    * contains a rich set of properties and methods for manipulating properties of an XML node retrieved by an XPath query.
    */
    Console.WriteLine(
    " The mail is from " + iterator.Current);
    Console.WriteLine(
    "The property value is " + iterator.Current.Value);
    }
    Console.Read();
    }
    }
    }
  10. Changing Values in an XML Document
    • In addition to navigating in an XML document using the various objects described in this chapter, you can also use the XML DOM to make changes in an XML document. Using DOM objects, you can:

      • Insert a node into the document

      • Remove a child node

      • Change the value of an element


      using System;
      using System.Xml;
      using System.Xml.XPath ;
      namespace LearningXml
      {
      class Program
      {
      static void Main( string[] args)
      {
      const string xmlPath = " C:\\books.xml";
      const string newXmlPath = " C:\\new_books.xml";
      XmlDocument xd
      = new XmlDocument();
      xd.Load(xmlPath);

      XmlNode root
      = xd.DocumentElement;


      //Insert book element
      /*
      * To insert a node, you use the InsertAfter or InsertBefore methods of the XmlNode object.
      * This method takes two parameters: the new child node to insert and a reference to an existing node.
      * The location of the existing node determines where the new node should go.
      */
      XmlElement eleBook
      = xd.CreateElement ("Book");
      root.InsertAfter(eleBook, root.FirstChild);

      // Insert TITLE element beneath the book
      XmlElement eleTitle = xd.CreateElement(" TITLE");
      eleTitle.InnerText
      = "My Title";
      eleBook.AppendChild(eleTitle);

      /*
      * You can see from the previous two listings that
      * changes to a document using the DOM change only the in-memory representation of the object,
      * not the way the document is stored on disk.
      * If you want to persist the changes to disk,
      * you must use the Save method of the XmlDocument object.
      * Therefore, to save the changes you made to the document,
      * you execute the Save method of the XmlDocument object that created the document.
      */
      xd.Save (newXmlPath);

      Console.Read();
      }
      }
      }

  11. Querying XML Documents Using XPath Expressionsa
    • XPath is a set-based query syntax for extracting data from an XML document.
    • XPath enables you to extract data from an XML document using a compact expression, ideally with a single line of code. It's generally a more concise way to extract information buried deep within an XML document. The compactness of XPath can come at a price, however: readability.
    • Although the complete XPath syntax is quite involved (and beyond the scope of this book), you should know about certain commonly used operations as you approach XML processing using the .NET framework classes. The three most common XPath scenarios are
      • Retrieving a subset of nodes that match a certain valuea
      • Retrieving one or more nodes based on the value of an attribute
      • Retrieving all the parent and child nodes where an attribute of a child node matches a certain value
    • A shortcut exists to retrieve the root node of a document, and the shortcut doesn't even require you to know the name of the root node. The XPath expression /* will always retrieve the root node of a document (and, by extension, all the descendants of the root node).

      using System;
      using System.Xml;
      using System.Xml.XPath ;
      namespace LearningXml
      {
      class Program
      {
      static void Main( string[] args)
      {
      const string xmlPath = " C:\\books.xml";
      string friendlyInputMessage = "Please input the xpath that you want to search: ";
      bool isNoInput = false;
      while (!isNoInput)
      {
      string input = string .Empty;
      try
      {
      Console.WriteLine(friendlyInputMessage);
      if ((input = Console.ReadLine ()) != string.Empty)
      {
      XmlNodeList nodeList
      = QueryXml(xmlPath, input);
      string output = string.Empty;
      foreach (XmlNode node in nodeList)
      {
      output
      += node.OuterXml;
      // Console.Write(node.OuterXml);
      }
      Console.WriteLine (output);
      }
      else
      isNoInput
      = true;
      }
      catch (Exception ex)
      {
      Console.WriteLine(
      "{0} Your input is '{1}' ",ex.Message,input);
      }
      }
      }
      private static XmlNodeList QueryXml(string xmlPath, string queryPath)
      {
      /*
      * XPath Query Expression: /BOOKS/BOOK/AUTHOR[@id = "107"]/parent::*
      * In this case, the @ symbol indicates that id is an attribute instead of the name of a note.
      * Note that this expression will retrieve multiple instances of a given author in a case
      * where a document contains multiple books with the same author.
      * The parent::* clause added to the end of the XPath expression in this example
      * tells the query to retrieve the data as well as the parent node.
      *
      * You can combine multiple query criteria using AND and OR logic.
      * BOOKS/BOOK/AUTHOR[@location="Seattle" or @location="San Francisco"]
      *
      * XPath Query Expression: /BOOKS/BOOK/AUTHOR[@location != "Seattle"]
      * XPath Query Expression: /BOOKS/BOOK/AUTHOR[@id > 105]
      *
      * In addition to querying on attributes, you can also query on the text contained in nodes.
      * XPath Query Expression: /BOOKS/BOOK/TITLE[. = "How to Pluck a Purdue Chicken"]/parent::*
      * The dot (.) operator is XPath-ese for "right here."
      *
      * There is another way to retrieve all the books
      * whose title matches a given text string without using the parent::* expression.
      * You can instead include the parameter element in square brackets.
      * retrieving all the TITLE nodes (and their parent BOOK nodes) given a specific title you specify.
      * XPath Query Expression: /BOOKS/BOOK[TITLE = 'How to Pluck a Purdue Chicken']
      *
      */
      XmlDocument xd
      = new XmlDocument();

      xd.Load(xmlPath);
      /*
      * If you were interested in retrieving only one instance of the AUTHOR node,
      * you could use the XmlNodeReader's SelectSingle method (rather than Select method).
      * This ensures that you retrieve only the first instance of the data.
      */
      return xd.SelectNodes(queryPath);
      }
      }
      }

No comments: