Related Topics: XML Magazine

XML: Article

Create XML Easily with FOR XML PATH

New FOR XML features simplify XML creation

Do you love XML? Have you been using XML with SQL Server? Many people have, starting way back when with SQL Server 7.0. Back then, there was no support for XML in the database, so we had to write external programs to convert the relational data into an XML format. This was time-consuming and often inefficient. When SQL Server 2000 came out, with its integrated support for XML, there were a lot of high expectations. Unfortunately, XML still couldn't be easily stored in the database, although it could be created and consumed. XML could be stored in the database as a large string of text, but this was problematic at best. Large VARCHAR and TEXT strings are hard to manipulate. This worked, though, and programmers managed to create great systems. Still, something more was needed. Many people looked forward to the next version of SQL Server, and hoped for a better solution.

SQL Server 2005 is now here, and with it comes a new XML data type and support for the new XQuery language. XML can now be stored directly in the database, queried, and even edited in place. In addition, the existing creation and consumption capabilities from SQL 2000 have been enhanced. XML can be created and consumed much more easily than ever before. With all of these enhancements, how should one proceed?

Last year, I couldn't wait for the ability to store XML in the database. I was sure that this capability would be the answer to all of my XML-related problems. Because SQL Server 2005 now has an XML data type, we can store entire documents directly in a column of a table. I was positive that this would be the first new feature that I would use. However, an unexpected thing occurred along the way: Microsoft enhanced the FOR XML clause. It is much easier to create XML from relational data than it ever was before. I've found that I haven't been using the XML data type nearly as often as I expected. Instead, I've been storing data relationally, creating the XML as needed. I believe that this gives my applications the best features of both worlds. Therefore, I wish to pass this information on to you.

This article will introduce some of the new XML-creation features in SQL Server 2005 and provide some guidance regarding their use. It will demonstrate how much easier it is to create XML in SQL Server 2005 that it ever was in SQL Server 2000. We'll walk through the creation of a small XML document and discuss each step in detail. Because of space limitations, the XML data type and the related XQuery language will be covered in a future column.

In order to fully appreciate the new SQL Server 2005 features, you must first understand the difficulties faced by SQL Server 2000 developers. These are best illustrated with an example. Let's examine a simple XML document that I created from data in the new AdventureWorks database (see Listing 1). The document has a single root element that contains multiple Category elements. These elements, in turn, contain multiple SubCategory elements. (Ellipses are used to indicate sections where similar elements have been removed for space reasons.) The Category and SubCategory elements each have a Name attribute. This small, straightforward document is complex enough for our purposes.

In Listing 2, I've created a FOR XML EXPLICIT query that creates the demonstration document. This query is representative of those commonly created with SQL Server 2000. Notice that there are three individual SELECT statements joined together via UNION ALL clauses. Each SELECT statement corresponds to a node in the resulting XML document. Each column in the SELECT statements is named according to a special scheme, which helps define the structure of the resulting document. There are also columns ("Tag" and "Parent") that exist for no other purpose than to define the document schema. To make matters worse, the ORDER BY clause must be carefully designed in order to make the resulting document nest properly. While this process works, with larger documents it can quickly become too complex to be practical. In addition, because of the fact that it is based on UNION clauses, it is highly repetitive. Maintenance of such queries can become very difficult. (I once created such a query with 12 individual SELECT statements. What a nightmare that was!)

FOR XML Enhancements
SQL Server 2005 has a solution to these shortcomings. There is a new PATH directive in the FOR XML clause. This directive allows much of the structure, or schema, of the resulting XML document to be specified via XPath. Take the code in Listing 3 for example. The SELECT statement returns a single column whose alias is an XPath expression. The expression indicates that the column results should become an attribute called "Name." (In XPath, the "@" symbol indicates an attribute. The lack of this symbol indicates an element. See an XPath reference for more information.) When executed, this code will create the XML document shown in Listing 4. As you can see, this creates an element for each row, with the attribute name that we specified; however, each element is named "row." This is probably not what we would want in our actual document. Fortunately, there is a solution for this too. In Listing 5, I've altered the SELECT statement so that the resulting elements will be named "Category." The results are shown in Listing 6.

If you glance back at Listing 1, you'll notice that we've managed to reproduce the Category elements contained in that document. Next, we'll need to write some code for the SubCategory elements. This code, along with its results, is shown in Listing 7. This SELECT statement is very similar to the one for the Category elements, except that the data is retrieved from a different table and the results get placed in different elements.

Now that we have the data for both sets of elements, we need a way to combine them together appropriately. The new TYPE directive can be used for this. The TYPE directive specifies that the XML created in a FOR XML query should be returned as an instance of the XML data type. The data can then be stored in an XML variable, placed in an XML column in a table, or further manipulated in T-SQL statements. It is this last capability that we will use. With the TYPE directive, we now have the ability to nest FOR XML PATH queries. This allows us to create XML documents of almost any depth. You can see this technique in use in Listing 8. I've combined the queries for the Category and SubCategory elements, linking them together with an appropriate WHERE clause. The XML produced is shown in Listing 9.

Although the XML document is now well formed and properly nested, it is not yet valid. XML documents should have only one root element. If you look at our document, you'll see that we have four instances of the Category element, all at the root level. We need to change our last query so that it produces one root element and includes the Category elements underneath this new root. To do this, we can take advantage of one final enhancement. The FOR XML clause now has a ROOT directive, which indicates that a root element should be created for us. By default, this root element will be given the name "root." If we so desire, we can also specify the name of this root element. In Listing 10 you can see the query with this directive added. The final XML document is shown in Listing 11.

We've now come full circle. If you compare Listing 1 (produced with FOR XML EXPLICIT) and Listing 11 (produced with FOR XML PATH), you'll see that they are identical. However, the new syntax is much cleaner and easier to create (compare Listings 2 and 10 to see this). The performance of the two implementations is similar, although the execution plan for the FOR XML PATH query is noticeably smaller. Because of this, FOR XML PATH queries should edge out FOR XML EXPLICIT queries when larger or more complicated documents are created. When you factor in readability and maintenance, the new method wins hands down.

When I started using the XML support in SQL Server 2005, I fully expected that I would be storing data inside tables as XML. I thought that this would be the easiest way to send XML to applications outside the server. As I've learned more about the product, I have seen how the improved FOR XML clause makes the creation of XML documents easier to accomplish and maintain. This has changed my perceptions. With the new FOR XML PATH clause, we can store our data relationally and easily produce the XML when needed. In this way, we can take advantage of the capabilities of both data formats. We can use the relational data in a JOIN, for example, and produce the XML whenever necessary. With SQL Server 2005, the choice is now ours.

More Stories By Jerry Dixon

Jerry Dixon is a senior developer and architect for ACH Food Companies in Memphis, Tennessee. Over the past 16 years he has led development projects for a number of enterprise, mid-level, and small business organizations. While he has fulfilled multiple roles as an infrastructure designer, database administrator, and software developer, he specializes in XML, SQL and ASP.NET. He is a co-leader and frequent presenter at the Memphis .NET User Group. Jerry holds the following Microsoft certifications: MCSD (VB 6.0 and .NET), MCDBA (SQL 2000), MCSA (Windows 2000 and 2003), MCSE (Windows 2000), MCAD (.NET), MCT. He resides in Olive Branch, MS with his wife and son.

Comments (3)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.