We've hinted at some of the uses of metadata as they might apply to our philosophy. When we discussed namespaces, we used the appearance of names from foreign namespaces as a clue to the meaning of an unknown XML vocabulary. Let's take a direct look at some of the specific ways we can use metadata to improve our cooperative applications.
First and foremost, we can validate documents using schemas.
This is a mixed blessing. Validation let's us enforce the rules of a vocabulary
rigorously. Sometimes though, we can improve the reliability of our
applications by relaxing unimportant syntactic rules. If XML Data or XML DCD
come to be recommendations of the W3C, we might be able to do both. An open
model as defined by XML Data or XML DCD would allow us to enforce the rules
that are important to us while admitting other content in a flexible way. Each
metadata effort offers some interesting features that, if adopted as both a
recommendation and as an implemented feature in a parser component, could help
us apply our five principles. The XML metadata world is full of tantalizing
possibilities and short on fulfillment. However, at the time of writing a working
group of the W3C was producing a schema for XML using an XML syntax: more
information was at http://www.w3.org/XML/Activity#schema-wg.
Strong typing of data and content organizing features, like
the group element's order attribute in the XML Schema preview,
help client applications search XML
documents written in an unfamiliar vocabulary for data on which they can
operate. A client that operates on text strings knows to skip numeric types. A
calculation-oriented application would seek out numeric types. Groups indicate
some association between elements, so a client would logically view such
elements as part of a whole — a data series, a group of alternatives,
properties of the containing parent. A client application working with metadata
can make useful suggestions to a human user. If the application requires some
numeric inputs, the application would present those types found in the document
to the user. Based on the order attribute, the user
interface could be reconfigured: a single selection list box for one,
a group of mandatory inputs for seq, or a multiple selection list box for many.
For example, in the XML Schema
experiment at the end of this chapter, we will encounter a schema intended for
building SQL queries. The schema will
use the value seq with the order attribute and the enumerated values
"GT GE LT LE EQ" to represent the
operators " >, >=, <, <=, = " used in SQL
WHERE clauses.
XML is a great advance over native data formats because it explicitly tags and labels each item of content. Metadata extends this by providing information about the structure, types, and relationships of the marked-up data. We've assumed that software clients will increasingly encounter unfamiliar vocabularies as networks grow decentralized. Once an XML metadata standard reaches recommendation status, client applications will find parsing the schema document as useful as parsing the data document.
A validating parser will only tell us when the data we use violates the rules of the vocabulary in question. The ability to parse a metadata document and discover the structure of the vocabulary enables us to avoid errors in the first place. Generally speaking, the metadata proposals we have seen in this chapter do not result in schema documents that proceed in a top-down fashion like our typical XML data document. In fact, since we usually like to define the component parts of larger structures before defining the overall structure, schema documents will usually be organized bottom-up. Once we have a root element definition, however, we can use the metadata definition to find its components. We can then either walk the parse tree or use the XSL pattern matching syntax to extract each of those components in turn. While this may not be terribly efficient from a programming point of view, keep in mind we're learning the structure that will be applied to all documents written to the vocabulary specified in the schema.
A certain amount of digging to learn the structure of the vocabulary will mean we can operate on data in a format we've never seen before. If we are using the specialization scheme presented in the last chapter, a client receiving a vocabulary more specialized than it desired can discover the structure of the specialized parts of the new vocabulary. This is a very powerful capability.
A common task in our network experience will be composing queries to select data. We established the convention of using directory entries to indicate the vocabulary of our services back in Chapter Two. It is likely that the vocabulary for the queries will contain elements drawn from the response vocabulary. If, for example, we were searching for a person we would expect to provide a name for which to search. The name would certainly be part of the XML sent in response to the query. An application that understands the name element — and we are assuming a client understands the vocabulary it requires — could reasonably request input from the user for this element. The application, however, might not understand the vocabulary used to query the service. That vocabulary might change based on the SQL query used to implement the search. Some services might permit the submission of batches of unrelated queries while others would only accept one query based on one alternative at a time, e.g., searching by name or by age, but not both in one request.
Once a metadata recommendation is released by the W3C, we might reasonably extend our directory schema to include an entry for specifying the URL of the query schema for a particular service. A client searching for a given data vocabulary would locate a server, as we do now, but first retrieve and parse the query schema for that schema. With that in hand, a user or programmatic interface (for human and software agents, respectively) could be created dynamically. The client application would package the input data according to the query schema and transmit that in its request. This would give us considerably more flexibility than we presently have. Right now, we assume knowledge of both the response and query vocabularies. With a metadata capability, we could loosen our requirement for knowledge of the query vocabulary considerably. As we saw in the preceding section, we could also loosen the requirement to understand the response vocabulary to some extent. The degree to which we could loosen the requirement would of course depend on how much metadata is included in the schema. RDF and XML Data are at one extremely flexible end of the spectrum, with more limited proposals like XML DCD at the other.
It is now time to get to work and try some experiments. Using the technology preview in MSXML, let's see what we can implement. We'll first see how XML Schema can be used to validate a document, then develop the ability to extract metadata from a schema, and finally try to build a dynamic query builder at the proof of concept level.
It is well worth repeating that the metadata capabilities we're going to use are based on notes submitted to the W3C, not published recommendations. We are exploring, hoping to find the scope of future capabilities. It seems fairly certain that we will eventually see a metadata recommendation; it is certain that the syntax and capabilities of components implementing that recommendation will be a dramatic change from what we present here. Nevertheless, these experiments have value. Not only do they show us a path forward into a future in which we can communicate more effectively between clients and services, but the final metadata standards that are implemented in commercial parsers will likely be similar in spirit to what we will see here.
We can control whether the parser performs validation with
the validateOnParse property. If this property is true,
which is the default value, the parser
will perform validation when it parses a document. Errors, as we know, turn up
in the parseError object
property.
Using a DTD, we need to declare a DOCTYPE
element declaring the DTD for the document. The technology preview, however,
also allows MSXML to validate a document using an XML Schema. The xmlns
name on an attribute declaring a
namespace is treated specially. The parser will download the resource named by
the attribute value unless the URI bears the urn preficx or a DOCTYPE element has been
declared for the document. So, the resource named by this attribute must be an
XML Schema file.
Having a schema encoded as an XML document means we can use
our existing tools and experience to pick the schema apart and discover its
structure and content constraints. This will usually be used to guide the
construction of a document written according to an unfamiliar schema. It could
also be used to look for overlapping structures in two schemas. For example, if
we suspect two schemas are talking about the same topic, we might compare their
structures. Two group elements composed of the same number and types of
elements, regardless of name, would strongly suggest similarity of the encoded
content. Similar constraints found in datatype schema elements would also be
clues. Of course, the dt:type attributes would
be of additional assistance to us in comparing two schemas.
We're about to embark on an experiment. It is not an uncommon task to format a query for a service as an XML document. The service extracts search criteria from the query document, performs a database query, and returns the results as an XML document written in the vocabulary specified in the directory. So far, we've assumed that the users of a vocabulary would understand it and also understand the required query vocabulary. Is it possible though, to parse a query vocabulary schema document and dynamically generate a query document builder? That is precisely what we shall try in our experiment.
Suppose we have a service that generates documents in the EMPLOYEE
vocabulary in response to queries
from clients. It might be the front end to a database of employee information.
The queries are documents written to an XML schema. We'll assume the client
application has obtained the URL for this schema document from the directory.
For our purposes, we'll allow ourselves to directly input the URL in a Web
page. When the user presses a button labeled Create Form, we want the script in
the page to dynamically generate a user interface for composing a query for the
service. The user interface should present us with the structure from the
schema and allow us to input search criteria in the appropriate places. Once
the user has finished entering values, he can click another button: Complete Query.
The button handler script will compose a query according to the schema using
the input parameters from the form and show us the XML in an alert box. This is
our experimental page before anything is generated:
The source code for the experiment is found in the file QueryBuilder.html.
It can be downloaded from our site http://www.wrox.com/
or run from http://webdev.wrox.co.uk/books/2270/.
We need a schema for queries against the Employee service.
We'll follow a couple of loose conventions. Since schemas can contain any
number of top level element definitions, we need some way of specifying the
root element name in our query vocabulary. Let's follow the DTD practice of
naming the schema for the root node. This convention is solely for the purposes
of our experiment. If this were a production application, we could store the
name of the root in the directory or ask the client application to supply it.
We could also adopt a convention of simply appending the word Query
to the name of the response vocabulary.
A more important convention is to use the names of elements in the response
vocabulary to name elements in the query vocabulary. For example, if we wish to
search by the employee's name, we shall have to supply a name against which to
search. The element supplying this should be called NAME
since that is the matching element name in the Employee vocabulary.
For reasons of simplicity we shall go into later, let's allow batched queries. That is, we will allow the user to provide parameters for searches by name, by hire date, by manager, and by department in a single request document. If all parameters were provided, multiple queries would be generated. Here is the schema we shall use.
<Schema name="EMPLOYEEQUERY"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="NAME" content="textOnly"/>
<ElementType name="BYNAME" content="eltOnly">
<description>Search by employee name</description>
<element type="NAME"/>
</ElementType>
<ElementType name="HIREDATE" content="textOnly" dt:type="date"/>
<ElementType name="OPERATOR" content="textOnly" dt:type="enumeration"
dt:values="GT GE LT LE EQ">
<description>Comparision operator for search</description>
</ElementType>
<ElementType name="BYHIRE" content="eltOnly">
<description>Search with respect to hire date specified</description>
<element type="HIREDATE"/>
<element type="OPERATOR"/>
</ElementType>
<ElementType name="MANAGER" content="textOnly"/>
<ElementType name="BYMANAGER" content="eltOnly">
<description>Search by employee's manager's name</description>
<element type="MANAGER"/>
</ElementType>
<ElementType name="DEPARTMENT" content="textOnly"/>
<ElementType name="BYDEPARTMENT" content="eltOnly">
<element type="DEPARTMENT"/>
</ElementType>
<ElementType name="EMPLOYEEQUERY" content="eltOnly">
<description>Query for Employee Service</description>
<group order="many">
<element type="BYNAME"/>
<element type="BYHIRE"/>
<element type="BYMANAGER"/>
<element type="BYDEPARTMENT"/>
</group>
</ElementType>
</Schema>
This slightly formidable schema simply says that a document
rooted with the EMPLOYEEQUERY element can
contain any mix of BYNAME, BYHIRE, BYMANAGER,
and BYDEPARTMENT elements. Each of these provides
the parameters for an appropriate search. For every search type except BYHIRE,
we simply provide the string for which
to search. Searching by hire date requires the additional specification of an
operator to tell us whether the search should look before or after the input
date.
This schema is not what one would normally expect. One would
normally specify an order attribute of one on the group so that the user would be
asked to select which type of search he wished to perform. This would, however,
add a substantial degree of complexity to our query form builder. In a
production system, we would have to compose multiple pages or replace the form
on our page as the user selected a search type. Since this is a proof of
concept, we'll make the simplifying assumption of allowing any or all of the
search types to be specified at once. That way, our query builder can simply
generate a user input form that captures the entire structure. Note, however,
this isn't exactly what many means. While the
form matching this query schema will have one section for BYNAME, one for BYHIRE,
and so forth, a document could have more than one of these types and be valid.
Again, since we are simply exploring the feasibility of this concept, we'll
leave the fine points for later work.
Given that, how shall we map schema element definitions to form elements?
The core of our query builder is the ability to traverse a
schema document so that we start with the root element of the vocabulary and
recursively retrieve the definitions of its contained elements. In this way, we
learn the structure of the vocabulary from the top down. Since the definitions
can be in any order, we should use the selectSingleNode()
method to search the parse tree for the definitions we require. First, let's
get the ElementType element
corresponding to the root element of our query vocabulary — in our example, EMPLOYEEQUERY:
var nameNode = schema.documentElement.attributes.getNamedItem("name");
if (nameNode != null)
{
queryRootName = nameNode.nodeValue;
var rootDef = schema.documentElement.selectSingleNode("//ElementType[@name='"
+ queryRootName + "']");
if (rootDef != null)
TraverseSchema(schema, rootDef, query, null, qform);
else
alert("ElementType named " + queryRootName + " not found. Requires
root element name.");
}
else
alert("Name not set -- cannot determine root element");
The root element of the schema file is, of course, <SCHEMA>. In the informal convention we
established, the name attribute should
provide the name of the root element in our query vocabulary. If the
attribute is not found, our convention
definitely isn't being observed, so we fail with an error message. If it is
found, the line
queryRootName = nameNode.nodeValue;
will give us the name of the root node, which in this case
will be EMPLOYEEQUERY. Now we use
the parser to search for the ElementType
element whose name attribute matches
the name we just retrieved. Since each element type definition appears exactly
once, we can use selectSingleNode. If the
search turns up empty, the convention isn't being followed — the author of the
schema provided a name that does not match an ElementType
definition. However, if it returns a node, we can begin to descend through the
schema parse tree.
To do this, we use the recursive TraverseSchema()
function:
function TraverseSchema(schemaParser, schemaNode, docParser, docNode, qform)
{
var currentNode;
switch (schemaNode.nodeName)
{
case "ElementType":
BuildQuery(schemaParser, docParser, schemaNode, docNode, qform);
currentNode = BuildDoc(docParser, schemaNode, docNode);
for (var nk = 0; nk < schemaNode.childNodes.length; nk++)
TraverseSchema(schemaParser, schemaNode.childNodes(nk), docParser,
currentNode, qform);
break;
case "group":
var orderType = schemaNode.attributes.getNamedItem("order");
if (orderType == null || orderType.text == "many" ||
orderType.text == "seq")
{
for (var nj = 0; nj < schemaNode.childNodes.length; nj++)
TraverseSchema(schemaParser, schemaNode.childNodes.item(nj),
docParser, docNode, qform);
}
break;
case "attribute":
break;
case "datatype":
break;
case "description":
BuildQuery(schemaParser, docParser, schemaNode, docNode, qform);
break;
case "element":
var elementDef =
schemaParser.documentElement.selectSingleNode(
"//ElementType[@name='" + schemaNode.attributes.getNamedItem(
"type").nodeValue + "']");
if (elementDef != null)
TraverseSchema(schemaParser, elementDef, docParser, docNode, qform);
break;
case
"AttributeType":
nInputItemCount++;
BuildQuery(schemaParser, docParser, schemaNode, null, qform);
BuildDoc(docParser, schemaNode, docNode);
break;
}
}
The switch statement provides the appropriate processing for
each type of schema element we will encounter (recall we're inside the one and
only Schema element). ElementType, group, and element
schema elements are the only elements that provide structural information. The datatype, attribute,
and AttributeType elements may provide attribute
and constraint information, but that is not of interest to us at the moment.
An ElementType element will
either define PCDATA or will contain
element content. For this reason, we call TraverseSchema()
on each child element. The same is true of group.
An <element> tag's type attribute refers to
a corresponding ElementType element, so
we need to do a search for that element to find its definition. The name we
need to search for is specified in the current node's type
attribute and will be found in the target's name
attribute:
var elementDef = schemaParser.documentElement.selectSingleNode("//ElementType[@name='" +
schemaNode.attributes.getNamedItem("type").nodeValue +
"']");
If it is not found, there is a problem and we will be unable
to go deeper into that particular subtree. Normally, however, we should find a
match, in which case we submit it to TraverseSchema()
to continue processing.
If you run this code in a debugger using our example schema,
you will see that we properly traverse the schema. Of course, we haven't
generated any output. The output we are looking for is an HTML form for
eliciting user input, so let's add some code to TraverseSchema()
to generate that form.
Some items are going to require input from the user.
Specifically, we will need inputs whenever an element can have text or mixed
content, and when an attribute is defined other than the datatype namespace attributes, e.g.,
dt:type. We will make some simplifying
assumptions to keep the example from becoming unwieldy. We will generate an
input element in the page when we encounter text and mixed content elements. If
we detect an enumeration datatype, we will create a single selection listbox
and populate it with the individual items in the dt:values
attribute.
There is another, critical simplifying assumption. Groups
present no problem if the order is seq. If the order is many, we can present a
form with all the grouped elements, i.e., as if the order had been seq.
This will generate a document that obeys
the schema, but we cannot generate all the combinations permitted under the many
order attribute value. Similarly, one presents some difficulty. To resolve these
situations, we would require some input from the user. We could obtain this
with dialog boxes if we implemented the query builder as a Java applet or as an
ActiveX control. For our purposes, it is enough to handle seq and many
the same way and omit support for one.
We are also going to omit support for AttributeType. We could handle this in a
manner similar to how we will handle ElementType,
so we're going to leave this as an exercise for production. The datatype
element will be supported to the
extent of assigning the proper dt:type
attribute to elements when the definition occurs in a datatype
element. The other attributes could be used in a production system to support
field level validation.
Now that we've decided what we won't do, let's decide how
we're going to generate the user interface for our supported elements. We've
built the essential navigational features in the preceding section. We will
again traverse the schema in a top-down approach, but we will create the
required user input form elements and the shell of an XML document that follows
the schema as we go. The shell consists of the tags for the finished query
document, but not the text values that the user will provide. The shell
document then becomes our guide when it comes time to retrieve user inputs. We
will traverse the shell document and fill each tag with a value from a
similarly named HTML input element. Here's what the finished page looks like
after generating a page for the EMPLOYEEQUERY
schema:
This isn't the nicest user interface page we've ever seen, but remember that it was built entirely without manual interference. The contents of this form derive entirely from the schema. If we provided the URL for a different XML schema, the form would change. Let's see how we got from simply traversing the schema document to generating an HTML form.
This necessitates some changes to the button handler for creating the form:
var nameNode = schema.documentElement.attributes.getNamedItem("name");
if (nameNode != null)
{
queryRootName = nameNode.nodeValue;
var rootDef = schema.documentElement.selectSingleNode("//ElementType[@name='" +
queryRootName + "']");
if (rootDef != null)
TraverseSchema(schema, rootDef, query, null, qform);
else
alert("ElementType named " + queryRootName + " not found. Requires root
element name.");
}
else
alert("Name not set -- cannot determine root element");
Notice that we have some new parameters in TraverseSchema. We pass in an instance of
MSXML (in the schema variable) for use in building the shell document — query
— and parameters for the current shell document node (null
at the moment) as well as a <DIV>
where we will be generating the user interface, qform.
Now we turn our attention to TraverseSchema().
The actions on ElementType are typical.
In addition to the schema traversal issues, we have issues related to creating
HTML form elements and issues related to building the shell document. We handle
these in the functions BuildQuery() and BuildDoc(), respectively.
case "ElementType":
BuildQuery(schemaParser, docParser, schemaNode, docNode, qform);
currentNode = BuildDoc(docParser, schemaNode, docNode);
for (var nk = 0; nk < schemaNode.childNodes.length; nk++)
TraverseSchema(schemaParser, schemaNode.childNodes(nk), docParser,
currentNode, qform);
break;
BuildQuery() doesn't
change anything in terms of the way we traverse the schema. BuildDoc(),
however, is going to create new
nodes in the shell document, so we will want to keep track of the current node
in the shell document processing. This is handled with the currentNode variable. BuildQuery()
takes both parsers, the current schema node, the current shell document node,
and the <DIV> as
parameters.
function BuildQuery(parserSchema, parserDoc, nodeSchema, nodeDoc, qform)
{
var str, eltType, oEnums;
switch (nodeSchema.nodeName)
{
case "ElementType":
var eltName = nodeSchema.attributes.getNamedItem("name").nodeValue;
qform.insertAdjacentHTML("beforeEnd", eltName);
var contentType = nodeSchema.attributes.getNamedItem("content").text;
switch (contentType)
{
case "empty":
case "eltOnly":
case "eltonly":
qform.insertAdjacentHTML("beforeEnd", "<p/>");
break;
case "textOnly":
case "textonly":
case "mixed":
eltType = nodeSchema.attributes.getNamedItem("dt:type");
if (eltType != null && eltType.text == "enumeration")
{
oEnums = nodeSchema.attributes.getNamedItem("dt:values");
if (oEnums != null)
PopulateEnumeration(eltName + nInputItemCount, oEnums.text,
qform);
}
else
{
str = " <input size='40' name ='" + eltName +
nInputItemCount + "'><p/>";
qform.insertAdjacentHTML("beforeEnd", str);
}
nInputItemCount++;
break;
}
break;
case "description":
qform.insertAdjacentHTML("beforeEnd", nodeSchema.text +
"<p/>");
break;
}
}
In all cases, we are going to write out the name of the element as a label for the user:
var eltName = nodeSchema.attributes.getNamedItem("name").nodeValue;
qform.insertAdjacentHTML("beforeEnd", eltName);
var contentType = nodeSchema.attributes.getNamedItem("content").text;
If the element is empty or contains only element content, we simply write out a paragraph HTML element to provide some formatting. Things get interesting when we reach mixed and text content elements, however. We need to provide an input element and concern ourselves with whether this is a free or an enumerated value:
case "textOnly":
case "textonly":
case "mixed":
eltType = nodeSchema.attributes.getNamedItem("dt:type");
if (eltType != null && eltType.text == "enumeration")
{
oEnums = nodeSchema.attributes.getNamedItem("dt:values");
if (oEnums != null)
PopulateEnumeration(eltName + nInputItemCount, oEnums.text,
qform);
}
else
{
str = " <input size='40' name ='" + eltName +
nInputItemCount + "'><p/>";
qform.insertAdjacentHTML("beforeEnd", str);
}
nInputItemCount++;
break;
}
break;
Note the global variable nInputItemCount.
We need to generate unique identifiers for all the form elements we generate.
We'll need to recreate these when we go back and pick up the user's inputs, so
we've come up with the following rule:
nInputItemCount appended.
As long as we increment this variable when traversing the
shell in the same order as when we traversed the schema, we'll be able to
retrieve the values of the form elements we generate. Here's how PopulateEnumeration() works:
function PopulateEnumeration(sName, sVals, qform)
{
var nStart, nFinish, nValsLength, sOptVal;
// Write out the start of the list box
str = " <select type='select-one' id='" + sName +
"'>";
if (sVals != null)
{
nValsLength = sVals.length;
nStart = 0;
nFinish = sVals.indexOf(" ");
while (nStart < nValsLength && nFinish != -1)
{
sOptVal = sVals.substring(nStart, nFinish);
str += "<OPTION value='" + sOptVal + "'>" + sOptVal +
"</OPTION>";
nStart = nFinish + 1;
nFinish = sVals.indexOf(" ", nStart);
}
if (nStart < nValsLength)
{
sOptVal = sVals.substring(nStart, nValsLength);
str += "<OPTION value='" + sOptVal + "'>" + sOptVal +
"</OPTION>";
}
}
str += "</SELECT><p/>";
qform.insertAdjacentHTML("beforeEnd", str);
}
We always generate a single selection listbox — type="select-one". Parsing the
enumeration values creates the <OPTION>
elements. We know these are delimited by spaces, so the parsing can be
accomplished with the JavaScript String.indexOf()
and String.substring() methods.
That takes care of handling the form building side of ElementType schema nodes.
TraverseSchema() also takes care of building
the shell document:
currentNode = BuildDoc(docParser, schemaNode, docNode);
BuildDoc() looks like
this:
function BuildDoc(docParser, schNode, docNode)
{
var newNode;
switch(schNode.nodeName)
{
case "ElementType":
var eltName = schNode.attributes.getNamedItem("name").nodeValue;
newNode = docParser.createElement(eltName);
if (docNode == null)
docParser.documentElement = newNode;
else
docNode.appendChild(newNode);
var contentType = schNode.attributes.getNamedItem("content").text;
if (contentType == "mixed" || contentType == "textonly" ||
contentType == "textOnly")
{
newNode.appendChild(docParser.createTextNode(""));
}
var eltType = schNode.attributes.getNamedItem("dt:type");
if (eltType != null)
{
var newAttr = docParser.createAttribute("dt:type");
newAttr.nodeValue = eltType.nodeValue;
newNode.attributes.setNamedItem(newAttr);
}
break;
case "AttributeType":
break;
}
return newNode;
}
Whenever we encounter an ElementType
node in the schema, we want to create an element in the shell document that
takes its name from the name attribute of the ElementType element in
the schema. If the current document node is null,
we have an empty document. In that case, the newly created document is the
documentElement in the DOM tree. Otherwise,
the new node is appended as a child of the current node.
It is important to insert a placeholder for the user's
inputs while we are generating the shell document. If we waited until later,
the new PCDATA elements would
continually change the number of child nodes which the mixed content nodes
have. Also, if we insert the placeholder in its parent tag now, we can use this
as a guide when it comes time to retrieve the user's inputs. Since both mixed
content and text nodes require PCDATA at some point, we create a text node and
append it to the current node without changing the current node:
newNode.appendChild(docParser.createTextNode(""));
While processing an ElementType
schema element, we may also encounter a dt:type
attribute. We want to add this to our shell document element to facilitate
typed processing by the service that receives the query we are building.
var eltType = schNode.attributes.getNamedItem("dt:type");
if (eltType != null)
{
var newAttr = docParser.createAttribute("dt:type");
newAttr.nodeValue = eltType.nodeValue;
newNode.attributes.setNamedItem(newAttr);
}
That's the entire picture for TraverseSchema()
and ElementType schema elements. There is one
other small change to TraverseSchema() from the
simple traversal case, and that is our support for description elements. Our
labels are the element names. We hope that the schema author chose descriptive
names, but we will provide one more bit of assistance to our users. By writing
out the contents of the description element, the user will have a bit of text
to help her try and figure out what the schema element means:
case "description":
BuildQuery(schemaParser, docParser, schemaNode, docNode, qform);
break;
In BuildQuery() we saw these
lines:
case "description":
qform.insertAdjacentHTML("beforeEnd", nodeSchema.text +
"<p/>");
break;
Simply put, BuildQuery()
retrieves the text the schema author entered and places it on a line of its
own.
After our user has clicked on the Create Form button, he has a user
interface for providing element values, and we have a shell query document.
When the user clicks the Complete
Query button, we want to complete the shell document with PCDATA
values we retrieve from the user
interface. Our button handler simply resets the nInputItemCount
variable and calls a document traversal function. After traversing the
document, it displays the finished XML document in an alert box.
function OnComplete()
{
var root = query.documentElement;
nInputItemCount = 0;
// Depends on having built a shell query document via the form builder
if (root != null)
{
// Recurse through the tree populating inputs, then display XML
TraverseForm(root);
alert(query.xml);
}
else
alert("You must have built a form prior to selecting this option.");
}
The traversal function has but two functions to perform. First, when a text element is encountered in the shell, it must go out to the user interface and retrieve the value of the corresponding HTML form element. Next, regardless of the node type encountered, it must keep the recursion going to complete the document traversal, which it does by calling itself with each child node of the current node.
function TraverseForm(node)
{
switch
(node.nodeType)
{
case PCDATA:
var str = node.parentNode.nodeName + nInputItemCount++;
node.nodeValue = document.all(str).value;
break;
}
for (var ni = 0; ni < node.childNodes.length; ni++)
TraverseForm(node.childNodes(ni));
}
We can verify that this works by running the query builder
against the schema found in employeequery.xml
and providing the form inputs seen in the user interface illustration:

This proof of concept clearly omits some features that are
mandatory for a production system. AttributeType
schema elements must be supported, and all the attributes of datatype
elements need to be handled. Some
sort of mechanism for handling the order attribute of group elements should be
provided. Ideally this would come in two forms: a dialog-based mechanism for
interaction with human users and a programmatic approach for software clients.
Better formatting of form elements would be nice. It would be useful to have
some mechanism for showing human users a comparison of the known (e.g., EMPLOYEE)
schema for the desired service and
the unknown (e.g., EMPLOYEEQUERY) schema in
order to assist them in determining what to input into the various form
elements.
Some of these items will be influenced by the specific nature of the client application. Others will stem from the syntax of whatever metadata proposal reaches W3C Recommendation status. An appreciation of the issues of traversing an XML schema and mapping it into something useful to a user without manual intervention is what is important at this stage in the evolution of XML metadata.
Metadata clearly has great potential for cooperative applications roaming a wide and loosely knit network. Metadata does the following for XML and its use in implementing our principles:
Of course, as has been noted, all the proposals, save namespaces and RDF, we saw and worked with in this chapter are still W3C Notes, not finished recommendations. For this reason, we shouldn't be too quick to adopt metadata techniques into our toolkit. As we finish this chapter, it is appropriate to reflect on what we know, where metadata is going, and what we should do now to prepare our applications for its eventual inclusion. After all, our fifth principle said in part that services should support extension. Metadata not only supports this, it will be an extension of what we can do now! Here are the things we should be doing:
First, you should encourage vocabulary authors in your organization to develop and document formal metadata for their work. This can be in DTD or other schema form, but the important thing is to capture this knowledge while it is still explicit. Additionally, the effort will force authors to think through the ramifications of their vocabularies. This will result in better, more descriptive vocabularies. When a metadata recommendation is available, the documentation in hand can be translated into models according to the published recommendation. In the meantime, DTDs or XML syntax schemas can be used during development to validate test documents. Failing to do this may result in unintended changes to the vocabulary that will have to be supported as legacy features.
Next consider how you might use metadata in your organization. A closed intranet will have less need for metadata than an application based on an extranet or the public Internet. At most, a query building capability will be needed. Vocabulary discovery is less likely to be required; you should be able to obtain metadata information internally for important service vocabularies. Developers of applications and components that will work in a wider environment will want to consider their metadata discovery needs. A component builder will want to consider searching for useful information in unknown vocabularies. An extranet application might need to learn the structure of specialized vocabularies or search for overlap by examining the structure and content of element and attribute definitions.
Finally, architects of network applications and systems should keep abreast of developments in the W3C Metadata Activity and XML parser vendors. You will want to know the syntax and capabilities of whichever metadata model shapes the final recommendation (or recommendations; there is no guarantee the W3C won't support several in order to address various needs, and vendors may pick and choose what they will implement). You will also want to be ready with a parser or metadata tool that supports the final recommendation.
The reader is reminded to checkhttp://www.w3.org/TR/
frequently to obtain the latest status of W3C efforts.
In this chapter we completed our examination of XML as a data transmission format by taking up the topic of XML metadata. We defined metadata as "data about data". We considered the current state of metadata by considering XML Namespaces, a working draft tangentially related to metadata proper. We looked at the future of metadata in the Web development world by examining the major submissions before the W3C Metadata Activity. These are:
We then considered some future uses for metadata in the context of our development philosophy. Having done so, we demonstrated a prototype query builder. To do this we had to learn the syntax supported in Microsoft's XML Schema technology preview.
Metadata offers enormous power and flexibility that will help make networked applications more cooperative and robust in the face of unexpected data formats. It is a natural continuation of our use of tagged data — XML — for capturing our service data.