The Resource Description Framework (RDF) is certainly the most ambitious of all the metadata efforts from the W3C Metadata Activity, it became a W3C Recommendation on the 22nd February 1999. RDF is a syntax for describing resources. Resources are defined as anything that can be designated by a URI. RDF does not specify a vocabulary for describing resources. Rather, it provides the means for vocabulary authors to build up descriptions and facts about some topic of interest. It was influenced by the W3C experience with PICS, but it attempts to break out of the narrow model of PICS by providing a generalized model for describing resources.
RDF is a model for defining statements about resources. Each resource possesses one or more properties, each of which has a value. The model provides a means of defining classes of resources and properties. These classes are used to build statements which assert facts about the resource. RDF defines a syntax for writing a schema for a resource. A schema is analogous to a DTD, but is much more expressive. The schema uses the model defined for some vocabulary to express the structure of a document in the vocabulary. The statements in the model place constraints on the statements that can be made in a document conforming to the schema.
The basic RDF model is built from three types of objects:
Resources can be almost anything: a document, a collection of documents, a site, even a specific portion of a document. This allows RDF to describe almost anything that can be placed online.
Properties have well-defined meanings. This means that constraints are placed on a property to define the types of resources to which it can be applied, the range and types of values it can take on, and how it relates to other properties. These constraints are a major reason why RDF is so expressive — the constraints give meaning to the properties, and hence to the resources they describe.
Statements are triplets consisting of a subject resource, a predicate property, and an object value. Objects can be literal values or resources, making complex statements possible. Consider the natural language statement:
The topic of urn:this-book is designing distributed applications.
The subject resource is urn:this-book. The property is topic, and the object is designing distributed applications.
Strictly speaking, properties are a subtype of resources. This is important from a theoretical perspective, but it is simpler for our introductory purposes to think of them as entirely separate entities. Our common sense view of them as separate items will make it easier to conceptualize the RDF model.
One property defined in the basic RDF model is type. This gives RDF a way to assign types to
resources. Resources and properties use a class typing mechanism, so a given
resource may be said to be a subtype of another class type. The RDF namespace
has names for the class of resources and the subClassOf
a property. By successively defining new classes of resources and properties, a
vocabulary builder can develop RDF statements of arbitrary complexity and
meaning.
Constraints are a specialized type of property. They are further refined in the range and domain of the property. Where typing gives us specialized properties, constraints bound a property, thereby giving it definition and meaning.
RDF also defines a variety of containers and collection classes. As we have seen in the previous chapter, it is often necessary to discuss collections of objects. RDF's container classes are much more sophisticated than ours. They define a variety of ordering and containment models.
An examination of RDF container classes is outside the scope of this book. The
full W3C RDF Recommendation can be found at http://www.w3.org/TR/REC-rdf-syntax/
RDF would be of little more than theoretical value if it did not include a format for transmitting data models. The creators of RDF chose to define an XML vocabulary for this task. This vocabulary defines resources and properties in a typed system similar to object oriented languages like C++ and Java.
The terminology of RDF can be overly theoretical in places. A few words on terminology for those of us who are not set theorists is therefore in order. RDF is a model for talking about things. Those things we can discuss, use, or otherwise refer to in an RDF schema are called resources. Both classes and properties are kinds of resources in the RDF model. Each property has a range – the set of values it can talk on – and a domain – the class to which the property applies.
Let's illustrate these concepts with a very simple RDF schema. Suppose we wish to talk about our retail customers. For generality, we'd like to say that retail customers are a specialized type of some customer class. This is done with the following lines:
<rdfs:Class rdf:ID="Customer">
<rdfs:comment>Generic class for describing customers</rdfs:comment>
<rdfs:subClassOf
rdf:resource="http://www.w3.org/TR/WD-rdf-schema#Resource"/>
</rdfs:Class>
<rdfs:Class rdf:ID="RetailCustomer">
<rdfs:comment>Derived class for describing retail customers</rdfs:comment>
<rdfs:subClassOf
rdf:resource="#Customer"/>
</rdfs:Class>
The rdf and rdfs namespaces are part of the RDF proposal
and are declared elsewhere in our schema document. Our class named Customer is a subclass of the RDF-defined
class resource. RetailCustomer,
then, is a subclass of Customer. Now let's give
our customer a way to pay for his purchases. RetailCustomer
should have a property that will take on one of the names of a set of credit
cards. That is accomplished with this property definition:
<rdf:Property ID="paymentType">
<rdfs:range rdf:resource="#CreditCards"/>
<rdfs:domain rdf:resource="#RetailCustomer"/>
</rdf:Property>
Our property is named paymentType.
It takes on a value from the class CreditCards,
which we shall define shortly. The property's domain — the class to which it
can apply — is the class RetailCustomer. We know
that the values for this property will be a limited number of strings naming
the major credit card types. First we define a class of literals.
<rdfs:Class rdf:ID="CreditCards"/>
Next we define some literal values of this type:
<CreditCards rdf:ID="MasterCard"/>
<CreditCards rdf:ID="AmericanExpress"/>
<CreditCards rdf:ID="Visa"/>
<CreditCards rdf:ID="OtherCredit"/>
Perhaps we are interested in keeping track of who referred
this customer to us. This should be a property whose value is a resource of the
type Customer. This allows us to have any sort of
customer derived class as a value for this property. That way, we could have
referrals from RetailCustomer instances
or as-yet undefined WholesaleCustomer
instances without having to enumerate these specific derived classes.
Similarly, if we derive more classes from Customer,
the referrer can participate in these relationships without modifying the range
declaration.
<rdf:Property ID="referrer">
<rdfs:range rdf:resource="#Customer"/>
<rdfs:domain rdf:resource="#RetailCustomer"/>
</rdf:Property>
Our property is called referrer,
it can be applied to the RetailCustomer class, and
its value must be a resource of the Customer
class. Since we have previously defined that class, no further specification is
necessary. Here's the full text of our simple RDF schema:
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
xmlns:rdfs="http://www.w3.org/TR/WD-rdf-schema#">
<rdfs:Class rdf:ID="Customer">
<rdfs:comment>Generic class for describing customers</rdfs:comment>
<rdfs:subClassOf rdf:resource=
"http://www.w3.org/TR/WD-rdf-schema#Resource"/>
</rdfs:Class>
<rdfs:Class rdf:ID="RetailCustomer">
<rdfs:comment>Derived class for describing retail customers</rdfs:comment>
<rdfs:subClassOf rdf:resource="#Customer"/>
</rdfs:Class>
<rdf:Property ID="paymentType">
<rdfs:range rdf:resource="#CreditCards"/>
<rdfs:domain rdf:resource="#RetailCustomer"/>
</rdf:Property>
<rdf:Property ID="referrer">
<rdfs:range rdf:resource="#Customer"/>
<rdfs:domain rdf:resource="#RetailCustomer"/>
</rdf:Property>
<rdfs:Class rdf:ID="CreditCards"/>
<CreditCards rdf:ID="MasterCard"/>
<CreditCards rdf:ID="AmericanExpress"/>
<CreditCards rdf:ID="Visa"/>
<CreditCards rdf:ID="OtherCredit"/>
</rdf:RDF>
RDF, quite simply, is far too ambitious for our purposes. Many of its assignments are nothing more than names. A complicated system of mappings between names and resources is needed to discern meaning. More advanced features, e.g., ranges, domains, and container classes, are needed to communicate metadata regarding the topic under discussion. These features, however, are a bit too much for the simple kinds of automated metadata applications we are likely to support in the immediate future. If RDF can be supported, it is a powerful mechanism for communicating intellectual models. Our needs, however, are somewhat simpler.
Indeed, both XML and our development philosophy share the belief that simple features that can be readily implemented are more useful than complex features that can be implemented only with great difficulty. Given some XML vocabulary, we'd like to be able to discover the proper structure for a document that conforms to that vocabulary. This is far simpler to implement. We really need a better way of encoding a DTD. This is what the remaining proposals aim to achieve.
The Meta Content Framework (MCF) is similar to RDF, although
it doesn't seem to have influenced quite so many later efforts as has RDF. Like
RDF (and, indeed, most of the metadata proposals), the MCF uses a directed
graph model of nodes and edges to build conceptual models. Objects are the
nodes and property values are the edges. An XML vocabulary is provided for
encoding MCF models. Subclassing and inheritance is permitted. Like RDF, a core
set of property and object types are used to describe more complicated types,
and so forth until the complete metadata model is described. An interesting
property of MCF is that its authors anticipated using MCF to define
componentized blocks of metadata. These blocks would then be combined through
the XML linking specification to compose complete metadata models. In this way,
MCF blocks found to be useful to particular problems could be reused by other
vocabulary authors working on related problems. The following illustration
shows a simple MCF schema for this book. The book object is derived from the
category (MCF’s term for class) Book, which
in turn derives from Document. The book has
chapters (i.e., the book is the domain of the Chapter
category) which takes their values from the category English_Prose.
That category is derived from the category text.
Note that typeof, domain, and range
are properties of their respective objects.
Here’s the XML document that captures the information in the illustration above:
<xml-mcf>
<Category id="Designing_Distributed_Applications">
<name>Designing_Distributed_Applications</name>
<superType unit="Book"/>
<description>The category whose sole member is this book</description>
</Category>
<Category id="Book">
<name>Book</name>
<superType unit="Document"/>
<description>The notion of a bound book</description>
</Category>
<!-- The supertype, Page, is a category from MCF itself. -->
<Category id="Document">
<name>Document</name>
<superType unit="Page"/>
<description>A generalized document</description>
</Category>
<Category id="Chapter">
<name>Chapter</name>
<superType unit="Page"/>
<description>The notion of an organized sequence of pages</description>
<domain unit="Designing_Distributed_Applications"/>
<range unit="English_Prose"/>
</Category>
<Category id="English_Prose">
<name>English_Prose</name>
<superType unit="text"/>
<description>The notion of prose written in English</description>
</Category>
<Category id="text">
<name>text</name>
<superType unit="Page"/>
<description>The notion of some organized natural language</description>
</Category>
</xml-mcf>
The W3C MCF Note can be found at http://www.w3.org/TR/NOTE-MCF-XML/.
XML Data is an ambitious proposal for the definition of
schemas. Like RDF, it can express both conceptual and syntactic models. To
clarify, a DTD is an example of a syntactic model – it specifies the allowable
syntax of some vocabulary, whereas a relational database schema is a conceptual
model, as it describes things and the relations between things in the model.
XML Data also uses an XML vocabulary as its documentation format. It can
express all the information of a conventional XML DTD, but it adds strong
typing of elements and attributes. In addition, constraints may be placed on
the value and use of an element. XML Data also supports inheritance of types,
which allows us to conveniently extend existing definitions. Further aiding
authors of schemas is the ability to use a defined element type as a complex
structure. Hence, our RetailCustomer from the
RDF discussion may be used as a basic type in later schemas.
Unlike a DTD, an XML Data schema allows you to declare a
model open. In an open model, the syntactic rules
laid down in the schema do not preclude the inclusion of content not covered in
the schema. This might be useful in cases when we wish to precisely define some
content but are indifferent to other content that might be added to documents.
If the model is declared closed, an XML Data schema specifies content in the
same formal manner as a DTD. In which case, all content must be explicitly
described in the schema to be permitted in a document conforming to the model.
In order to embrace conceptual models such as relational database schemas, XML
Data introduces relations, a concept in which an element acts as a reference to
another. This is like the notion of primary and foreign keys in a database; an
element contained in one item of content establishes a relationship with
another item of content. The element in question is a key or index into the
other content. Aliases are also permitted. This allows us to establish subtle
concepts. An element can have an alias, or correlative in XML Data's terminology,
which establishes the context of a relationship. For example, we might have a STUDIED
element with the correlative STUDENT. This establishes that STUDIED
is an alias for STUDENT, in the context of the student's
relation to the topic she studies.
We will not discuss XML Data and the related proposal that follows, XML Document Content Description, in great depth because a partial implementation is included with the version of MSXML that ships with Internet Explorer 5.0. This partial implementation, intended as a technology preview, is termed XML Schema. We will discuss its implementation at length and develop some prototype code using it later in this chapter.
For further information on XML Data see the W3C Note on
their Web site at http://www.w3.org/TR/1998/NOTE-XML-data/
The XML Document Content Description (DCD) proposal is an attempt to extract the subset of XML Data's features that permit the encoding of a DTD in XML. It is thus a simplification of XML Data that addresses a pressing need in a valuable way. Its authors modified the syntax of XML Data so that DCD would be more closely aligned with RDF.
DCD also offers a few features that cannot be expressed in an XML 1.0 DTD. The first, and perhaps most important to the exchange of business data using XML, is the ability to specify the data type of elements and attributes. One criticism of XML is that it expresses all values as text, leaving the native data type in question. DCD identifies a host of native types drawn from common programming languages as well as the core tokenized types defined in XML 1.0.
DCD explored two additional features in appendices to the
main submission. The first is the ability to nest element type definitions
within other definitions in order to declare an element type with scope local
to the containing element type definition. The second, of somewhat broader use,
is the inheritance and subclassing mechanism. This borrows a powerful technique
from the world of object oriented programming. Element and attribute type
definitions can be extensions of simpler type definitions. When a type
definition includes the keyword element <Extends
Type="some_type_definition"/>,
it inherits all the elements and properties previously defined for the class some_type_definition.
For further information on DCDs see the W3C Note on their
Web site at http://www.w3.org/TR/NOTE-dcd
Internet Explorer 5.0 supports metadata in several ways. First, it uses the current draft of the namespaces specification. Second, it uses namespaces to provide an approach to typing of elements. This is coupled with Microsoft's extensions to the DOM so that a program can retrieve the value of an element in either text (i.e., as it appears in the document) or native binary data format (e.g., int, float). Finally, it offers a technology preview termed XML Schema. This is based on the XML Data proposal, but only supports the feature subset that is also part of the XML DCD proposal. These features may be used to explore the metadata in XML and suggest ways we could use it in our applications.
The various metadata efforts seen in this chapter cover a spectrum from the highly ambitious to the narrowly focused. Each minimally gives us a way to capture the same metadata about a vocabulary that a DTD expresses. Each goes further, however, adding more expressive techniques for describing data. That is what is interesting to us in terms of the third principle of developing cooperative network applications:
3. Services shall be provided as self-describing data.
The more descriptive our data can be, the better. An automated consumer of service data such as an agent may encounter an unfamiliar vocabulary. Unlike a human consumer, the robot needs a great deal of help in exploring the data. When the thicket of metadata efforts is cleared, service programmers will have a very powerful tool for providing that help. Since these efforts use XML for their own syntax, we have the added benefit of being able to reuse MSXML and other XML parsers with which we may be familiar.
There are many occasions when the textual contents of an element represent a typed value other than text in the domain we are describing. This is most obvious in the case of numeric values. The integer 1234 requires two bytes of storage in its native form on a PC. In XML's default character encoding, it consumes four bytes. Worse, before we can use it in calculations, we must perform a conversion from the string to the numeric form. Beyond the issues of storage and conversion, if we simply use unadorned text the type of data is implicit knowledge. If we use the data type namespace, however, we can make the type explicit. This might be useful to us if we wanted to examine a document in an unknown format. For example, a graphing component might search a document for collections of numeric types. If found, these could be presented to the user for selection of what data to put in a graph. Use of the data type namespace also allows us to manipulate data in native form. For example, if I have this element
<VELOCITY dt:type="r8">1.5E5</VELOCITY>
I can retrieve it as either the string 1.5E5 or as an eight-byte floating point
numeric value. The DOM extensions to support this consist of two properties of
the Node class:
| Property | Description |
nodeTypedValue
|
read-write; typed value of the node |
dataType
|
read-write; the type of the node |
Twenty-five types frequently encountered in programming languages are supported. Additionally, the XML 1.0 recommendation defines ten enumerated or tokenized types and these are supported as well. The definitive list of types supported is found at http://www.microsoft.com/workshop/xml/schema/reference/datatypes.asp.