Management of XML data: XML catalog

2010-03-29  来源:本站原创  分类:Java  人气:374 

From "http://www.ibm.com/developerworks/cn/xml/x-mxd3.html"

Management of XML data: XML catalog indirect style sheets, DTD and mode
Elliotte Rusty Harold ([email protected]), Associate Professor, Polytechnic University

June 30, 2005

An old programmer saying goes, by increasing the level of indirection that any problems can be solved. This adage also applies to XML. Loading mode, DTD and style sheet for many problems, can be cataloged as through the introduction of XML parser and the network level of indirection between the loader to be the perfect solution. XML catalog allows consumers to use a set of documents to replace URL in the XML document itself provides the actual URL or public identifier. This can increase the speed of XML processing and security.
XML document contains a lot of style sheet positioning, patterns and DTD such as the relative URL. If it is an absolute URL, they may also point to hidden behind the firewall system. Even if these URL is accessible, for performance considerations may also need to use the local cache, rather than repeatedly around a half a world from the same remote network server to download the same DTD.

Such as IBM developerWorks site, using XML templates, which begins like this:

<? xml version = "1.0"?>
<? xml-stylesheet type = "application / xml + xslt" href = "
C: \ IBM developerWorks \ article-author-package \ developerworks \ xsl \ dw-document-html-4.0.xsl "
? "
<dw-document xsi: noNamespaceSchemaLocation =
"C: \ IBM developerWorks \ article-author-package \ developerworks \ schema \ dw-document-4.0.xsd"
xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance">

Note that these pairs of C: \ IBM developerWorks \ article-author-package \ developerworks \ xsl directory style sheet and C: \ IBM developerWorks \ article-author-package \ developerworks \ schema directory reference patterns. These are the Microsoft ® Windows ® operating system path name. I write articles on a Mac machine will save these files in different locations. Therefore, before writing the article first to modify the URL to point to my file system:

<? xml-stylesheet type = "application / xml + xslt"
href ="../ developerWorks/xsl/dw-document-html-4.0.xsl "?>
<dw-document
xsi: noNamespaceSchemaLocation =
"../developerWorks/schema/dw-document-4.0.xsd"
xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance">

The completion of the first draft of the article, I will send it to the editor. Because her machine is running Windows, dealing with this article, she must modify the URL, to point to the new location of the style sheet and models. She edited manuscripts returned exhausted to let me deal with her doubts, I would turn all of the URL change back. I will then revised manuscript back to her, she will forward the article to the developerWorks product group, product groups, but also changes to the third of these URL addresses. This process is not just a general low efficiency.

By maintaining a standard list of URL and the system identifier is mapped to specific locations and their copy, XML catalog can solve this problem. Each user can be common files (such as patterns, DTD and style sheets) stored in different places, as long as he changes the local catalog file and the location of matches can be saved. When the parser, stylesheet processing, schema validation, or other tool to read the document, it can be cataloged in the URL rather than the document itself load the supporting documentation.

In addition to simplifying the work of authors and editors, the catalog there are several advantages. For example, suppose you are from a remote site (for example, www.w3.org) read the XHTML document. Such documents usually contain such a DTD:

<! DOCTYPE html PUBLIC "- / / W3C / / DTD XHTML 1.0 Strict / / EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

If the parser reads the DTD, not only to be loaded from a remote Web server, XML documents must also be farther from the possibility of http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd reading DTD . Network speed and latency become critical. Catalog may be required to use the same DTD parser to load the local copy of the speed much faster.

URL Redirection can also resist certain attacks. For example, XML to your system documentation a person can change the external DTD subset system identifier, thereby changing the validation of the DTD. Catalog so that users choose to use the document parsing the DTD, rather than editing the document of people to choose. However, this redirection does not provide complete protection, because a small number of attacks on the internal DTD subset may be used as a vector, while the catalog does not affect the internal DTD.

In addition to a simple caching features, catalog, or models can also replace the DTD. For example, you may want to use some variant of XHTML DTD, it only defines the entity did not declare any element or attribute. Even from the local system load the full DTD, the DTD parsing and application of up faster. By modifying certain properties of the ATTLIST declaration can also change the default property values. No matter the reasons for choosing catalog, the result is the same: the catalog so that people who read the documentation rather than the responsibility of the person editing the document DTD (or the mode, style sheet).

The syntax of cataloging

Listing 1 shows a simple cataloging. Catalog itself is an XML document. The root element is urn: oasis: names: tc: entity: xmlns: xml: catalog name of the space in the catalog. This catalog contains three public elements, each from a particular public identifier is mapped to a specific URL. For example, the public identifier ID - / / W3C / / DTD XHTML 1.0 Strict / / EN is mapped to the URL file: / / / opt/xml/xhtml/DTD/xhtml1-strict.dtd.

Listing 1. For the simple cataloging XHTML
<? xml version = '1 .0 '?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<public publicId ="-// W3C / / DTD XHTML 1.0 Transitional / / EN "
uri = "file: / / / opt/xml/xhtml/DTD/xhtml1-transitional.dtd" />
<public publicId ="-// W3C / / DTD XHTML 1.0 Strict / / EN "
uri = "file: / / / opt/xml/xhtml/DTD/xhtml1-strict.dtd" />
<public publicId ="-// W3C / / DTD XHTML 1.0 Frameset / / EN "
uri = "file: / / / opt/xml/xhtml/DTD/xhtml1-frameset.dtd" />
</ catalog>

Assumptions used in the catalog configuration parser to read at the beginning of the document as follows:

<! DOCTYPE html PUBLIC "- / / W3C / / DTD XHTML 1.0 Strict / / EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

You do not need a second network connection from the http://www.w3.org download DTD. On the contrary, from the path / opt/xml/xhtml/DTD/xhtml1-strict.dtd Department's local file system to download.

Of course, the catalog can also be redirected to a http URL or relative URL. For example, invoke a local web server on the remote server rather than the DTD a copy of, or references in the same directory with the source document of the DTD.

Cataloging may allow the use of system elements with the systemId attribute to re-map the system identifier, rather than using the public with the publicId attribute elements. This re-mapping may be cited only for the system identifier and public identifier is not referenced DTD and entity definitions useful Listing 2 shows how to use the re-mapped to under the W3C site URL rather than the XHTML DTD public identifier load the local copy. (List 2 in fact just to illustrate, the public identifier is usually more reliable.)

Listing 2. XHTML identifier based on the system catalog
<? xml version = '1 .0 '?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<system systemId = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
uri = "file: / / / opt/xml/xhtml/DTD/xhtml1-transitional.dtd" />
<system systemId = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
uri = "file: / / / opt/xml/xhtml/DTD/xhtml1-strict.dtd" />
<system systemId = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"
uri = "file: / / / opt/xml/xhtml/DTD/xhtml1-frameset.dtd" />
</ catalog>

Not through the system or the general public identifiers referenced style sheets and other documents, you can use the uri element. The element's name attribute specifies the mapping from the URI. uri attribute provides a mapping to the URI. Listing 3 shows how the request will be http://schemas.xmlsoap.org/wsdl/soap/ redirect http://localhost:8888/schemas/soap.xsd.

Listing 3. From the local Web server, load the SOAP model
<? xml version = '1 .0 '?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<uri name = "http://schemas.xmlsoap.org/wsdl/soap/"
uri = "http://localhost:8888/schemas/soap.xsd" />
</ catalog>

Cataloging for the tree is useful to rewrite the entire URL. rewriteSystem elements and rewriteURI directory from a specific server or all the files in the specified an alternative location. Listing 4 shows how http://www.example.com/data/ files in the directory will be redirected to the request http://www.example.net/mirror/.

Listing 4. To rewrite the code URI
<? xml version = '1 .0 '?>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
<rewriteURI uriStartString = "http://www.example.com/data/"
rewritePrefix = "http://www.example.net/mirror/" />
</ catalog>

For example, if the parser is to use the catalog request file http://www.example.com/data/tic/article.xsl, then actually get the documents http://www.example.net/mirror/tic / article.xsl. Adapted to the prefix-based, and is limited to the prefix. Therefore can not, for example, the use of rewriteURI all html file requests redirected to. Xhtml file requests.

System Identifier and URI

Have both uri and the system elements, or rewriteURI and rewriteSystem elements, seems a bit strange. In fact, all of the system identifiers are URI, has never been a structure contains both URI and the system identifier. system and used only for rewriteSystem element in the XML 1.0 specification that is defined as the system identifier of things, mainly the document type declarations and external entity definitions used in the URI. uri and rewriteURI element is used to all the other things.

Although I use a separate catalog file for each model element, but it can be all of these are placed in a catalog. If the same identifier have multiple maps, then the position of front priority. If a resource has multiple identifiers (for example, in both the public identifier and system identifier of the DTD), the behavior depends on the system, although the use of elements in the catalog prefer = "system" or prefer = "public" attribute indicates that the Which one should be chosen.

Cataloging There are also several more advanced features can be used for more complex redirect, including:

The relative URL parsing for the xml: base attribute is used for special types of public and system identifiers loading additional cataloging delegatePublic and delegateSystem element is used to string together multiple cataloging nextCatalog element is used to group elements by a combination of multiple entities in a document in the preamble <? oasis-xml-catalog?> processing instructions specified in the document-specific catalog, however, public, system, rewriteSystem, uri, and rewriteURI sufficient to meet the most common situation.

Top

Cataloging software

A large number of XML software has built-in XML catalog support. For example, Gnome Project's libxml C library automatically loads the / etc / xml / catalog in the catalog. By $ XML_CATALOG_FILES environment variable specified in the search for a new location can change the catalog directory. If you do not want to load any catalog can be $ XML_CATALOG_FILES set to an empty string.

If the program is written in Java ™ language, and use the SAX parser to read XML, you can install the Norm Walsh's catalog filter program (now a part of Apache XML Commons Project) as the EntityResolver. Similarly, TrAX URIResolver can be used to parse XSLT stylesheet xsl: import and xsl: include element as well as document () function in the URL. For example, the following code to configure the SAX parser using the catalog:

EntityResolver resolver = new org.apache.xml.resolver.tools.CatalogResolver ();
XMLReader reader = XMLReaderFactory.createXMLReader ();
reader.setEntityResolver (resolver);

CatalogResolver objects refer to the xml.catalog.files Java System Properties to find a catalog. The property contains a semicolon-separated list of catalog files URI.

Apache Forrest documentation framework and the Apache Cocoon Web publishing framework is to use the XML Commons CatalogResolver class and cataloging files to pick out the service in the document link.

Against the other major tools, libraries and environment there is a similar option. How to load catalog file please refer to the relevant documentation. While the details of activation cataloging support for different tools and libraries varies, but MARC is consistent.

Top

Conclusion

Of the world will never be unified into a single document layout structure. XML documents in the system to undermine the movement between the style sheets, models, DTD, and other meta content. XML Catalog provides a useful indirect level, even if the file is not in the desired location of the document is also able to maintain link integrity. As long as the hope to maintain the XML document and its supporting documents in the heterogeneous system synchronization, rather than simply mirror copy cataloging will be able to play an extraordinary role. By loading locally cached copy rather than the remote network resources, cataloging also improve the speed of XML processing. Finally, by avoiding the exchange of DTD and the XML parser to bypass the firewall to prevent, catalog also can improve security. Because many of the tools you are using may have built-in support for cataloging, so cataloging is easy to solve many difficult problems.

相关文章
  • Management of XML data: XML catalog 2010-03-29

    From "http://www.ibm.com/developerworks/cn/xml/x-mxd3.html" Management of XML data: XML catalog indirect style sheets, DTD and mode Elliotte Rusty Harold ([email protected]), Associate Professor, Polytechnic University June 30, 2005 An old

  • Jdom generate XML data, XML and access methods 2011-04-29

    Use this project to XML, which generates two XML files to write to, and can read the XML content. With Jdom (Jdom is Java language to read, write, operation of the new XML API functions. Jdom tree operation is based on pure Java API, is a set of pars

  • Servlet accept the HTTP request parsing XML data, return XML 2008-05-23

    Servlet to accept submission of project needs over the XML parsing http, the return data. OutputStream output = null; HttpURLConnection conn = null; Document document = null; SAXReader reader = new SAXReader(); URL _url; if (StringUtil.isNotEmpty(xml

  • The use of XSL to the XML data is encrypted and case conversion 2010-03-22

    XML data one of the most common problem is the case of data, often generated during data conversion headaches trouble. Here is a solution. Suppose you have some data to be sent to another system, it also recognized XML format, data, and requires all

  • The non-ajax request returns json, xml data file download prompts that appear 2010-03-05

    The non-ajax requests are returned in the Action json or xml data file download prompt will appear xx.do box.

  • Growth path of servlet reads the database for the XML data output when the garbage problem: 2010-03-19

    servlet reads the database for the XML data output when the garbage problem: En First: Your object is to first get out after completing the acquisition before you set the character set, contradictory set up your first response after it during write o

  • Chinese Table Cell Application - Use the automatic read XML data | # report # China Table Cell 2010-03-19

    Table Cell in China - A Chinese table used in the IE plug-Cell article, the use of the hard-coded way to add data to the report, time consuming and there is a way to automatically populate data is to use XML. Populate the data using the XML format, i

  • Using an XML data provider with the Spark List control in Flex 4 2010-04-06

    Using an XML data provider with the Spark List control in Flex 4 by Peter DeHaan on NOVEMBER 4, 2009 in LIST (SPARK) , XML , XMLList , an XMLListCollection , BETA2 The following example shows how you can use an XML document as a data provider for a S

  • Calling XML data in HTML 2010-04-13

    <html> <head> <style type="text/css"> <!-- p{ font-family:Arial; font-size:15px; } --> </style> <script language="javascript" event="onload" for="window"> var xmlDoc = new ActiveX

  • XML data set for the Tree 2010-05-26

    date.xml <data label="2004"> <result label="Jan-04"> <product label="apple">81156</product> <product label="orange">58883</product> <product label="grape">49280<

  • How to use. NET stored XML data? 2010-06-04

    XML Bulk Load and Updategrams, both client-side technology to use with the annotated outline of the contents of the specified XML document and mapping between database tables; OpenXML is a server-side technology that allows you to define relations in

  • java code for the return JSON or XML data (extJs) 2010-06-10

    package com.hrm.util; import java.util.ArrayList; import java.util.List; import net.sf.json.JSONObject; import com.thoughtworks.xstream.XStream; import com.thoughtworks.xstream.io.xml.DomDriver; /** * Title: Ext JS Auxiliary classes * Description: Th

  • (Original) java xml data access WebService returns an instance of students into the local file 2010-06-17

    Reprinted please indicate the source: http://eric-619.javaeye.com/blog/692838 import java.io.IOException; import java.io.InputStream; import java.net.MalformedURLException; import java.net.URL; import java.net.URLConnection; import java.io.FileNotFou

  • A read and display xml data to the DataGrid on the form procedure 2010-06-18

    1. Functional features of the Demo is read with a specific xml format xml document and data in the table; table with paging and page jump function. 2. Use 1. Open interface, it will display a set of test data; 2. From the text box to the right of cho

  • Jquery ajax requests Struts2 action to return xml data 2010-06-25

    Jquery send ajax request, action or servlet processes the request and return data to xml format, to jquery treatment. 1.jquery send the request: var provinceurl = "/ Struts2/jquery/selectProvince.action"; $. Ajax (( url: provinceurl, type: 'POST

  • Learning jQuery XML data processing records ---- (b) 2010-08-05

    Cipian divided into two parts, but the two are related, the second part of the basis in the first part of the conduct. Part I: Use plain ajax processing XML data. Very simple, to return XML data, in addition to the servlet or JSP in the output XML da

  • Flex parsing XML data 2010-08-27

    Simple Flex project using flex parsing xml ----- These works mainly learning points: Use <mx:HTTPService> </ mx: HTTPService> parsing XML Basic <mx:DataGrid> format Mxml code 1. <mx:DataGrid DataProvider="{http_test.lastResult.bl

  • Operating XML data in ASP Summary (recommendations or direct use ASP.NET) 2010-09-08

    NO.1-- an XML database data.xml <? Xml version = "1.0"?> <records> <record> <name> caca </ name> <qq> 154222225 </ qq> <email> [email protected] </ email> </ Record> <records> NO.2--

  • Xml data analysis using KXML 2010-09-10

    A recent project, the server Xml format for transmission over the data are the need to resolve xml, originally using jsr182 this package, no problem (only downside is that naughty phone does not support jsr182, when it comes to parsing xml data, simu

  • javascript xml data source for the drop-down box 2010-10-10

    Page code: <html> <head> <script language="JavaScript" for="window" event="onload"> var xmlDoc = new ActiveXObject("Microsoft.XMLDOM"); var i=0; var j=0; loadXML(); function loadXML(){ xmlDoc.async