In this article I'm going to show a common use case of XML Catalogs. Their usage is not only recommended to avoid certain issues but can also drastically improve the performance. I'll start with explaining the issue that I've faced recently and will conclude with the resolution.
Issue
To start with, I've got the following exception:
java.io.IOException: Server returned HTTP response code: 429 for URL: http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd
The HTTP code 429 stands for "Too Many Requests" that can appear when:
The user has sent too many requests in a given amount of time. Intended for use with rate limiting schemesJust to provide some context, I have an Apache Cocoon based application that does a lot of XSLT processing with Saxon. It appears that every time Saxon reads an xml document with a DTD reference, it tries to fetch the DTD source for validation. Obviously, if the processing rate is high enough and there is no caching, you can create a lot of excessive network traffic and hit the rate limit. The same issue has been kindly explained by W3C.
Solution
XML Catalog maps resources addresses to local copies of the same resources. Thus, the use of XML Catalogs can bring big benefits when there are many external references in your xml documents. Finally, let's look at an example catalog that resolved the above issue by using local SVG DTD files:
PUBLIC "-//W3C//DTD SVG 1.1//EN" "svg11.dtd"
So it looks pretty simple mapping the SVG formal public identifier to the local copy of the main DTD file. Both this file named catalog and all the required SVG DTD files are located under META-INF/cocoon/entities/catalog as a standard location for Cocoon. Now as you can read in How to use a catalog file and Cocoon catalog documentation, we need to create a CatalogManager.properties file that must be placed in the Java classpath:
catalogs=META-INF/cocoon/entities/catalog relative-catalogs=false static-catalog=yes verbosity=1
To conclude, XML Catalog appeared to me as a not really well-known mechanism that must be used as a good practice. Besides avoiding the rate limit issue, it helped to increase the performance several times in certain cases. This can happen if the application is hidden behind a slow proxy and the DTD is fetched dozens of times on a pipeline.
Comments
Post a Comment