Skip to main content

Using XML Catalogs in Cocoon

In this article I'm going to show a common use case of XML Catalogs. Their usage is not only recommended to avoid certain issues but can also drastically improve the performance. I'll start with explaining the issue that I've faced recently and will conclude with the resolution.

Issue
To start with, I've got the following exception:
java.io.IOException: Server returned HTTP response code: 429 for URL: 
http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd
The HTTP code 429 stands for "Too Many Requests" that can appear when:
The user has sent too many requests in a given amount of time. Intended for use with rate limiting schemes
Just to provide some context, I have an Apache Cocoon based application that does a lot of XSLT processing with Saxon. It appears that every time Saxon reads an xml document with a DTD reference, it tries to fetch the DTD source for validation. Obviously, if the processing rate is high enough and there is no caching, you can create a lot of excessive network traffic and hit the rate limit. The same issue has been kindly explained by W3C.

Solution
XML Catalog maps resources addresses to local copies of the same resources. Thus, the use of XML Catalogs can bring big benefits when there are many external references in your xml documents. Finally, let's look at an example catalog that resolved the above issue by using local SVG DTD files:
PUBLIC "-//W3C//DTD SVG 1.1//EN" "svg11.dtd"
So it looks pretty simple mapping the SVG formal public identifier to the local copy of the main DTD file. Both this file named catalog and all the required SVG DTD files are located under META-INF/cocoon/entities/catalog as a standard location for Cocoon. Now as you can read in How to use a catalog file and Cocoon catalog documentation, we need to create a CatalogManager.properties file that must be placed in the Java classpath:
catalogs=META-INF/cocoon/entities/catalog
relative-catalogs=false
static-catalog=yes
verbosity=1
To conclude, XML Catalog appeared to me as a not really well-known mechanism that must be used as a good practice. Besides avoiding the rate limit issue, it helped to increase the performance several times in certain cases. This can happen if the application is hidden behind a slow proxy and the DTD is fetched dozens of times on a pipeline.

Comments

Popular posts from this blog

Connection to Amazon Neptune endpoint from EKS during development

This small article will describe how to connect to Amazon Neptune database endpoint from your PC during development. Amazon Neptune is a fully managed graph database service from Amazon. Due to security reasons direct connections to Neptune are not allowed, so it's impossible to attach a public IP address or load balancer to that service. Instead access is restricted to the same VPC where Neptune is set up, so applications should be deployed in the same VPC to be able to access the database. That's a great idea for Production however it makes it very difficult to develop, debug and test applications locally. The instructions below will help you to create a tunnel towards Neptune endpoint considering you use Amazon EKS - a managed Kubernetes service from Amazon. As a side note, if you don't use EKS, the same idea of creating a tunnel can be implemented using a Bastion server . In Kubernetes we'll create a dedicated proxying pod. Prerequisites. Setting up a tunnel. ...

Notes on upgrade to JSF 2.1, Servlet 3.0, Spring 4.0, RichFaces 4.3

This article is devoted to an upgrade of a common JSF Spring application. Time flies and there is already Java EE 7 platform out and widely used. It's sometimes said that Spring framework has become legacy with appearance of Java EE 6. But it's out of scope of this post. Here I'm going to provide notes about the minimal changes that I found required for the upgrade of the application from JSF 1.2 to 2.1, from JSTL 1.1.2 to 1.2, from Servlet 2.4 to 3.0, from Spring 3.1.3 to 4.0.5, from RichFaces 3.3.3 to 4.3.7. It must be mentioned that the latest final RichFaces release 4.3.7 depends on JSF 2.1, JSTL 1.2 and Servlet 3.0.1 that dictated those versions. This post should not be considered as comprehensive but rather showing how I did the upgrade. See the links for more details. Jetty & Tomcat. JSTL. JSF & Facelets. Servlet. Spring framework. RichFaces. Jetty & Tomcat First, I upgraded the application to run with the latest servlet container versio...

Managing Content Security Policy (CSP) in IBM MAS Manage

This article explores a new system property introduced in IBM MAS 8.11.0 and Manage 8.7.0+ that enhances security but can inadvertently break Google Maps functionality within Manage. We'll delve into the root cause, provide a step-by-step solution, and offer best practices for managing Content Security Policy (CSP) effectively. Understanding the issue IBM MAS 8.11.0 and Manage 8.7.0 introduced the mxe.sec.header.Content_Security_Policy   property, implementing CSP to safeguard against injection attacks. While beneficial, its default configuration restricts external resources, causing Google Maps and fonts to malfunction. CSP dictates which domains can serve various content types (scripts, images, fonts) to a web page. The default value in this property blocks Google-related domains by default. Original value font-src 'self' data: https://1.www.s81c.com *.walkme.com; script-src 'self' 'unsafe-inline' 'unsafe-eval' ...