Pither.com / Simon
Development, systems administration, parenting and business

Custom (private) DTD in HTTPBuilder XML response

The Groovy HTTPBuilder library is extremely useful for interacting with all sorts of things, including remote web services that send back XML responses. In these cases HTTPBuilder will automatically use Groovy's XmlSlurper to parse the response and give you back a GPathResult to make navigating the XML super easy.

However today I've been trying to parse XML that's returned (from a site I have no control over) with only 'SYSTEM "awkward.dtd"' provided to define the DTD. I do have a copy of the DTD and I'm quite happy for it to be used in the parsing, however with only the filename specified it turned out to be fairly tricky to control how the file was found.

Generally speaking, you can pass a custom EntityResolver to XmlSlurper to help it find DTDs. However in this case I don't have direct access to the XmlSlurper instance - HTTPBuilder is doing it for me.

HTTPBuilder (the class) uses a ParserRegistry to provide the default XML parsing method. Looking at the source code it seems that HTTPBuilder already provides a custom, catalog based, EntityResolver to XmlSlurper. ParserRegistry also provides a convenient [addCatalog](http://groovy.codehaus.org/modules/http-builder/apidocs/groovyx/net/http/ParserRegistry.html#addCatalog(java.net.URL)) method to allow extra catalog definitions, which are intended to define (or at least help resolve) the location of DTDs. The tricky bit was working out what to put in a new catalog!

Firstly I wanted to get some debug information out, so that I could see more clearly how my DTD was currently being searched for. The CatalogResolver and related classes will load a few properties from a CatalogManager.properties file on the classpath. To switch on most debug messages, I only needed one property:

verbosity=5

I'm doing all of this in a Grails application, so I created this file in src/java to get it added to the root of my running application classpath.

This revealed just one method call before the exception...

resolveSystem(file:///home/simon/path/to/webapp/awkward.dtd)
2010-11-16 15:45:53,355 [http-8080-1] ERROR errors.GrailsExceptionResolver  - /home/simon/path/to/webapp/awkward.dtd (No such file or directory)
java.io.FileNotFoundException: /home/simon/path/to/webapp/awkward.dtd (No such file or directory)

So it's already resolved the filename to an absolute path (I could just put the DTD file there, but that wouldn't help for the production environment) and is searching for it as a System ID.

This meant my extra catalog definition would have to manipulate the system ID to get the DTD found. I failed miserably to find any (readable and current) documentation on how to write these catalog files, so I ended up reading the source of the SAX parser to figure out what I needed...

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
  "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog" prefer="system" xml:base=".">

  <!-- http://xml.apache.org/commons/components/resolver/resolver-article.html -->

  <systemSuffix systemIdSuffix="awkward.dtd" uri="awkward.dtd"/>

</catalog>

The final step was to get this catalog file loaded (again from the classpath via placement in src/java). The catalog resolver used in HTTPBuilder is static, so I only needed to load it once in BootStrap.groovy...

import groovyx.net.http.ParserRegistry

class BootStrap {
  def init = { servletContext ->
    ParserRegistry.addCatalog(MyClass.getResource('/my-catalog.xml'))
  }
}
Add a comment