Schema-Aware Parsing and Saxon

If you are using XSD for your DITA documents you will very likely want to process them outside of the Toolkit or an IDE like OxygenXML. Doing this requires a little bit of one-time setup.

At the time of writing, the Saxon XSLT engine is packaged in three versions: Home Edition, Professional Edition, and Enterprise Edition. Of these three packages, only Enterprise Edition provides schema-aware XSLT processing directly.

However, because Saxon can use any JAXP parser, you can configure it to use a schema-aware processor by simply turning on a couple of options on the Apache Xerces parser.

You do this by creating a simple Java class that wraps the Xerces parser to set its configuration and then use that class as the parser class used by Saxon, which you can specify on the command line.

This configuration is done for you automatically by the Open Toolkit, but if you want to run Saxon outside of the Toolkit you may need to set this up.

The Java class looks like this:
package org.example.xerces;

import org.apache.xml.resolver.CatalogManager;
import org.apache.xml.resolver.tools.ResolvingXMLReader;
import org.xml.sax.SAXNotRecognizedException;

/**
 *
 */
public class SchemaValidatingCatalogResolvingXMLReader extends
		ResolvingXMLReader {

	/**
	 * @throws Throwable 
	 * @throws SAXNotRecognizedException 
	 * 
	 */
	public SchemaValidatingCatalogResolvingXMLReader() throws SAXNotRecognizedException, Throwable {
		super();
		init();
	}

	/**
	 * @throws Throwable 
	 * @throws SAXNotRecognizedException 
	 * 
	 */
	private void init() throws SAXNotRecognizedException, Throwable {
		// System.err.println(" + INFO: Using " + this.getClass().getName());
		this.setFeature("http://xml.org/sax/features/validation", false);
		this.setFeature("http://apache.org/xml/features/validation/schema", true);
		this.setFeature("http://apache.org/xml/features/validation/dynamic", true);
	}

	/**
	 * @param catalogManager
	 * @throws Throwable 
	 * @throws SAXNotRecognizedException 
	 */
	public SchemaValidatingCatalogResolvingXMLReader(CatalogManager catalogManager) throws SAXNotRecognizedException, Throwable {
		super(catalogManager);
		init();
	}

}

The class shown depends on the Apache resolver.jar library, which does the catalog resolution you need.

To use it you would normally package this class into a jar, e.g., "mySchemaParser.jar," include it in the Java class path you use to run Saxon, and specify the class name on the "-x" parameter to Saxon, for example, the following:
set CLASSPATH=%CLASSPATH%;%PROJECT_PATH%\java\lib\resolver.jar;
set CLASSPATH=%CLASSPATH%;%PROJECT_PATH%\java\lib\saxon.jar;
set CLASSPATH=%CLASSPATH%;%PROJECT_PATH%\java\lib\mySchemaParser.jar;

java -cp %CLASSPATH% net.sf.saxon.Transform -x org.example.xerces.SchemaValidatingCatalogResolvingXMLReader $1 $2 $3