Reviews : Java Books : Java and XSLT :


Title: Java and XSLT
ISBN: 0-596-00143-6, Order Number: 143-6
US Price: $39.95
© O'Reilly & Associates, Inc.

Writing the custom parser

Writing the actual SAX parser sounds harder than it really is. The process basically involves implementing the org.xml.sax.XMLReader interface, which provides numerous methods you can safely ignore for most applications. For example, when parsing a CSV file, it is probably not necessary to deal with namespaces or validation. The code for AbstractXMLReader.java is shown in Example 5-5. This is an abstract class that provides basic implementations of every method in the XMLReader interface except for the parse( ) method. This means that all you need to do to write a parser is create a subclass and override this single method.

Example 5-5: AbstractXMLReader.java

package com.oreilly.javaxslt.util;

import java.io.IOException;
import java.util.*;
import org.xml.sax.*;

/**
 * An abstract class that implements the SAX2 XMLReader 
 * interface. The intent of this class is to make it easy 
 * for subclasses to act as SAX2 XMLReader implementations. 
 * This makes it possible, for example, for them to emit SAX2 
 * events that can be fed into an XSLT processor for 
 * transformation.
 */
public abstract class AbstractXMLReader 
			implements org.xml.sax.XMLReader {
    private Map featureMap = new HashMap(  );
    private Map propertyMap = new HashMap(  );
    private EntityResolver entityResolver;
    private DTDHandler dtdHandler;
    private ContentHandler contentHandler;
    private ErrorHandler errorHandler;

    /**
     * The only abstract method in this class. Derived classes 
     * can parse any source of data and emit SAX2 events to the 
	 * ContentHandler.
     */
    public abstract void parse(InputSource input) 
				throws IOException,
            SAXException;

    public boolean getFeature(String name)
            throws SAXNotRecognizedException, 
				SAXNotSupportedException {
        Boolean featureValue = (Boolean) this.featureMap.get(name);
        return (featureValue == null) ? false
                : featureValue.booleanValue(  );
    }

    public void setFeature(String name, boolean value)
            throws SAXNotRecognizedException, 
				SAXNotSupportedException {
        this.featureMap.put(name, new Boolean(value));
    }

    public Object getProperty(String name)
            throws SAXNotRecognizedException, 
				SAXNotSupportedException {
        return this.propertyMap.get(name);
    }

    public void setProperty(String name, Object value)
            throws SAXNotRecognizedException, 
				SAXNotSupportedException {
        this.propertyMap.put(name, value);
    }

    public void setEntityResolver(EntityResolver entityResolver) {
        this.entityResolver = entityResolver;
    }

    public EntityResolver getEntityResolver(  ) {
        return this.entityResolver;
    }

    public void setDTDHandler(DTDHandler dtdHandler) {
        this.dtdHandler = dtdHandler;
    }

    public DTDHandler getDTDHandler(  ) {
        return this.dtdHandler;
    }

    public void setContentHandler(ContentHandler contentHandler) {
        this.contentHandler = contentHandler;
    }

    public ContentHandler getContentHandler(  ) {
        return this.contentHandler;
    }

    public void setErrorHandler(ErrorHandler errorHandler) {
        this.errorHandler = errorHandler;
    }

    public ErrorHandler getErrorHandler(  ) {
        return this.errorHandler;
    }

    public void parse(String systemId) throws IOException, 
		SAXException {
        parse(new InputSource(systemId));
    }
}
Note: Color coded lines have been broken for display purposes.

Creating the subclass, CSVXMLReader, involves overriding the parse( ) method and actually scanning through the CSV file, emitting SAX events as elements in the file are encountered. While the SAX portion is very easy, parsing the CSV file is a little more challenging. To make this class as flexible as possible, it was designed to parse through any CSV file that a spreadsheet such as Microsoft Excel can export. For simple data, your CSV file might look like this:

Burke,Eric,M
Burke,Jennifer,L
Burke,Aidan,G

The XML representation of this file is shown in Example 5-6. The only real drawback here is that CSV files are strictly positional, meaning that names are not assigned to each column of data. This means that the XML output merely contains a sequence of three <value> elements for each line, so your stylesheet will have to select items based on position.

Example 5-6: Example XML output from CSV parser

<?xml version="1.0" encoding="UTF-8"?>
<csvFile>
  <line>
    <value>Burke</value>
    <value>Eric</value>
    <value>M</value>
  </line>
  <line>
    <value>Burke</value>
    <value>Jennifer</value>
    <value>L</value>
  </line>
  <line>
    <value>Burke</value>
    <value>Aidan</value>
    <value>G</value>
  </line>
</csvFile>

One enhancement would be to design the CSV parser so it could accept a list of meaningful column names as parameters, and these could be used in the XML that is generated. Another option would be to write an XSLT stylesheet that transformed this initial output into another form of XML that used meaningful column names. To keep the code example relatively manageable, these features were omitted from this implementation. But there are some complexities to the CSV file format that have to be considered. For example, fields that contain commas must be surrounded with quotes:

"Consultant,Author,Teacher",Burke,Eric,M
Teacher,Burke,Jennifer,L
None,Burke,Aidan,G

To further complicate matters, fields may also contain quotes ("). In this case, they are doubled up, much in the same way you use double backslash characters (\\) in Java to represent a single backslash. In the following example, the first column contains a single quote, so the entire field is quoted, and the single quote is doubled up:

"test""quote"

This would be interpreted as:

test"quote,Teacher,Burke,Jennifer,L

The code in Example 5-7 shows the complete implementation of the CSV parser.

How to Add Java Applets to Your Site

New on the Java Boutique:

New Review:

Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling API boasts simplicity, ease-of-integration, a well-rounded feature set, and it's free!

New Applet:

Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA sequences into three useful formats.

Elsewhere on internet.com:

WebDeveloper Java
Lots of Java information on webdeveloper.com

WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.

ScriptSearch Java
Hundreds of free Java code files to download.

jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.