Writing the custom parser
Writing the actual SAX parser sounds harder than it really is. The
process basically involves implementing the
org.xml.sax.XMLReader interface, which provides numerous
methods you can safely ignore for most applications. For example, when
parsing a CSV file, it is probably not necessary to deal with
namespaces or validation. The code for
AbstractXMLReader.java is shown in
Example 5-5. This is an abstract class
that provides basic implementations of every method in the
XMLReader interface except for the parse( )
method. This means that all you need to do to write a parser is create
a subclass and override this single method.
Example 5-5:
AbstractXMLReader.java
package com.oreilly.javaxslt.util;
import java.io.IOException;
import java.util.*;
import org.xml.sax.*;
/**
* An abstract class that implements the SAX2 XMLReader
* interface. The intent of this class is to make it easy
* for subclasses to act as SAX2 XMLReader implementations.
* This makes it possible, for example, for them to emit SAX2
* events that can be fed into an XSLT processor for
* transformation.
*/
public abstract class AbstractXMLReader
implements org.xml.sax.XMLReader {
private Map featureMap = new HashMap( );
private Map propertyMap = new HashMap( );
private EntityResolver entityResolver;
private DTDHandler dtdHandler;
private ContentHandler contentHandler;
private ErrorHandler errorHandler;
/**
* The only abstract method in this class. Derived classes
* can parse any source of data and emit SAX2 events to the
* ContentHandler.
*/
public abstract void parse(InputSource input)
throws IOException,
SAXException;
public boolean getFeature(String name)
throws SAXNotRecognizedException,
SAXNotSupportedException {
Boolean featureValue = (Boolean) this.featureMap.get(name);
return (featureValue == null) ? false
: featureValue.booleanValue( );
}
public void setFeature(String name, boolean value)
throws SAXNotRecognizedException,
SAXNotSupportedException {
this.featureMap.put(name, new Boolean(value));
}
public Object getProperty(String name)
throws SAXNotRecognizedException,
SAXNotSupportedException {
return this.propertyMap.get(name);
}
public void setProperty(String name, Object value)
throws SAXNotRecognizedException,
SAXNotSupportedException {
this.propertyMap.put(name, value);
}
public void setEntityResolver(EntityResolver entityResolver) {
this.entityResolver = entityResolver;
}
public EntityResolver getEntityResolver( ) {
return this.entityResolver;
}
public void setDTDHandler(DTDHandler dtdHandler) {
this.dtdHandler = dtdHandler;
}
public DTDHandler getDTDHandler( ) {
return this.dtdHandler;
}
public void setContentHandler(ContentHandler contentHandler) {
this.contentHandler = contentHandler;
}
public ContentHandler getContentHandler( ) {
return this.contentHandler;
}
public void setErrorHandler(ErrorHandler errorHandler) {
this.errorHandler = errorHandler;
}
public ErrorHandler getErrorHandler( ) {
return this.errorHandler;
}
public void parse(String systemId) throws IOException,
SAXException {
parse(new InputSource(systemId));
}
}
Note: Color coded lines have been broken for display purposes.
Creating the subclass, CSVXMLReader, involves overriding
the parse( ) method and actually scanning through the
CSV file, emitting SAX events as elements in the file are encountered.
While the SAX portion is very easy, parsing the CSV file is a little
more challenging. To make this class as flexible as possible, it was
designed to parse through any CSV file that a spreadsheet such as
Microsoft Excel can export. For simple data, your CSV file might
look like this:
Burke,Eric,M
Burke,Jennifer,L
Burke,Aidan,G
The XML representation of this file is shown in
Example 5-6. The only real drawback here
is that CSV files are strictly positional, meaning that names are not
assigned to each column of data. This means that the XML output merely
contains a sequence of three <value> elements for
each line, so your stylesheet will have to select items based on
position.
Example 5-6:
Example XML output from CSV parser
<?xml version="1.0" encoding="UTF-8"?>
<csvFile>
<line>
<value>Burke</value>
<value>Eric</value>
<value>M</value>
</line>
<line>
<value>Burke</value>
<value>Jennifer</value>
<value>L</value>
</line>
<line>
<value>Burke</value>
<value>Aidan</value>
<value>G</value>
</line>
</csvFile>
One enhancement would be to design the CSV parser so it could
accept a list of meaningful column names as parameters, and these
could be used in the XML that is generated. Another option would
be to write an XSLT stylesheet that transformed this initial output
into another form of XML that used meaningful column names. To keep
the code example relatively manageable, these features were omitted
from this implementation. But there are some complexities to the CSV
file format that have to be considered. For example, fields that
contain commas must be surrounded with quotes:
"Consultant,Author,Teacher",Burke,Eric,M
Teacher,Burke,Jennifer,L
None,Burke,Aidan,G
To further complicate matters, fields may also contain quotes (").
In this case, they are doubled up, much in the same way you use double
backslash characters (\\) in Java to represent a single
backslash. In the following example, the first column contains a single
quote, so the entire field is quoted, and the single quote is doubled
up:
"test""quote"
This would be interpreted as:
test"quote,Teacher,Burke,Jennifer,L
The code in Example 5-7 shows the
complete implementation of the CSV parser.
New on the Java Boutique:
New Review:
Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling
API boasts simplicity, ease-of-integration, a well-rounded feature
set, and it's free!
New Applet:
Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA
sequences into three useful formats.
Elsewhere on internet.com:
WebDeveloper Java
Lots of Java information on webdeveloper.com
WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.
ScriptSearch Java
Hundreds of free Java code files to download.
jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.
|