Parsing with XMLStreamReader
Now it's time to see how this works by parsing the document in Listing 1. using the XMLStreamReader API. The XMLStreamReader parses XML documents using a cursor. Its interface contains a next() method that parses the next parse event. The getEventType() method returns the event type. The code snippets that follow are taken from the XMLParser.java application, shown in Listing 3.
In the XMLParser.java application, first, you import the StAX classes:
import javax.xml.stream.*;
import javax.xml.stream.events.*;
import javax.xml.stream.XMLInputFactory;
Next, create an XMLInputFactory, through which you will obtain an XMLStreamReader:
XMLInputFactory inputFactory=XMLInputFactory.newInstance();
Now, you need to create an InputStream, which is a representation of the file that will be parsed as an input stream. Also create an XMLStreamReader from the XMLInputFactory object created earlier.
InputStream input=new FileInputStream(new File("C:/STAX/catalog.xml"));
XMLStreamReader xmlStreamReader =inputFactory.createXMLStreamReader(input);
The hasNext() method returns true if more parsing events are available.
Next, obtain the next parsing event using next() method:
int event=xmlStreamReader.next();
The advantage of StAX parsing over SAX parsing is that a parse event may be skipped by invoking the next() method as shown in the following code. For example, if the parse event is of type ENTITY_DECLARATION, a developer may determine if the event information is to be obtained or the next event is to be retrieved:
If(event.getEventType()==XMLStreamConstants.ENTITY_DECLARATION){
int event=xmlStreamReader.next();
}
Parsing may also be suspended by not invoking the next() method.
next() method returns int, which represents a parsing event and is specified using an XMLStreamConstants constant.
The different event types returned by the XMLStreamReader are listed in Table 1.
|
Event
Type
|
Description
|
|
START_DOCUMENT
|
Start
of a document
|
|
START_ELEMENT
|
Start
of an element
|
|
ATTRIBUTE
|
An
element attribute
|
|
NAMESPACE
|
A
namespace declaration
|
|
CHARACTERS
|
Characters
may be text or a white space
|
|
COMMENT
|
A
comment
|
|
SPACE
|
Ignorable
white space.
|
|
PROCESSING_INSTRUCTION
|
Processing
instruction.
|
|
DTD
|
A DTD
|
|
ENTITY_REFERENCE
|
A
entity reference
|
|
CDATA
|
Cdata
section
|
|
END_ELEMENT
|
End
element
|
|
END_DOCUMENT
|
End
document
|
|
ENTITY_DECLARATION
|
An
entity declaration.
|
|
NOTATION_DECLARATION
|
A notation declaration
|
Table 1. XMLStreamReader Events
These different parsing events enable you to obtain data and metadata in the XML document. If the parsing event type is START_DOCUMENT, you would use the getEncoding() method to obtain the specified encoding from the XML document, whereas you'd use the getVersion() method to return the XML version of the XML document.
Likewise, if you're working with a START_ELEMENT event type, you'd use the getPrefix() method to returns the element prefix and getNamespaceURI to return the element prefix namespace or the default namespace. To obtain the local name of the element, you'd use the getLocalName() method and for the number of attributes, the getAttributesCount() method. You'd get the attribute prefix for a specified attribute index i with the getAttributePrefix(i) method and the attribute namespace using getAttributeNamespace(i) method. Obtain the attribute local name using getAttributeLocalName(i) method, the attribute value using getAttributeValue(i) method. If the event types are CHARACTERS or COMMENT obtain the text using getText() method.
Listing 4 shows the output from parsing the example XML document, catalog.xml.
Listing 3 shows the Java application used to parse the XML document. You can run the app as a command line application or in an IDE such as Eclipse. Remember: if you run XMLParser.java without first running the XMLWriter.java application (shown in Listing 2), you'll need to copy the catalog.xml (shown in Listing 1) to the C:/StAX directory.
New on the Java Boutique:
New Review:
Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling
API boasts simplicity, ease-of-integration, a well-rounded feature
set, and it's free!
New Applet:
Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA
sequences into three useful formats.
Elsewhere on internet.com:
WebDeveloper Java
Lots of Java information on webdeveloper.com
WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.
ScriptSearch Java
Hundreds of free Java code files to download.
jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.