advertisement
javaboutique
Search Tips
Articles  |   Tutorials  |   Reviews  |   Tools  |   by Category  |   by Date  |   by Name  |   Submit  |   Source  |   Forums  |  
javaboutique
Browse DevX


Partners & Affiliates











advertisement

Tutorials : Java and XML: putting SAX to work :

Contents
Why use XML?
Reading an XML file
Putting SAX to work
A complete event handler program for SAX
Sorting the data

Reading an XML file

In this article we'll look at how you, from a Java program, can read and process an XML file. Actually there are several ways of doing this, but a powerful API is SAX, the Simple API for XML. SAX is typically used if you are not interested in holding all data from the XML file in memory. You might be searching for specific elements, or you want to count the number of a certain kind of element. If it's more convenient for you to read and keep all the data from the file as a tree-structure in memory, then DOM, the Document Object Model API, could be a better choice.

SAX is an event-based parser, meaning that when it reads the XML file it'll call various methods in your program when certain events occur. These methods are called call-back methods, and there are 4 SAX interfaces that support them. In order to facilitate coding of your program--which acts as an event handler--you may extend a SAX convenience class called DefaultHandler. This class has default implementations of all the call-back methods, so you only need to overwrite the ones you're interested in. The most important events are when a SAX parser detects the

  • beginning of an element. The call-back method is "startElement"
  • end of an element. The call-back method is "endElement"
  • content of an element. The call-back method is "characters"

The following figure illustrates the "call-back" mechanism when a simple XML structure is read. The 8 events that occur and the methods called are shown:

So it works like this: first you call the SAX parser from your program, and then you wait… this is how event-based programming is: you leave the control to another party. Suddenly the "startElement" method is called--and SAX tells you that it found the "dvd" start-tag. Then you wait again… and now "startElement" is called once more--this time because the "title" start-tag has been found. Patient as we are we wait again, until "characters" is called. This time we are told that the string "The Matrix" has been read from the XML-file. We will of course have to save this text, because we're only informed once. Let's wait one more time… Now "endElement" is called, and SAX informs us that this was the end of the "title" tag. We can therefore safely save "The Matrix" as the element-value for "title".

What I'd like to emphasize is that all the data from the XML-file will be given to our program, but only once. It's your responsibility as the programmer to keep track of where we are in the XML-hierarchy, and to save the element values that you receive.

The syntax of the 3 call-back methods are:

void startElement(String uri, String localName, String qName, 
	                  Attributes attributes) 
  • uri and localName are used with namespaces, which we'll not cover in this article
  • qName is the name of the element found--i.e. "dvd", "title" or "length" in the example above
  • attributes are only used when an element has an attribute value. For example: <dvd category="horror">
void endElement(String uri, String localName, String qName)

The parameters are as above.

void characters(char[] ch, int start, int length)
  • ch holds the characters read
  • start is the starting position in the character array
  • length is the number of characters to use from the array

Note: The "characters" method may be called more than one time for a specific element. This means that you'll have to buffer the characters until you meet the end tag for the element. This is easily done using a StringBuffer "b":

b.append(ch, start, length);

As an example "The Matrix" could be delivered in three parts: "The", " ", and "Matrix".

The way you use the 3 call-back methods will usually be like this:

  • in startElement: empty the StringBuffer so you're ready to receive the characters inside a tag
  • in characters: collect the characters in your StringBuffer
  • in endElement: having the name of the tag and the characters inside the tag it's up to you to do whatever is needed

One thing that might come as a surprise is that newline characters and spaces in front of the tags are received in the "characters" method. As a consequence you should empty your StringBuffer every time "startElement" is called.

Having said this I'll have to admit that there are more than the 8 events in the example above. The newline characters and succeeding leading spaces actually triggers 3 more events belonging to the "dvd"-tag. You should simply ignore them.

How to Add Java Applets to Your Site

New on the Java Boutique:

New Review:

Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling API boasts simplicity, ease-of-integration, a well-rounded feature set, and it's free!

New Applet:

Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA sequences into three useful formats.

Elsewhere on internet.com:

WebDeveloper Java
Lots of Java information on webdeveloper.com

WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.

ScriptSearch Java
Hundreds of free Java code files to download.

jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.

 Microsoft Visual Studio 2010 Showcase
 Avaya Developer Showcase
 MSDN Spotlight
 PHP for Windows Showcase
XML error: undefined entity at line 39
advertisement
Receive Articles via our XML/RSS feed
Receive Articles via our XML/RSS feed

JavaBytes
Internet Cyclone
This powerful, easy-to-use, internet optimizer is for Windows 95, 98, ME, NT, 2000 and XP. It's designed to automatically optimize your Windows settings, boosting your Internet connection up to 200%.

Windows 7: From Beta to Final Code in One Year
Google Shows Off Chrome OS, Releases Source
Microsoft Shows Off Silverlight 4, IE9 Plans
Metasploit Expands Vulnerability Test Framework
HyperCard Reborn?
Fedora 12 Takes Aim at Linux Networking
Top Supercomputer Nearly Doubles in Speed
Fedora 12 Linux Tackles Virtualization
Apple Gives iPhone Developers App Status Tracker
Novell Sets OpenSUSE 11.2 Free

Creating Custom Export Filters for StarOffice with XSLT
WPF Wonders: Using DataTemplates
Crystal Reports Family Offers Options for Developers
Avaya Aura Session Manager video
Avaya Aura Overview video
Exploring HTML 5's Audio/Video Multimedia Support
Overriding Virtual Functions? Use C++0x Attributes to Avoid Bugs.
Understanding the Cloud Computing Security Vulnerabilities
Cisco and IBM Target a Greener World
Upgrade to Visual Studio 2010 with the Ultimate Offer

Advertising Info  |   Member Services  |   Contact Us  |   Help  |   Feedback  |   Site Map  |   Network Map  |   About

internet.commediabistro.comJusttechjobs.comGraphics.com

Search:

WebMediaBrands Corporate Info

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | Shopping | E-mail Offers | Freelance Jobs