How to read XML using SAX parser
In the previous article we talked about DOM parser and provided different examples for parsing and reading elements of an XML document. SAX parser is yet another XML parser provided by JDK which parses documents in a more optimized and faster way.
SAX parser doesn’t load the whole document into the memory, however it parses the document line by line and provides callback operations to the developer in order to handle each read tag separately.
If you’re interested in STAX or DOM parser, please refer to these tutorials: STAX parser, DOM parser.
1- Students.xml
Consider we have the following Students.xml file:
1 2 3 4 5 6 7 8 9 10 | <students> <student graduated="true"> <id>1</id> <name>Hussein</name> </student> <student> <id>2</id> <name>Alex</name> </student> </students> |
2- Student.java
For mapping purposes, we create Student.java for populating each student element inside Students.xml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | package com.programmer.gate; public class Student { private int id; private String name; private boolean isGraduated; public int getId() { return id; } public void setId(int id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public boolean isGraduated() { return isGraduated; } public void setGraduated(boolean isGraduated) { this.isGraduated = isGraduated; } } |
3- Define SAX handler
In this section, we’re going to parse students.xml and populate a List of Student objects out of it.
SAX parses documents using a handler. In order to define our own customized handler, we define a class called SAXHandler as the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | package com.programmer.gate; import java.util.ArrayList; import java.util.List; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class SAXHandler extends DefaultHandler { private List<Student> students = null; private Student student = null; private String elementValue; @Override public void startDocument() throws SAXException { students = new ArrayList<Student>(); } @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if (qName.equalsIgnoreCase("student")) { student = new Student(); if(attributes.getLength() > 0) { String graduated = attributes.getValue("graduated"); student.setGraduated(Boolean.valueOf(graduated)); } } } @Override public void endElement(String uri, String localName, String qName) throws SAXException { if (qName.equalsIgnoreCase("student")) { students.add(student); } if (qName.equalsIgnoreCase("id")) { student.setId(Integer.valueOf(elementValue)); } if (qName.equalsIgnoreCase("name")) { student.setName(elementValue); } } @Override public void characters(char[] ch, int start, int length) throws SAXException { elementValue = new String(ch, start, length); } public List<Student> getStudents() { return students; } } |
Following is a brief description for the above code snippet:
- startDocument(): This method is called when the parser starts parsing the document.
- endDocument(): This method is called when the parser ends parsing the document.
- startElement(): This method is called when the parser starts parsing a specific element inside the document.
- qName: refers to the element or tag name.
- attributes: refers to the attributes linked to the element.
- In the above example, we’re instantiating a new Student object whenever the parser starts parsing a ‘student’ element.
- endElement(): This method is called when the parser ends parsing a specific element inside the document.
- qName: refers to the element or tag name
- In the above example, we’re adding the already instantiated Student object to students list whenever we reach the end of student element. If the ending element is id or name, then we set the id and name of the current student object.
- characters(): This method reads the text value of the currently parsed element. We’re saving the text value in a class field called elementValue so that we access it inside endElement().
- getStudents(): This method exposes the populated list of Student objects so that caller classes can use it.
4- Parse students.xml
Now we create our main class named as ReadXMLWithSAX which parses students.xml using SAXParser.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | package com.programmer.gate; import java.util.List; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.SAXException; public class ReadXMLWithSAX { public static void main(String[] args) throws ParserConfigurationException, SAXException { try { SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); SAXHandler saxHandler = new SAXHandler(); saxParser.parse("students.xml", saxHandler); List<Student> students = saxHandler.getStudents(); for(Student student : students) { System.out.println("Student Id = " + student.getId()); System.out.println("Student Name = " + student.getName()); System.out.println("Is student graduated? " + student.isGraduated()); } } catch(Exception ex) { ex.printStackTrace(); } } } |
After running the above main method, we get the following output:
1 2 3 4 5 6 | Student Id = 1 Student Name = Hussein Is student graduated? true Student Id = 2 Student Name = Alex Is student graduated? false |
5- Source Code
You can download the source code from this repository: Read-XML