首页 > 代码库 > XML文件解析之SAX解析

XML文件解析之SAX解析

使用DOM解析的时候是需要把文档的所有内容读入内存然后建立一个DOM树结构,然后通过DOM提供的接口来实现XML文件的解析,如果文件比较小的时候肯定是很方便的。但是如果是XML文件很大的话,那么这种方式的解析效率肯定会大打折扣的,所以SAX解析就很有必要的了。SAX采用基于事件驱动的处理方式,它将XML文档转换成一系列的事件,由单独的事件处理器来决定如何处理。在读入文档的过程中便实现了解析过程,现在就简单介绍下SAX解析的具体实现过程。

1.主要对象

SAXParserFactory:解析工厂

SAXParser:解析器,通过解析工厂获取

ContentHander、DTDHander、ErrorHandler,EntityResolver:事件处理器接口

DefaultHandler:继承了上面的四个事件接口,在实际开发中直接从DefaultHandler继承并实现相关函数就可以了

2.XML文档

和上次DOM解析的XML文件是一样的

<?xml version="1.0" encoding="UTF-8"?><world>    <comuntry id="1">        <name>China</name>        <capital>Beijing</capital>        <population>1234</population>        <area>960</area>    </comuntry>    <comuntry id="2">        <name id="">America</name>        <capital>Washington</capital>        <population>234</population>        <area>900</area>    </comuntry>    <comuntry id="3">        <name >Japan</name>        <capital>Tokyo</capital>        <population>234</population>        <area>60</area>    </comuntry>    <comuntry id="4">        <name >Russia</name>        <capital>Moscow</capital>        <population>34</population>        <area>1960</area>    </comuntry></world>

3.主要接口分析

EntityResolver :

package org.xml.sax;import java.io.IOException;public interface EntityResolver {    /**     * Allow the application to resolve external entities.     *     * <p>The parser will call this method before opening any external     * entity except the top-level document entity.  Such entities include     * the external DTD subset and external parameter entities referenced     * within the DTD (in either case, only if the parser reads external     * parameter entities), and external general entities referenced     * within the document element (if the parser reads external general     * entities).  The application may request that the parser locate     * the entity itself, that it use an alternative URI, or that it     * use data provided by the application (as a character or byte     * input stream).</p>     *     * <p>Application writers can use this method to redirect external     * system identifiers to secure and/or local URIs, to look up     * public identifiers in a catalogue, or to read an entity from a     * database or other input source (including, for example, a dialog     * box).  Neither XML nor SAX specifies a preferred policy for using     * public or system IDs to resolve resources.  However, SAX specifies     * how to interpret any InputSource returned by this method, and that     * if none is returned, then the system ID will be dereferenced as     * a URL.  </p>     *     * <p>If the system identifier is a URL, the SAX parser must     * resolve it fully before reporting it to the application.</p>     *     * @param publicId The public identifier of the external entity     *        being referenced, or null if none was supplied.     * @param systemId The system identifier of the external entity     *        being referenced.     * @return An InputSource object describing the new input source,     *         or null to request that the parser open a regular     *         URI connection to the system identifier.     * @exception org.xml.sax.SAXException Any SAX exception, possibly     *            wrapping another exception.     * @exception java.io.IOException A Java-specific IO exception,     *            possibly the result of creating a new InputStream     *            or Reader for the InputSource.     * @see org.xml.sax.InputSource     */    public abstract InputSource resolveEntity (String publicId,                                               String systemId)        throws SAXException, IOException;}

DTDHandler :

package org.xml.sax;/** * Receive notification of basic DTD-related events. * * <blockquote> * <em>This module, both source code and documentation, is in the * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em> * See <a href=http://www.mamicode.com/‘http://www.saxproject.org‘>http://www.saxproject.org</a> * for further information. * </blockquote> * * <p>If a SAX application needs information about notations and * unparsed entities, then the application implements this * interface and registers an instance with the SAX parser using * the parser‘s setDTDHandler method.  The parser uses the * instance to report notation and unparsed entity declarations to * the application.</p> * * <p>Note that this interface includes only those DTD events that * the XML recommendation <em>requires</em> processors to report: * notation and unparsed entity declarations.</p> * * <p>The SAX parser may report these events in any order, regardless * of the order in which the notations and unparsed entities were * declared; however, all DTD events must be reported after the * document handler‘s startDocument event, and before the first * startElement event. * (If the {@link org.xml.sax.ext.LexicalHandler LexicalHandler} is * used, these events must also be reported before the endDTD event.) * </p> * * <p>It is up to the application to store the information for * future use (perhaps in a hash table or object tree). * If the application encounters attributes of type "NOTATION", * "ENTITY", or "ENTITIES", it can use the information that it * obtained through this interface to find the entity and/or * notation corresponding with the attribute value.</p> * * @since SAX 1.0 * @author David Megginson * @see org.xml.sax.XMLReader#setDTDHandler */public interface DTDHandler {    /**     * Receive notification of a notation declaration event.     *     * <p>It is up to the application to record the notation for later     * reference, if necessary;     * notations may appear as attribute values and in unparsed entity     * declarations, and are sometime used with processing instruction     * target names.</p>     *     * <p>At least one of publicId and systemId must be non-null.     * If a system identifier is present, and it is a URL, the SAX     * parser must resolve it fully before passing it to the     * application through this event.</p>     *     * <p>There is no guarantee that the notation declaration will be     * reported before any unparsed entities that use it.</p>     *     * @param name The notation name.     * @param publicId The notation‘s public identifier, or null if     *        none was given.     * @param systemId The notation‘s system identifier, or null if     *        none was given.     * @exception org.xml.sax.SAXException Any SAX exception, possibly     *            wrapping another exception.     * @see #unparsedEntityDecl     * @see org.xml.sax.Attributes     */    public abstract void notationDecl (String name,                                       String publicId,                                       String systemId)        throws SAXException;    /**     * Receive notification of an unparsed entity declaration event.     *     * <p>Note that the notation name corresponds to a notation     * reported by the {@link #notationDecl notationDecl} event.     * It is up to the application to record the entity for later     * reference, if necessary;     * unparsed entities may appear as attribute values.     * </p>     *     * <p>If the system identifier is a URL, the parser must resolve it     * fully before passing it to the application.</p>     *     * @exception org.xml.sax.SAXException Any SAX exception, possibly     *            wrapping another exception.     * @param name The unparsed entity‘s name.     * @param publicId The entity‘s public identifier, or null if none     *        was given.     * @param systemId The entity‘s system identifier.     * @param notationName The name of the associated notation.     * @see #notationDecl     * @see org.xml.sax.Attributes     */    public abstract void unparsedEntityDecl (String name,                                             String publicId,                                             String systemId,                                             String notationName)        throws SAXException;}

ContentHandler:

package org.xml.sax;/** * Receive notification of the logical content of a document. * * <blockquote> * <em>This module, both source code and documentation, is in the * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em> * See <a href=http://www.mamicode.com/‘http://www.saxproject.org‘>http://www.saxproject.org</a> * for further information. * </blockquote> * * <p>This is the main interface that most SAX applications * implement: if the application needs to be informed of basic parsing * events, it implements this interface and registers an instance with * the SAX parser using the {@link org.xml.sax.XMLReader#setContentHandler * setContentHandler} method.  The parser uses the instance to report * basic document-related events like the start and end of elements * and character data.</p> * * <p>The order of events in this interface is very important, and * mirrors the order of information in the document itself.  For * example, all of an element‘s content (character data, processing * instructions, and/or subelements) will appear, in order, between * the startElement event and the corresponding endElement event.</p> * * <p>This interface is similar to the now-deprecated SAX 1.0 * DocumentHandler interface, but it adds support for Namespaces * and for reporting skipped entities (in non-validating XML * processors).</p> * * <p>Implementors should note that there is also a * <code>ContentHandler</code> class in the <code>java.net</code> * package; that means that it‘s probably a bad idea to do</p> * * <pre>import java.net.*; * import org.xml.sax.*; * </pre> * * <p>In fact, "import ...*" is usually a sign of sloppy programming * anyway, so the user should consider this a feature rather than a * bug.</p> * * @since SAX 2.0 * @author David Megginson * @see org.xml.sax.XMLReader * @see org.xml.sax.DTDHandler * @see org.xml.sax.ErrorHandler */public interface ContentHandler{    /**     * Receive an object for locating the origin of SAX document events.     *     * <p>SAX parsers are strongly encouraged (though not absolutely     * required) to supply a locator: if it does so, it must supply     * the locator to the application by invoking this method before     * invoking any of the other methods in the ContentHandler     * interface.</p>     *     * <p>The locator allows the application to determine the end     * position of any document-related event, even if the parser is     * not reporting an error.  Typically, the application will     * use this information for reporting its own errors (such as     * character content that does not match an application‘s     * business rules).  The information returned by the locator     * is probably not sufficient for use with a search engine.</p>     *     * <p>Note that the locator will return correct information only     * during the invocation SAX event callbacks after     * {@link #startDocument startDocument} returns and before     * {@link #endDocument endDocument} is called.  The     * application should not attempt to use it at any other time.</p>     *     * @param locator an object that can return the location of     *                any SAX document event     * @see org.xml.sax.Locator     */    public void setDocumentLocator (Locator locator);    /**     * Receive notification of the beginning of a document.     *     * <p>The SAX parser will invoke this method only once, before any     * other event callbacks (except for {@link #setDocumentLocator     * setDocumentLocator}).</p>     *     * @throws org.xml.sax.SAXException any SAX exception, possibly     *            wrapping another exception     * @see #endDocument     */    public void startDocument ()        throws SAXException;    /**     * Receive notification of the end of a document.     *     * <p><strong>There is an apparent contradiction between the     * documentation for this method and the documentation for {@link     * org.xml.sax.ErrorHandler#fatalError}.  Until this ambiguity is     * resolved in a future major release, clients should make no     * assumptions about whether endDocument() will or will not be     * invoked when the parser has reported a fatalError() or thrown     * an exception.</strong></p>     *     * <p>The SAX parser will invoke this method only once, and it will     * be the last method invoked during the parse.  The parser shall     * not invoke this method until it has either abandoned parsing     * (because of an unrecoverable error) or reached the end of     * input.</p>     *     * @throws org.xml.sax.SAXException any SAX exception, possibly     *            wrapping another exception     * @see #startDocument     */    public void endDocument()        throws SAXException;    /**     * Begin the scope of a prefix-URI Namespace mapping.     *     * <p>The information from this event is not necessary for     * normal Namespace processing: the SAX XML reader will     * automatically replace prefixes for element and attribute     * names when the <code>http://xml.org/sax/features/namespaces</code>     * feature is <var>true</var> (the default).</p>     *     * <p>There are cases, however, when applications need to     * use prefixes in character data or in attribute values,     * where they cannot safely be expanded automatically; the     * start/endPrefixMapping event supplies the information     * to the application to expand prefixes in those contexts     * itself, if necessary.</p>     *     * <p>Note that start/endPrefixMapping events are not     * guaranteed to be properly nested relative to each other:     * all startPrefixMapping events will occur immediately before the     * corresponding {@link #startElement startElement} event,     * and all {@link #endPrefixMapping endPrefixMapping}     * events will occur immediately after the corresponding     * {@link #endElement endElement} event,     * but their order is not otherwise     * guaranteed.</p>     *     * <p>There should never be start/endPrefixMapping events for the     * "xml" prefix, since it is predeclared and immutable.</p>     *     * @param prefix the Namespace prefix being declared.     *  An empty string is used for the default element namespace,     *  which has no prefix.     * @param uri the Namespace URI the prefix is mapped to     * @throws org.xml.sax.SAXException the client may throw     *            an exception during processing     * @see #endPrefixMapping     * @see #startElement     */    public void startPrefixMapping (String prefix, String uri)        throws SAXException;    /**     * End the scope of a prefix-URI mapping.     *     * <p>See {@link #startPrefixMapping startPrefixMapping} for     * details.  These events will always occur immediately after the     * corresponding {@link #endElement endElement} event, but the order of     * {@link #endPrefixMapping endPrefixMapping} events is not otherwise     * guaranteed.</p>     *     * @param prefix the prefix that was being mapped.     *  This is the empty string when a default mapping scope ends.     * @throws org.xml.sax.SAXException the client may throw     *            an exception during processing     * @see #startPrefixMapping     * @see #endElement     */    public void endPrefixMapping (String prefix)        throws SAXException;    /**     * Receive notification of the beginning of an element.     *     * <p>The Parser will invoke this method at the beginning of every     * element in the XML document; there will be a corresponding     * {@link #endElement endElement} event for every startElement event     * (even when the element is empty). All of the element‘s content will be     * reported, in order, before the corresponding endElement     * event.</p>     *     * <p>This event allows up to three name components for each     * element:</p>     *     * <ol>     * <li>the Namespace URI;</li>     * <li>the local name; and</li>     * <li>the qualified (prefixed) name.</li>     * </ol>     *     * <p>Any or all of these may be provided, depending on the     * values of the <var>http://xml.org/sax/features/namespaces</var>     * and the <var>http://xml.org/sax/features/namespace-prefixes</var>     * properties:</p>     *     * <ul>     * <li>the Namespace URI and local name are required when     * the namespaces property is <var>true</var> (the default), and are     * optional when the namespaces property is <var>false</var> (if one is     * specified, both must be);</li>     * <li>the qualified name is required when the namespace-prefixes property     * is <var>true</var>, and is optional when the namespace-prefixes property     * is <var>false</var> (the default).</li>     * </ul>     *     * <p>Note that the attribute list provided will contain only     * attributes with explicit values (specified or defaulted):     * #IMPLIED attributes will be omitted.  The attribute list     * will contain attributes used for Namespace declarations     * (xmlns* attributes) only if the     * <code>http://xml.org/sax/features/namespace-prefixes</code>     * property is true (it is false by default, and support for a     * true value is optional).</p>     *     * <p>Like {@link #characters characters()}, attribute values may have     * characters that need more than one <code>char</code> value.  </p>     *     * @param uri the Namespace URI, or the empty string if the     *        element has no Namespace URI or if Namespace     *        processing is not being performed     * @param localName the local name (without prefix), or the     *        empty string if Namespace processing is not being     *        performed     * @param qName the qualified name (with prefix), or the     *        empty string if qualified names are not available     * @param atts the attributes attached to the element.  If     *        there are no attributes, it shall be an empty     *        Attributes object.  The value of this object after     *        startElement returns is undefined     * @throws org.xml.sax.SAXException any SAX exception, possibly     *            wrapping another exception     * @see #endElement     * @see org.xml.sax.Attributes     * @see org.xml.sax.helpers.AttributesImpl     */    public void startElement (String uri, String localName,                              String qName, Attributes atts)        throws SAXException;    /**     * Receive notification of the end of an element.     *     * <p>The SAX parser will invoke this method at the end of every     * element in the XML document; there will be a corresponding     * {@link #startElement startElement} event for every endElement     * event (even when the element is empty).</p>     *     * <p>For information on the names, see startElement.</p>     *     * @param uri the Namespace URI, or the empty string if the     *        element has no Namespace URI or if Namespace     *        processing is not being performed     * @param localName the local name (without prefix), or the     *        empty string if Namespace processing is not being     *        performed     * @param qName the qualified XML name (with prefix), or the     *        empty string if qualified names are not available     * @throws org.xml.sax.SAXException any SAX exception, possibly     *            wrapping another exception     */    public void endElement (String uri, String localName,                            String qName)        throws SAXException;    /**     * Receive notification of character data.     *     * <p>The Parser will call this method to report each chunk of     * character data.  SAX parsers may return all contiguous character     * data in a single chunk, or they may split it into several     * chunks; however, all of the characters in any single event     * must come from the same external entity so that the Locator     * provides useful information.</p>     *     * <p>The application must not attempt to read from the array     * outside of the specified range.</p>     *     * <p>Individual characters may consist of more than one Java     * <code>char</code> value.  There are two important cases where this     * happens, because characters can‘t be represented in just sixteen bits.     * In one case, characters are represented in a <em>Surrogate Pair</em>,     * using two special Unicode values. Such characters are in the so-called     * "Astral Planes", with a code point above U+FFFF.  A second case involves     * composite characters, such as a base character combining with one or     * more accent characters. </p>     *     * <p> Your code should not assume that algorithms using     * <code>char</code>-at-a-time idioms will be working in character     * units; in some cases they will split characters.  This is relevant     * wherever XML permits arbitrary characters, such as attribute values,     * processing instruction data, and comments as well as in data reported     * from this method.  It‘s also generally relevant whenever Java code     * manipulates internationalized text; the issue isn‘t unique to XML.</p>     *     * <p>Note that some parsers will report whitespace in element     * content using the {@link #ignorableWhitespace ignorableWhitespace}     * method rather than this one (validating parsers <em>must</em>     * do so).</p>     *     * @param ch the characters from the XML document     * @param start the start position in the array     * @param length the number of characters to read from the array     * @throws org.xml.sax.SAXException any SAX exception, possibly     *            wrapping another exception     * @see #ignorableWhitespace     * @see org.xml.sax.Locator     */    public void characters (char ch[], int start, int length)        throws SAXException;    /**     * Receive notification of ignorable whitespace in element content.     *     * <p>Validating Parsers must use this method to report each chunk     * of whitespace in element content (see the W3C XML 1.0     * recommendation, section 2.10): non-validating parsers may also     * use this method if they are capable of parsing and using     * content models.</p>     *     * <p>SAX parsers may return all contiguous whitespace in a single     * chunk, or they may split it into several chunks; however, all of     * the characters in any single event must come from the same     * external entity, so that the Locator provides useful     * information.</p>     *     * <p>The application must not attempt to read from the array     * outside of the specified range.</p>     *     * @param ch the characters from the XML document     * @param start the start position in the array     * @param length the number of characters to read from the array     * @throws org.xml.sax.SAXException any SAX exception, possibly     *            wrapping another exception     * @see #characters     */    public void ignorableWhitespace (char ch[], int start, int length)        throws SAXException;    /**     * Receive notification of a processing instruction.     *     * <p>The Parser will invoke this method once for each processing     * instruction found: note that processing instructions may occur     * before or after the main document element.</p>     *     * <p>A SAX parser must never report an XML declaration (XML 1.0,     * section 2.8) or a text declaration (XML 1.0, section 4.3.1)     * using this method.</p>     *     * <p>Like {@link #characters characters()}, processing instruction     * data may have characters that need more than one <code>char</code>     * value. </p>     *     * @param target the processing instruction target     * @param data the processing instruction data, or null if     *        none was supplied.  The data does not include any     *        whitespace separating it from the target     * @throws org.xml.sax.SAXException any SAX exception, possibly     *            wrapping another exception     */    public void processingInstruction (String target, String data)        throws SAXException;    /**     * Receive notification of a skipped entity.     * This is not called for entity references within markup constructs     * such as element start tags or markup declarations.  (The XML     * recommendation requires reporting skipped external entities.     * SAX also reports internal entity expansion/non-expansion, except     * within markup constructs.)     *     * <p>The Parser will invoke this method each time the entity is     * skipped.  Non-validating processors may skip entities if they     * have not seen the declarations (because, for example, the     * entity was declared in an external DTD subset).  All processors     * may skip external entities, depending on the values of the     * <code>http://xml.org/sax/features/external-general-entities</code>     * and the     * <code>http://xml.org/sax/features/external-parameter-entities</code>     * properties.</p>     *     * @param name the name of the skipped entity.  If it is a     *        parameter entity, the name will begin with ‘%‘, and if     *        it is the external DTD subset, it will be the string     *        "[dtd]"     * @throws org.xml.sax.SAXException any SAX exception, possibly     *            wrapping another exception     */    public void skippedEntity (String name)        throws SAXException;}

ErrorHandler:

package org.xml.sax;/** * Basic interface for SAX error handlers. * * <blockquote> * <em>This module, both source code and documentation, is in the * Public Domain, and comes with <strong>NO WARRANTY</strong>.</em> * See <a href=http://www.mamicode.com/‘http://www.saxproject.org‘>http://www.saxproject.org</a> * for further information. * </blockquote> * * <p>If a SAX application needs to implement customized error * handling, it must implement this interface and then register an * instance with the XML reader using the * {@link org.xml.sax.XMLReader#setErrorHandler setErrorHandler} * method.  The parser will then report all errors and warnings * through this interface.</p> * * <p><strong>WARNING:</strong> If an application does <em>not</em> * register an ErrorHandler, XML parsing errors will go unreported, * except that <em>SAXParseException</em>s will be thrown for fatal errors. * In order to detect validity errors, an ErrorHandler that does something * with {@link #error error()} calls must be registered.</p> * * <p>For XML processing errors, a SAX driver must use this interface * in preference to throwing an exception: it is up to the application * to decide whether to throw an exception for different types of * errors and warnings.  Note, however, that there is no requirement that * the parser continue to report additional errors after a call to * {@link #fatalError fatalError}.  In other words, a SAX driver class * may throw an exception after reporting any fatalError. * Also parsers may throw appropriate exceptions for non-XML errors. * For example, {@link XMLReader#parse XMLReader.parse()} would throw * an IOException for errors accessing entities or the document.</p> * * @since SAX 1.0 * @author David Megginson * @see org.xml.sax.XMLReader#setErrorHandler * @see org.xml.sax.SAXParseException */public interface ErrorHandler {    /**     * Receive notification of a warning.     *     * <p>SAX parsers will use this method to report conditions that     * are not errors or fatal errors as defined by the XML     * recommendation.  The default behaviour is to take no     * action.</p>     *     * <p>The SAX parser must continue to provide normal parsing events     * after invoking this method: it should still be possible for the     * application to process the document through to the end.</p>     *     * <p>Filters may use this method to report other, non-XML warnings     * as well.</p>     *     * @param exception The warning information encapsulated in a     *                  SAX parse exception.     * @exception org.xml.sax.SAXException Any SAX exception, possibly     *            wrapping another exception.     * @see org.xml.sax.SAXParseException     */    public abstract void warning (SAXParseException exception)        throws SAXException;    /**     * Receive notification of a recoverable error.     *     * <p>This corresponds to the definition of "error" in section 1.2     * of the W3C XML 1.0 Recommendation.  For example, a validating     * parser would use this callback to report the violation of a     * validity constraint.  The default behaviour is to take no     * action.</p>     *     * <p>The SAX parser must continue to provide normal parsing     * events after invoking this method: it should still be possible     * for the application to process the document through to the end.     * If the application cannot do so, then the parser should report     * a fatal error even if the XML recommendation does not require     * it to do so.</p>     *     * <p>Filters may use this method to report other, non-XML errors     * as well.</p>     *     * @param exception The error information encapsulated in a     *                  SAX parse exception.     * @exception org.xml.sax.SAXException Any SAX exception, possibly     *            wrapping another exception.     * @see org.xml.sax.SAXParseException     */    public abstract void error (SAXParseException exception)        throws SAXException;    /**     * Receive notification of a non-recoverable error.     *     * <p><strong>There is an apparent contradiction between the     * documentation for this method and the documentation for {@link     * org.xml.sax.ContentHandler#endDocument}.  Until this ambiguity     * is resolved in a future major release, clients should make no     * assumptions about whether endDocument() will or will not be     * invoked when the parser has reported a fatalError() or thrown     * an exception.</strong></p>     *     * <p>This corresponds to the definition of "fatal error" in     * section 1.2 of the W3C XML 1.0 Recommendation.  For example, a     * parser would use this callback to report the violation of a     * well-formedness constraint.</p>     *     * <p>The application must assume that the document is unusable     * after the parser has invoked this method, and should continue     * (if at all) only for the sake of collecting additional error     * messages: in fact, SAX parsers are free to stop reporting any     * other events once this method has been invoked.</p>     *     * @param exception The error information encapsulated in a     *                  SAX parse exception.     * @exception org.xml.sax.SAXException Any SAX exception, possibly     *            wrapping another exception.     * @see org.xml.sax.SAXParseException     */    public abstract void fatalError (SAXParseException exception)        throws SAXException;}

上面是四个基本处理事件的接口源码,通过阅读代码就可以知道每个事件需要完成的事情。

4.SAX解析具体实现过程,主要包括两个过程一个是解析规则的定义还有就是文件的读取

 

事件处理MyHandler.java

import java.io.IOException;import org.xml.sax.Attributes;import org.xml.sax.InputSource;import org.xml.sax.Locator;import org.xml.sax.SAXException;import org.xml.sax.SAXParseException;import org.xml.sax.helpers.DefaultHandler;public class MyHandler extends DefaultHandler {    /**      * 开始前缀 URI 名称空间范围映射。      * 此事件的信息对于常规的命名空间处理并非必需:      * 当 http://xml.org/sax/features/namespaces 功能为 true(默认)时,      * SAX XML 读取器将自动替换元素和属性名称的前缀。      * 参数意义如下:      *    prefix :前缀      *    uri :命名空间      */      @Override    public void startPrefixMapping(String prefix, String uri)            throws SAXException {        // TODO Auto-generated method stub         System.out.println("(startPrefixMapping)start prefix_mapping : xmlns:"+prefix+" = "                      +"\""+uri+"\"");      }        /**      * 结束前缀 URI 范围的映射。      * @param prefix  前缀     */     @Override    public void endPrefixMapping(String prefix) throws SAXException {        // TODO Auto-generated method stub        System.out.println("(endPrefixMapping)end prefix_mapping : "+prefix);     }        /**     * 文档结束     */    @Override    public void endDocument() throws SAXException {        // TODO Auto-generated method stub        System.out.println("(endDocument)doument is ended");     }        /**      * 接收文档的结尾的通知。      * 参数意义如下:      *    uri :元素的命名空间      *    localName :元素的本地名称(不带前缀)      *    qName :元素的限定名(带前缀)      */     @Override    public void endElement(String uri, String localName, String qName)            throws SAXException {        // TODO Auto-generated method stub        System.out.println("(endElement)end element : "+qName+"("+uri+")");      }        /**      * 接收元素内容中可忽略的空白的通知。      * 参数意义如下:      *     ch : 来自 XML 文档的字符      *     start : 数组中的开始位置      *     length : 从数组中读取的字符的个数      */    @Override    public void ignorableWhitespace(char[] ch, int start, int length)            throws SAXException {        // TODO Auto-generated method stub        StringBuffer buffer = new StringBuffer();          for(int i = start ; i < start+length ; i++){              switch(ch[i]){                  case ‘\\‘:buffer.append("\\\\");break;                  case ‘\r‘:buffer.append("\\r");break;                  case ‘\n‘:buffer.append("\\n");break;                  case ‘\t‘:buffer.append("\\t");break;                  case ‘\"‘:buffer.append("\\\"");break;                  default : buffer.append(ch[i]);               }          }          System.out.println("(ignorableWhitespace)ignorable whitespace("+length+"): "+buffer.toString());      }        /**      * 接收用来查找 SAX 文档事件起源的对象。      * 参数意义如下:      *     locator : 可以返回任何 SAX 文档事件位置的对象      */      @Override    public void setDocumentLocator(Locator locator) {        // TODO Auto-generated method stub        System.out.println("(setDocumentLocator)set document_locator : (lineNumber = "+locator.getLineNumber()                  +",columnNumber = "+locator.getColumnNumber()                  +",systemId = "+locator.getSystemId()                  +",publicId = "+locator.getPublicId()+")");      }        /**     * 接收文档的开始的通知。      */     @Override    public void startDocument() throws SAXException {        // TODO Auto-generated method stub        System.out.println("(startDocument)document is startting");     }        /**      * 接收元素开始的通知。      * 参数意义如下:      *    uri :元素的命名空间      *    localName :元素的本地名称(不带前缀)      *    qName :元素的限定名(带前缀)      *    atts :元素的属性集合      */      @Override    public void startElement(String uri, String localName, String qName,            Attributes attributes) throws SAXException {        // TODO Auto-generated method stub         System.out.println("(startElement)start element : "+qName+"("+uri+")");     }        /**      * 接收注释声明事件的通知。      * 参数意义如下:      *     name - 注释名称。      *     publicId - 注释的公共标识符,如果未提供,则为 null。      *     systemId - 注释的系统标识符,如果未提供,则为 null。      */     @Override    public void notationDecl(String name, String publicId, String systemId)            throws SAXException {        // TODO Auto-generated method stub        System.out.println("(notationDecl)notation declare : (name = "+name                  +",systemId = "+publicId                  +",publicId = "+systemId+")");      }        /**      * 允许应用程序解析外部实体。      * 解析器将在打开任何外部实体(顶级文档实体除外)前调用此方法      * 参数意义如下:      *     publicId : 被引用的外部实体的公共标识符,如果未提供,则为 null。      *     systemId : 被引用的外部实体的系统标识符。      * 返回:      *     一个描述新输入源的 InputSource 对象,或者返回 null,      *     以请求解析器打开到系统标识符的常规 URI 连接。      */      @Override    public InputSource resolveEntity(String publicId, String systemId)            throws IOException, SAXException {        // TODO Auto-generated method stub        return super.resolveEntity(publicId, systemId);    }        /**      * 接收跳过的实体的通知。      * 参数意义如下:       * name : 所跳过的实体的名称。如果它是参数实体,则名称将以 ‘%‘ 开头,      *            如果它是外部 DTD 子集,则将是字符串 "[dtd]"      */      @Override    public void skippedEntity(String name) throws SAXException {        // TODO Auto-generated method stub        System.out.println("(skippedEntity)the name of the skipped entity : "+name);     }        /**      * 接收未解析的实体声明事件的通知。      * 参数意义如下:      *     name - 未解析的实体的名称。      *     publicId - 实体的公共标识符,如果未提供,则为 null。      *     systemId - 实体的系统标识符。      *     notationName - 相关注释的名称。      */     @Override    public void unparsedEntityDecl(String name, String publicId,            String systemId, String notationName) throws SAXException {        // TODO Auto-generated method stub          System.out.println("(unparsedEntityDecl)unparsed entity declare : (name = "+name                      +",systemId = "+publicId                      +",publicId = "+systemId                      +",notationName = "+notationName+")");      }        /**      * 接收处理指令的通知。      * 参数意义如下:      *     target : 处理指令目标      *     data : 处理指令数据,如果未提供,则为 null。      */      @Override    public void processingInstruction(String target, String data)            throws SAXException {        // TODO Auto-generated method stub         System.out.println("(processingInstruction)process instruction : (target = \""                      +target+"\",data = http://www.mamicode.com/""+data+"/")");    }        /**      * 接收字符数据的通知。      * 在DOM中 ch[begin:end] 相当于Text节点的节点值(nodeValue)      */    @Override    public void characters(char[] ch, int start, int length)            throws SAXException {        // TODO Auto-generated method stub          StringBuffer buffer = new StringBuffer();              for(int i = start ; i < start+length ; i++){                  switch(ch[i]){                      case ‘\\‘:buffer.append("\\\\");break;                      case ‘\r‘:buffer.append("\\r");break;                      case ‘\n‘:buffer.append("\\n");break;                      case ‘\t‘:buffer.append("\\t");break;                      case ‘\"‘:buffer.append("\\\"");break;                      default : buffer.append(ch[i]);                   }              }              System.out.println("(characters)characters("+length+"): "+buffer.toString());     }    /**     * 错误异常处理 可恢复     */    @Override    public void error(SAXParseException e) throws SAXException {        // TODO Auto-generated method stub         System.err.println("(error)Error ("+e.getLineNumber()+","                      +e.getColumnNumber()+") : "+e.getMessage());    }        /**     * 致命性错误处理 不可恢复     */    @Override    public void fatalError(SAXParseException e) throws SAXException {        // TODO Auto-generated method stub         System.err.println("(fatalError)FatalError ("+e.getLineNumber()+","                      +e.getColumnNumber()+") : "+e.getMessage());    }        /**     * 警告处理     */    @Override    public void warning(SAXParseException e) throws SAXException {        // TODO Auto-generated method stub         System.err.println("(warning)("+e.getLineNumber()+","                      +e.getColumnNumber()+") : "+e.getMessage());    }}

 

解析开始:

SAXParse.java

import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.IOException;import javax.xml.parsers.ParserConfigurationException;import javax.xml.parsers.SAXParser;import javax.xml.parsers.SAXParserFactory;import org.xml.sax.InputSource;import org.xml.sax.SAXException;import org.xml.sax.XMLReader;/** * 1.得到SAX解析器的工厂实例 * 2.从SAX工厂实例中获得SAX解析器 * 3.把要解析的XML文档转化为输入流,以便DOM解析器解析它 * 4.解析XML文档 */public class SAXParse {    /**     * @param args     */    public static void main(String[] args) {        // TODO Auto-generated method stub        // 得到SAX解析工厂          SAXParserFactory factory = SAXParserFactory.newInstance();          // 创建解析器          SAXParser parser =null;        try {            parser = factory.newSAXParser();            XMLReader xmlReader = parser.getXMLReader();            InputSource input = new InputSource(new FileInputStream(new File("world.xml")));            xmlReader.setContentHandler(new MyHandler());            xmlReader.parse(input);        } catch (ParserConfigurationException | SAXException e) {            // TODO Auto-generated catch block            e.printStackTrace();        }catch (FileNotFoundException e) {            // TODO Auto-generated catch block            e.printStackTrace();        } catch (IOException e) {            // TODO Auto-generated catch block            e.printStackTrace();        }      }}

5.结果输出;

(setDocumentLocator)set document_locator : (lineNumber = 1,columnNumber = 1,systemId = null,publicId = null)(startDocument)document is startting(startElement)start element : world()(characters)characters(2): \n\t(startElement)start element : comuntry()(characters)characters(3): \n\t\t(startElement)start element : name()(characters)characters(5): China(endElement)end element : name()(characters)characters(3): \n\t\t(startElement)start element : capital()(characters)characters(7): Beijing(endElement)end element : capital()(characters)characters(3): \n\t\t(startElement)start element : population()(characters)characters(4): 1234(endElement)end element : population()(characters)characters(3): \n\t\t(startElement)start element : area()(characters)characters(3): 960(endElement)end element : area()(characters)characters(2): \n\t(endElement)end element : comuntry()(characters)characters(2): \n\t(startElement)start element : comuntry()(characters)characters(3): \n\t\t(startElement)start element : name()(characters)characters(7): America(endElement)end element : name()(characters)characters(3): \n\t\t(startElement)start element : capital()(characters)characters(10): Washington(endElement)end element : capital()(characters)characters(3): \n\t\t(startElement)start element : population()(characters)characters(3): 234(endElement)end element : population()(characters)characters(3): \n\t\t(startElement)start element : area()(characters)characters(3): 900(endElement)end element : area()(characters)characters(2): \n\t(endElement)end element : comuntry()(characters)characters(2): \n\t(startElement)start element : comuntry()(characters)characters(3): \n\t\t(startElement)start element : name()(characters)characters(5): Japan(endElement)end element : name()(characters)characters(3): \n\t\t(startElement)start element : capital()(characters)characters(5): Tokyo(endElement)end element : capital()(characters)characters(3): \n\t\t(startElement)start element : population()(characters)characters(3): 234(endElement)end element : population()(characters)characters(3): \n\t\t(startElement)start element : area()(characters)characters(2): 60(endElement)end element : area()(characters)characters(2): \n\t(endElement)end element : comuntry()(characters)characters(2): \n\t(startElement)start element : comuntry()(characters)characters(3): \n\t\t(startElement)start element : name()(characters)characters(6): Russia(endElement)end element : name()(characters)characters(3): \n\t\t(startElement)start element : capital()(characters)characters(6): Moscow(endElement)end element : capital()(characters)characters(3): \n\t\t(startElement)start element : population()(characters)characters(2): 34(endElement)end element : population()(characters)characters(3): \n\t\t(startElement)start element : area()(characters)characters(4): 1960(endElement)end element : area()(characters)characters(2): \n\t(endElement)end element : comuntry()(characters)characters(1): \n(endElement)end element : world()(endDocument)doument is ended

 

6.SAX解析完成,这是一个很简单的解析读取过程,具体的应用需要定制。

XML文件解析之SAX解析