2021SC@SDUSC [Application and Practice of Software Engineering] Cocoon Project 9-xml Folder Analysis

Keywords: Java xml

2021SC@SDUSC

JaxpSAXParser

1. Summary

SAX parser using JAXP 1.1 compliant parser
Inherited from AbstractJaxpParser

2. Main Properties

//SAX Parser Factory
protected SAXParserFactory factory;
//Whether the namespace should also be used as a judgment flag for attributes, defaulting to false
protected boolean nsPrefixes = false;
//The judgment flag to stop the warning, true by default
protected boolean stopOnWarning = true;
//Whether the restorable error judgment flag needs to be stopped, default to true
protected boolean stopOnRecoverableError = true;
//Whether comment events between start/end DTD events should be deleted, defaulting to false
protected boolean dropDtdComments = false;
//Name of sax parser factory
protected String saxParserFactoryName = "javax.xml.parsers.SAXParserFactory";

3. Methods

public void setDropDtdComments(boolean dropDtdComments) {
        this.dropDtdComments = dropDtdComments;
}
  • Should comment() events from DTD s be deleted? (Default is false)
  • Because the DeclHandler interface is not supported by this implementation in any way, only comments from DTDs are not useful. Annotation events from the internal DTD subset will appear again in the serialized output.
public void setNsPrefixes(boolean nsPrefixes) {
        this.nsPrefixes = nsPrefixes;
}
  • Do we want namespace declarations to also be'xmlns:'attributes? The default is false.
  • Note: Setting this to true can confuse some XSL processors (such as Saxon)
public void setSaxParserFactoryName(String saxParserFactoryName) {
        this.saxParserFactoryName = saxParserFactoryName;
}
  • Set the name of the SAXParserFactory implementation class to use instead of using the standard JAXP mechanism (SAXParserFactory.newInstance())
  • This allows you to explicitly choose which JAXP implementation to use when several of them are available in the class path
public void fatalError( final SAXParseException spe )
throws SAXException
  • Notify when fatal errors are received
public void setStopOnRecoverableError(boolean stopOnRecoverableError)
  • If a recoverable error occurs, determine if the parser should stop parsing and default to true
public void parse( final InputSource in,
                       final ContentHandler contentHandler,
                       final LexicalHandler lexicalHandler )
    throws SAXException, IOException {
        final XMLReader tmpReader = this.setupXMLReader();

        try {
            LexicalHandler theLexicalHandler = null;
            if ( null == lexicalHandler 
                 && contentHandler instanceof LexicalHandler) {
                theLexicalHandler = (LexicalHandler)contentHandler;
            }   
            if( null != lexicalHandler ) {
                theLexicalHandler = lexicalHandler;
            }
            if (theLexicalHandler != null) {
                if (this.dropDtdComments) {
                    theLexicalHandler = new DtdCommentEater(theLexicalHandler);
                }
                tmpReader.setProperty( "http://xml.org/sax/properties/lexical-handler",
                                       theLexicalHandler );
            }
        } catch( final SAXException e ) {
            final String message =
                "SAX2 driver does not support property: " +
                "'http://xml.org/sax/properties/lexical-handler'";
            this.getLogger().warn( message );
        }
        tmpReader.setContentHandler( contentHandler );

        tmpReader.parse( in );
    }
  1. Supplement: ContentHandler

Receive notifications of logical content in documents
This is the primary interface implemented by most SAX applications: if the application needs to be aware of basic parsing events, it implements this interface and registers an instance with the SAX parser using setContentHandler(). Parsers use instances to report basic document-related events, such as the beginning and end of element and character data
The order of events in this interface is important because it reflects the order of information in the document itself. For example, all the contents of an element (character data, processing instructions, and/or child elements) will appear sequentially between the startElement event and the corresponding endElement event

public void setContentHandler (ContentHandler handler);
  • Allows an application to register content event handlers, and if the application does not register content handlers, all content events reported by the SAX parser will be silently ignored.
  • Applications may register a new or different handler during parsing, and the SAX parser must immediately begin using the new handler.
  1. Supplement: LexicalHandler

This is an optional extension handler for SAX2 that provides lexical information about XML documents, such as comment and CDATA section boundaries. The XML reader does not need to recognize this handler, and it is not part of only the core SAX2 release.
Events in the lexical handler apply to the entire document, not just document elements, and all lexical handler events must appear between the startDocument and endDocument events in the content handler.

4. Nested Classes

JaxpSAXParser.DtdCommentEater:

A LexicalHandler implementation that removes all comment events between startDTD and endDTD. In all other cases, the event is forwarded to another LexicalHandler

protected static class DtdCommentEater implements LexicalHandler {
        protected LexicalHandler next;
        protected boolean inDTD;
        public DtdCommentEater(LexicalHandler nextHandler) {
            this.next = nextHandler;
        }
        public void startDTD (String name, String publicId, String systemId)
        throws SAXException {
            inDTD = true;
            next.startDTD(name, publicId, systemId);
        }
        public void endDTD ()
        throws SAXException {
            inDTD = false;
            next.endDTD();
        }
        public void startEntity (String name)
        throws SAXException {
            next.startEntity(name);
        }
        public void endEntity (String name)
        throws SAXException {
            next.endEntity(name);
        }
        public void startCDATA ()
        throws SAXException {
            next.startCDATA();
        }
        public void endCDATA ()
        throws SAXException {
            next.endCDATA();
        }
        public void comment (char ch[], int start, int length)
        throws SAXException {
            if (!inDTD) {
                next.comment(ch, start, length);
            }
        }
    }

Included methods:

public void startDTD (String name, String publicId, String systemId)
        throws SAXException
  • Reporting the beginning of a DTD declaration (if present), this method is mainly used to report the beginning of a DOCTYPE declaration; This method is not called if the document does not have any DOCTYPE declaration

  • All declarations reported through DTDHandler or DeclHandler events must occur between the startDTD and endDTD events. You can assume declarations belong to an internal subset of DTDs unless they occur between startEntity and endEntity events. Notes and processing instructions for DTDs should also be reported between the startDTD and endDTD events in the original order (logically) in which the events occurred. However, they do not need to appear in the correct location relative to DTDHandler or DeclHandler events

  • Note that the start/end DTD event will appear in the start/end Document event of ContentHandler before the first startElement event

  • Parameters:
    name -- name of the document type.
    publicId - A declared public identifier for an external DTD subset, null if not declared.
    systemId - A declared system public identifier for an external DTD subset, null if not declared. (Note that this cannot be parsed from the document base URI)

 public void endDTD ()
        throws SAXException 
  • Report the end of the DTD declaration, this method is mainly used to report the end of the DOCTYPE declaration; This method is not called if the document does not have any DOCTYPE declaration
 public void startEntity (String name)
        throws SAXException
  • Reporting the start of some internal and external XML entities, reporting of parameter entities (including a subset of external DTD s) is optional, and SAX2 drivers reporting LexicalHandler events cannot implement it; have access to http://xml.org/sax/features/lexical-handler/parameter-entities Report of functional query or control parameter entity

  • Report regular entities with their regular names, parameter entities with'%'placed before their names, and external DTD subsets with pseudo entity names'[dtd]'

  • When these events are provided by the SAX2 driver, all other events must be nested correctly in the start/end entity events. There is no additional requirement to properly sort events from DeclHandler() or DTDHandler()

  • Note that skipped entities will be reported via skippedEntity() (which is part of the ContentHandler interface)

  • Parameters:
    Name -- The name of the entity. If it is a parameter entity, the name will start with'%', and if it is an external DTD subset, it will be'[dtd]'

public void startCDATA ()
        throws SAXException
  • Report the beginning of the CDATA section and report the contents of the CDATA section through regular characters events; This event is used only for reporting boundaries
public void comment (char ch[], int start, int length)
        throws SAXException
  • Report XML comments anywhere in the document, and this callback will be used for comments inside and outside the document elements, including comments in a subset of the external DTD if read. Comments in DTDs must be nested correctly (if used) inside the start/endDTD and start/endEntity events
    -Parameters:
    ch -- An array of characters that hold comments.
    start -- The starting position in the array.
    length -- The number of characters in the array used.

Posted by Dave100 on Fri, 26 Nov 2021 12:22:45 -0800