Ruben Laguna’s blog

STaX: OutOfMemoryError When Parsing Big Files

Java 6 includes STaX , when I tried to parse a Evernote backup file with it, I got a OOME error.

1
2
3
4
5
6
7
8
9
10

1
2
3
4
5
6
7
8
9
10
<span class='line'>java.lang.OutOfMemoryError: Java heap space
</span><span class='line'>     at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
</span><span class='line'>     at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1520)
</span><span class='line'>     at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070)
</span><span class='line'>     at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(XMLEntityScanner.java:486)
</span><span class='line'>     at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2679)
</span><span class='line'>     at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
</span><span class='line'>     at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
</span><span class='line'>     at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:548)
</span><span class='line'>     at</span>

Googling a bit I found a bug report 6536111. It says that this should be fixed in 1.6.0_14. But I tried Sun 1.6.0_16 and no luck. I got the exact same thing.

By the way I get this error in both Windows Vista and Mac OS X 10.5.6.

So I decided to go and use WoodStox instead (which is also a STaX API implementation). I worked like a charm.

At the beginning I though I would need to put the woodstox jars in the endorsed dir (-Djava.endorsed.dirs=“xxx”) but actually it’s not necessary at all.

You just put the woodstox’s jars (stax2-api-3.0.1.jar,woodstox-core-lgpl-4.0.5.jar) in the classpath and that’s it. In my case I was using it in a Netbeans Platform Application (RCP) so I created a Netbeans Library Wrapper with the two jars in it and make my module depend on this new library wrapper.

1
2
3
4
5
6
7
8
9

1
2
3
4
5
6
7
8
9
<span class='line'>&lt;class-path-extension>
</span><span class='line'>                &lt;runtime-relative-path>ext/woodstox-core-lgpl-4.0.5.jar&lt;/runtime-relative-path>
</span><span class='line'>                &lt;binary-origin>release/modules/ext/woodstox-core-lgpl-4.0.5.jar&lt;/binary-origin>
</span><span class='line'>            &lt;/class-path-extension>
</span><span class='line'>            
</span><span class='line'>            &lt;class-path-extension>
</span><span class='line'>                &lt;runtime-relative-path>ext/stax2-api-3.0.1.jar&lt;/runtime-relative-path>
</span><span class='line'>                &lt;binary-origin>release/modules/ext/stax2-api-3.0.1.jar&lt;/binary-origin>
</span><span class='line'>            &lt;/class-path-extension></span>

The JARs use the Service Provider (SPI) feature of jar files to register themselves as an STaX implementation. No changes in the code, you still use the STaX interface to do the parsing but the WoodStox implementation will be used instead.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<span class='line'>...
</span><span class='line'>...
</span><span class='line'>                in = new FileInputStream(toAdd);
</span><span class='line'>                XMLInputFactory factory = XMLInputFactory.newInstance();
</span><span class='line'>                factory.setProperty(XMLInputFactory.SUPPORT_DTD, Boolean.FALSE);
</span><span class='line'>                XMLStreamReader parser = factory.createXMLStreamReader(in);
</span><span class='line'>                int inHeader=0;
</span><span class='line'>                for (int event = parser.next();
</span><span class='line'>                        event != XMLStreamConstants.END_DOCUMENT;
</span><span class='line'>                        event = parser.next()) {
</span><span class='line'>                    switch (event) {
</span><span class='line'>                        case XMLStreamConstants.START_ELEMENT:
</span><span class='line'>                            if ("title".equals(parser.getLocalName())) {
</span><span class='line'>                                inHeader++;
</span><span class='line'>                            }
</span><span class='line'>                            ....
</span><span class='line'>                            ....
</span><span class='line'>                            ....</span>

Comments

Copyright © 2015 - Ruben Laguna - Powered by Octopress