Quantcast
Channel: Using Python Iterparse For Large XML Files - Stack Overflow
Viewing all articles
Browse latest Browse all 7

Using Python Iterparse For Large XML Files

$
0
0

I need to write a parser in Python that can process some extremely large files ( > 2 GB ) on a computer without much memory (only 2 GB). I wanted to use iterparse in lxml to do it.

My file is of the format:

<item><title>Item 1</title><desc>Description 1</desc></item><item><title>Item 2</title><desc>Description 2</desc></item>

and so far my solution is:

from lxml import etreecontext = etree.iterparse( MYFILE, tag='item' )for event, elem in context :      print elem.xpath( 'description/text( )' )del context

Unfortunately though, this solution is still eating up a lot of memory. I think the problem is that after dealing with each "ITEM" I need to do something to cleanup empty children. Can anyone offer some suggestions on what I might do after processing my data to properly cleanup?


Viewing all articles
Browse latest Browse all 7

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>