I need to write a parser in Python that can process some extremely large files ( > 2 GB ) on a computer without much memory (only 2 GB). I wanted to use iterparse in lxml to do it.
My file is of the format:
<item><title>Item 1</title><desc>Description 1</desc></item><item><title>Item 2</title><desc>Description 2</desc></item>
and so far my solution is:
from lxml import etreecontext = etree.iterparse( MYFILE, tag='item' )for event, elem in context : print elem.xpath( 'description/text( )' )del context
Unfortunately though, this solution is still eating up a lot of memory. I think the problem is that after dealing with each "ITEM" I need to do something to cleanup empty children. Can anyone offer some suggestions on what I might do after processing my data to properly cleanup?