Login or Sign up

Remove insignificant whitespace from xml string with Python

Posted by: skyl on April 29, 2010

If you haven't looked at lxml for Python and you have arrived at this webpage, you almost certainly want to take a look over at http://codespeak.net/lxml/ .

Say, you have a unicode string like:

<RateV3Request USERID="123412341234">\n    <Package ID="1ST">\n        <Service>FIRST CLASS</Service>\n        <FirstClassMailType>LETTER</FirstClassMailType>\n        <ZipOrigination>44106</ZipOrigination>\n        <ZipDestination>20770</ZipDestination>\n        <Pounds>0</Pounds>\n        <Ounces>3.5</Ounces>\n        <Size>REGULAR</Size>\n        <Machinable>true</Machinable>\n    </Package>\n    <Package ID="2ND">\n        <Service>PRIORITY</Service>\n        <ZipOrigination>44106</ZipOrigination>\n        <ZipDestination>20770</ZipDestination>\n        <Pounds>1</Pounds>\n        <Ounces>8</Ounces>\n        <Container>NONRECTANGULAR</Container>\n        <Size>LARGE</Size>\n        <Width>15</Width>\n        <Length>30</Length>\n        <Height>15</Height>\n        <Girth>55</Girth>\n    </Package>\n    <Package ID="3RD">\n        <Service>ALL</Service>\n        <ZipOrigination>90210</ZipOrigination>\n        <ZipDestination>96698</ZipDestination>\n        <Pounds>8</Pounds>\n        <Ounces>32</Ounces>\n        <Container/>\n        <Size>REGULAR</Size>\n        <Machinable>true</Machinable>\n    </Package>\n</RateV3Request>\n

You want all of those unsightly newlines and spaces to go way. Perhaps b/c you want to urlencode this string for a GET param of a url request for let's say the USPS API, http://www.usps.com/webtools/htm/Rate-Calculators-v2-3.htm#_Toc220743990 .

This task is not so easy as you might think unless you know of the right tools.

from StringIO import StringIO
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(StringIO(your_dirty_xml), parser)
final_string = etree.tostring(tree.getroot())

final_string is then:

<RateV3Request USERID="123412341234"><Package ID="1ST"><Service>FIRST CLASS</Service><FirstClassMailType>LETTER</FirstClassMailType><ZipOrigination>44106</ZipOrigination><ZipDestination>20770</ZipDestination><Pounds>0</Pounds><Ounces>3.5</Ounces><Size>REGULAR</Size><Machinable>true</Machinable></Package><Package ID="2ND"><Service>PRIORITY</Service><ZipOrigination>44106</ZipOrigination><ZipDestination>20770</ZipDestination><Pounds>1</Pounds><Ounces>8</Ounces><Container>NONRECTANGULAR</Container><Size>LARGE</Size><Width>15</Width><Length>30</Length><Height>15</Height><Girth>55</Girth></Package><Package ID="3RD"><Service>ALL</Service><ZipOrigination>90210</ZipOrigination><ZipDestination>96698</ZipDestination><Pounds>8</Pounds><Ounces>32</Ounces><Container/><Size>REGULAR</Size><Machinable>true</Machinable></Package></RateV3Request>

Comments on This Post:

Please Login (or Sign Up) to leave a comment