If you haven't looked at lxml for Python and you have arrived at this webpage, you almost certainly want to take a look over at http://codespeak.net/lxml/ .
Say, you have a unicode string like:
<RateV3Request USERID="123412341234">\n <Package ID="1ST">\n <Service>FIRST CLASS</Service>\n <FirstClassMailType>LETTER</FirstClassMailType>\n <ZipOrigination>44106</ZipOrigination>\n <ZipDestination>20770</ZipDestination>\n <Pounds>0</Pounds>\n <Ounces>3.5</Ounces>\n <Size>REGULAR</Size>\n <Machinable>true</Machinable>\n </Package>\n <Package ID="2ND">\n <Service>PRIORITY</Service>\n <ZipOrigination>44106</ZipOrigination>\n <ZipDestination>20770</ZipDestination>\n <Pounds>1</Pounds>\n <Ounces>8</Ounces>\n <Container>NONRECTANGULAR</Container>\n <Size>LARGE</Size>\n <Width>15</Width>\n <Length>30</Length>\n <Height>15</Height>\n <Girth>55</Girth>\n </Package>\n <Package ID="3RD">\n <Service>ALL</Service>\n <ZipOrigination>90210</ZipOrigination>\n <ZipDestination>96698</ZipDestination>\n <Pounds>8</Pounds>\n <Ounces>32</Ounces>\n <Container/>\n <Size>REGULAR</Size>\n <Machinable>true</Machinable>\n </Package>\n</RateV3Request>\n
You want all of those unsightly newlines and spaces to go way. Perhaps b/c you want to urlencode this string for a GET param of a url request for let's say the USPS API, http://www.usps.com/webtools/htm/Rate-Calculators-v2-3.htm#_Toc220743990 .
This task is not so easy as you might think unless you know of the right tools.
from StringIO import StringIO
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(StringIO(your_dirty_xml), parser)
final_string = etree.tostring(tree.getroot())
final_string is then:
<RateV3Request USERID="123412341234"><Package ID="1ST"><Service>FIRST CLASS</Service><FirstClassMailType>LETTER</FirstClassMailType><ZipOrigination>44106</ZipOrigination><ZipDestination>20770</ZipDestination><Pounds>0</Pounds><Ounces>3.5</Ounces><Size>REGULAR</Size><Machinable>true</Machinable></Package><Package ID="2ND"><Service>PRIORITY</Service><ZipOrigination>44106</ZipOrigination><ZipDestination>20770</ZipDestination><Pounds>1</Pounds><Ounces>8</Ounces><Container>NONRECTANGULAR</Container><Size>LARGE</Size><Width>15</Width><Length>30</Length><Height>15</Height><Girth>55</Girth></Package><Package ID="3RD"><Service>ALL</Service><ZipOrigination>90210</ZipOrigination><ZipDestination>96698</ZipDestination><Pounds>8</Pounds><Ounces>32</Ounces><Container/><Size>REGULAR</Size><Machinable>true</Machinable></Package></RateV3Request>