python - unicode_literals and StringIO and the right way to do things -


yes, rambling unicode question.

i have code snippet:

from __future__ import unicode_literals import requests lxml import etree  class review(object):     def __init__(self, site_name):         self.parser = etree.htmlparser()         # other things       def get_root(self, url):         # snip snip         resp = requests.get(url)         html = resp.text         root = etree.parse(stringio(html), self.parser)         return root 

that works.

in python 3, like:

from urllib import request # stuff detect encoding of page response = request.urlopen(req) html = response.read().decode(detected_encoding) root = etree.parse(stringio(self.html_doc), self.parser) 

with lot of ugly code handle when page's declared encoding isn't actual encoding.

my issue unicode_literals voodoo me, embarrassed of ignorance. why root = etree.parse(stringio(html), self.parser) magically work most of time unicode_literals imported , actual right thing in python 2.7?

for example, have construct in django code fixing :

stuff = stringio(unicode(request.body)) 

and bad , wrong. can't explain why bad , wrong except breaks on many encodings not utf-8

i strings are, well, strings encoding in python 3, ascii in python 2.7. stringio lets me treat string buffer. , know stuff = stringio(unicode(request.body)), sorta/kinda work unicode_literals imported, don't know why means don't know right thing avoid writing lot of ugly code detect encoding of django's request.body, why posting this.

tl;dr

what unicode_literals in python 2.7, fix django error in stuff = stringio(unicode(request.body)), side effects?

much thanks

the unicode literals not affect code stringio(unicode(request.body)). change type of literal strings when don't use prefix in python 2.

without unicode literals

u'y'  # unicode string b'z'  # byte string 'x'  # byte string 

with unicode literals

from __future__ import unicode_literals u'y'  # unicode string b'z'  # byte string 'x'  # *unicode* string 

when use unicode literals, have same behaviour python 3.3+ (you couldn't use u'' in python 3.0 3.2).

the correct way convert request.body byte string unicode string specify encoding when converting byte string unicode.

stuff = stringio(body.decode('utf-8')) 

if encoding not utf-8, change encoding.


Comments