yes, rambling unicode question.
i have code snippet:
from __future__ import unicode_literals import requests lxml import etree class review(object): def __init__(self, site_name): self.parser = etree.htmlparser() # other things def get_root(self, url): # snip snip resp = requests.get(url) html = resp.text root = etree.parse(stringio(html), self.parser) return root
that works.
in python 3, like:
from urllib import request # stuff detect encoding of page response = request.urlopen(req) html = response.read().decode(detected_encoding) root = etree.parse(stringio(self.html_doc), self.parser)
with lot of ugly code handle when page's declared encoding isn't actual encoding.
my issue unicode_literals voodoo me, embarrassed of ignorance. why root = etree.parse(stringio(html), self.parser)
magically work most of time unicode_literals imported , actual right thing in python 2.7?
for example, have construct in django code fixing :
stuff = stringio(unicode(request.body))
and bad , wrong. can't explain why bad , wrong except breaks on many encodings not utf-8
i strings are, well, strings encoding in python 3, ascii in python 2.7. stringio lets me treat string buffer. , know stuff = stringio(unicode(request.body))
, sorta/kinda work unicode_literals imported, don't know why means don't know right thing avoid writing lot of ugly code detect encoding of django's request.body, why posting this.
tl;dr
what unicode_literals in python 2.7, fix django error in stuff = stringio(unicode(request.body))
, side effects?
much thanks
the unicode literals not affect code stringio(unicode(request.body))
. change type of literal strings when don't use prefix in python 2.
without unicode literals
u'y' # unicode string b'z' # byte string 'x' # byte string
with unicode literals
from __future__ import unicode_literals u'y' # unicode string b'z' # byte string 'x' # *unicode* string
when use unicode literals, have same behaviour python 3.3+ (you couldn't use u''
in python 3.0 3.2).
the correct way convert request.body
byte string unicode string specify encoding when converting byte string unicode.
stuff = stringio(body.decode('utf-8'))
if encoding not utf-8, change encoding.
Comments
Post a Comment