python - unicode().decode('utf-8', 'ignore') raising UnicodeEncodeError -
here code:
>>> z = u'\u2022'.decode('utf-8', 'ignore') traceback (most recent call last): file "<stdin>", line 1, in <module> file "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, true) unicodeencodeerror: 'latin-1' codec can't encode character u'\u2022' in position 0: ordinal not in range(256)
why unicodeencodeerror raised when using .decode?
why error raised when using 'ignore'?
when first started messing around python strings , unicode, took me awhile understand jargon of decode , encode too, here's post here may help:
think of decoding go regular bytestring to unicode , encoding from unicode. in other words:
you de - code str
produce unicode
string
and en - code unicode
string produce str
.
so:
unicode_char = u'\xb0' encodedchar = unicode_char.encode('utf-8')
encodedchar
contain unicode character, displayed in selected encoding (in case, utf-8
).
Comments
Post a Comment