Python 编码和系统编码问题。

来结个贴。。
以前确实没太仔细研究编码和 export 出得 LC_*系列参数昨晚仔细 Google 了一遍。

1 、查看系统默认编码 sys.getdefaultencoding( ) 一般为 ascii

2 、在终端获取系统的输入、输出编码格式 sys.stdin.encoding sys.stdout.encoding 正常应该为 utf-8 设置方法为 export PYTHONIOENCODING=UTF-8

3 、 u ’中文’=‘中文’.decode(encode)
此处 encode 值为 sys.stdin.encoding
所以当为 utf-8 时 '中文'.decode('utf-8 ’)=u'\u4e2d\u6587 ’
当为 ASCII 时 '中文'.decode('ISO-8859-1 ’)=u'\xe4\xb8\xad\xe6\x96\x87 ’

4 、 os.path.exists(path) 当 path 里有中文路径时，尽量转成 utf-8 后再和英文路径相加

5 、 print 输出时候尽量要 encode(‘ utf-8 ’)

https://wiki.archlinux.org/index.php/Locale_(简体中文)

https://segmentfault.com/a/1190000004357933

http://www.w2bc.com/article/216391

http://stackoverflow.com/questions/2596714/why-does-python-print-unicode-characters-when-the-default-encoding-is-ascii

http://blog.csdn.net/liuyukuan/article/details/50855748

x91

lc_all

xe5

xc2

8 条回复 • 2017-04-16 04:19:51 +08:00

zhihaofans

2017-04-15 00:15:18 +08:00 via iPhone

→python3

coolair

2017-04-15 00:37:14 +08:00 via Android

.decode('gbk')

2017-04-15 00:54:17 +08:00

我认为是你输入的值有了问题，不然你看看 len(i) 是怎样？

wwqgtxx

2017-04-15 11:11:18 +08:00 via iPhone

快转换到 python3 吧，别在编码问题上死磕了

SuT2i

2017-04-15 21:13:59 +08:00

Python3 没有这些问题。。

dant

2017-04-15 23:41:13 +08:00

LC_ALL=C 时， Python 不知道你输入的字面量是什么编码，于是默认 ISO-8859-1 。
encode 的时候，就按 ISO-8859-1 -> UTF-8 的规则转换了。

dant

2017-04-15 23:45:51 +08:00

纠正一下，是解析 u'呵呵' 的时候把 “呵呵” 的 UTF-8 表示（ E5 91 B5 E5 91 B5 ）当作 ISO-8859-1 编码转换为 Unicode codepoint 序列（ U+00E5 U+0091 U+00B5 U+00E5 U+0091 U+00B5 ）了.
encode 的时候，就是把上面提到的那个 Unicode codepoint 序列编码成 UTF-8

lzjun

2017-04-16 04:19:51 +08:00

编码问题看： https://foofish.net/why-python-encoding-is-tricky.html