ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes

2021-07-15 15:02:20  阅读:1065  来源: 互联网

标签:XML lxml etree no py bytes File line html5lib


出现这个错,是因为编码的问题。

Traceback (most recent call last):
  File "/tmp/a.py", line 4, in <module>
    html5lib.parse('<p>&#1;', treebuilder='lxml')
  File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 28, in parse
    return p.parse(doc, encoding=encoding)
  File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 224, in parse
    parseMeta=parseMeta, useChardet=useChardet)
  File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 93, in _parse
    self.mainLoop()
  File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 183, in mainLoop
    new_token = phase.processCharacters(new_token)
  File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/html5parser.py", line 991, in processCharacters
    self.tree.insertText(token["data"])
  File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/treebuilders/_base.py", line 320, in insertText
    parent.insertText(data)
  File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/treebuilders/etree_lxml.py", line 240, in insertText
    builder.Element.insertText(self, data, insertBefore)
  File "/home/simon/.virtualenvs/weasyprint/lib/python3.3/site-packages/html5lib/treebuilders/etree.py", line 108, in insertText
    self._element.text += data
  File "lxml.etree.pyx", line 921, in lxml.etree._Element.text.__set__ (src/lxml/lxml.etree.c:41467)
  File "apihelpers.pxi", line 652, in lxml.etree._setNodeText (src/lxml/lxml.etree.c:18888)
  File "apihelpers.pxi", line 1335, in lxml.etree._utf8 (src/lxml/lxml.etree.c:24701)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

再生成文档过程中,突然间发现出现此错误。本来想着是通过改变编码的方式,来解决这类问题,如下所示:

p = document.add_paragraph(u"哈哈 ")
或者是:
p = document.add_paragraph(p.encode('utf-8').decode("utf-8")) 

但是我使用了上述的两种方法,错误仍然存在,后面就用了替换的方法,解决了眼前的错误(虽然目前妥协了,但是后面如果发现又更好的解决方式,会再来更新的):

s = re.sub(u"[\\x00-\\x08\\x0b\\x0e-\\x1f\\x7f]", "", s)
p = self.doc.add_paragraph(s)

 

标签:XML,lxml,etree,no,py,bytes,File,line,html5lib
来源: https://www.cnblogs.com/lxz123/p/15015557.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有