如何让 BeautifulSoup 4 尊重自闭标签?
- 作者: 丶InMyHeart
- 来源: 51数据库
- 2022-10-28
问题描述
这个问题是针对 BeautifulSoup4 的问题,这使得它不同于以前的问题:
This question is specific to BeautifulSoup4, which makes it different from the previous questions:
BeautifulSoup 为什么要修改我的自闭合元素?
BeautifulSoup 中的 selfClosingTags
由于 BeautifulStoneSoup 已经消失(之前的 xml 解析器),我怎样才能让 bs4 尊重一个新的自闭合标签?例如:
Since BeautifulStoneSoup is gone (the previous xml parser), how can I get bs4 to respect a new self-closing tag? For example:
import bs4 S = '''<foo> <bar a="3"/> </foo>''' soup = bs4.BeautifulSoup(S, selfClosingTags=['bar']) print soup.prettify()
不会自动关闭 bar 标签,但会给出提示.bs4 所指的这个树生成器是什么以及如何自我关闭标签?
Does not self-close the bar tag, but gives a hint. What is this tree builder that bs4 is referring to and how to I self-close the tag?
/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags. "BS4 does not respect the selfClosingTags argument to the " <html> <body> <foo> <bar a="3"> </bar> </foo> </body> </html>
推荐答案
解析你传入的XMLxml"作为 BeautifulSoup 构造函数的第二个参数.
soup = bs4.BeautifulSoup(S, 'xml')
您需要安装 lxml.
您不再需要传递 selfClosingTags:
In [1]: import bs4 In [2]: S = '''<foo> <bar a="3"/> </foo>''' In [3]: soup = bs4.BeautifulSoup(S, 'xml') In [4]: print soup.prettify() <?xml version="1.0" encoding="utf-8"?> <foo> <bar a="3"/> </foo>
推荐阅读
热点文章
Discord.py(重写)on_member_update 无法正常工作
0
Discord.py 在 vc 中获取用户分钟数
0
discord.py 重写 |为我的命令出错
0
Discord.py rewrite 如何 DM 命令?
0
播放音频时,最后一部分被切断.如何解决这个问题?(discord.py)
0
在消息删除消息 Discord.py
0
如何使 discord.py 机器人私人/直接消息不是作者的人?
0
(Discord.py) 如何获取整个嵌入内容?
0
Discord bot 尽管获得了许可,但不能提及所有人
0
Discord.py discord.NotFound 异常
0
