Python--xml模块

首页 > 代码库 > Python--xml模块

2024-10-03 07:47:39 218人阅读

XML是实现不同语言或程序之间进行数据交换的协议,XML文件格式如下

读xml文件

<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2026</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>

from xml.etree import ElementTree #导入xml处理模块

XML()模块函数

功能：解析字符串形式的xml，返回的xml的最外层标签节点，也就是一级标签节点【有参】

使用方法：模块名称.XML(xml字符串变量)

格式如：a = ElementTree.XML(neir)

text模块关键字

功能：获取标签里的字符串

使用方法：要获取字符串的标签节点变量.text

格式如：b = a.text

http请求处理xmlQQ在线状态

#!/usr/bin/env python
# -*- coding:utf8 -*-
"""http请求处理xmlQQ在线状态"""
import requests #导入http请求模块
from xml.etree import ElementTree #导入xml处理模块
http =requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=729088188") #发送http请求
http.encoding = "utf-8" #http请求编码
neir = http.text #获取http请求的xml字符串代码
print(neir) #打印获取到http请求的xml字符串代码
print("\n")

a = ElementTree.XML(neir) #解析xml，返回的xml的最外层标签节点，也就是一级标签节点
print(a) #打印xml的最外层标签节点，也就是一级标签节点
print("\n")

b = a.text #获取节点标签里的字符串
print(b) #可以根据这个标签的字符串来判断QQ是否在线
#输出
# <?xml version="1.0" encoding="utf-8"?>
# <string xmlns="http://WebXml.com.cn/">V</string>

# <Element ‘{http://WebXml.com.cn/}string‘ at 0x0000005692820548>

# V

注意：返回以Element开头的为标签节点如：<Element ‘{http://WebXml.com.cn/}DataSet‘ at 0x0000008C179C0548>

iter()模块函数

功能：获取一级标签节点下的，多个同名同级的标签节点，可跨级的获取节点，返回迭代节点，需要for循环出标签【有参】

使用方法：解析xml节点变量.iter("要获取的标签名称")

格式如：b = a.iter("TrainDetailInfo")

#!/usr/bin/env python
# -*- coding:utf8 -*-
"""http请求处理列车时刻表"""
import requests #导入http请求模块
from xml.etree import ElementTree #导入xml处理模块
http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #发送http请求
http.encoding = "utf-8" #http请求编码
neir = http.text #获取http请求的xml字符串代码

a = ElementTree.XML(neir) #解析xml，返回的xml的最外层标签节点，也就是一级标签节点
print(a) #打印xml的最外层标签节点，也就是一级标签节点
print("\n")

b = a.iter("TrainDetailInfo") #获取一级标签下的，多个同名同级的标签节点，返回迭代节点，需要for循环出标签节点
for i in b:
    print(i) #循环出所有的TrainDetailInfo标签节点
# 输出
# <Element ‘{http://WebXml.com.cn/}DataSet‘ at 0x000000B52FC7F548>
#
#
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC7FB88>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC7FD18>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC7FEA8>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC98098>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC98228>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC983B8>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC98548>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC986D8>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC98868>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC989F8>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC98B88>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC98D18>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC98EA8>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC9B098>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC9B228>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC9B3B8>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC9B548>
# <Element ‘TrainDetailInfo‘ at 0x000000B52FC9B6D8>

tag模块关键字

功能：获取标签的名称,返回标签名称

使用方法：标签节点变量.tag

格式如：i.tag

attrib模块关键字

功能：获取标签的属性，以字典形式返回标签属性

使用方法：标签节点变量.attrib

格式如：i.attrib

#!/usr/bin/env python
# -*- coding:utf8 -*-
"""http请求处理列车时刻表"""
import requests #导入http请求模块
from xml.etree import ElementTree #导入xml处理模块
http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #发送http请求
http.encoding = "utf-8" #http请求编码
neir = http.text #获取http请求的xml字符串代码

a = ElementTree.XML(neir) #解析xml，返回的xml的最外层标签节点，也就是一级标签节点
print(a) #打印xml的最外层标签节点，也就是一级标签节点
print("\n")

b = a.iter("TrainDetailInfo") #获取一级标签下的，多个同名同级的标签节点，返回迭代节点，需要for循环出标签节点
for i in b:
    print(i.tag,i.attrib) #tag获取标签的名称，attrib获取标签的属性
# 输出
# <Element ‘{http://WebXml.com.cn/}DataSet‘ at 0x0000008C179C0548>
#
#
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘0‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo1‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘1‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo2‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘2‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo3‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘3‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo4‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘4‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo5‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘5‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo6‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘6‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo7‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘7‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo8‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘8‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo9‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}
# TrainDetailInfo {‘{urn:schemas-microsoft-com:xml-msdata}rowOrder‘: ‘9‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}id‘: ‘TrainDetailInfo10‘, ‘{urn:schemas-microsoft-com:xml-diffgram-v1}hasChanges‘: ‘inserted‘}

find()模块函数

功能：查找一个标签节点下的子标签节点，返回子标签节点【有参】

使用方法：父标签节点变量.find("要查找的子标签名称")

格式如：i.find("TrainStation")

#!/usr/bin/env python
# -*- coding:utf8 -*-
"""http请求处理xmll列出时刻表"""
import requests #导入http请求模块
from xml.etree import ElementTree #导入xml处理模块
http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k234&UserID=") #发送http请求
http.encoding = "utf-8" #http请求编码
neir = http.text #获取http请求的xml字符串代码

a = ElementTree.XML(neir) #解析xml，返回的xml的最外层标签节点，也就是一级标签节点
print(a) #打印xml的最外层标签节点，也就是一级标签节点
print("\n")

b = a.iter("TrainDetailInfo") #获取一级标签下的，多个同名同级的标签节点，返回迭代句柄，需要for循环出标签节点
for i in b:
    print(i.find("TrainStation")) #find()查找一个标签节点下的子标签节点
# 输出
# <Element ‘{http://WebXml.com.cn/}DataSet‘ at 0x000000BF7CFC0548>
#
#
# <Element ‘TrainStation‘ at 0x000000BF7CFC0BD8>
# <Element ‘TrainStation‘ at 0x000000BF7CFC0D68>
# <Element ‘TrainStation‘ at 0x000000BF7CFC0EF8>
# <Element ‘TrainStation‘ at 0x000000BF7CFD90E8>
# <Element ‘TrainStation‘ at 0x000000BF7CFD9278>
# <Element ‘TrainStation‘ at 0x000000BF7CFD9408>
# <Element ‘TrainStation‘ at 0x000000BF7CFD9598>
# <Element ‘TrainStation‘ at 0x000000BF7CFD9728>
# <Element ‘TrainStation‘ at 0x000000BF7CFD98B8>
# <Element ‘TrainStation‘ at 0x000000BF7CFD9A48>
# <Element ‘TrainStation‘ at 0x000000BF7CFD9BD8>
# <Element ‘TrainStation‘ at 0x000000BF7CFD9D68>
# <Element ‘TrainStation‘ at 0x000000BF7CFD9EF8>
# <Element ‘TrainStation‘ at 0x000000BF7CFDC0E8>
# <Element ‘TrainStation‘ at 0x000000BF7CFDC278>
# <Element ‘TrainStation‘ at 0x000000BF7CFDC408>
# <Element ‘TrainStation‘ at 0x000000BF7CFDC598>
# <Element ‘TrainStation‘ at 0x000000BF7CFDC728>

拿出标签里需要的数据

#!/usr/bin/env python
# -*- coding:utf8 -*-
"""http请求处理xmlQQ在线状态"""
import requests #导入http请求模块
from xml.etree import ElementTree #导入xml处理模块
http =requests.get("http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=k567&UserID=") #发送http请求
http.encoding = "utf-8" #http请求编码
neir = http.text #获取http请求的xml字符串代码

a = ElementTree.XML(neir) #解析xml，返回的xml的最外层标签节点，也就是一级标签节点
print(a) #打印xml的最外层标签节点，也就是一级标签节点
print("\n")

b = a.iter("TrainDetailInfo") #获取一级标签下的，多个同名同级的标签节点，返回迭代句柄，需要for循环出标签节点
for i in b:
    print(i.find("TrainStation").text,i.find("ArriveTime").text,i.find("StartTime").text,i.find("KM").text) #获取标签里的字符串
# 输出
# <Element ‘{http://WebXml.com.cn/}DataSet‘ at 0x000000B9C3775408>
#
#
# 天津（车次：k567） None 15:30:00 0
# 唐山 17:06:00 17:11:00 114
# 昌黎 18:19:00 18:22:00 226
# 北戴河 18:43:00 18:46:00 254
# 秦皇岛 19:04:00 19:09:00 275
# 山海关 19:34:00 19:50:00 292
# 绥中 20:54:00 20:58:00 357
# 兴城 21:34:00 21:38:00 405
# 葫芦岛 21:58:00 22:02:00 426
# 锦州 22:47:00 22:53:00 476
# 沟帮子 23:42:00 23:46:00 540
# 沈阳 02:19:00 02:31:00 718
# 四平 05:16:00 05:19:00 906
# 八面城 05:40:00 05:42:00 934
# 双辽 06:28:00 06:33:00 996
# 保康 07:28:00 07:30:00 1076
# 太平川 07:56:00 08:00:00 1110
# 开通 08:33:00 08:36:00 1159
# 洮南 09:21:00 09:24:00 1227
# 白城 09:52:00 10:04:00 1259
# 镇赉 10:31:00 10:34:00 1297
# 泰来 11:22:00 11:24:00 1361
# 江桥 12:02:00 12:06:00 1409
# 三间房 12:51:00 12:53:00 1447
# 齐齐哈尔 13:26:00 None 1477

注意：xml通过XML()解析得到一级节点后，可以用for循环节点，和嵌套循环节点来得到想要的节点【重点】

如下列：

#!/usr/bin/env python
# -*- coding:utf8 -*-
"""
注意：xml通过ElementTree.XML()解析得到一级节点后，可以用for循环节点，和嵌套循环节点来得到想要的节点
如下列：
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2026</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>
"""
a = open("xml.xml","r",encoding="utf-8") #本地打开一个xml文件
b = a.read() #读出文件内容
from xml.etree import ElementTree #导入xml处理模块
c = ElementTree.XML(b) #解析xml得到，第一个节点，也就是一级节点
for i in c: #循环一级节点里的节点，得到二级节点
    for s in i: #循环二级节点里的节点，得到三级节点下的节点
        print(s) #打印出三级节点
# 输出
# <Element ‘rank‘ at 0x000000A4C3B083B8>
# <Element ‘year‘ at 0x000000A4C3B08408>
# <Element ‘gdppc‘ at 0x000000A4C3B08458>
# <Element ‘neighbor‘ at 0x000000A4C3B084A8>
# <Element ‘neighbor‘ at 0x000000A4C3B084F8>
# <Element ‘rank‘ at 0x000000A4C3B08598>
# <Element ‘year‘ at 0x000000A4C3B085E8>
# <Element ‘gdppc‘ at 0x000000A4C3B08638>
# <Element ‘neighbor‘ at 0x000000A4C3B08688>
# <Element ‘rank‘ at 0x000000A4C3B08728>
# <Element ‘year‘ at 0x000000A4C3B08778>
# <Element ‘gdppc‘ at 0x000000A4C3B087C8>
# <Element ‘neighbor‘ at 0x000000A4C3B08818>
# <Element ‘neighbor‘ at 0x000000A4C3B08868>

重点：推荐iter()跨级获取父点，如iter()无法满足需求考虑用for循环节在配合使用。：for循环节点，配合iter()跨级获取父点，配合find()获取子字典，可以获取到所有你想需要的节点。

解析XML

解析xml有两种方式，一种是字符串方式解析XML()，一种是xml文件直接解析parse()

字符串方式解析XML()

#!/usr/bin/env python
# -*- coding:utf8 -*-
a = open("xml.xml","r",encoding="utf-8") #以utf-8编码只读模式打开
b = a.read() #读出文件里的字符串
a.close()
from xml.etree import ElementTree #导入xml解析模块
c = ElementTree.XML(b) #解析字符串形式的xml，返回的xml的最外层标签节点，也就是一级标签节点
print(c)
# 输出
# <Element ‘data‘ at 0x000000D2756B3228>

xml文件直接解析parse() 注意：parse()方式解析xml是即可读，也可对xml文件写入的，包括，增，删，改

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree #导入xml解析模块
c = ElementTree.parse("xml.xml") #打开一个xml文件
b = c.getroot() #获取xml文件的一级节点
print(b)
# 输出
# <Element ‘data‘ at 0x0000006C681C3228>

写xml文件包括（增，删，改）

parse()模块函数

功能：打开xml文件解析，直接打开一个xml文件，parse()方式解析xml是即可读，也可对xml文件写入的，包括，增，删，改【有参】

使用方法：模块名称.parse("要打开解析的xml文件路径或名称")

格式如：c = ElementTree.parse("xml.xml")

getroot()模块函数

功能：获取parse()方式解析的xml节点，返回的一级节点，也就是最外层节点

使用方法：打开xml文件解析变量.getroot()

格式如：b = c.getroot()

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree #导入xml解析模块
c = ElementTree.parse("xml.xml") #打开一个xml文件
b = c.getroot() #获取xml文件的一级节点
print(b)
# 输出
# <Element ‘data‘ at 0x000000F9AF0D3228>

注意：parse()解析xml文件可以读写，只是解析方式不同，其他的获取里面的标签节点，和获取标签里的字符串的方式，都和XML()解析是一样的

write()模块函数

功能：写入保存修改的xml文件【有参】

使用方法：parse()解析变量.write("保存文件名称",encoding=‘字符编码‘, xml_declaration=True, short_empty_elements=False)

参数说明

encoding="字符编码"

xml_declaration=True ：写入xml文件时是否写入xml版本信息和字符编码如：<?xml version=‘1.0‘ encoding=‘utf-8‘?> True 写入 False 不写入

short_empty_elements=False ：是否自动将没有值的空标签保存为单标签 True 是 False 否

格式如：c.write("xml.xml", encoding=‘utf-8‘)

修改一个标签里的字符串

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree #导入xml解析模块
c = ElementTree.parse("xml.xml") #打开一个xml文件
b = c.getroot() #获取xml文件的一级节点
print(b,"\n") #打印获取xml文件的一级节点
d = b.iter("rank") #获取一级节点下的，所有名称为rank的节点
for i in d: #循环出所有rank节点
    f1 = int(i.text) + 1 #获取rank节点标签里的字符串，然后转换成数字类型加1，赋值给f1
    i.text = str(f1) #然后将rank节点里的字符串，修改成f1的值
c.write("xml.xml", encoding=‘utf-8‘,xml_declaration=True, short_empty_elements=False) #最后将写入保存

set()模块函数

功能：为节点标签添加属性【有参】
使用方法：节点变量.set("标签属性名称","标签属性值")
格式如：i.set("linguixiou2","yes2")

为标签添加属性

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree #导入xml解析模块
c = ElementTree.parse("xml.xml") #打开一个xml文件
b = c.getroot() #获取xml文件的一级节点
print(b,"\n") #打印获取xml文件的一级节点
d = b.iter("year") #获取一级节点下的，所有名称为rank的节点
for i in d: #循环出所有rank节点
    i.set("linguixiou2","yes2") #为当前节点标签添加属性
    i.set("linguixiou","yes") #为当前节点标签添加属性

c.write("xml.xml", encoding=‘utf-8‘) #最后将写入保存

del 模块关键字

功能：删除标签属性
使用方法：del 当前标签节点.attrib["要删除的标签属性名称"]
格式如：del i.attrib["linguixiou"]

删除标签属性

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree #导入xml解析模块
c = ElementTree.parse("xml.xml") #打开一个xml文件
b = c.getroot() #获取xml文件的一级节点
print(b,"\n") #打印获取xml文件的一级节点
d = b.iter("year") #获取一级节点下的，所有名称为rank的节点
for i in d: #循环出所有rank节点
    i.set("linguixiou2","yes2") #为当前节点标签添加属性
    i.set("linguixiou","yes") #为当前节点标签添加属性
    del i.attrib["linguixiou"] #（删除标签的属性）del删除，i当前节点，attrib获取标签名称，["linguixiou"]标签名称里的属性名称

c.write("xml.xml", encoding=‘utf-8‘) #最后将写入保存

findall()模块函数

功能：获取一级节点，下的多个同名同级的节点，返回成一个列表，每个元素是一个节点
使用方法：一级节点变量.findall("要获取的节点标签名称")
格式如：d = b.findall("country")

查找多个同名同级的节点的某一个节点

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree #导入xml解析模块
c = ElementTree.parse("xml.xml") #打开一个xml文件
b = c.getroot() #获取xml文件的一级节点
print(b,"\n") #打印获取xml文件的一级节点
d = b.findall("country") #获取一级节点，下的多个同名同级的节点，返回成一个列表，每个元素是一个节点
print(d,"\n")
e = d[0].find("rank") #索引列表里的第0个元素节点，查找0个元素节点下的rank子节点
print(e)
# 输出
# <Element ‘data‘ at 0x000000278AB63228>
#
# [<Element ‘country‘ at 0x000000278AD12B88>, <Element ‘country‘ at 0x000000278AD27548>, <Element ‘country‘ at 0x000000278AD276D8>]
#
# <Element ‘rank‘ at 0x000000278AD273B8>

remove()模块函数

功能：删除父节点下的子节点
使用方法：父节点变量.remove(从父节点找到子节点)
格式如：e.remove(e.find("rank"))

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree as xml #导入xml解析模块
c = xml.parse("xml.xml") #打开一个xml文件
b = c.getroot() #获取xml文件的一级节点
e = b.find("country") #获取根节点下的country节点
e.remove(e.find("rank")) #删除country节点下的gdppc节点

c.write("xml.xml", encoding=‘utf-8‘,xml_declaration=True, short_empty_elements=False) #最后将写入保存

Element()模块函数

功能：创建标签节点，注意只是创建了节点，需要结合append()追加到父节点下
使用方法：要创建的父级节点变量.Element("标签名称",attrib={‘键值对字典形式的标签属性})
格式如：xml.Element(‘er‘, attrib={‘name‘: ‘2‘})

append()模块函数

功能：追加标签节点，将一个创建的节点，追加到父级节点下
使用方法：父级节点变量.append("创建节点变量")
格式如：geng.append(b1)

ElementTree()模块函数

功能1：创建一个ElementTree对象，生成xml，追加等操作后生成xml

注意：XML()解析的Element对象，使用ElementTree()创建一个ElementTree对象，也可以修改保存的

使用方法：模块名称.ElementTree(根节点)
格式如：tree = xml.ElementTree(geng)

新建一个xml文件1，注意这个方式生成的xml文件是没自动缩进的

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree as xml #as将模块名称重命名为xml
geng = xml.Element("geng") #创建一级节点，根节点

b1 = xml.Element(‘er‘, attrib={‘name‘: ‘2‘}) #创建二级节点
b2 = xml.Element(‘er‘, attrib={‘name‘: ‘2‘})#创建二级节点
geng.append(b1) #将二级节点添加到根节点下
geng.append(b2) #将二级节点添加到根节点下

c1 = xml.Element(‘san‘, attrib={‘name‘: ‘3‘})#创建三级节
c2 = xml.Element(‘san‘, attrib={‘name‘: ‘3‘})#创建三级节
b1.append(c1) #将三级节点添加到二级节点下
b2.append(c2) #将三级节点添加到二级节点下

tree = xml.ElementTree(geng)  #生成xml
tree.write(‘oooo.xml‘,encoding=‘utf-8‘,xml_declaration=True, short_empty_elements=False)

SubElement()【推荐】

功能：创建节点追加节点，并且可以定义标签名称，标签属性，以及标签的text文本值
使用方法：定义变量 = 模块名称.SubElement(要追加的父级变量,"标签名称",attrib={"字典形式标签属性"})

　　　　　定义变量.text = "标签的text文本值"
格式如：ch1 = xml.SubElement(geng,"er",attrib={"name":"zhi"})

　　　　ch1.text = "二级标签"

新建xml文件2，注意这个方式生成的xml文件是没自动缩进的

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree as xml #as将模块名称重命名为xml
geng = xml.Element("geng") #创建一级节点，根节点

ch1 = xml.SubElement(geng,"er",attrib={"name":"zhi"})
ch1.text = "二级标签"

ch2 = xml.SubElement(ch1,"er3",attrib={"name":"zhi3"})
ch2.text = "三级标签"

tree = xml.ElementTree(geng)  #生成xml
tree.write(‘oooo.xml‘,encoding=‘utf-8‘,xml_declaration=True, short_empty_elements=False)

新建xml文件3【推荐】，自动缩进

缩进需要引入 xml文件夹下dom文件夹里的 minidom 模块文件

from xml.etree import ElementTree as ET 引入ElementTree模块

from xml.dom import minidom 引入模块

创建节点等操作用ElementTree模块，minidom只做缩进处理

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree as ET
from xml.dom import minidom
def prettify(elem):
    """将节点转换成字符串，并添加缩进。"""
    rough_string = ET.tostring(elem, ‘utf-8‘)
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")
# 创建根节点
root = ET.Element("famliy") #创建一级节点
a = ET.SubElement(root,"erji",attrib={"mna":"zhi"}) #创建二级节点

b1 = ET.SubElement(a,"mingzi") #创建三级节点
b1.text = "林贵秀"
b2 = ET.SubElement(a,"xingbie") #创建三级节点
b2.text = "男"
b3 = ET.SubElement(a,"nianl") #创建三级节点
b3.text = "32"

raw_str = prettify(root) #将整个根节点传入函数里缩进处理
f = open("xxxoo.xml",‘w‘,encoding=‘utf-8‘) #打开xxxoo.xml文件
f.write(raw_str) #将缩进处理好的整个xml写入文件
f.close() #关闭打开的文件

XML 命名空间提供避免元素命名冲突的方法。

命名冲突

在 XML 中，元素名称是由开发者定义的，当两个不同的文档使用相同的元素名时，就会发生命名冲突。

这个 XML 文档携带着某个表格中的信息：

<table>
   <tr>
   <td>Apples</td>
   <td>Bananas</td>
   </tr>
</table>

这个 XML 文档携带有关桌子的信息（一件家具）：

<table>
   <name>African Coffee Table</name>
   <width>80</width>
   <length>120</length>
</table>

假如这两个 XML 文档被一起使用，由于两个文档都包含带有不同内容和定义的 <table> 元素，就会发生命名冲突。

XML 解析器无法确定如何处理这类冲突。

使用前缀来避免命名冲突

此文档带有某个表格中的信息：

<h:table>
   <h:tr>
   <h:td>Apples</h:td>
   <h:td>Bananas</h:td>
   </h:tr>
</h:table>

此 XML 文档携带着有关一件家具的信息：

<f:table>
   <f:name>African Coffee Table</f:name>
   <f:width>80</f:width>
   <f:length>120</f:length>
</f:table>

现在，命名冲突不存在了，这是由于两个文档都使用了不同的名称来命名它们的 <table> 元素 (<h:table> 和 <f:table>)。

通过使用前缀，我们创建了两种不同类型的 <table> 元素。

register_namespace()模块函数

功能：创建命名空间

使用方法：模块名称.register_namespace("命名名称","命名名称值")
格式如：ET.register_namespace(‘com‘,"http://www.company.com")

#!/usr/bin/env python
# -*- coding:utf8 -*-
from xml.etree import ElementTree as ET
ET.register_namespace(‘com‘,"http://www.company.com") #register_namespace("命名名称","命名名称值")创建命名空间
"""
创建命名空间后，后面创建节点的时候，定义节点标签，和定义节点标签属性的时候，在里面加入命名名称值 如【"{命名名称值}节点标签名称"】，这样会自动将命名名称值转换成命名名称
简单的理解就是，给标签，标签属性，加上一个标示，防止名称冲突
"""
root = ET.Element("{http://www.company.com}STUFF")
body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF", attrib={"{http://www.company.com}hhh": "123"})
body.text = "STUFF EVERYWHERE!"
tree = ET.ElementTree(root)
tree.write("page.xml",xml_declaration=True,encoding=‘utf-8‘,method="xml")
#生成如下
# <?xml version=‘1.0‘ encoding=‘utf-8‘?>
# <com:STUFF xmlns:com="http://www.company.com">
#     <com:MORE_STUFF com:hhh="123">STUFF EVERYWHERE!</com:MORE_STUFF>
# </com:STUFF>

重点总结

一.解析xml文件
有两种方式
字符串方式：XML()解析，返回的是Element对象，直接得到根节点
文件方式：parse()解析，返回的是ElementTree对象，要通过getroot()来得到Element对象，得到根节点
重点：ElementTree对象才可以写入文件，Element无法写入文件，所以如果是Element解析的，需要ElementTree(根节点变量)函数来创建ElementTree对象，后就可以写入了

二.ElementTree对象
1.ElementTree 类创建，可以通过ElementTree(xxxx)创建对象，parse()底层还是调用ElementTree()创建的
2.ElementTree 对象，通过getroot()获取根节点
ElementTree() 创建一个ElementTree对象，生成xml
write() 内存中的xml写入文件

三.Element对象
iter() 获取一级标签节点下的，多个同名同级的标签节点，可跨级的获取节点，返回迭代节点，需要for循环出标签【有参】
findall() 获取一级节点，下的多个同名同级的节点，返回成一个列表，每个元素是一个节点
find() 查找一个标签节点下的子标签节点，返回子标签节点【有参】
set() 为节点标签添加属性【有参】
remove() 删除父节点下的子节点
text 获取标签里的文本字符串
tag 获取标签的名称,返回标签名称
attrib 获取标签的属性，以字典形式返回标签属性
del 删除标签属性
Element() 创建标签节点，注意只是创建了节点，需要结合append()追加到父节点下
append() 追加标签节点，将一个创建的节点，追加到父级节点下
SubElement() 创建节点追加节点，并且可以定义标签名称，标签属性，以及标签的text文本值
register_namespace() 创建命名空间


Element，操作详细源码

Python--xml模块

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Python--xml模块