首页 > 代码库 > 2Python进阶强化训练之csv|json|xml|excel高

2Python进阶强化训练之csv|json|xml|excel高

2024-09-21 12:07:18 214人阅读

Python进阶强化训练之csv|json|xml|excel高

如何读写csv数据？

实际案例

我们可以通过http://table.finance.yahoo.com/table.csv?s=000001.sz，这个url获取中国股市(深市)数据集，它以csv数据格式存储：

Date,Open,High,Low,Close,Volume,Adj Close
2016-09-15,9.06,9.06,9.06,9.06,000,9.06
2016-09-14,9.17,9.18,9.05,9.06,42148100,9.06
2016-09-13,9.18,9.21,9.14,9.19,46093100,9.19
2016-09-12,9.29,9.32,9.13,9.16,75658100,9.16
2016-09-09,9.40,9.43,9.36,9.38,32743100,9.38
2016-09-08,9.39,9.42,9.38,9.40,29521800,9.40
2016-09-07,9.41,9.42,9.37,9.40,45937300,9.40
2016-09-06,9.42,9.43,9.36,9.41,57473800,9.41
2016-09-05,9.46,9.46,9.40,9.42,46993600,9.42
2016-09-02,9.43,9.46,9.42,9.45,36879600,9.45
2016-09-01,9.49,9.52,9.42,9.45,48013100,9.45
2016-08-31,9.46,9.50,9.43,9.49,48974600,9.49
2016-08-30,9.42,9.47,9.41,9.47,59508100,9.47
2016-08-29,9.41,9.45,9.38,9.42,56523100,9.42
2016-08-26,9.45,9.47,9.40,9.45,50223300,9.45
2016-08-25,9.42,9.45,9.34,9.44,61738900,9.44
2016-08-24,9.41,9.43,9.38,9.43,73228100,9.43

请将平安银行这支股票，在2016年中成交量超过14000000的记录存储到另一个csv文件中

解决方案

使用标准库中的csv模块，可以使用其中reader和writer完成文件读写

下载数据

>>> from urllib import urlretrieve
# 获取平安银行股票信息，保存到pingan.csv文件中
>>> urlretrieve(‘http://table.finance.yahoo.com/table.csv?s=000001.sz‘, ‘pingan.csv‘)
(‘pingan.csv‘, <httplib.HTTPMessage instance at 0x1a997e8>)

使用csv模块进行读

>>> import csv
>>> rf = open(‘pingan.csv‘, ‘rb‘)
>>> reader = csv.reader(rf)
# 获取的对象是一个可迭代的
>>> reader.next()
[‘Date‘, ‘Open‘, ‘High‘, ‘Low‘, ‘Close‘, ‘Volume‘, ‘Adj Close‘]
>>> reader.next()
[‘2016-09-15‘, ‘9.06‘, ‘9.06‘, ‘9.06‘, ‘9.06‘, ‘000‘, ‘9.06‘]

使用csv模块进行写

>>> wf = open(‘pingan_copy.csv‘, ‘wb‘)
>>> writer = csv.writer(wf)
>>> writer.writerow([‘2016-09-14‘, ‘9.17‘, ‘9.18‘, ‘9.05‘, ‘9.06‘, ‘42148100‘, ‘9.06‘])
>>> writer.writerow(reader.next())                                                     
>>> wf.flush()

查看写入的文件内容

[root@iZ28i253je0Z ~]# cat pingan_copy.csv 
2016-09-14,9.17,9.18,9.05,9.06,42148100,9.06
2016-09-13,9.18,9.21,9.14,9.19,46093100,9.19

如上的问题解决方案如下：

#!/use/bin/env python
# _*_ coding:utf-8 _*_

import csv

with open(‘pingan.csv‘, ‘rb‘) as rf:
    reader = csv.reader(rf)
    with open(‘pingan2016.csv‘, ‘wb‘) as wf:
        writer = csv.writer(wf)
        headers = reader.next()
        writer.writerow(headers)
        for row in reader:
            if row[0] < ‘2016-01-01‘:
                break
            if int(row[5]) >= 50000000:
                writer.writerow(row)

如何读写json数据？

实际案例

在web应用中常用JSON(JavaScript Object Notation)格式传输数据，在python中如何读写json数据?

解决方案

使用标准库中的json模块，其中loads、dumps函数可以完成json数据的读写

将数据类型转换为字符串

>>> import json
# 创建一个列表
>>> l = [1,2,‘asd‘,{‘blgo_url‘,‘ansheng.me‘}]
# 使用dumps转换为字符串
>>> json.dumps(l)
‘[1, 2, "asd", {"blgo_url": "ansheng.me"}]‘
# 去掉空格
>>> json.dumps(l, separators=[‘,‘,‘:‘])
‘[1,2,"asd",{"blgo_url":"ansheng.me"}]‘
# 排序
>>> d = {‘b‘:None,‘a‘:‘111‘,‘g‘:‘Null‘}
>>> json.dumps(d, sort_keys=True)
‘{"a": "111", "b": null, "g": "Null"}‘

将字符串转换为数据类型

>>> json.loads(‘[1,2,"asd",{"blgo_url":"ansheng.me"}]‘)
[1, 2, ‘asd‘, {‘blgo_url‘: ‘ansheng.me‘}]

如何解析简单的xml文档？

实际案例

如以下XML文档，如何使用python进行解析？

<?xml version="1.0" encoding="utf-8" ?>
<data>
    <country name="Liechtenstein">
        <rank update="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank update="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank update="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

解决方案

可以使用标准库中的xml.etree.ElementTree，其中的parse函数可以解析XML文档

# 导入parse
>>> from xml.etree.ElementTree import parse
>>> f = open(‘a.xml‘)
# 获得ElementTree对象
>>> et = parse(f)
# 获取根节点，也就是data
>>> root = et.getroot()
>>> root
<Element ‘data‘ at 0x00000203ECA1E728>
# 查看标签
>>> root.tag
‘data‘
# 属性
>>> root.attrib
{}
# 文本
>>> root.text.strip()
‘‘
# 获得一个节点的子元素，然后在获取每个子元素的属性
>>> for child in root: print(child.get(‘name‘))
...
Liechtenstein
Singapore
Panama
# 根据标签寻找子元素，每次之寻找第一个
>>> root.find(‘country‘)
<Element ‘country‘ at 0x00000203ECBD3228>
# 寻找所有
>>> root.findall(‘country‘)
[<Element ‘country‘ at 0x00000203ECBD3228>, <Element ‘country‘ at 0x00000203ECBDBC28>, <Element ‘country‘ at 0x00000203ECBDBDB8>]
# 获得一个生成器对象
>>> root.iterfind(‘country‘)
<generator object prepare_child.<locals>.select at 0x00000203ECBC5FC0>
>>> for e in root.iterfind(‘country‘): print(e.get(‘name‘))
...
Liechtenstein
Singapore
Panama
# 获取所有的元素节点
>>> list(root.iter())
[<Element ‘data‘ at 0x00000203ECA1E728>, <Element ‘country‘ at 0x00000203ECBD3228>, <Element ‘rank‘ at 0x00000203ECBDBA98>, <Element ‘year‘ at 0x00000203ECBDBAE8>, <Element ‘gdppc‘ at 0x00000203ECBDBB38>, <Element ‘neighbor‘ at 0x00000203ECBDBB88>, <Element ‘neighbor‘ at 0x00000203ECBDBBD8>, <Element ‘country‘ at 0x00000203ECBDBC28>, <Element ‘rank‘ at 0x00000203ECBDBC78>, <Element ‘year‘ at 0x00000203ECBDBCC8>, <Element ‘gdppc‘ at 0x00000203ECBDBD18>, <Element ‘neighbor‘ at 0x00000203ECBDBD68>, <Element ‘country‘ at 0x00000203ECBDBDB8>, <Element ‘rank‘ at 0x00000203ECBDBE08>, <Element ‘year‘ at 0x00000203ECBDBE58>, <Element ‘gdppc‘ at 0x00000203ECBDBEA8>, <Element ‘neighbor‘ at 0x00000203ECBDBEF8>, <Element ‘neighbor‘ at 0x00000203ECBDBF48>]
# 寻找标签为rank的子节点
>>> list(root.iter(‘rank‘))
[<Element ‘rank‘ at 0x00000203ECBDBA98>, <Element ‘rank‘ at 0x00000203ECBDBC78>, <Element ‘rank‘ at 0x00000203ECBDBE08>]

查找的高级用法

# 匹配country下的所有子节点
>>> root.findall(‘country/*‘)
[<Element ‘rank‘ at 0x00000203ECBDBA98>, <Element ‘year‘ at 0x00000203ECBDBAE8>, <Element ‘gdppc‘ at 0x00000203ECBDBB38>, <Element ‘neighbor‘ at 0x00000203ECBDBB88>, <Element ‘neighbor‘ at 0x00000203ECBDBBD8>, <Element ‘rank‘ at 0x00000203ECBDBC78>, <Element ‘year‘ at 0x00000203ECBDBCC8>, <Element ‘gdppc‘ at 0x00000203ECBDBD18>, <Element ‘neighbor‘ at 0x00000203ECBDBD68>, <Element ‘rank‘ at 0x00000203ECBDBE08>, <Element ‘year‘ at 0x00000203ECBDBE58>, <Element ‘gdppc‘ at 0x00000203ECBDBEA8>, <Element ‘neighbor‘ at 0x00000203ECBDBEF8>, <Element ‘neighbor‘ at 0x00000203ECBDBF48>]
# 找到所有节点的rank
>>> root.findall(‘.//rank‘)
[<Element ‘rank‘ at 0x00000203ECBDBA98>, <Element ‘rank‘ at 0x00000203ECBDBC78>, <Element ‘rank‘ at 0x00000203ECBDBE08>]
# 找到rank的父对象
>>> root.findall(‘.//rank/..‘)
[<Element ‘country‘ at 0x00000203ECBD3228>, <Element ‘country‘ at 0x00000203ECBDBC28>, <Element ‘country‘ at 0x00000203ECBDBDB8>]
# 查找country包含name属性的
>>> root.findall(‘country[@name]‘)
[<Element ‘country‘ at 0x00000203ECBD3228>, <Element ‘country‘ at 0x00000203ECBDBC28>, <Element ‘country‘ at 0x00000203ECBDBDB8>]
>>> root.findall(‘country[@age]‘)
[]
# 查找属性等于特定值的元素
>>> root.findall(‘country[@name="Singapore"]‘)
[<Element ‘country‘ at 0x00000203ECBDBC28>]
# 查找必须包含某一个子元素
>>> root.findall(‘country[rank]‘)
[<Element ‘country‘ at 0x00000203ECBD3228>, <Element ‘country‘ at 0x00000203ECBDBC28>, <Element ‘country‘ at 0x00000203ECBDBDB8>]
# 查找子元素等于指定的值
>>> root.findall(‘country[rank="5"]‘)
[<Element ‘country‘ at 0x00000203ECBDBC28>]
# 根据位置查找，从1开始
>>> root.findall(‘country‘)
[<Element ‘country‘ at 0x00000203ECBD3228>, <Element ‘country‘ at 0x00000203ECBDBC28>, <Element ‘country‘ at 0x00000203ECBDBDB8>]
# 根据位置查找
>>> root.findall(‘country[1]‘)
[<Element ‘country‘ at 0x00000203ECBD3228>]
# 倒数第一个
>>> root.findall(‘country[last()]‘)
[<Element ‘country‘ at 0x00000203ECBDBDB8>]
# 倒数第二个
>>> root.findall(‘country[last()-1]‘)
[<Element ‘country‘ at 0x00000203ECBDBC28>]

如何构建xml文档？

实际案例

某些时候，我们需要将其他格式数据转换为xml，例如下面的字符串如何转换为XML？

解决方案

使用标准库中的xml.etree.ElementTree，构建ElementTree，使用write方法写入文件

>>> from xml.etree.ElementTree import Element, ElementTree
>>> from xml.etree.ElementTree import tostring
>>> e = Element(‘Data‘)
>>> e.set(‘name‘, ‘abc‘)
>>> e.text = ‘123‘
# 查看结果
>>> tostring(e)
b‘<Data name="abc">123</Data>‘
>>> e2 = Element(‘Row‘)
>>> e3 = Element(‘Open‘)
>>> e3.text = ‘9.17‘
>>> e2.append(e3)
>>> e.text = None
>>> e.append(e2)
>>> tostring(e)
b‘<Data name="abc"><Row><Open>9.17</Open></Row></Data>‘
>>> et = ElementTree(e)
>>> et.write(‘demo.xml‘)

解决如上问题的脚本如下：

import csv
from xml.etree.ElementTree import Element, ElementTree

def pretty(e, level=0):
    if len(e) > 0:
        e.text = ‘\n‘ + ‘\t‘ * (level + 1)
        for child in e:
            pretty(child, level + 1)
        child.tail = child.tail[:-1]
    e.tail = ‘\n‘ + ‘\t‘ * level
    
def csvToXML(fname):
    with open(fname, ‘rb‘) as f:
        reader = csv.reader(f)
        headers = reader.next()
        root = Element(‘Data‘)
        for row in reader:
            eRow = Element(‘Row‘)
            root.append(eRow)
            for tag, text in zip(headers, row):
                e = Element(tag)
                e.text = text
                eRow.append(e)
    pretty(root)
    return ElementTree(root)
    
et = csvToXML(‘pingan.csv‘)
et.write(‘pingan.xml‘)

转换好的文件内容为：

<Data>
        <Row>
                <Date>2016-09-15</Date>
                <Open>9.06</Open>
                <High>9.06</High>
                <Low>9.06</Low>
                <Close>9.06</Close>
                <Volume>000</Volume>
                <Adj Close>9.06</Adj Close>
        </Row>
        <Row>
                <Date>2016-09-14</Date>
                <Open>9.17</Open>
                <High>9.18</High>
                <Low>9.05</Low>
                <Close>9.06</Close>
                <Volume>42148100</Volume>
                <Adj Close>9.06</Adj Close>
        </Row>
        .....

如何读写excel文件？

实际案例

Microsoft Excel是日常办公中使用最频繁的软件，起数据格式为xls,xlsx,一种非常常用的电子表格。

某小学某班成绩记录在excel了文件中，内容如下：

姓名    语文    数学    外语
小明    95    96    94
张三    85    84    92
王五    86    85    75
小哈    96    92    100

利用python读写excel了，添加总分列，计算每人总分

解决方案

使用第三方库xlrd和xlwt，这两个库分别用于excel读和写

安装这两个模块

pip3 install xlrd
pip3 install xlwt

脚本如下：

#!/usr/bin/env python
# _*_ coding:utf-8 _*_

# 导入xlrd和xlwt
import xlrd
import xlwt

# 打开excel文件
rbook = xlrd.open_workbook(‘x.xlsx‘)
# 表
rsheet = rbook.sheet_by_index(0)

# 添加一个总分的列
# 列
nc = rsheet.ncols
# 第0行，列，文本类型，文字，内容
rsheet.put_cell(0, nc, xlrd.XL_CELL_TEXT, ‘总分‘, None)

# 迭代表中的所有数据，计算总分
for row in range(1, rsheet.nrows):
    # 计算每行的总分，跳过第0行,0==姓名,sum对列表进行求和，t等于最后加上拿出来的分数
    t = sum(rsheet.row_values(row, 1))
    # 写入数据
    rsheet.put_cell(row, nc, xlrd.XL_CELL_NUMBER, t, None)
    
# 写入到文件中
wbook = xlwt.Workbook()
wsheet = wbook.add_sheet(rsheet.name)

# 对其方式，垂直和水平都是剧中
style = xlwt.easyxf(‘align: vertical center, horizontal center‘)
# rsheet的每个单元格写入到wsheet中
for r in range(rsheet.nrows):  # 行
    for c in range(rsheet.ncols):  # 列
        wsheet.write(r, c, rsheet.cell_value(r, c), style)
        
wbook.save(‘output.xlsx‘)

计算结果如下：

姓名    语文    数学    外语    总分
小明    95    96    94    285
张三    85    84    92    261
王五    86    85    75    246
小哈    96    92    100    288

#Python

本文出自 “Eden” 博客，转载请与作者联系！

2Python进阶强化训练之csv|json|xml|excel高

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > 2Python进阶强化训练之csv|json|xml|excel高

2Python进阶强化训练之csv|json|xml|excel高

Python进阶强化训练之csv|json|xml|excel高

如何读写csv数据？

解决方案

如何读写json数据？

解决方案

如何解析简单的xml文档？

解决方案

如何构建xml文档？

解决方案

如何读写excel文件？

解决方案

看完仍有疑问？有类似问题直接问程序猿