python之pandas模块的基本使用（1）

首页 > 代码库 > python之pandas模块的基本使用（1）

python之pandas模块的基本使用（1）

2024-09-01 21:28:20 218人阅读

一、pandas概述

pandas ：pannel data analysis（面板数据分析）。pandas是基于numpy构建的，为时间序列分析提供了很好的支持。pandas中有两个主要的数据结构，一个是Series，另一个是DataFrame。

二、数据结构 Series

Series 类似于一维数组与字典(map)数据结构的结合。它由一组数据和一组与数据相对应的数据标签（索引index）组成。这组数据和索引标签的基础都是一个一维ndarray数组。可将index索引理解为行索引。 Series的表现形式为：索引标签在左边，值在右边

Series的使用代码示例：

import pandas as pd
from pandas import Series,DataFrame

print ‘用一维数组生成Series‘
x = Series([1,2,3,4]) 
print x
‘‘‘
0    1
1    2
2    3
3    4
‘‘‘
print x.values # [1 2 3 4]
# 默认标签为0到3的序号
print x.index # RangeIndex(start=0, stop=4, step=1) 

print ‘指定Series的index‘ # 可将index理解为行索引
x = Series([1, 2, 3, 4], index = [‘a‘, ‘b‘, ‘d‘, ‘c‘])
print x
‘‘‘
a    1
b    2
d    3
c    4
‘‘‘
print x.index # Index([u‘a‘, u‘b‘, u‘d‘, u‘c‘], dtype=‘object‘)
print x[‘a‘] # 通过行索引来取得元素值：1
x[‘d‘] = 6 # 通过行索引来赋值
print x[[‘c‘, ‘a‘, ‘d‘]] # 类似于numpy的花式索引
‘‘‘
c    4
a    1
d    6
‘‘‘
print x[x > 2]  # 类似于numpy的布尔索引
‘‘‘
d    6
c    4
‘‘‘
print ‘b‘ in x # 类似于字典的使用：是否存在该索引：True
print ‘e‘ in x # False


print ‘使用字典来生成Series‘
data = http://www.mamicode.com/{‘a‘:1, ‘b‘:2, ‘d‘:3, ‘c‘:4}>





三、数据结构 DataFrame

DataFrame是一个表格型的数据结构，既有行索引也有列索引，它含有一组有序的列，每列可以是不同的值类型（数值、字符串、布尔值等）。DataFrame的每一行和每一列都是一个Series，这个Series的name属性为当前的行索引名/列索引名。

可以输入给DataFrame构造器的数据：



DataFrame的使用代码示例：

print ‘使用字典生成DataFrame，key为列名字。‘
data = http://www.mamicode.com/{‘state‘:[‘ok‘, ‘ok‘, ‘good‘, ‘bad‘],>




四、索引对象

pandas的索引对象负责管理轴标签和轴名称等。构建Series或DataFrame时，所用到的任何数组或其他序列的标签都会被转换成一个Index对象。 Index对象是不可修改的。

代码示例：

from pandas import Index
print ‘获取Index对象‘
x = Series(range(3), index = [‘a‘, ‘b‘, ‘c‘])
index = x.index
print index 
# Index([u‘a‘, u‘b‘, u‘c‘], dtype=‘object‘)
print index[0:2]
# Index([u‘a‘, u‘b‘], dtype=‘object‘)
try:
    index[0]=‘d‘
except:
    print "Index is immutable"

print ‘构造/使用Index对象‘
index = Index(numpy.arange(3))
obj2 = Series([1.5, -2.5, 0], index = index)
print obj2
‘‘‘
0    1.5
1   -2.5
2    0.0
dtype: float64
‘‘‘
print obj2.index is index # True


print ‘判断列/行索引是否存在‘
data = http://www.mamicode.com/{‘pop‘:{2.4, 2.9},>




五、基本功能


对列/行索引重新指定索引（删除/增加：行/列）：reindex函数

reindex的method选项：



代码示例：

print ‘重新指定索引及NaN填充值‘
x = Series([4, 7, 5], index = [‘a‘, ‘b‘, ‘c‘])
y = x.reindex([‘a‘, ‘b‘, ‘c‘, ‘d‘])
print y
‘‘‘
a    4.0
b    7.0
c    5.0
d    NaN
dtype: float64
‘‘‘
print x.reindex([‘a‘, ‘b‘, ‘c‘, ‘d‘], fill_value = http://www.mamicode.com/0) >

删除（丢弃）整一行/列的元素：drop函数

print ‘Series根据行索引删除行‘
x = Series(numpy.arange(4), index = [‘a‘, ‘b‘, ‘c‘,‘d‘])
print x.drop(‘c‘)
‘‘‘
a    0
b    1
d    3
dtype: int32
‘‘‘
print x.drop([‘a‘, ‘b‘])  #  花式删除
‘‘‘
c    2
d    3
dtype: int32
‘‘‘

print ‘DataFrame根据索引行/列删除行/列‘
x = DataFrame(numpy.arange(16).reshape((4, 4)),
                  index = [‘a‘, ‘b‘, ‘c‘, ‘d‘],
                  columns = [‘A‘, ‘B‘, ‘C‘, ‘D‘])
print x
‘‘‘
    A   B   C   D
a   0   1   2   3
b   4   5   6   7
c   8   9  10  11
d  12  13  14  15
‘‘‘
print x.drop([‘A‘,‘B‘],axis=1) # 在列的维度上删除AB两行
‘‘‘
    C   D
a   2   3
b   6   7
c  10  11
d  14  15
‘‘‘
print x.drop(‘a‘, axis = 0) # 在行的维度上删除行
‘‘‘
    A   B   C   D
b   4   5   6   7
c   8   9  10  11
d  12  13  14  15
‘‘‘
print x.drop([‘a‘, ‘b‘], axis = 0)
‘‘‘
  A   B   C   D
c   8   9  10  11
d  12  13  14  15
‘‘‘

索引、选取和过滤：

DataFrame的索引选项：



print ‘Series的数组索引/字典索引‘
x = Series(numpy.arange(4), index = [‘a‘, ‘b‘, ‘c‘, ‘d‘])
print x[‘b‘] # 1 像字典一样索引
print x[1] # 1  像数组一样索引
print x[[1, 3]] # 花式索引
‘‘‘
b    1
d    3
dtype: int32
‘‘‘
print x[x < 2] # 布尔索引
‘‘‘
a    0
b    1
dtype: int32
‘‘‘
print ‘Series的数组切片‘
print x[‘a‘:‘c‘]  # 闭区间，索引顺序须为前后
‘‘‘
a    0
b    1
c    2
‘‘‘
x[‘a‘:‘c‘] = 5
print x
‘‘‘
a    5
b    5
c    5
d    3
‘‘‘

print ‘DataFrame的索引‘
data = http://www.mamicode.com/DataFrame(numpy.arange(16).reshape((4, 4)),>

算术运算和数据对齐

代码示例：

print ‘DataFrame算术:不重叠部分为NaN,重叠部分元素运算‘
x = DataFrame(numpy.arange(9.).reshape((3, 3)),
                columns = [‘A‘,‘B‘,‘C‘],
                index = [‘a‘, ‘b‘, ‘c‘])
y = DataFrame(numpy.arange(12).reshape((4, 3)),
                columns = [‘A‘,‘B‘,‘C‘],
                index = [‘a‘, ‘b‘, ‘c‘, ‘d‘])
print x
print y
print x + y
‘‘‘
      A     B     C
a   0.0   2.0   4.0
b   6.0   8.0  10.0
c  12.0  14.0  16.0
d   NaN   NaN   NaN
‘‘‘
print ‘对x/y的不重叠部分填充，不是对结果NaN填充‘
print x.add(y, fill_value = http://www.mamicode.com/0) # x不变化>

numpy函数应用与映射

代码示例：

print ‘numpy函数在Series/DataFrame的应用‘
frame = DataFrame(numpy.arange(9).reshape(3,3),
                  columns = [‘A‘,‘B‘,‘C‘],
                  index = [‘a‘, ‘b‘, ‘c‘])
print frame
‘‘‘
   A  B  C
a  0  1  2
b  3  4  5
c  6  7  8
‘‘‘
print numpy.square(frame)
‘‘‘
    A   B   C
a   0   1   4
b   9  16  25
c  36  49  64
‘‘‘

series = frame.A
print series
‘‘‘
a    0
b    3
c    6
‘‘‘
print numpy.square(series)
‘‘‘
a     0
b     9
c    36
‘‘‘


print ‘lambda(匿名函数)以及应用‘
print frame
‘‘‘

   A  B  C
a  0  1  2
b  3  4  5
c  6  7  8
‘‘‘
print frame.max()
‘‘‘
A    6
B    7
C    8
‘‘‘
f = lambda x: x.max() - x.min()
print frame.apply(f) # 作用到每一列
‘‘‘
A    6
B    6
C    6
‘‘‘
print frame.apply(f, axis = 1) # 作用到每一行
‘‘‘
a    2
b    2
c    2
‘‘‘
def f(x): # Series的元素的类型为Series
    return Series([x.min(), x.max()], index = [‘min‘, ‘max‘])
print frame.apply(f)
‘‘‘
     A  B  C
min  0  1  2
max  6  7  8
‘‘‘

print ‘applymap和map：作用到每一个元素‘
_format = lambda x: ‘%.2f‘ % x
print frame.applymap(_format) # 针对DataFrame
‘‘‘
      A     B     C
a  0.00  1.00  2.00
b  3.00  4.00  5.00
c  6.00  7.00  8.00
‘‘‘
print frame[‘A‘].map(_format) # 针对Series
‘‘‘
a    0.00
b    3.00
c    6.00
Name: A, dtype: object
‘‘‘

        <script  type="text/javascript">            $(function () {                $(‘pre.prettyprint code‘).each(function () {                    var lines = $(this).text().split(‘\n‘).length;                    var $numbering = $(‘‘).addClass(‘pre-numbering‘).hide();                    $(this).addClass(‘has-numbering‘).parent().append($numbering);                    for (i = 1; i <= lines; i++) {                        $numbering.append($(‘‘).text(i));                    };                    $numbering.fadeIn(1700);                });            });        </script>python之pandas模块的基本使用（1）



基本使用 其他 views java split 


 声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉： 投诉/举报 工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。









 看完仍有疑问？有类似问题直接问程序猿









  
    
      
        投诉/举报
        
      
      
        




       
          
            
            
            
            
          
        
       
           您的姓名
          
            
            
            
          
          
        
        
           反馈内容






 相关代码解决方案


 Pandas的使用（1） python pandas模块,nba数据处理（1） python pandas 使用 python tarfile模块基本使用 python爬虫----（1. 基本模块） python 之使用模块 python之logging模块1 Python pandas.io.data 模块迁移 pandas 学习（1）： pandas 数据结构之Series Python科学计算之Pandas Python爬虫之urllib模块1 python之logging模块使用 Python pandas python之数据分析pandas pickle模块的基本使用 pickle模块的基本使用 python基础之模块part1 python 之 xlrd模块 excel的读使用 python  pandas介绍 Python：Pandas学习 Python安装pandas Python学习笔记-模块介绍（一）-模块概念和基本使用 python数据持久存储：pickle模块的基本使用 python数据持久存储：pickle模块的基本使用 模块之使用模块 Python数据分析之pandas学习 python之模块 python之模块 python 之模块


当日更新
 (hdu step 1.3.3)Tian Ji --
 干货首发，能够清理，带动画的自
 PHP导出excel文件，第二步先实
 BLE固件开发--更新连接参数
 eslint
 jQuery 引用地址{包括jquery
 Spring 和Quartz2 整合实现
 Idea IntelliJ远程调试教程
 HDOJ 3232 Crossing Rivers 
 2017.8.08