首页 > 代码库 > pandas学习系列(一):时间序列

pandas学习系列(一):时间序列

最近参加了天池的一个机场航空人流量预测大赛,需要用时间序列来预测,因此开始使用python的pandas库

发现pandas库功能的确很强大,因此在这记录我的pandas学习之路。

# -*- coding: utf-8 -*-
# 统计未来3小时将要起飞的人数
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler

os.chdir(C:/Users\Administrator/Desktop/competition/1017)
df = pd.read_csv(airport_gz_departure_chusai_2ndround.csv,usecols = [2,3])
df = df.dropna(axis = 0)    # 删除含有空值的行
df = df[df.flight_time>df.checkin_time]    # 删除flighttime早于checktime的行
df = df.sort_values(by=flight_time)# 将数据按flight_time排序
df.flight_time = pd.to_datetime(df.flight_time)    #转换数据类型为Timestamp
df.checkin_time = pd.to_datetime(df.checkin_time)
df = df[(df.flight_time-df.checkin_time)<pd.Timedelta(hours=12)]    #去除间隔时间相差12个小时的,12这个参数需要自己调试
df = df.flight_time
dataset = pd.tseries.index.DatetimeIndex(df.values)        # 转换数据类型为DatetimeIndex

times = pd.date_range(start = 2016-09-10 19:00:00,end = 2016-9-25 15:00:00,freq =10min)
contact_nums = []

for time in times:
    start = np.where(dataset>time)[0]
    time = time + pd.Timedelta(hours = 3)            # 统计当前时间后3小时将要起飞的乘客
    end = np.where(dataset<=time)[0]
    if len(end)==0:
        contact_nums.append(0)
    else:
        contact_nums.append(end[-1]-start[0]+1)


df = pd.DataFrame(contact_nums,index = times,columns = [num])
df.to_csv(C:/Users/Administrator/Desktop/competition/DataProcessing/Person_to_fly.csv,index_label = time_back)

scaler = MinMaxScaler(feature_range = (0,1))
contact_nums = scaler.fit_transform(np.reshape(np.array(contact_nums),(len(contact_nums),1)).astype(float32))
plt.plot(scaler.inverse_transform(contact_nums))
plt.show()

 

pandas学习系列(一):时间序列