首页 > 代码库 > UFO长啥样?--Python数据分析来告诉你
UFO长啥样?--Python数据分析来告诉你
前言
真心讲,长这么大,还没有见过UFO长啥样,偶然看到美国UFO报告中心有关于UFO时间记录的详细信息,突然想分析下这些记录里都包含了那些有趣的信息,于是有了这次的分析过程。
当然,原始数据包含的记录信息比较多,我只是进了了比较简单的分析,有兴趣的童鞋可以一起来分析,别忘了也给大家分享下您的分析情况哦。
本次分析的主要内容涉及以下几个方面:
- UFO长啥样?
- UFO在哪些地方出现的次数较多?
- UFO在哪些年份出现的次数较多?
- 热力图同时显示哪些州和哪些年UFO出现次数最多
import pandas as pdimport numpy as npimport matplotlib.pyplot as plt% matplotlib inlineplt.style.use(‘ggplot‘)
1 数据整理与清洗
df = pd.read_csv(‘nuforc_events.csv‘)
print(df.shape) # 查看数据的结构print(df.head())
(110265, 13) Event_Time Event_Date Year Month Day Hour Minute 0 2017-04-20T14:15:00Z 2017-04-20 2017.0 4.0 20.0 14.0 15.0 1 2017-04-20T04:56:00Z 2017-04-20 2017.0 4.0 20.0 4.0 56.0 2 2017-04-19T23:55:00Z 2017-04-19 2017.0 4.0 19.0 23.0 55.0 3 2017-04-19T23:50:00Z 2017-04-19 2017.0 4.0 19.0 23.0 50.0 4 2017-04-19T23:29:00Z 2017-04-19 2017.0 4.0 19.0 23.0 29.0 City State Shape Duration 0 Palmyra NJ Other 5 minutes 1 Bridgeview IL Light 20 seconds 2 Newton AL Triangle 5 seconds 3 Newton AL Triangle 5-6 minutes 4 Denver CO Light 1 hour Summary 0 I observed an aircraft that seemed to look odd. 1 Bridgeview, IL, blue light. ((anonymous report)) 2 Silent triangle UFO. 3 My friend and I stepped outside hoping to catc... 4 Moved slow but made quick turns staying and ci... Event_URL 0 http://www.nuforc.org/webreports/133/S133726.html 1 http://www.nuforc.org/webreports/133/S133720.html 2 http://www.nuforc.org/webreports/133/S133724.html 3 http://www.nuforc.org/webreports/133/S133723.html 4 http://www.nuforc.org/webreports/133/S133721.html
- 由于存在许多包含NaN的数据信息,在进行分析之前,先用dropna()方法去除包含NaN的行数
df_clean = df.dropna()print(df_clean.shape) # 查看去除Nan后还有多少行print(df_clean.head())
(95004, 13) Event_Time Event_Date Year Month Day Hour Minute 0 2017-04-20T14:15:00Z 2017-04-20 2017.0 4.0 20.0 14.0 15.0 1 2017-04-20T04:56:00Z 2017-04-20 2017.0 4.0 20.0 4.0 56.0 2 2017-04-19T23:55:00Z 2017-04-19 2017.0 4.0 19.0 23.0 55.0 3 2017-04-19T23:50:00Z 2017-04-19 2017.0 4.0 19.0 23.0 50.0 4 2017-04-19T23:29:00Z 2017-04-19 2017.0 4.0 19.0 23.0 29.0 City State Shape Duration 0 Palmyra NJ Other 5 minutes 1 Bridgeview IL Light 20 seconds 2 Newton AL Triangle 5 seconds 3 Newton AL Triangle 5-6 minutes 4 Denver CO Light 1 hour Summary 0 I observed an aircraft that seemed to look odd. 1 Bridgeview, IL, blue light. ((anonymous report)) 2 Silent triangle UFO. 3 My friend and I stepped outside hoping to catc... 4 Moved slow but made quick turns staying and ci... Event_URL 0 http://www.nuforc.org/webreports/133/S133726.html 1 http://www.nuforc.org/webreports/133/S133720.html 2 http://www.nuforc.org/webreports/133/S133724.html 3 http://www.nuforc.org/webreports/133/S133723.html 4 http://www.nuforc.org/webreports/133/S133721.html
- 由于1900年以前的数据较少,这里选择1900年以后的数据来进行分析,如下:
df_clean = df_clean[df_clean[‘Year‘]>=1900] # 获取1900年以后的数据来进行分析
- 查看导入的每列数据的数据类型,通过运行结果,可以看到,“Event_Date”列并不是日期类型,因此要将之转换。
- 可以采用pd.to_datetime()方法来操作
df_clean.dtypes
Event_Time objectEvent_Date objectYear float64Month float64Day float64Hour float64Minute float64City objectState objectShape objectDuration objectSummary objectEvent_URL objectdtype: object
- 用pd.to_datetime()方法来将str格式的日期转换成日期类型
pd.to_datetime(df_clean[‘Event_Date‘]) # 1061-12-31年不能显示# OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1061-12-31 00:00:00df_clean.dtypes
Event_Time objectEvent_Date objectYear float64Month float64Day float64Hour float64Minute float64City objectState objectShape objectDuration objectSummary objectEvent_URL objectdtype: object
2 UFO长啥样?
- 按UFO出现的形状类型来分析,统计不同类型的UFO出现的次数
s_shape = df_clean.groupby(‘Shape‘)[‘Event_Date‘].count()print(type(s_shape))s_shape.sort_values(inplace=True)s_shape
<class ‘pandas.core.series.Series‘>ShapeChanged 1Hexagon 1Pyramid 1Flare 1Round 2Crescent 2Delta 7Cross 287Cone 383Egg 842Teardrop 866Chevron 1187Diamond 1405Cylinder 1495Rectangle 1620Flash 1717Cigar 2313Changing 2378Formation 3070Oval 4332Disk 5841Sphere 6482Other 6658Unknown 6887Fireball 7785Triangle 9358Circle 9818Light 20254Name: Event_Date, dtype: int64
剔除特殊情况
- 剔除出现次数少于10次的类型
- 剔除“Unknown”及“Other”类型
s_shape_normal = s_shape[s_shape.values>10]s_shape_normal
ShapeCross 287Cone 383Egg 842Teardrop 866Chevron 1187Diamond 1405Cylinder 1495Rectangle 1620Flash 1717Cigar 2313Changing 2378Formation 3070Oval 4332Disk 5841Sphere 6482Other 6658Unknown 6887Fireball 7785Triangle 9358Circle 9818Light 20254Name: Event_Date, dtype: int64
s_shape_normal = s_shape_normal[s_shape_normal.index.isin([‘Unknown‘, ‘Other‘])==False]s_shape_normal
ShapeCross 287Cone 383Egg 842Teardrop 866Chevron 1187Diamond 1405Cylinder 1495Rectangle 1620Flash 1717Cigar 2313Changing 2378Formation 3070Oval 4332Disk 5841Sphere 6482Fireball 7785Triangle 9358Circle 9818Light 20254Name: Event_Date, dtype: int64
from matplotlib import font_manager as fmfrom matplotlib import cmlabels = s_shape_normal.indexsizes = s_shape_normal.valuesexplode = (0.1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.1) # "explode" , show the selected slicefig, axes = plt.subplots(figsize=(10,5),ncols=2) # 设置绘图区域大小ax1, ax2 = axes.ravel()colors = cm.rainbow(np.arange(len(sizes))/len(sizes)) # colormaps: Paired, autumn, rainbow, gray,spring,Darkspatches, texts, autotexts = ax1.pie(sizes, labels=labels, autopct=‘%1.0f%%‘,explode=explode, shadow=False, startangle=150, colors=colors, labeldistance=1.2,pctdistance=1.05, radius=0.95)# labeldistance: 控制labels显示的位置# pctdistance: 控制百分比显示的位置# radius: 控制切片突出的距离ax1.axis(‘equal‘) # 重新设置字体大小proptease = fm.FontProperties()proptease.set_size(‘xx-small‘)# font size include: ‘xx-small’,x-small’,‘small’,‘medium’,‘large’,‘x-large’,‘xx-large’ or number, e.g. ‘12‘plt.setp(autotexts, fontproperties=proptease)plt.setp(texts, fontproperties=proptease)ax1.set_title(‘Shapes‘, loc=‘center‘)# ax2 只显示图例(legend)ax2.axis(‘off‘)ax2.legend(patches, labels, loc=‘center left‘, fontsize=9)# plt.tight_layout()# plt.savefig("pie_shape_ufo.png", bbox_inches=‘tight‘)plt.savefig(‘ufo_shapes.jpg‘)plt.show()
运行结果如下:
3 UFO在美国那些州(state)出现的次数比较多?
按”State”进行分组运算,统计ufo在各个州出现的次数
s_state = df_clean.groupby(‘State‘)[‘Event_Date‘].count()print(type(s_state))s_state.head()
<class ‘pandas.core.series.Series‘>StateAB 438AK 472AL 930AR 791AZ 3488Name: Event_Date, dtype: int64
将分析得到的结果进行可视化显示,如下:
fig, ax1 = plt.subplots(figsize=(12,8))width = 0.5state = s_state.indexx_pos1 = np.arange(len(state))y1 = s_state.valuesax1.bar(x_pos1, y1,color=‘#4F81BD‘,align=‘center‘, width=width, label=‘Amounts‘, linewidth=0)ax1.set_title(‘Amount of reporting UFO events by State ‘)ax1.set_xlim(-1, len(state))ax1.set_xticks(x_pos1)ax1.set_xticklabels(state, rotation = -90)ax1.set_ylabel(‘Amount‘)fig.savefig(‘ufo_state.jpg‘)plt.show()
运行结果如下:
从上图可看出,ufo在加州(CA)出现的总次数明显比其他地方多,难道是ufo偏爱加州人民?
4 UFO在哪些年份出现的次数较多?
按”Year”进行分组运算,统计ufo在各个年份出现的次数
# df_clean[‘Year‘].astype(int)s_year = df_clean.groupby(df_clean[‘Year‘].astype(int))[‘Event_Date‘].count()print(type(s_year))s_year.head()
<class ‘pandas.core.series.Series‘>Year1905 11910 21920 11925 11929 1Name: Event_Date, dtype: int64
将分析得到的结果进行可视化显示,如下:
fig, ax = plt.subplots(figsize=(12,20))# fig, ax1 = plt.subplots(figsize=(12,8))# fig, axes = plt.subplots(nrows=2, figsize=(12,8))# fig, axes = plt.subplots(ncols=2, figsize=(18,4))year = s_year.indexy_pos = np.arange(len(year))x_value = http://www.mamicode.com/s_year.values"hljs-string" style="color: #dd1144;">‘#4F81BD‘,align=‘center‘, label=‘Amounts‘, linewidth=0)ax.set_title(‘Amount of reporting UFO events by Year ‘)ax.set_ylim(-0.5, len(year)-0.5)ax.set_yticks(y_pos)ax.set_yticklabels(year, rotation = 0, fontsize=6)ax.set_xlabel(‘Amount‘)plt.savefig(‘ufo_year.jpg‘)plt.show()
运行结果如下:
从上图可看出,近年来UFO出现的报告次数最多
5 1997年以后的UFO事件分析
- 通过上述分析可看出,1997年以前,报告发现UFO的事件相对较少,下面将针对1997年以后的情况进行分析
df_97 = df_clean[(df_clean[‘Year‘]>=1997)]df_97[‘Year‘] = df_97[‘Year‘].astype(int)# df_97.astype({‘Year‘:int})print(df_97.shape)print(df_97.head())
(86041, 13) Event_Time Event_Date Year Month Day Hour Minute 0 2017-04-20T14:15:00Z 2017-04-20 2017 4.0 20.0 14.0 15.0 1 2017-04-20T04:56:00Z 2017-04-20 2017 4.0 20.0 4.0 56.0 2 2017-04-19T23:55:00Z 2017-04-19 2017 4.0 19.0 23.0 55.0 3 2017-04-19T23:50:00Z 2017-04-19 2017 4.0 19.0 23.0 50.0 4 2017-04-19T23:29:00Z 2017-04-19 2017 4.0 19.0 23.0 29.0 City State Shape Duration 0 Palmyra NJ Other 5 minutes 1 Bridgeview IL Light 20 seconds 2 Newton AL Triangle 5 seconds 3 Newton AL Triangle 5-6 minutes 4 Denver CO Light 1 hour Summary 0 I observed an aircraft that seemed to look odd. 1 Bridgeview, IL, blue light. ((anonymous report)) 2 Silent triangle UFO. 3 My friend and I stepped outside hoping to catc... 4 Moved slow but made quick turns staying and ci... Event_URL 0 http://www.nuforc.org/webreports/133/S133726.html 1 http://www.nuforc.org/webreports/133/S133720.html 2 http://www.nuforc.org/webreports/133/S133724.html 3 http://www.nuforc.org/webreports/133/S133723.html 4 http://www.nuforc.org/webreports/133/S133721.html
将数据按”Year”和”State”进行分组运算,如下:
df_amount_year = df_97.groupby([‘Year‘, ‘State‘])[‘Event_Date‘].size().reset_index()df_amount_year.columns = [‘Year‘, ‘State‘, ‘Amount‘]print(df_amount_year.head())
Year State Amount0 1997 AB 61 1997 AK 52 1997 AL 83 1997 AR 104 1997 AZ 127
import seaborn as snsdf_pivot = df_amount_year.pivot_table(index=‘State‘, columns=‘Year‘, values=‘Amount‘)f, ax = plt.subplots(figsize = (10, 15))cmap = sns.cubehelix_palette(start = 1, rot = 3, gamma=0.8, as_cmap = True)sns.heatmap(df_pivot, cmap = cmap, linewidths = 0.05, ax = ax)ax.set_title(‘Amounts per State and Year since Year 1997‘)ax.set_xlabel(‘Year‘)ax.set_ylabel(‘State‘)ax.set_xticklabels(ax.get_xticklabels(), rotation=90)f.savefig(‘ufo_per_year_state.jpg‘)
- 上图中,颜色越深的地方,表示UFO事件报告的次数越多。
原始数据来源于美国的UFO事件报告中心。
更多精彩内容请关注公众号:
“Python数据之道”
?
UFO长啥样?--Python数据分析来告诉你
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。