Data Cleaning 2

首页 > 代码库 > Data Cleaning 2

2024-08-17 07:55:09 222人阅读

1. When we match a set of data with duplicated values in a column, and we want to use this column as an unify column which is sharing for each database. We are going to filter them into a DataFrame we want.

　　class_size = data["class_size"]
　　class_size= class_size[class_size["GRADE "] == "09-12" ]
　　class_size= class_size[class_size["PROGRAM TYPE"]=="GEN ED"]

2. Once we filtered the column ,we want to condence the duplicated column into one by using groupby() and agg function.

　　import numpy as np
　　group_by = class_size.groupby(‘DBN‘) #group_by is a special type of data called GroupBy
　　class_size = group_by.aggregate(np.mean) # we use aggregate function to deal with the GroupBy types of data .At his moment, the index of class_size will change to the grouped by value (DBN).
　　class_size.reset_index(inplace = True) # reset_index allows us to reset the index as a row number - 1
　　data[‘class_size‘] = class_size

3. Numeric all the number string by using pd.numeric() function:

　　cols = [‘AP Test Takers ‘, ‘Total Exams Taken‘, ‘Number of Exams with scores 3 4 or 5‘]

　　for col in cols:
　　data["ap_2010"][col] = pd.to_numeric(data["ap_2010"][col],errors = "coerce")

4. After cleanning each dataset, we could like to combine them together so that we can plot them. Normally we use merge() function to combine two dataset.

　　combined = data["sat_results"]

　　combined = combined.merge(data["ap_2010"],how = "left")
　　combined = combined.merge(data["graduation"],how = "inner")
　　print(combined.shape)

5. At last, we want to extract some number form certain rows by using apply() function:

　　index = combined.index

　　def get_first_two_char(data):
　　　　return data[0:2]

　　combined["school_dist"] = combined["DBN"].apply(get_first_two_char)#usually once we need to use for loop in the DataFrame, we would like to use apply function to simplieze it.

Data Cleaning 2

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Data Cleaning 2

Data Cleaning 2

看完仍有疑问？有类似问题直接问程序猿