首页 > 代码库 > Data Cleaning 3

Data Cleaning 3

1. Find correlations for each type of data by using corr()

  correlations = combined.corr(method = "pearson")
  print(correlations["sat_score"])

note: The value of correlation is from -1 to 1. If the data close to 1, they are positive correlated. If the value close to -1, they are negative correlated. If the data close to 0, they are not correlated.  

2. Then we can plot these data by using plot() function.

  %matplotlib inline

  import matplotlib.pyplot as plt

  combined.plot(‘total_enrollment‘,‘sat_score‘,kind = "scatter") #plot(x,y,kind)

3. Then we can filter the data to digging some info we need. 

4. We mapping out the school we need in certain area.

  from mpl_toolkits.basemap import Basemap

  m = Basemap(projection = "merc",llcrnrlat = 40.496044, urcrnrlat = 40.915256, llcrnrlon = -74.255735,urcrnrlon = -73.700272,resolution = "i") # urcrnrlon =  upper right corner longititude. llcrnrlon = lower left corner longitude. urcrnrlat = upper right corner latitute,llcrnrlat = lower left corner latitude.
  m.drawmapboundary(fill_color=‘#85A6D9‘)
  m.drawcoastlines(color=‘#6D5F47‘, linewidth=.4)
  m.drawrivers(color=‘#6D5F47‘, linewidth=.4)

  latitudes = combined["lat"].tolist()
  longitudes = combined["lon"].tolist()

  m.scatter(longitudes,latitudes,s = 20, zorder = 2 , latlon = True ) # scatter can only shows the list.

5. We can change the parameter of the scatter() to change the 
  plt.show

Data Cleaning 3