数据分析的基本步骤:
- 了解你的数据(get to know your data),
- 做一些统计学处理(像僵尸一样盯着数字不会带给你任何灵感!)
- 实现可视化(get a better feeling for your data.)。
1、numpy 自带生成数据的函数
2、numpy 具有各种统计学函数
# np_baseball is available # Import numpy import numpy as np # Create np_height from np_baseball np_height = np_baseball[:,0] # Print out the mean of np_height print(np.mean(np_height)) # Print out the median of np_height print(np.median(np_height))
/
# np_baseball is available # Import numpy import numpy as np # Print mean height (first column) avg = np.mean(np_baseball[:,0]) print("Average: " + str(avg)) # Print median height. Replace 'None' med = np.median(np_baseball[:,0]) print("Median: " + str(med)) # Print out the standard deviation on height. Replace 'None' stddev = np.std(np_baseball[:,0]) print("Standard Deviation: " + str(stddev)) # Print out correlation between first and second column. Replace 'None' corr = np.corrcoef(np_baseball[:,0], np_baseball[:,1]) print("Correlation: " + str(corr))
/
# heights and positions are available as lists # Import numpy import numpy as np # Convert positions and heights to numpy arrays: np_positions, np_heights np_positions = np.array(positions) np_heights = np.array(heights) # Heights of the goalkeepers: gk_heights gk_heights = np_heights[np_positions == 'GK'] # Heights of the other players: other_heights other_heights = np_heights[np_positions != 'GK'] # Print out the median height of goalkeepers. Replace 'None' print("Median height of goalkeepers: " + str(np.median(gk_heights))) # Print out the median height of other players. Replace 'None' print("Median height of other players: " + str(np.median(other_heights)))
3、numpy 貌似不可以做数据可视化······
可视化是从数据中获取灵感、直觉的一种途经!