Pandas模块是Python用于数据导入及整理的模块

http://yam.gift/2017/02/15/list-dict-series-dataframe-ndarray-transform/

pandas 空值定义为numpy.nan

对整体的series或Dataframe判断是否未空，用isnull()

eg:

pd.isnull(df1) #df1是dataframe变量

对单独的某个值判断，可以用 np.isnan()

eg: np.isnan(df1.ix[0,3]) #对df1的第0行第3列判断

nan遇到问题解决：http://www.cnblogs.com/itdyb/p/5806688.html

nulltype： http://www.cnblogs.com/BeginMan/p/3153983.html

Pandas模块的数据结构主要有两：1、Series ；2、DataFrame

DataFrame数据的shape值，不含第一行，含第一列

loc

iloc

具体使用

http://blog.csdn.net/qq_16234613/article/details/62046057

http://blog.csdn.net/u014607457/article/details/51290237

http://blog.csdn.net/zhu418766417/article/details/52718063

Pandas也有数据框架（dataFrame）

一个特定的pandas对象，叫做数据序列（series）

df[df['Age']>60][['Sex','Pclass','Age','Survived']]

df[df['Age'].isnull()][['Sex', 'Pclass', 'Age']]

len(df[ (df['Sex'] == 'male') & (df['Pclass'] == i) ])

df['Gender'] = df['Sex'].map( {'female': 0, 'male': 1} ).astype(int)

df = df.drop(['Age'], axis=1)

df = df.dropna()

转化为arrary：train_data = df.values

pandas的数据结构

http://blog.csdn.net/AmourDeMai/article/details/51097635【很好】

http://pandas.pydata.org/pandas-docs/stable/dsintro.html

数据结构

https://pandas.pydata.org/pandas-docs/stable/dsintro.html【官方文档】

pandas主要有Series（对映一维数组），DataFrame（对映二维数组），Panel（对映三维数组），Panel4D（对映四维数组），PanelND（多维）等数据结构。应用最多的就是Series和DataFrame了。下面就主要介绍这两类数据结构。

从ndarry创建

s = pd.Series(np.random.randn(5), index = list('ABCDE'))

从dict创建

In [19]: d = {'a': 1, 'b': 2, 'c': 3}

In [20]: pd.Series(d)

从标量创建

pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])

Series像ndarray一样操作

s[0]

s[s > s.median()]

s[:3]

s[[4, 2, 1]]

np.exp(s)

Series支持+，-，*, /, exp等NumPy的运算。

Series像dictionary一样操作

s['a']

s.get('f')

当两个index不同的Series一起操作时，不同部分值为nan：

In[9]:s[1:]+s[:-1]Out[9]:aNaNb-2.729308c-0.919524d0.876880e5.863378fNaNdtype:float64

DataFrame创建方法

从字典

d = {'one': pd.Series([1, 2, 3], index=list('abc')), 'two': pd.Series([1, 2, 3, 4], index=list('abcd'))}

df = pd.DataFrame(d)

df.index

df.columns

一些操作

del df['two']

three = df.pop('three') # df中弹出three列到three变量

df['foo'] = 'bar'#整列全都变为bar

df['one_trunc'] = df['one'][:2] # 填补的数据为nan

df.insert(1, 'bar', df['one']) # 可以自定义加入列的位置，新增bar列到index为1的列

df.assign(ration = df['one'] / df['bar']) # assign操作会把结果储存在DataFrame中

df.loc['b'] # 用loc操作获取行，loc操作需要行的标签

df.iloc[2] # 用iloc操作根据行列获取数据，iloc[row list, columns list]

df.iloc[2, :] # 选取第二行，除了最后一列的所有列

pandas

数据结构

你可能感兴趣的:(pandas)