Pandas的对象之DataFrame

DataFrame是Pandas的另一个数据结构

1. DataFrame是通用的Numpy数组
import pandas as pd

>> area_dict = {
        "California":423967,
        "Texas":695662,
        "New York":141297,
        "Florida":170312,
        "Illinois":149995
        }
>> area = pd.Series(area_dict)

California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
dtype: int64


>> population_dict = {
        "California":38332521,
        "Texas":26448193,
        "New York":19651127,
        "Florida":19552860,
        "Illinois":12882135
        }
>> population = pd.Series(population_dict)

California    38332521
Florida       19552860
Illinois      12882135
New York      19651127
Texas         26448193
dtype: int64

1.结合上面两个Series对象area和population,用一个字典创建包含这些信息的二维对象

states = pd.DataFrame({'population':population,'area':area})

              area  population
California  423967    38332521
Florida     170312    19552860
Illinois    149995    12882135
New York    141297    19651127
Texas       695662    26448193

2.查看索引
>> states.index
Index(['California', 'Florida', 'Illinois', 'New York', 'Texas'], dtype='object')

3.查看列名
>> states.columns
Index(['area', 'population'], dtype='object')

因此DataFrame可以看做一种通用的Numpy二维数组,它的行和列都可以通过索引获取


2. DataFrame是特殊的字典。

DataFrame是一列映射一个Series的数据。通过 'area'的列属性可以返回包含面积数据的Series对象:

>> states['area']
California    423967
Florida       170312
Illinois      149995
New York      141297
Texas         695662
Name: area, dtype: int64

3. 创建DataFrame对象。

Pandas的DataFrame对象可以通过许多种方式创建。

1.通过单个Series对象创建。DataFrame是一组Series对象的集合。
  可以用单个Series创建一个单列的DataFrame:
>> population_dict = {
        "California":38332521,
        "Texas":26448193,
        "New York":19651127,
        "Florida":19552860,
        "Illinois":12882135
        }
>> population = pd.Series(population_dict)

California    38332521
Florida       19552860
Illinois      12882135
New York      19651127
Texas         26448193
dtype: int64

>> pd.DataFrame(population,columns=['population'])

            population
California    38332521
Florida       19552860
Illinois      12882135
New York      19651127
Texas         26448193



2.通过字典列表创建
>> data = [{'a':i,'b':2*i} for i in range(3)]
[{'a': 0, 'b': 0}, {'a': 1, 'b': 2}, {'a': 2, 'b': 4}]

>> pd.DataFrame(data)
   a  b
0  0  0
1  1  2
2  2  4


3.通过Series对象字典创建,像之前那样:
>> data = pd.DataFrame({'population':population,'area':area})

              area  population
California  423967    38332521
Florida     170312    19552860
Illinois    149995    12882135
New York    141297    19651127
Texas       695662    26448193


4.通过Numpy二维数组创建。
假如有一个二维数组,就可以通过创建一个可以指定行列索引值的DataFrame。
如果不指定行列索引,那么行列默认都是整数索引值

>> pd.DataFrame(np.random.rand(3,2),
             columns = ['foo','bar'],
             index = ['a','b','c'])

        foo       bar
a  0.882203  0.474690
b  0.969104  0.842780
c  0.637580  0.755599


5.通过Numpy结构化数组创建
>> A = np.zeros(3,dtype=[('A','i8'),('B','f8')])
array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('A', '> pd.DataFrame(A)
   A    B
0  0  0.0
1  0  0.0
2  0  0.0

你可能感兴趣的:(Pandas的对象之DataFrame)