Pandas中一共有三种数据结构,分别为:Series、DataFrame和MultiIndex(老版本中叫Panel )。其中:
Series是一个类似于一维数组的数据结构,它能够保存任何类型的数据,比如整数、字符串、浮点数等,主要由一组数据和与之相关的索引两部分构成。
Series 是带有标签的一维数组,可以保存任何数据类型(整数,字符串,浮点数,Python对象等),轴标签统称为索引。
import numpy as np
import pandas as pd
# 查看数据、数据类型
s = pd.Series(np.random.rand(5))
print('s = \n', s)
print('\ntype(s) = ', type(s))
# .index查看series索引,类型为rangeindex
print('\ns.index = {0}, type(s.index) = {1}'.format(s.index, type(s.index)))
# .values查看series值,类型是ndarray
print('\ns.values = {0}, type(s.values) = {1}'.format(s.values, type(s.values)))
打印结果:
s =
0 0.809852
1 0.096700
2 0.202184
3 0.838067
4 0.344407
dtype: float64
type(s) = <class 'pandas.core.series.Series'>
s.index = RangeIndex(start=0, stop=5, step=1), type(s.index) = <class 'pandas.core.indexes.range.RangeIndex'>
s.values = [0.80985162 0.09670019 0.20218442 0.83806729 0.34440732], type(s.values) = <class 'numpy.ndarray'>
Process finished with exit code 0
import pandas as pd
pd.Series(data=None, index=None, dtype=None)
字典的key就是index,values就是values
注意:key肯定是字符串
import pandas as pd
dic = {'red':100, 'blue':200, 'green': 500, 'yellow':1000}
s = pd.Series(data=dic)
print('s = \n', s)
打印结果:
s =
red 100
blue 200
green 500
yellow 1000
dtype: int64
假如values类型不止一种,则dtype变为 object:
import pandas as pd
dic = {'a': 1, 'b': 'hello', 'c': 3, '4': 4, '5': 5}
s = pd.Series(data=dic)
print('s = \n', s)
打印结果:
s =
a 1
b hello
c 3
4 4
5 5
dtype: object
import numpy as np
import pandas as pd
arr = np.random.randn(5)
s = pd.Series(data=arr)
print('arr = ', arr)
print('\ns = \n', s)
打印结果:
arr = [ 1.82866366 0.75174314 -1.67554372 1.51687102 0.7632735 ]
s =
0 1.828664
1 0.751743
2 -1.675544
3 1.516871
4 0.763273
dtype: float64
import numpy as np
import pandas as pd
arr = np.random.randn(5)
print('arr = ', arr)
# index参数:设置index,长度保持一致
# dtype参数:设置数值类型
s = pd.Series(data=arr, index=['a', 'b', 'c', 'd', 'e'], dtype=np.object)
print('\ns = \n', s)
打印结果:
arr = [-0.30063079 -1.0600119 -1.13511772 0.75371044 -0.87985218]
s =
a -0.300631
b -1.06001
c -1.13512
d 0.75371
e -0.879852
dtype: object
如果data是标量值,则必须提供索引。该值会重复,来匹配索引的长度
import pandas as pd
# Series 创建方法三:由标量创建
s = pd.Series(data=10, index=range(4))
print('\ns = \n', s)
打印结果:
s =
0 10
1 10
2 10
3 10
dtype: int64
为了更方便地操作Series对象中的索引和数据,Series中提供了两个属性index和values
import pandas as pd
dic = {'red':100, 'blue':200, 'green': 500, 'yellow':1000}
color_count = pd.Series(data=dic)
print('color_count = \n', color_count)
print('\ncolor_count.index = ', color_count.index)
打印结果:
color_count =
red 100
blue 200
green 500
yellow 1000
dtype: int64
color_count.index = Index(['red', 'blue', 'green', 'yellow'], dtype='object')
import pandas as pd
dic = {'red':100, 'blue':200, 'green': 500, 'yellow':1000}
color_count = pd.Series(data=dic)
print('color_count = \n', color_count)
print('\ncolor_count.values = ', color_count.values)
打印结果:
color_count =
red 100
blue 200
green 500
yellow 1000
dtype: int64
color_count.values = [ 100 200 500 1000]
也可以使用索引来获取数据:
color_count[2]
# 结果
100
import numpy as np
import pandas as pd
# Series 名称属性:name
# name为Series的一个参数,创建一个数组的名称
# .name方法:输出数组的名称,输出格式为str,如果没用定义输出名称,输出为None
s1 = pd.Series(np.random.randn(5))
print('s1 = \n{0}, \ns1.name = {1}, type(s1.name) = {2}'.format(s1, s1.name, type(s1.name)))
print('-' * 100)
s2 = pd.Series(np.random.randn(5), name='test')
print('s2 = \n{0}, \ns2.name = {1}, type(s2.name) = {2}'.format(s2, s2.name, type(s2.name)))
print('-' * 100)
# .rename()重命名一个数组的名称,并且新指向一个数组,原数组不变
s3 = s2.rename('hehehe')
print('s3 = \n{0}, \ns3.name = {1}, type(s3.name) = {2}'.format(s3, s3.name, type(s3.name)))
打印结果:
s1 =
0 0.327221
1 1.111763
2 0.412881
3 -0.823193
4 0.855757
dtype: float64,
s1.name = None, type(s1.name) = <class 'NoneType'>
----------------------------------------------------------------------------------------------------
s2 =
0 0.399637
1 -1.042004
2 -0.725770
3 -2.863925
4 -0.840557
Name: test, dtype: float64,
s2.name = test, type(s2.name) = <class 'str'>
----------------------------------------------------------------------------------------------------
s3 =
0 0.399637
1 -1.042004
2 -0.725770
3 -2.863925
4 -0.840557
Name: hehehe, dtype: float64,
s3.name = hehehe, type(s3.name) = <class 'str'>
Process finished with exit code 0