创建pandas插入数据_Pandas的Categorical Data创建

16. Pandas的Categorical Data创建

前一章里介绍了Categorical Data的基本含义,本章就如何创建、使用本数据类型进行较为相近的解析。

需再说明一下Categorical Data和categories的区别,Categorical Data由两部分组成即categories和codes, categories是有限且唯一的分类的集合,codes是Categorical data的值对应于categories的编码用于存储。

16.1 创建Categorical Data数据

在Pandas里有很多的方式可以创建Categorical Data型的数据,可以基于已有的dataframe数据将模列转化成Catagorical data型的数据,也可直接创建Categorical data型数据,某些函数的返回值也有可能就是Categorical data型数据。

1). astype('category')方式创建 ,可以将某dataframe的某列直接转为Categorical Data型的数据。

import pandas as pd

import time

idx = [1,2,3,5,6,7,9,4,8]

name = ["apple","pearl","orange", "apple","orange","orange","apple","pearl","orange"]

price = [5.20,3.50,7.30,5.00,7.50,7.30,5.20,3.70,7.30]

#df = pd.DataFrame({ "fruit": name , "price" : price}, index = idx)

N = 1

df = pd.DataFrame({ "fruit": name * N, "price" : price * N}, index = idx * N)

df['fruit'] = df['fruit'].astype('category')

print df,"\n"

#print type(df.fruit.values)

print "df.price.values\n", df.price.values,"\n"

print "df.fruit.values\n", df.fruit.values, "\n"

这是前一章里使用的例子就是直接将dataframe的df的第2列即fruit由Series型数据直接转为categorical data型数据即category。

fruit price

1 apple 5.2

2 pearl 3.5

3 orange 7.3

5 apple 5.0

6 orange 7.5

7 orange 7.3

9 apple 5.2

4 pearl 3.7

8 orange 7.3

df.price.values

[5.2 3.5 7.3 5. 7.5 7.3 5.2 3.7 7.3]

df.fruit.values

[apple, pearl, orange, apple, orange, orange, apple, pearl, orange]

Categories (3, object): [apple, orange, pearl]

2). pandas.Categorical直接创建Categorical

import pandas as pd

val = ["apple","pearl","orange", "apple", "orange"]

cat = pd.Categorical(val)

print "type is",type(cat)

print "*" * 20

print "categorical data:\n",cat

print "*" * 20

print cat.categories

print cat.codes

程序执行结果:

type is

********************

categorical data:

[apple, pearl, orange, apple, orange]

Categories (3, object): [apple, orange, pearl]

********************

Index([u'apple', u'orange', u'pearl'], dtype='object')

[0 2 1 0 1]

********************

val是python的列表,而cat则是categorical data数据类型,有categories和codes属性,分别表示数据存储时的分类和编码。

3). 用categoris和codes生成Categorical Data,categories要求唯一、有限,codes可以任意定义。

import pandas as pd

val = ["apple","pearl","orange", "apple", "orange"]

cat = pd.Categorical(val)

print "type is",type(cat)

print "*" * 20

print "categorical data:\n",cat

print "*" * 20

print cat.categories

print cat.codes

print "*" * 20

codes = pd.Series([0,1, 0,2,1,0,2,0])

print "create categorical data:"

print cat.take(codes)

print pd.Categorical.take(cat, codes)

print cat.from_codes(codes, cat.categories)

程序执行结果:

type is

********************

categorical data:

[apple, pearl, orange, apple, orange]

Categories (3, object): [apple, orange, pearl]

********************

Index([u'apple', u'orange', u'pearl'], dtype='object')

[0 2 1 0 1]

********************

create categorical data:

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

[apple, orange, apple, pearl, orange, apple, pearl, apple]

Categories (3, object): [apple, orange, pearl]

程序里的cat变量是基于列表val创建的一个categorical data数据,cat有categories和codes属性。下面用cat的categories作为分类集来生成另一个categorical。

Categorical Data的实例对象调用take函数,一个categorical的实例对象cat可以传入"要查询"的编码表codes给take函数获得其对应的值,即给出编码找对应的分类。

print cat.take(codes)

"查出"的数据为:

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

pd.Categorical类调用take函数,这时形参有两个,一个是pd.Categorical的实例对象cat,另一个是编码表。

print pd.Categorical.take(cat, codes)

"查询"结果:

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

Categorical Data的实例对象调用from_codes函数,此函数需要传入“查询”编码表和分类即categories。

print cat.from_codes(codes, cat.categories)

"查询"结果:

[apple, pearl, apple, orange, pearl, apple, orange, apple]

Categories (3, object): [apple, orange, pearl]

16.2 DataFrame里插入Categorical Data

可以利用pandas.Categorical创建的Categorical data数据插入到DataFrame里。

import pandas as pd

idx = [1,2,3,5,6,7,9,4,8]

fruit = ["apple","pearl","orange", "apple","orange","orange","apple","pearl","orange"]

price = [5.20,3.50,7.30,5.00,7.50,7.30,5.20,3.70,7.30]

df = pd.DataFrame({"price" : price}, index = idx)

print df

cat = pd.Categorical(fruit)

df["fruit"] = cat

print df

print cat.codes

print cat.categories

程序执行结果:

price

1 5.2

2 3.5

3 7.3

5 5.0

6 7.5

7 7.3

9 5.2

4 3.7

8 7.3

price fruit

1 5.2 apple

2 3.5 pearl

3 7.3 orange

5 5.0 apple

6 7.5 orange

7 7.3 orange

9 5.2 apple

4 3.7 pearl

8 7.3 orange

[0 2 1 0 1 1 0 2 1]

Index([u'apple', u'orange', u'pearl'], dtype='object')

当然先创建DataFrame再将某列用astype('category')转也可以。

你可能感兴趣的:(创建pandas插入数据)