pandas类别变量转哑变量

get_dummies

pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None,
sparse=False, drop_first=False)

Convert categorical variable into dummy/indicator variables
Parameters data : array-like, Series, or DataFrame
prefix : string, list of strings, or dict of strings, default None
String to append DataFrame column names Pass a list with length equal to the number
of columns when calling get_dummies on a DataFrame. Alternatively, prefix can be a
dictionary mapping column names to prefixes.
prefix_sep : string, default ‘_’
If appending prefix, separator/delimiter to use. Or pass a list or dictionary as with prefix.
dummy_na : bool, default False
Add a column to indicate NaNs, if False NaNs are ignored.
columns : list-like, default None
Column names in the DataFrame to be encoded. If columns is None then all the columns
with object or category dtype will be converted.
sparse : bool, default False
Whether the dummy columns should be sparse or not. Returns SparseDataFrame if
data is a Series or if all columns are included. Otherwise returns a DataFrame with
some SparseBlocks.
drop_first : bool, default False
Whether to get k-1 dummies out of k categorical levels by removing the first level.

import pandas as pd
import numpy as np
from pandas.api.types import CategoricalDtype
cat_dat = CategoricalDtype(categories=['春','夏','秋','冬'], ordered=True)
data=pd.DataFrame({'季节':['春','夏','冬',np.nan]})
data['季节'] = data['季节'].astype(cat_dat)
pd.get_dummies(data,['季节'], prefix_sep='-', dummy_na=True, drop_first=False)

pandas类别变量转哑变量_第1张图片

pd.get_dummies(df)

pandas类别变量转哑变量_第2张图片 

df = pd.DataFrame({'A':['a','b','c'], 'B':['b','a','c'], 'C':[1,2,3]})
pd.get_dummies(df, prefix={'B':'col_B','A':'col_A'}, prefix_sep = '-', columns=['A','B'])

prefix:指定转换后列的前缀,默认为原列名+类别

prefix_sep:指定转换后列内分隔符,默认为'_'

pandas类别变量转哑变量_第3张图片

你可能感兴趣的:(数据分析)