Pandas Dataframe-将字符串拆分为多列 一个典型例子

我有一个字符串列,如下所示,我想将其转换为单独的列。我的问题是我尝试拆分它,但没有按我的需要提供输出。
 

*-----------------------------------------------------------------------------*
|  Total Visitor                                                              |
*-----------------------------------------------------------------------------*
|  2x Adult, 1x Adult + Audio Guide                                           |
|  2x Adult, 2x Youth, 1x Children                                            | 
|  5x Adult + Audio Guide, 1x Children + Audio Guide, 1x Senior + Audio Guide |
*-----------------------------------------------------------------------------*



这是我用来分割字符串但没有给我预期输出的代码。
 

df = data["Total Visitor"].str.split(",", n = 1, expand = True)



分割字符串后,“我的预期输出”应如下表所示:
 

*----------------------------------------------------------------------------------------------------------------*
|  Adult    | Adult + Audio Guide    | Youth   | Children    | Children + AG        | Senior + AG                                                                       
*----------------------------------------------------------------------------------------------------------------*
|  2x Adult | 1x Adult + Audio Guide |    -    |       -     |    -                    | -  
|
|  2x Adult |          -             |2x Youth | 1x Children |    -                    | -                               
|      -    | 5x Adult + Audio Guide |    -    |      -      |1x Children + Audio Guide| 1x Senior + Audio Guide |
*----------------------------------------------------------------------------------------------------------------*



我怎样才能做到这一点?任何帮助或指导都会很棒。

 

 

最佳答案

想法是创建字典列表,并用x-regex^\d+x\s+删除数字键(^是字符串的开头,\d+是一个或多个整数,而\s+是一个或多个空格)并传递给DataFrame构造函数:
 

import re

L =[dict([(re.sub('^\d+x\s+',"",y),y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
print (df)
      Adult     Adult + Audio Guide     Youth     Children  \
0  2x Adult  1x Adult + Audio Guide         -            -   
1  2x Adult                       -  2x Youth  1x Children   
2         -  5x Adult + Audio Guide         -            -   

      Children + Audio Guide     Senior + Audio Guide  
0                          -                        -  
1                          -                        -  
2  1x Children + Audio Guide  1x Senior + Audio Guide  



另一个类似的想法由x拆分为dict键的列名称:
 

L = [dict([(y.split('x ')[1], y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')

你可能感兴趣的:(python,正则表达式)