Python之pandas文字数据处理

1.导入模块
In [1]: import pandas as pd
2.导入表格数据
>>> titanic = pd.read_csv(r"C:\Users\Administrator\Desktop\titanic.csv")
>>> titanic
     PassengerId  Survived  Pclass                                                 Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked
0              1         0       3                              Braund, Mr. Owen Harris    male  22.0      1      0         A/5 21171   7.2500   NaN        S
1              2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)  female  38.0      1      0          PC 17599  71.2833   C85        C
2              3         1       3                               Heikkinen, Miss. Laina  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S
3              4         1       1         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S
4              5         0       3                             Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S
..           ...       ...     ...                                                  ...     ...   ...    ...    ...               ...      ...   ...      ...
886          887         0       2                                Montvila, Rev. Juozas    male  27.0      0      0            211536  13.0000   NaN        S
887          888         1       1                         Graham, Miss. Margaret Edith  female  19.0      0      0            112053  30.0000   B42        S
888          889         0       3             Johnston, Miss. Catherine Helen "Carrie"  female   NaN      1      2        W./C. 6607  23.4500   NaN        S
889          890         1       1                                Behr, Mr. Karl Howell    male  26.0      0      0            111369  30.0000  C148        C
890          891         0       3                                  Dooley, Mr. Patrick    male  32.0      0      0            370376   7.7500   NaN        Q

[891 rows x 12 columns]
3.将姓名列中的大写名字全部修改为小写名字
>>> titanic["Name"].str.lower()
0                                  braund, mr. owen harris
1      cumings, mrs. john bradley (florence briggs thayer)
2                                   heikkinen, miss. laina
3             futrelle, mrs. jacques heath (lily may peel)
4                                 allen, mr. william henry
                              ...                         
886                                  montvila, rev. juozas
887                           graham, miss. margaret edith
888               johnston, miss. catherine helen "carrie"
889                                  behr, mr. karl howell
890                                    dooley, mr. patrick
Name: Name, Length: 891, dtype: object
4.按逗号分隔姓名,将行转化为列表
>>> titanic["Name"].str.split(",")
0                                  [Braund,  Mr. Owen Harris]
1      [Cumings,  Mrs. John Bradley (Florence Briggs Thayer)]
2                                   [Heikkinen,  Miss. Laina]
3             [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]
4                                 [Allen,  Mr. William Henry]
                                ...                          
886                                  [Montvila,  Rev. Juozas]
887                           [Graham,  Miss. Margaret Edith]
888               [Johnston,  Miss. Catherine Helen "Carrie"]
889                                  [Behr,  Mr. Karl Howell]
890                                    [Dooley,  Mr. Patrick]
Name: Name, Length: 891, dtype: object
5.提取名字列表中的第一个元素
>>> titanic["Surname"] = titanic["Name"].str.split(",").str.get(0)
>>> titanic["Surname"]
0         Braund
1        Cumings
2      Heikkinen
3       Futrelle
4          Allen
         ...    
886     Montvila
887       Graham
888     Johnston
889         Behr
890       Dooley
Name: Surname, Length: 891, dtype: object
6.查找包含指定字符的用户信息
>>> titanic[titanic["Name"].str.contains("Countess")]
     PassengerId  Survived  Pclass                                                      Name     Sex   Age  SibSp  Parch  Ticket  Fare Cabin Embarked Surname
759          760         1       1  Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)  female  33.0      0      0  110152  86.5   B77        S  Rothes
7.将男性值替换为M,并将所有女性值替换为F
>>> titanic["Sex_short"] = titanic["Sex"].replace({"male": "M", "female": "F"})
>>> titanic
     PassengerId  Survived  Pclass                                                 Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked    Surname Sex_short
0              1         0       3                              Braund, Mr. Owen Harris    male  22.0      1      0         A/5 21171   7.2500   NaN        S     Braund         M
1              2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)  female  38.0      1      0          PC 17599  71.2833   C85        C    Cumings         F
2              3         1       3                               Heikkinen, Miss. Laina  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S  Heikkinen         F
3              4         1       1         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S   Futrelle         F
4              5         0       3                             Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S      Allen         M
..           ...       ...     ...                                                  ...     ...   ...    ...    ...               ...      ...   ...      ...        ...       ...
886          887         0       2                                Montvila, Rev. Juozas    male  27.0      0      0            211536  13.0000   NaN        S   Montvila         M
887          888         1       1                         Graham, Miss. Margaret Edith  female  19.0      0      0            112053  30.0000   B42        S     Graham         F
888          889         0       3             Johnston, Miss. Catherine Helen "Carrie"  female   NaN      1      2        W./C. 6607  23.4500   NaN        S   Johnston         F
889          890         1       1                                Behr, Mr. Karl Howell    male  26.0      0      0            111369  30.0000  C148        C       Behr         M
890          891         0       3                                  Dooley, Mr. Patrick    male  32.0      0      0            370376   7.7500   NaN        Q     Dooley         M

[891 rows x 14 columns]
8.总结

使用str方法可以使用字符串方法,replace方法是根据给定字典转换值的便捷方法

你可能感兴趣的:(Python之pandas文字数据处理)