DataFrame取差集、去重、拼接、shuffle

演示数据df_nba:

+-------------+-----------+---------+--------+--------+
|     name    |    team   | poision | height | weight |
+-------------+-----------+---------+--------+--------+
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|    Oneil    |   Lakers  |    C    |  216   | 147.0  |
|   McGradyg  |  Rockets  |    SF   |  203   | 101.0  |
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|  Larry_Bird |   Boston  |    PF   |  206   | 100.0  |
|   Iverson   |    76s    |    PG   |  183   |  75.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|   Olajuwon  |  Rockets  |    C    |  208   | 116.0  |
+-------------+-----------+---------+--------+--------+

df_res:

+-------------+-----------+---------+--------+--------+
|     name    |    team   | poision | height | weight |
+-------------+-----------+---------+--------+--------+
|    James    | Cavaliers |    PF   |  206   | 113.0  |
|    Jabbar   | Cavaliers |    PF   |  228   | 120.0  |
| Chamberlain |    76s    |    PF   |  216   | 125.0  |
|   Russell   |   Boston  |    PF   |  206   |  98.0  |
+-------------+-----------+---------+--------+--------+

两个DataFrame对象取差集

 res = df[~(df['a'].isin(a_to_drop))]

数据去重

 DataFrame取差集、去重、拼接、shuffle_第1张图片

  • df_nba.drop_duplicates(subset=['name','team'], keep='last', inplace=False)
+------------+---------+---------+--------+--------+
|    name    |   team  | poision | height | weight |
+------------+---------+---------+--------+--------+
|   Oneil    |  Lakers |    C    |  216   | 147.0  |
|  McGradyg  | Rockets |    SF   |  203   | 101.0  |
| Larry_Bird |  Boston |    PF   |  206   | 100.0  |
|  Iverson   |   76s   |    PG   |  183   |  75.0  |
|    kobe    |  Lakers |    SG   |  198   |  96.0  |
|   jordan   |  Bulls  |    SG   |  198   |  98.0  |
|  Olajuwon  | Rockets |    C    |  208   | 116.0  |
+------------+---------+---------+--------+--------+

数据拼接   

  • data = pd.concat([df_nba, df_res], axis = 0)
+-------------+-----------+---------+--------+--------+
|     name    |    team   | poision | height | weight |
+-------------+-----------+---------+--------+--------+
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|    Oneil    |   Lakers  |    C    |  216   | 147.0  |
|   McGradyg  |  Rockets  |    SF   |  203   | 101.0  |
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|  Larry_Bird |   Boston  |    PF   |  206   | 100.0  |
|   Iverson   |    76s    |    PG   |  183   |  75.0  |
|     kobe    |   Lakers  |    SG   |  198   |  96.0  |
|    jordan   |   Bulls   |    SG   |  198   |  98.0  |
|   Olajuwon  |  Rockets  |    C    |  208   | 116.0  |
|    James    | Cavaliers |    PF   |  206   | 113.0  |
|    Jabbar   | Cavaliers |    PF   |  228   | 120.0  |
| Chamberlain |    76s    |    PF   |  216   | 125.0  |
|   Russell   |   Boston  |    PF   |  206   |  98.0  |
+-------------+-----------+---------+--------+--------+

打乱数据

  • data = df_nba.sample(frac=1)
+------------+---------+---------+--------+--------+
|    name    |   team  | poision | height | weight |
+------------+---------+---------+--------+--------+
|   jordan   |  Bulls  |    SG   |  198   |  98.0  |
|   jordan   |  Bulls  |    SG   |  198   |  98.0  |
|    kobe    |  Lakers |    SG   |  198   |  96.0  |
|  McGradyg  | Rockets |    SF   |  203   | 101.0  |
|   jordan   |  Bulls  |    SG   |  198   |  98.0  |
|    kobe    |  Lakers |    SG   |  198   |  96.0  |
| Larry_Bird |  Boston |    PF   |  206   | 100.0  |
|  Iverson   |   76s   |    PG   |  183   |  75.0  |
|  Olajuwon  | Rockets |    C    |  208   | 116.0  |
|    kobe    |  Lakers |    SG   |  198   |  96.0  |
|   Oneil    |  Lakers |    C    |  216   | 147.0  |
+------------+---------+---------+--------+--------+

 

你可能感兴趣的:(#,python)