【pandas】教程:2-读写表格数据

pandas 读写表格数据

【pandas】教程:2-读写表格数据_第1张图片

  • 读取 Titanic 乘客数据

pandas提供了read_csv()函数,将存储为csv文件的数据读取到pandas DataFrame中。Pandas支持多种开箱即用的文件格式或数据源(csv, excel, sql, json, parquet,…),每种格式都带有前缀read_*

import pandas as pd 

titanic = pd.read_csv("data/titanic.csv")
titanic  # 这里只是为了查看加载的数据是否正确
     PassengerId  Survived  Pclass  \
0              1         0       3   
1              2         1       1   
2              3         1       3   
3              4         1       1   
4              5         0       3   
..           ...       ...     ...   
886          887         0       2   
887          888         1       1   
888          889         0       3   
889          890         1       1   
890          891         0       3   

                                                  Name     Sex   Age  SibSp  \
0                              Braund, Mr. Owen Harris    male  22.0      1   
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                               Heikkinen, Miss. Laina  female  26.0      0   
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                             Allen, Mr. William Henry    male  35.0      0   
..                                                 ...     ...   ...    ...   
886                              Montvila, Rev. Juozas    male  27.0      0   
887                       Graham, Miss. Margaret Edith  female  19.0      0   
888           Johnston, Miss. Catherine Helen "Carrie"  female   NaN      1   
889                              Behr, Mr. Karl Howell    male  26.0      0   
890                                Dooley, Mr. Patrick    male  32.0      0   
...
888      2        W./C. 6607  23.4500   NaN        S  
889      0            111369  30.0000  C148        C  
890      0            370376   7.7500   NaN        Q  

[891 rows x 12 columns]
  • 查看前N 个数据
titanic.head(8)

DataFrame.head(8) 查看前 8 行数据,如果要查看最后 8 行数据,则可以用 .tail(8)

  • 查看每一列数据的类型
titanic.dtypes
PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object

当需要知道 dtypes 时,不使用括号, dtypesDataFrame 或者 Series 的属性。

  • 将 DataFrame 导出到 excel 中;
titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)

read_*函数用于将数据读取到pandas,而to_*方法用于存储数据。to_excel()方法将数据存储为excel文件。在这里的示例中,sheet_name命名为passenger,而不是默认的Sheet1。通过设置index=False,行索引标签不会保存在电子表格中。

  • info()
titanic.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

你可能感兴趣的:(pandas,pandas,python,数据分析)