Python与数据科学实战课程——第三章pandas:Dataframe

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

生成DataFrame

import webbrowser
link ="https://www.tiobe.com/tiobe-index/"
webbrowser.open(link)     #打开一个网站,并从该网站中复制数据

True

df = pd.read_clipboard()  #从粘贴板中获取数据并解析,转换成个dataframe
df
Aug 2020 Aug 2019 Change Programming Language Ratings Change.1
0 1 2 C 16.98% +1.83%
1 2 1 Java 14.43% -1.60%
2 3 3 NaN Python 9.69% -0.33%
3 4 4 NaN C++ 6.84% +0.78%
4 5 5 NaN C# 4.68% +0.83%
5 6 6 NaN Visual Basic 4.66% +0.97%
6 7 7 NaN JavaScript 2.87% +0.62%
7 8 20 R 2.79% +1.97%
8 9 8 PHP 2.24% +0.17%
type(df)

pandas.core.frame.DataFrame

dataframe相关属性

df.columns         #获取列名

Index([‘Aug 2020’, ‘Aug 2019’, ‘Change’, ‘Programming Language’, ‘Ratings’,
‘Change.1’],
dtype=‘object’)

df.Ratings            #返回某一列的数值  df.列名

0 16.98%
1 14.43%
2 9.69%
3 6.84%
4 4.68%
5 4.66%
6 2.87%
7 2.79%
8 2.24%
Name: Ratings, dtype: object

df_new = DataFrame(df, columns=["Aug 2020","Programming Language"])  #通过选取需要的列生成新的dataframe
df_new
Aug2020 ProgrammingLanguage
0 1 C
1 2 Java
2 3 Python
3 4 C++
4 5 C#
5 6 VisualBasic
6 7 JavaScript
7 8 R
8 9 PHP
df["Aug 2020"]

0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
Name: Aug 2020, dtype: int64

type(df["Aug 2020"])        #可以看到dataframe的某一列的数据结构是series

pandas.core.series.Series

df_new = DataFrame(df, columns=["Aug 2020","Programming Language","Sep 2020"])
df_new  #可以看到选取原有dataframe中没有的列并不会报错,而是系统自动生成该列,只是数值都为NaN
Aug 2020 Programming Language Sep 2020
0 1 C NaN
1 2 Java NaN
2 3 Python NaN
3 4 C++ NaN
4 5 C# NaN
5 6 Visual Basic NaN
6 7 JavaScript NaN
7 8 R NaN
8 9 PHP NaN

给dataframe赋值

方法一:使用列表赋值

df_new["Sep 2020"] = range(0,9)
df_new
Aug 2020 Programming Language Sep 2020
0 1 C 0
1 2 Java 1
2 3 Python 2
3 4 C++ 3
4 5 C# 4
5 6 Visual Basic 5
6 7 JavaScript 6
7 8 R 7
8 9 PHP 8

方法二:使用array赋值

df_new["Sep 2020"] = np.arange(1,10)
df_new
Aug 2020 Programming Language Sep 2020
0 1 C 1
1 2 Java 2
2 3 Python 3
3 4 C++ 4
4 5 C# 5
5 6 Visual Basic 6
6 7 JavaScript 7
7 8 R 8
8 9 PHP 9

方法三:使用series进行赋值

df_new["Sep 2020"]=pd.Series(np.arange(2,11))
df_new
Aug 2020 Programming Language Sep 2020
0 1 C 2
1 2 Java 3
2 3 Python 4
3 4 C++ 5
4 5 C# 6
5 6 Visual Basic 7
6 7 JavaScript 8
7 8 R 9
8 9 PHP 10

对dataframe进行部分赋值

df_new["Sep 2020"] = pd.Series([100,200],index=[1,2])
df_new              #可以看到虽然是部分赋值 但是仍然是一整列赋值,只不过没有赋值的地方自动填充为NaN
Aug 2020 Programming Language Sep 2020
0 1 C NaN
1 2 Java 100.0
2 3 Python 200.0
3 4 C++ NaN
4 5 C# NaN
5 6 Visual Basic NaN
6 7 JavaScript NaN
7 8 R NaN
8 9 PHP NaN

你可能感兴趣的:(实战网课,python)