数据科学【三】:dataframe基本操作(二)

数据科学【三】:dataframe基本操作(二)

google api

使用google book api (https://developers.google.com/books/docs/overview)获得数据。也就是拼接url查询信息。可能需要。

示例:查询相应主体的书籍

注意:json.loads()返回的是json对象

import requests
import json

"""
    Google Books Api
    See: https://developers.google.com/books/
"""

def get(topic=""):
    BASEURL = 'https://www.googleapis.com/books/v1/volumes'
    headers = {'Content-Type': 'application/json'}

    response = requests.get(BASEURL + "?q=" + topic, headers=headers)

    if response.status_code == 200:
        return json.loads(response.content.decode('utf-8'))

    return response

python = get("Python")
data_science = get("Data Science")
data_analytics = get("Data Analysis")
machine_learning = get("Machine Learning")
deep_learning = get("Deep Learning")

json转dataframe

使用json_normalize函数。


def json2df(book_json):
    return pd.json_normalize(book_json['items'])

python_df = json2df(python)
data_science_df = json2df(data_science)
data_analytics_df = json2df(data_analytics)
machine_learning_df = json2df(machine_learning)
deep_learning_df = json2df(deep_learning)

python_df.to_csv("python.csv", index=False)
data_science_df.to_csv("data_science.csv", index=False)
data_analytics_df.to_csv("data_analytics.csv", index=False)
machine_learning_df.to_csv("machine_learning.csv", index=False)
deep_learning_df.to_csv("deep_learning.csv", index=False)

dataframe重命名列

使用dataframe对象的rename函数。


python_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
data_science_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
data_analytics_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
machine_learning_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)
deep_learning_df.rename(columns={'volumeInfo.title':'Title', 'volumeInfo.authors':'Authors'}, inplace=True)


dataframe添加列

使用concat函数。
示例:为每个主题的dataframe添加一个“主题”列。

python_df['Topic'] = pd.Series(['Python']*python_df.shape[0])
data_science_df['Topic'] = pd.Series(['Data Science']*data_science_df.shape[0])
data_analytics_df['Topic'] = pd.Series(['Data Analysis']*data_analytics_df.shape[0])
machine_learning_df['Topic'] = pd.Series(['Machine Learning']*machine_learning_df.shape[0])
deep_learning_df['Topic'] = pd.Series(['Deep Learning']*deep_learning_df.shape[0])

all_df = pd.concat([python_df, data_science_df, data_analytics_df, machine_learning_df, deep_learning_df])

dataframe转csv

使用to_csv函数。

all_df.to_csv("all_topics.csv", index=False)

根据值筛选dataframe行

使用str加string函数

示例:获得所有title 包含data的行

# your code here
contain_data_df = all_df[all_df['Title'].str.lower().str.contains("data")]
contain_data_df.head()
kind id etag selfLink Title volumeInfo.subtitle Authors volumeInfo.publishedDate volumeInfo.description volumeInfo.industryIdentifiers ... volumeInfo.categories saleInfo.listPrice.amount saleInfo.listPrice.currencyCode saleInfo.retailPrice.amount saleInfo.retailPrice.currencyCode saleInfo.buyLink saleInfo.offers accessInfo.epub.acsTokenLink Topic accessInfo.pdf.acsTokenLink
7 books#volume 6omNDQAAQBAJ 8i8xnUEyo14 https://www.googleapis.com/books/v1/volumes/6o... Python Data Science Handbook Essential Tools for Working with Data [Jake VanderPlas] 2016-11-21 For many researchers, Python is a first-class ... [{'type': 'ISBN_13', 'identifier': '9781491912... ... [Computers] 59.99 USD 59.99 USD https://play.google.com/store/books/details?id... [{'finskyOfferType': 1, 'listPrice': {'amountI... NaN Python NaN
0 books#volume vfi3DQAAQBAJ Q0KYt+x/bgk https://www.googleapis.com/books/v1/volumes/vf... R for Data Science Import, Tidy, Transform, Visualize, and Model ... [Hadley Wickham, Garrett Grolemund] 2016-12-12 "This book introduces you to R, RStudio, and t... [{'type': 'ISBN_13', 'identifier': '9781491910... ... [Computers] NaN NaN NaN NaN NaN NaN NaN Data Science NaN
1 books#volume TFpVDwAAQBAJ MqNMwqcnbUk https://www.googleapis.com/books/v1/volumes/TF... Data Science NaN [John D. Kelleher, Brendan Tierney] 2018-04-13 A concise introduction to the emerging field o... [{'type': 'ISBN_13', 'identifier': '9780262535... ... [Computers] NaN NaN NaN NaN NaN NaN NaN Data Science NaN
2 books#volume 6omNDQAAQBAJ 86otz4UmKRE https://www.googleapis.com/books/v1/volumes/6o... Python Data Science Handbook Essential Tools for Working with Data [Jake VanderPlas] 2016-11-21 For many researchers, Python is a first-class ... [{'type': 'ISBN_13', 'identifier': '9781491912... ... [Computers] 59.99 USD 59.99 USD https://play.google.com/store/books/details?id... [{'finskyOfferType': 1, 'listPrice': {'amountI... NaN Data Science NaN
3 books#volume xb29DwAAQBAJ QKJ7stkk3Ac https://www.googleapis.com/books/v1/volumes/xb... Introduction to Data Science Data Analysis and Prediction Algorithms with R [Rafael A. Irizarry] 2019-11-20 Introduction to Data Science: Data Analysis an... [{'type': 'ISBN_13', 'identifier': '9781000708... ... [Mathematics] NaN NaN NaN NaN NaN NaN http://books.google.com/books/download/Introdu... Data Science http://books.google.com/books/download/Introdu...

5 rows × 52 columns

map加lambda表达式

示例:筛选作者姓或名首字母为E的所有行


author_e_df = all_df[all_df['Authors'].map(lambda row: any(map(lambda x: x.split()[0][0]=='E' or x.split()[1][0]=='E', row)))]
author_e_df.head()
kind id etag selfLink Title volumeInfo.subtitle Authors volumeInfo.publishedDate volumeInfo.description volumeInfo.industryIdentifiers ... volumeInfo.categories saleInfo.listPrice.amount saleInfo.listPrice.currencyCode saleInfo.retailPrice.amount saleInfo.retailPrice.currencyCode saleInfo.buyLink saleInfo.offers accessInfo.epub.acsTokenLink Topic accessInfo.pdf.acsTokenLink
7 books#volume xDszEAAAQBAJ ju27MhIAQrM https://www.googleapis.com/books/v1/volumes/xD... Build a Career in Data Science NaN [Emily Robinson, Jacqueline Nolis] 2020-03-06 Summary You are going to need more than techni... [{'type': 'ISBN_13', 'identifier': '9781638350... ... [Computers] 28.99 USD 28.99 USD https://play.google.com/store/books/details?id... [{'finskyOfferType': 1, 'listPrice': {'amountI... http://books.google.com/books/download/Build_a... Data Science NaN
7 books#volume fBPEAgAAQBAJ zEKoyMUn6e8 https://www.googleapis.com/books/v1/volumes/fB... Beginning Statistics with Data Analysis NaN [Frederick Mosteller, Stephen E. Fienberg, Rob... 2013-11-20 This introduction to the world of statistics c... [{'type': 'ISBN_13', 'identifier': '9780486782... ... [Mathematics] 24.95 USD 14.72 USD https://play.google.com/store/books/details?id... [{'finskyOfferType': 1, 'listPrice': {'amountI... http://books.google.com/books/download/Beginni... Data Analysis http://books.google.com/books/download/Beginni...
3 books#volume NP5bBAAAQBAJ evXnYOuFPGY https://www.googleapis.com/books/v1/volumes/NP... Introduction to Machine Learning NaN [Ethem Alpaydin] 2014-08-29 The goal of machine learning is to program com... [{'type': 'ISBN_13', 'identifier': '9780262028... ... [Computers] NaN NaN NaN NaN NaN NaN NaN Machine Learning NaN
4 books#volume AGQ4DQAAQBAJ 4KxWueVGUyI https://www.googleapis.com/books/v1/volumes/AG... Machine Learning The New AI [Ethem Alpaydin] 2016-10-07 A concise overview of machine learning—compute... [{'type': 'ISBN_13', 'identifier': '9780262529... ... [Computers] NaN NaN NaN NaN NaN NaN NaN Machine Learning NaN
5 books#volume LrT4DwAAQBAJ RCyLWjQWJQQ https://www.googleapis.com/books/v1/volumes/Lr... Introduction to Deep Learning NaN [Eugene Charniak] 2019-01-29 A project-based guide to the basics of deep le... [{'type': 'ISBN_13', 'identifier': '9780262039... ... [Computers] NaN NaN NaN NaN NaN NaN NaN Deep Learning NaN

5 rows × 52 columns

你可能感兴趣的:(#,数据科学,python,数据挖掘,机器学习)