对dataframe的索引进行排序或者操作
'monthly_max_temp.csv'
into a DataFrame called weather1
with 'Month'
as the index.weather1
in alphabetical order using the .sort_index()
method and store the result in weather2
.weather1
in reverse alphabetical order by specifying the additional keyword argument ascending=False
inside .sort_index()
..sort_values()
method to sort weather1
in increasing numerical order according to the values of the column 'Max TemperatureF'
.# Import pandas
import pandas as pd
# Read 'monthly_max_temp.csv' into a DataFrame: weather1
weather1 = pd.read_csv("monthly_max_temp.csv", index_col='Month')
# Print the head of weather1
print(weather1.head())
# Sort the index of weather1 in alphabetical order: weather2
weather2 = weather1.sort_index()
# Print the head of weather2
print(weather2.head())
# Sort the index of weather1 in reverse alphabetical order: weather3
weather3 = weather1.sort_index(ascending=False)
# Print the head of weather3
print(weather3.head())
# Sort weather1 numerically using the values of 'Max TemperatureF': weather4
weather4 = weather1.sort_values('Max TemperatureF')
# Print the head of weather4
print(weather4.head())
weather1
using the .reindex()
method with the list year
as the argument, which contains the abbreviations for each month.weather1
just as you did above, this time chaining the .ffill()
method to replace the null values with the last preceding non-null value.# Import pandas
import pandas as pd
# Reindex weather1 using the list year: weather2
weather2 = weather1.reindex(year)
# Print weather2
print(weather2)
# Reindex weather1 using the list year with forward-fill: weather3
weather3 = weather1.reindex(year).ffill()
# Print weather3
print(weather3)
common_names
by reindexing names_1981
using the Index of the DataFrame names_1881
of older names.common_names
DataFrame. This has been done for you. It should be the same as that of names_1881
.common_names
that have null counts using the .dropna()
method. These rows correspond to names that fell out of fashion between 1881 & 1981.common_names
DataFrame. This has been done for you, so hit 'Submit Answer' to see the result!# Import pandas
import pandas as pd
# Reindex names_1981 with index of names_1881: common_names
common_names = names_1981.reindex(names_1881.index)
# Print shape of common_names
print(common_names.shape)
# Drop rows with null counts: common_names
common_names = common_names.dropna()
# Print shape of new common_names
print(common_names.shape)
temps_f
by extracting the columns 'Min TemperatureF'
, 'Mean TemperatureF'
, & 'Max TemperatureF'
from weather
as a new DataFrame temps_f
. To do this, pass the relevant columns as a list to weather[]
.temps_c
from temps_f
using the formula (temps_f - 32) * 5/9
.temps_c
to replace 'F'
with 'C'
using the .str.replace('F', 'C')
method on temps_c.columns
.temps_c
. This has been done for you, so hit 'Submit Answer' to see the result!'GDP.csv'
into a DataFrame called gdp
.
parse_dates=True
and index_col='DATE'
.post2008
by slicing gdp
such that it comprises all rows from 2008 onward.post2008
. This has been done for you. This data has quarterly frequency so the indices are separated by three-month intervals.yearly
by resampling the slice post2008
by year. Remember, you need to chain .resample()
(using the alias 'A'
for annual frequency) with some kind of aggregation; you will use the aggregation method .last()
to select the last element when resampling.yearly
with .pct_change() * 100
.import pandas as pd
# Read 'GDP.csv' into a DataFrame: gdp
gdp = pd.read_csv('GDP.csv', parse_dates=True, index_col='DATE')
# Slice all the gdp data from 2008 onward: post2008
post2008 = gdp.loc['2008':]
# Print the last 8 rows of post2008
print(post2008.tail(8))
# Resample post2008 by year, keeping last(): yearly
yearly = post2008.resample('A').last()
# Print yearly
print(yearly)
# Compute percentage growth of yearly: yearly['growth']
yearly['growth'] = yearly.pct_change() * 100
# Print yearly again
print(yearly)
sp500
& exchange
from the files 'sp500.csv'
& 'exchange.csv'
respectively..
parse_dates=True
and index_col='Date'
.'Open'
& 'Close'
from the DataFrame sp500
as a new DataFrame dollars
and print the first 5 rows.pounds
by converting US dollars to British pounds. You'll use the .multiply()
method of dollars
with exchange['GBP/USD']
and axis='rows'
pounds
. This has been done for you, so hit 'Submit Answer' to see the results!.# Import pandas
import pandas as pd
# Read 'sp500.csv' into a DataFrame: sp500
sp500 = pd.read_csv('sp500.csv', parse_dates=True, index_col='Date')
# Read 'exchange.csv' into a DataFrame: exchange
exchange = pd.read_csv("exchange.csv", parse_dates=True, index_col='Date')
# Subset 'Open' & 'Close' columns from sp500: dollars
dollars = sp500[['Open', 'Close']]
# Print the head of dollars
print(dollars.head())
# Convert dollars to pounds: pounds
pounds = dollars.multiply(exchange['GBP/USD'], axis='rows')
# Print the head of pounds
print(pounds.head())
'sales-jan-2015.csv'
, 'sales-feb-2015.csv'
and 'sales-mar-2015.csv'
into the DataFrames jan
, feb
, and mar
respectively.
parse_dates=True
and index_col='Date'
.'Units'
column of jan
, feb
, and mar
to create the Series jan_units
, feb_units
, and mar_units
respectively.quarter1
by appending feb_units
to jan_units
and then appending mar_units
to the result. Use chained calls to the .append()
method to do this.quarter1
has the individual Series stacked vertically. To do this:
jan 27, 2015
to feb 2, 2015
.feb 26, 2015
to mar 7, 2015
.quarter1
. This has been done for you, so hit 'Submit Answer' to see the result!units
. This has been done for you.for
loop to iterate over [jan, feb, mar]
:
'Units'
column of each DataFrame to units
.units
into a longer Series called quarter1
using pd.concat()
.
axis='rows'
to stack the Series vertically.quarter1
has the individual Series stacked vertically by printing slices. This has been done for you, so hit 'Submit Answer' to see the result!# Initialize empty list: units
units = []
# Build the list of Series
for month in [jan, feb, mar]:
units.append(month['Units'])
# Concatenate the list: quarter1
quarter1 = pd.concat(units, axis='rows')
# Print slices from quarter1
print(quarter1.loc['jan 27, 2015':'feb 2, 2015'])
print(quarter1.loc['feb 26, 2015':'mar 7, 2015'])
'year'
column in the DataFrames names_1881
and names_1981
, with values of 1881
and 1981
respectively. Recall that assigning a scalar value to a DataFrame column broadcasts that value throughout.combined_names
by appending the rows of names_1981
underneath the rows of names_1881
. Specify the keyword argument ignore_index=True
to make a new RangeIndex of unique integers for each row.combined_names
that have the name 'Morgan'
. To do this, use the .loc[]
accessor with an appropriate filter. The relevant column of combined_names
here is 'name'
.medal_types
in the for
loop.for
loop:
file_name
using string interpolation with the loop variable medal
. This has been done for you. The expression "%s_top5.csv" % medal
evaluates as a string with the value of medal
replacing %s
in the format string.columns
. This has been done for you.file_name
into a DataFrame called medal_df
. Specify the keyword arguments header=0
, index_col='Country'
, and names=columns
to get the correct row and column Indexes.medal_df
to medals
using the list .append()
method.medals
horizontally (using axis='columns'
) to create a single DataFrame called medals
. Print it in its entirety.for medal in medal_types:
# Create the file name: file_name
file_name = "%s_top5.csv" % medal
# Create list of column names: columns
columns = ['Country', medal]
# Read file_name into a DataFrame: df
medal_df = pd.read_csv(file_name, header=0, index_col='Country', names=columns)
# Append medal_df to medals
medals.append(medal_df)
# Concatenate medals horizontally: medals
medals = pd.concat(medals, axis='columns')
# Print medals
print(medals)
for
loop:
file_name
into a DataFrame called medal_df
. Specify the index to be 'Country'
.medal_df
to medals
.medals
into a single DataFrame called medals
. Be sure to use the keyword argument keys=['bronze', 'silver', 'gold']
to create a vertically stacked DataFrame with a MultiIndex.medals
. This has been done for you, so hit 'Submit Answer' to see the result!for medal in medal_types:
file_name = "%s_top5.csv" % medal
# Read file_name into a DataFrame: medal_df
medal_df = pd.read_csv(file_name, index_col='Country')
# Append medal_df to medals
medals.append(medal_df)
# Concatenate medals: medals
medals = pd.concat(medals, keys=['bronze', 'silver', 'gold'])
# Print medals in entirety
print(medals)
medals_sorted
with the entries of medals
sorted. Use .sort_index(level=0)
to ensure the Index is sorted suitably.pd.IndexSlice
called idx
. A slicerpd.IndexSlice
is required when slicing on the innerlevel of a MultiIndex..loc[]
accessor with idx[:,'United Kingdom'], :
.february
with MultiIndexed columns by concatenating the list dataframes
.
axis=1
to stack the DataFrames horizontally and the keyword argument keys=['Hardware', 'Software', 'Service']
to construct a hierarchical Index from each DataFrame.february
using the .info()
method. This has been done for you.idx
for pd.IndexSlice
.slice_2_8
from february
(using .loc[]
& idx
) that comprises rows between Feb. 2, 2015 to Feb. 8, 2015 from columns under 'Company'
.slice_2_8
. This has been done for you, so hit 'Submit Answer' to see the sliced data!# Concatenate dataframes: february
february = pd.concat(dataframes, axis=1, keys=['Hardware', 'Software', 'Service'])
# Print february.info()
print(february.info())
# Assign pd.IndexSlice: idx
idx = pd.IndexSlice
# Create the slice: slice_2_8
slice_2_8 = february.loc['Feb 2, 2015':'Feb 8, 2015', idx[:, 'Company']]
# Print slice_2_8
print(slice_2_8)
month_list
consisting of the tuples ('january', jan)
, ('february', feb)
, and ('march', mar)
.month_dict
.for
loop:
month_data
by 'Company'
and use .sum()
to aggregate.sales
by concatenating the DataFrames stored in month_dict
.pd.IndexSlice
and print all sales by 'Mediacore'
. This has been done for you, so hit 'Submit Answer' to see the result!# Make the list of tuples: month_list
month_list = [('january', jan), ('february', feb), ('march', mar)]
# Create an empty dictionary: month_dict
month_dict = dict()
for month_name, month_data in month_list:
# Group month_data: month_dict[month_name]
month_dict[month_name] = month_data.groupby('Company').sum()
# Concatenate data in month_dict: sales
sales = pd.concat(month_dict)
# Print sales
print(sales)
# Print all sales by Mediacore
idx = pd.IndexSlice
print(sales.loc[idx[:, 'Mediacore'], :])
china_annual
by resampling the DataFrame china
with .resample('A')
(i.e., with annual frequency) and chaining two method calls:
.pct_change(10)
as an aggregation method to compute the percentage change with an offset of ten years..dropna()
to eliminate rows containing null values.us_annual
by resampling the DataFrame us
exactly as you resampled china
.china_annual
and us_annual
to construct a DataFrame called gdp
. Use join='inner'
to perform an inner join and use axis=1
to concatenate horizontally.gdp
every decade (i.e., using .resample('10A')
) and aggregating with the method .last()
. This has been done for you, so hit 'Submit Answer' to see the result!# Resample and tidy china: china_annual
china_annual = china.resample('A').pct_change(10).dropna()
# Resample and tidy us: us_annual
us_annual = us.resample('A').pct_change(10).dropna()
# Concatenate china_annual and us_annual: gdp
gdp = pd.concat([china_annual, us_annual], axis=1, join='inner')
# Resample gdp and print
print(gdp.resample('10A').last())
关于归并和merge的一些操作
pd.merge()
, merge the DataFrames revenue
and managers
on the 'city'
column of each. Store the result as merge_by_city
.merge_by_city
. This has been done for you.revenue
and managers
on the 'branch_id'
column of each. Store the result as merge_by_id
.merge_by_id
. This has been done for you, so hit 'Submit Answer' to see the result!# Merge revenue with managers on 'city': merge_by_city
merge_by_city = pd.merge(revenue, managers, on='city')
# Print merge_by_city
print(merge_by_city)
# Merge revenue with managers on 'branch_id': merge_by_id
merge_by_id = pd.merge(revenue, managers, on='branch_id')
# Print merge_by_id
print(merge_by_id)
revenue
and managers
into a single DataFrame called combined
using the 'city'
and 'branch'
columns from the appropriate DataFrames.
pd.merge()
, you will have to specify the parameters left_on
and right_on
appropriately.combined
.# Merge revenue & managers on 'city' & 'branch': combined
combined = pd.merge(revenue, managers, left_on='city', right_on='branch')
# Print combined
print(combined)
'state'
in the DataFrame revenue
, consisting of the list ['TX','CO','IL','CA']
.'state'
in the DataFrame managers
, consisting of the list ['TX','CO','CA','MO']
.revenue
and managers
using three columns :'branch_id'
, 'city'
, and 'state'
. Pass them in as a list to the on
paramater of pd.merge()
.pd.merge()
with revenue
and sales
to yield a new DataFrame revenue_and_sales
.
how='right'
and on=['city', 'state']
.revenue_and_sales
. This has been done for you.sales
and managers
to yield a new DataFrame sales_and_managers
.
how='left'
, left_on=['city', 'state']
, and right_on=['branch', 'state]
.sales_and_managers
. This has been done for you, so hit 'Submit Answer' to see the result!austin
and houston
using pd.merge_ordered()
. Store the result as tx_weather
.tx_weather
. You should notice that the rows are sorted by the date but it is not possible to tell which observation came from which city.austin
and houston
.
on='date'
and suffixes=['_aus','_hus']
so that the rows can be distinguished. Store the result as tx_weather_suff
.tx_weather_suff
to examine its contents. This has been done for you.austin
and houston
.
on
and suffixes
parameters, specify the keyword argument fill_method='ffill'
to use forward-filling to replace NaN
entries with the most recent non-null entry, and hit 'Submit Answer' to examine the contents of the merged DataFrames!auto
and oil
using pd.merge_asof()
with left_on='yr'
and right_on='Date'
. Store the result as merged
.merged
. This has been done for you.merged
using 'A'
(annual frequency), and on='Date'
. Select [['mpg','Price']]
and aggregate the mean. Store the result as yearly
.yearly
and yearly.corr()
, which shows the Pearson correlation between the resampled 'Price'
and 'mpg'
.# Merge auto and oil: merged
merged = pd.merge_asof(auto, oil, left_on='yr', right_on='Date')
# Print the tail of merged
print(merged.tail())
# Resample merged: yearly
yearly = merged.resample('A', on='Date')[['mpg', 'Price']].mean()
# Print yearly
print(yearly)
# print yearly.corr()
print(yearly.corr())
for
loop:
file_path
into a DataFrame. Assign the result to the year
key of medals_dict
.'Athlete'
, 'NOC'
, and 'Medal'
from medals_dict[year]
.'Edition'
in the DataFrame medals_dict[year]
whose entries are all year
.medals_dict
into a DataFame called medals
. Specify the keyword argument ignore_index=True
to prevent repeated integer indices.medals
. This has been done for you, so hit 'Submit Answer' to see the result!editions
to be 'Edition'
(using the method .set_index()
). Save the result as totals
.'Grand Total'
column from totals
and assign the result back to totals
.medal_counts
by totals
along each row. You will have to use the .divide()
method with the option axis='rows'
. Assign the result to fractions
.fractions
. This has been done for you, so hit 'Submit Answer' to see the results!mean_fractions
by chaining the methods .expanding().mean()
to fractions
.mean_fractions
down each column by applying .pct_change()
and multiplying by 100
. Assign the result to fractions_change
.fractions_change
using the .reset_index()
method. This will make 'Edition'
an ordinary column.fractions
. This has been done for you, so hit 'Submit Answer' to see the results!# Apply the expanding mean: mean_fractions
mean_fractions = fractions.expanding().mean()
# Compute the percentage change: fractions_change
fractions_change = mean_fractions.pct_change() * 100
# Reset the index of fractions_change: fractions_change
fractions_change = fractions_change.reset_index('Edition')
# Print first & last 5 rows of fractions_change
print(fractions_change.head())
print(fractions_change.tail())