hongyesuifeng

class Manipulating DataFrames with pandas

更多关于 dataframe的操作

Slice the row labels 'Perry' to 'Potter' and assign the output to p_counties.
Print the p_counties DataFrame. This has been done for you.
Slice the row labels 'Potter' to 'Perry' in reverse order. To do this for hypothetical row labels 'a' and 'b', you could use a stepsize of -1 like so: df.loc['b':'a':-1].
Print the p_counties_rev DataFrame. This has also been done for you, so hit 'Submit Answer' to see the result of your slicing!

# Slice the row labels 'Perry' to 'Potter': p_counties
p_counties = election.loc['Perry':'Potter']

# Print the p_counties DataFrame
print(p_counties)

# Slice the row labels 'Potter' to 'Perry' in reverse order: p_counties_rev
p_counties_rev = election.loc["Potter":'Perry':-1]

# Print the p_counties_rev DataFrame
print(p_counties_rev)

pandas dataframe 的列索引, 数据的索引

Slice the columns from the starting column to 'Obama' and assign the result to left_columns
Slice the columns from 'Obama' to 'winner' and assign the result to middle_columns
Slice the columns from 'Romney' to the end and assign the result to right_columns
The code to print the first 5 rows of left_columns, middle_columns, and right_columns has been written, so hit 'Submit Answer' to see the results!

# Slice the columns from the starting column to 'Obama': left_columns
left_columns = election.loc[:,:'Obama']

# Print the output of left_columns.head()
print(left_columns.head())

# Slice the columns from 'Obama' to 'winner': middle_columns
middle_columns = election.loc[:,'Obama':'winner']

# Print the output of middle_columns.head()
print(middle_columns.head())

# Slice the columns from 'Romney' to the end: 'right_columns'
right_columns = election.loc[:,'Romney':]

# Print the output of right_columns.head()
print(right_columns.head())

Create the list of row labels ['Philadelphia', 'Centre', 'Fulton'] and assign it to rows.
Create the list of column labels ['winner', 'Obama', 'Romney'] and assign it to cols.
Create a new DataFrame by selecting with rows and cols in .loc[] and assign it to three_counties.
Print the three_counties DataFrame. This has been done for you, so hit 'Submit Answer` to see your new DataFrame.

# Create the list of row labels: rows
rows = ['Philadelphia', 'Centre', 'Fulton']

# Create the list of column labels: cols
cols = ['winner', 'Obama', 'Romney']

# Create the new DataFrame: three_counties
three_counties = election.loc[rows, cols]

# Print the three_counties DataFrame
print(three_counties)

Import numpy as np.
Create a boolean array for the condition where the 'margin'column is less than 1 and assign it to too_close.
Convert the entries in the 'winner' column where the result was too close to call to np.nan.
Print the output of election.info(). This has been done for you, so hit 'Submit Answer' to see the results.

# Import numpy
import numpy as np

# Create the boolean array: too_close
too_close = election.margin < 1

# Assign np.nan to the 'winner' column where the results were too close to call
election.winner[too_close] = np.nan

# Print the output of election.info()
print(election.info())

Select the 'age' and 'cabin' columns of titanic and create a new DataFrame df.
Print the shape of df. This has been done for you.
Drop rows in df with how='any' and print the shape.
Drop rows in df with how='all' and print the shape.
Drop columns from the titanic DataFrame that have more than 1000 missing values by specifying the thresh and axiskeyword arguments. Print the output of .info() from this.

# Select the 'age' and 'cabin' columns: df
df = titanic[['age','cabin']]

# Print the shape of df
print(df.shape)

# Drop rows in df with how='any' and print the shape
print(df.dropna(how='any').shape)

# Drop rows in df with how='all' and print the shape
print(df.dropna(how='all').shape)

# Call .dropna() with thresh=1000 and axis='columns' and print the output of .info() from titanic
print(titanic.dropna(thresh=1000, axis='columns').info())

Apply the to_celsius function over the ['Mean TemperatureF','Mean Dew PointF'] columns of the weatherDataFrame.
Reassign the columns of df_celsius to ['Mean TemperatureC','Mean Dew PointC'].
Hit 'Submit Answer' to see the new DataFrame with the converted units.

# Write a function to convert degrees Fahrenheit to degrees Celsius: to_celsius
def to_celsius(F):
return 5/9*(F - 32)

# Apply the function over 'Mean TemperatureF' and 'Mean Dew PointF': df_celsius
df_celsius = weather[['Mean TemperatureF','Mean Dew PointF']].apply(to_celsius)

# Reassign the columns df_celsius
df_celsius.columns = ['Mean TemperatureC', 'Mean Dew PointC']

# Print the output of df_celsius.head()
print(df_celsius.head())

Create a dictionary with the key:value pairs 'Obama':'blue'and 'Romney':'red'.
Use the .map() method on the 'winner' column using the red_vs_blue dictionary you created.
Print the output of election.head(). This has been done for you, so hit 'Submit Answer' to see the new column!

# Create the dictionary: red_vs_blue
red_vs_blue = dict([('Obama','blue') ,( 'Romney','red')])

# Use the dictionary to map the 'winner' column to the new column: election['color']
election['color'] = election['winner'].map(red_vs_blue)

# Print the output of election.head()
print(election.head())

Import zscore from scipy.stats.
Call zscore with election['turnout'] as input .
Print the output of type(turnout_zscore). This has been done for you.
Assign turnout_zscore to a new column in election as 'turnout_zscore'.
Print the output of election.head(). This has been done for you, so hit 'Submit Answer' to view the result.

# Import zscore from scipy.stats
from scipy.stats import zscore

# Call zscore with election['turnout'] as input: turnout_zscore
turnout_zscore = zscore(election['turnout'])

# Print the type of turnout_zscore
print(type(turnout_zscore))

# Assign turnout_zscore to a new column: election['turnout_zscore']
election['turnout_zscore'] = turnout_zscore

# Print the output of election.head()
print(election.head())

index的一些操作：

Create a list new_idx with the same elements as in sales.index, but with all characters capitalized.
Assign new_idx to sales.index.
Print the sales dataframe. This has been done for you, so hit 'Submit Answer' and to see how the index changed.

# Create the list of new indexes: new_idx
new_idx = [new_idx.upper() for new_idx in sales.index]

# Assign new_idx to sales.index
sales.index = new_idx

# Print the sales DataFrame
print(sales)

Assign the string 'MONTHS' to sales.index.nameto create a name for the index.
Print the sales dataframe to see the index name you just created.
Now assign the string 'PRODUCTS' to sales.columns.name to give a name to the set of columns.
Print the sales dataframe again to see the columns name you just created.

# Assign the string 'MONTHS' to sales.index.name
sales.index.name = 'MONTHS'

# Print the sales DataFrame
print(sales)

# Assign the string 'PRODUCTS' to sales.columns.name
sales.columns.name = 'PRODUCTS'

# Print the sales dataframe again
print(sales)

Create a MultiIndex by setting the index to be the columns ['state', 'month'].
Sort the MultiIndex using the .sort_index() method.
Print the sales DataFrame. This has been done for you, so hit 'Submit Answer' to verify that indeed you have an index with the fields state and month!

# Set the index to be the columns ['state', 'month']: sales
sales = sales.set_index(['state', 'month'])

# Sort the MultiIndex: sales
sales = sales.sort_index()

# Print the sales DataFrame
print(sales)

Set the index of sales to be the column 'state'.
Print the sales DataFrame to verify that indeed you have an index with state values.
Access the data from 'NY' and print it to verify that you obtain two rows.

# Set the index to the column 'state': sales
sales = sales.set_index(['state'])

# Print the sales DataFrame
print(sales)

# Access the data from 'NY'
print(sales.loc['NY'])

stocks.loc[(slice(None), slice('2016-10-03', '2016-10-04')), :]

Look up data for the New York column ('NY') in month 1.
Look up data for the California and Texas columns ('CA', 'TX') in month 2.
Look up data for all states in month 2. Use (slice(None), 2) to extract all rows in month 2.

# Look up data for NY in month 1: NY_month1
NY_month1 = sales.loc[("NY", 1), :]


# Look up data for CA and TX in month 2: CA_TX_month2
CA_TX_month2 = sales.loc[(['CA','TX'], 2), :]


# Look up data for all states in month 2: all_month2
all_month2 = sales.loc[(slice(None), 2) ,:]

数据透视表 pivot

Pivot the users DataFrame with the rows indexed by 'weekday', the columns indexed by 'city', and the values populated with 'visitors'.
Print the pivoted DataFrame. This has been done for you, so hit 'Submit Answer' to view the result.

# Pivot the users DataFrame: visitors_pivot visitors_pivot = users.pivot(index='weekday', columns='city', values='visitors') # Print the pivoted DataFrame print(visitors_pivot)

Pivot the users DataFrame with the 'signups' indexed by 'weekday' in the rows and 'city' in the columns.
Print the new DataFrame. This has been done for you.
Pivot the users DataFrame with both 'signups' and 'visitors' pivoted - that is, all the variables. This will happen automatically if you do not specify an argument for the valuesparameter of .pivot().
Print the pivoted DataFrame. This has been done for you, so hit 'Submit Answer' to see the result.

# Pivot users with signups indexed by weekday and city: signups_pivot signups_pivot = users.pivot(index='weekday', columns='city',values='signups') # Print signups_pivot print(signups_pivot) # Pivot users pivoted by both signups and visitors: pivot pivot = users.pivot(index='weekday', columns='city') # Print the pivoted DataFrame print(pivot)

Define a DataFrame byweekday with the 'weekday' level of users unstacked.
Print the byweekday DataFrame to see the new data layout. This has been done for you.
Stack byweekday by 'weekday' and print it to check if you get the same layout as the original users DataFrame.

# Unstack users by 'weekday': byweekday byweekday = users.unstack('weekday') # Print the byweekday DataFrame print(byweekday) # Stack byweekday by 'weekday' and print it print(byweekday.stack(level='weekday'))

Define a DataFrame newusers with the 'city' level stacked back into the index of bycity.
Swap the levels of the index of newusers.
Print newusers and verify that the index is not sorted. This has been done for you.
Sort the index of newusers.
Print newusers and verify that the index is now sorted. This has been done for you.
Assert that newusers equals users. This has been done for you, so hit 'Submit Answer' to see the result.

# Stack 'city' back into the index of bycity: newusers newusers = bycity.stack(level='city') # Swap the levels of the index of newusers: newusers newusers = newusers.swaplevel(0,1) # Print newusers and verify that the index is not sorted print(newusers) # Sort the index of newusers: newusers newusers = newusers.sort_index() # Print newusers and verify that the index is now sorted print(newusers) # Verify that the new DataFrame is equal to the original print(newusers.equals(users))

Reset the index of visitors_by_city_weekday with .reset_index().
Print visitors_by_city_weekday and verify that you have just a range index, 0, 1, 2, 3. This has been done for you.
Melt visitors_by_city_weekday to move the city names from the column labels to values in a single column called city.
Print visitors to check that the city values are in a single column now and that the dataframe is longer and skinnier.

# Reset the index: visitors_by_city_weekday visitors_by_city_weekday = visitors_by_city_weekday.reset_index() # Print visitors_by_city_weekday print(visitors_by_city_weekday) # Melt visitors_by_city_weekday: visitors visitors = pd.melt(visitors_by_city_weekday, id_vars=['weekday'], value_name='visitors') # Print visitors print(visitors)

Define a DataFrame skinny where you melt the 'visitors'and 'signups' columns of users into a single column.
Print skinny to verify the results. Note the value column that had the cell values in users.

# Melt users: skinny skinny = pd.melt(users, id_vars=['weekday', 'city']) # Print skinny print(skinny)

Set the index of users to ['city', 'weekday'].
Print the DataFrame users_idx to see the new index.
Obtain the key-value pairs corresponding to visitors and signups by melting users_idx with the keyword argument col_level=0.

# Set the new index: users_idx users_idx = users.set_index(['city', 'weekday']) # Print the users_idx DataFrame print(users_idx) # Obtain the key-value pairs: kv_pairs kv_pairs = pd.melt(users_idx, col_level=0) # Print the key-value pairs print(kv_pairs)

pivot table 中类似group by 的操作

Define a DataFrame count_by_weekday1 that shows the count of each column with the parameter aggfunc='count'. The index here is 'weekday'.
Print count_by_weekday1. This has been done for you.
Replace aggfunc='count' with aggfunc=len and verify you obtain the same result.

# Use a pivot table to display the count of each column: count_by_weekday1 count_by_weekday1 = users.pivot_table(index='weekday', aggfunc='count') # Print count_by_weekday print(count_by_weekday1) # Replace 'aggfunc='count'' with 'aggfunc=len': count_by_weekday2 count_by_weekday2 = users.pivot_table(index='weekday', aggfunc=len) # Verify that the same result is obtained print('==========================================') print(count_by_weekday1.equals(count_by_weekday2))

Define a DataFrame signups_and_visitors that shows the breakdown of signups and visitors by day, as well as the totals.
- You will need to use aggfunc=sum to do this.
Print signups_and_visitors. This has been done for you.
Now pass the additional argument margins=True to the .pivot_table() method to obtain the totals.
Print signups_and_visitors_total. This has been done for you, so hit 'Submit Answer' to see the result.

# Create the DataFrame with the appropriate pivot table: signups_and_visitors signups_and_visitors = users.pivot_table(index='weekday', aggfunc=sum) # Print signups_and_visitors print(signups_and_visitors) # Add in the margins: signups_and_visitors_total signups_and_visitors_total = users.pivot_table(index='weekday', aggfunc=sum, margins=True) # Print signups_and_visitors_total print(signups_and_visitors_total)

Group by the 'pclass' column and save the result as by_class.
Aggregate the 'survived' column of by_classusing .count(). Save the result as count_by_class.
Print count_by_class. This has been done for you.
Group titanic by the 'embarked' and 'pclass' columns. Save the result as by_mult.
Aggregate the 'survived' column of by_mult using .count(). Save the result as count_mult.
Print count_mult. This has been done for you, so hit 'Submit Answer' to view the result.

# Group titanic by 'pclass' by_class = titanic.groupby(['pclass']) # Aggregate 'survived' column of by_class by count count_by_class = by_class['survived'].count() # Print count_by_class print(count_by_class) # Group titanic by 'embarked' and 'pclass' by_mult = titanic.groupby(['embarked', 'pclass']) # Aggregate 'survived' column of by_mult by count count_mult = by_mult['survived'].count() # Print count_mult print(count_mult)

Read life_fname into a DataFrame called life and set the index to 'Country'.
Read regions_fname into a DataFrame called regions and set the index to 'Country'.
Group life by the region column of regionsand store the result in life_by_region.
Print the mean over the 2010 column of life_by_region.

# Read life_fname into a DataFrame: life life = pd.read_csv(life_fname, index_col='Country') # Read regions_fname into a DataFrame: regions regions = pd.read_csv(regions_fname, index_col='Country') # Group life by regions['region']: life_by_region life_by_region = life.groupby(regions['region']) # Print the mean over the '2010' column of life_by_region print(life_by_region['2010'].mean())

Group titanic by 'pclass' and save the result as by_class.
Select the 'age' and 'fare' columns from by_class and save the result as by_class_sub.
Aggregate by_class_sub using 'max' and 'median'. You'll have to pass 'max' and 'median' in the form of a list to .agg().
Use .loc[] to print all of the rows and the column specification ('age','max'). This has been done for you.
Use .loc[] to print all of the rows and the column specification ('fare','median').

# Group titanic by 'pclass': by_class by_class = titanic.groupby(['pclass']) # Select 'age' and 'fare' by_class_sub = by_class[['age','fare']] # Aggregate by_class_sub by 'max' and 'median': aggregated aggregated = by_class_sub.agg(['max', 'median']) # Print the maximum age in each class print(aggregated.loc[:, ('age','max')]) # Print the median fare in each class print(aggregated.loc[:, ('fare', 'median')])

Read 'gapminder.csv' into a DataFrame with index_col=['Year','region','Country']. Sort the index.
Group gapminder with a level of ['Year','region'] using its level parameter. Save the result as by_year_region.
Define the function spread which returns the maximum and minimum of an input series. This has been done for you.
Create a dictionary with 'population':'sum', 'child_mortality':'mean' and 'gdp':spreadas aggregator. This has been done for you.
Use the aggregator dictionary to aggregate by_year_region. Save the result as aggregated.
Print the last 6 entries of aggregated. This has been done for you, so hit 'Submit Answer' to view the result.

# Read the CSV file into a DataFrame and sort the index: gapminder gapminder = pd.read_csv('gapminder.csv', index_col=['Year', 'region', 'Country']).sort_index() # Group gapminder by 'Year' and 'region': by_year_region by_year_region = gapminder.groupby(level=['Year', 'region']) # Define the function to compute spread: spread def spread(series): return series.max() - series.min() # Create the dictionary: aggregator aggregator = {'population':'sum', 'child_mortality':'mean', 'gdp':spread} # Aggregate by_year_region using the dictionary: aggregated aggregated = by_year_region.agg(aggregator) # Print the last 6 entries of aggregated print(aggregated.tail(6))

Read 'sales.csv' into a DataFrame with index_col='Date' and parse_dates=True.
Create a groupby object with sales.index.strftime('%a') as input and assign it to by_day.
Aggregate the 'Units' column of by_day with the .sum() method. Save the result as units_sum.
Print units_sum. This has been done for you, so hit 'Submit Answer' to see the result.

# Read file: sales sales = pd.read_csv('sales.csv', index_col='Date', parse_dates=True) # Create a groupby object: by_day by_day = sales.groupby(sales.index.strftime('%a')) # Create sum: units_sum units_sum = by_day['Units'].sum() # Print units_sum print(units_sum)

transform 函数以及找出异常点

Import zscore from scipy.stats.
Group gapminder_2010 by 'region' and transform the ['life','fertility'] columns by zscore.
Construct a boolean Series of the bitwise or between standardized['life'] < -3 and standardized['fertility'] > 3.
Filter gapminder_2010 using .loc[] and the outliers Boolean Series. Save the result as gm_outliers.
Print gm_outliers. This has been done for you, so hit 'Submit Answer' to see the results.

# Import zscore from scipy.stats import zscore # Group gapminder_2010: standardized standardized = gapminder_2010.groupby(['region'])[['life', 'fertility']].transform(zscore) # Construct a Boolean Series to identify outliers: outliers outliers = (standardized['life'] < -3) | (standardized['fertility'] > 3) # Filter gapminder_2010 by the outliers: gm_outliers gm_outliers = gapminder_2010.loc[outliers] # Print gm_outliers print(gm_outliers)

Group titanic by 'sex' and 'pclass'. Save the result as by_sex_class.
Write a function called impute_median() that fills missing values with the median of a series. This has been done for you.
Call .transform() with impute_median on the 'age' column of by_sex_class.
Print the output of titanic.tail(10). This has been done for you - hit 'Submit Answer' to see how the missing values have now been imputed.

# Create a groupby object: by_sex_class by_sex_class = titanic.groupby(['sex', 'pclass']) # Write a function that imputes median def impute_median(series): return series.fillna(series.median()) # Impute age and assign to titanic['age'] titanic.age = by_sex_class['age'].transform(impute_median) # Print the output of titanic.tail(10) print(titanic.tail(10))

Group gapminder_2010 by 'region'. Save the result as regional.
Apply the provided disparity function on regional, and save the result as reg_disp.
Use .loc[] to select ['United States','United Kingdom','China'] from reg_disp and print the results.

# Group gapminder_2010 by 'region': regional regional = gapminder_2010.groupby(['region']) # Apply the disparity function on regional: reg_disp reg_disp = regional.apply(disparity) # Print the disparity of 'United States', 'United Kingdom', and 'China' print(reg_disp.loc[['United States','United Kingdom','China']])

Group sales by 'Company'. Save the result as by_company.
Compute and print the sum of the 'Units' column of by_company.
Call .filter() on by_company with lambda g:g['Units'].sum() > 35 as input and print the result.

# Read the CSV file into a DataFrame: sales sales = pd.read_csv('sales.csv', index_col='Date', parse_dates=True) # Group sales by 'Company': by_company by_company = sales.groupby(['Company']) # Compute the sum of the 'Units' of by_company: by_com_sum by_com_sum = by_company['Units'].sum() print(by_com_sum) # Filter 'Units' where the sum is > 35: by_com_filt by_com_filt = by_company.filter(lambda g: g['Units'].sum() > 35) print(by_com_filt)

Create a Boolean Series of titanic['age'] < 10 and call .map with {True:'under 10', False:'over 10'}.
Group titanic by the under10 Series and then compute and print the mean of the 'survived' column.
Group titanic by the under10 Series as well as the 'pclass' column and then compute and print the mean of the 'survived' column.

# Create the Boolean Series: under10 under10 = (titanic['age'] < 10).map({True:'under 10', False:'over 10'}) # Group by under10 and compute the survival rate survived_mean_1 = titanic.groupby(under10)['survived'].mean() print(survived_mean_1) # Group by under10 and pclass and compute the survival rate survived_mean_2 = titanic.groupby([under10, 'pclass'])['survived'].mean() print(survived_mean_2)

一个探索并且操作数据集的实例

Extract the 'NOC' column from the DataFrame medalsand assign the result to country_names. Notice that this Series has repeated entries for every medal (of any type) a country has won in any Edition of the Olympics.
Create a Series medal_counts by applying .value_counts() to the Series country_names.
Print the top 15 countries ranked by total number of medals won. This has been done for you, so hit 'Submit Answer' to see the result.

# Select the 'NOC' column of medals: country_names country_names = medals['NOC'] # Count the number of medals won by each country: medal_counts medal_counts = country_names.value_counts() # Print top 15 countries ranked by medals print(medal_counts.head(15))

Construct a pivot table counted from the DataFrame medals aggregating by count. Use 'NOC' as the index, 'Athlete' for the values, and 'Medal' for the columns.
Modify the DataFrame counted by adding a column counted['totals']. The new column 'totals'should contain the result of taking the sum along the columns (i.e., use .sum(axis='columns')).
Overwrite the DataFrame counted by sorting it with the .sort_values() method. Specify the keyword argument ascending=False.
Print the first 15 rows of counted using .head(15). This has been done for you, so hit 'Submit Answer' to see the result.

# Construct the pivot table: counted counted = medals.pivot_table(index='NOC', values='Athlete', columns='Medal', aggfunc='count') # Create the new column: counted['totals'] counted['totals'] = counted.sum(axis='columns') # Sort counted by the 'totals' column counted = counted.sort_values(['totals'], ascending=False) # Print the top 15 rows of counted print(counted.head(15))

Group medals by 'NOC'.
Compute the number of distinct sports in which each country won medals. To do this, select the 'Sport' column from country_grouped and apply .nunique().
Sort Nsports in descending order with .sort_values() and ascending=False.
Print the first 15 rows of Nsports. This has been done for you, so hit 'Submit Answer' to see the result.

# Group medals by 'NOC': country_grouped country_grouped = medals.groupby('NOC') # Compute the number of distinct sports in which each country won medals: Nsports Nsports = country_grouped['Sport'].nunique() # Sort the values of Nsports in descending order Nsports = Nsports.sort_values(ascending=False) # Print the top 15 rows of Nsports print(Nsports.head(15))

Create a Boolean Series called during_cold_war by extracting all rows from medals for which the 'Edition' is >= 1952 and <= 1988.
Create a Boolean Series called is_usa_urs by extracting rows from medals for which 'NOC' is either 'USA'or 'URS'.
Filter the medals DataFrame using during_cold_war and is_usa_urs to create a new DataFrame called cold_war_medals.
Group cold_war_medals by 'NOC'.
Create a Series Nsports from country_groupedusing indexing & chained methods:
- Extract the column 'Sport'.
- Use .nunique() to get the number of unique elements in each group;
- Apply .sort_values(ascending=False) to rearrange the Series.
Print the final Series Nsports. This has been done for you, so hit 'Submit Answer' to see the result!

# Extract all rows for which the 'Edition' is between 1952 & 1988: during_cold_war during_cold_war = (medals.Edition >= 1952) & (medals.Edition <= 1988) # Extract rows for which 'NOC' is either 'USA' or 'URS': is_usa_urs is_usa_urs = medals.NOC.isin(['USA', 'URS']) # Use during_cold_war and is_usa_urs to create the DataFrame: cold_war_medals cold_war_medals = medals.loc[during_cold_war & is_usa_urs] # Group cold_war_medals by 'NOC' country_grouped = cold_war_medals.groupby(['NOC']) # Create Nsports Nsports = country_grouped['Sport'].nunique().sort_values(ascending=False) # Print Nsports print(Nsports)

Construct medals_won_by_country using medals.pivot_table().
- The index should the years ('Edition') & the columns should be country ('NOC')
- the values should be 'Athlete' (which captures every medal regardless of kind) & the aggregation method should be 'count' (which captures the total number of medals won).
Create cold_war_usa_usr_medals by slicing the pivot table medals_won_by_country. Your slice should contain the editions from years 1952:1988 and only the columns 'USA' & 'URS' from the pivot table.
Create the Series most_medals by applying the .idxmax() method to cold_war_usa_usr_medals. Be sure to use axis='columns'.
Print the result of applying .value_counts() to most_medals. The result reported gives the number of times each of the USA or the USSR won more Olympic medals in total than the other between 1952 and 1988.

# Create the pivot table: medals_won_by_country medals_won_by_country = medals.pivot_table(index='Edition', columns='NOC', values='Athlete', aggfunc='count') # Slice medals_won_by_country: cold_war_usa_usr_medals cold_war_usa_usr_medals = medals_won_by_country.loc[1952:1988, ['USA','URS']] # Create most_medals most_medals = cold_war_usa_usr_medals.idxmax(axis='columns') # Print most_medals.value_counts() print(most_medals.value_counts())

Create a DataFrame usa with data only for the USA.
Group usa such that ['Edition', 'Medal'] is the index. Aggregate the count over 'Athlete'.
Use .unstack() with level='Medal' to reshape the DataFrame usa_medals_by_year.
Construct a line plot from the final DataFrame usa_medals_by_year. This has been done for you, so hit 'Submit Answer' to see the plot!

# Create the DataFrame: usa usa = medals[medals.NOC == 'USA'] # Group usa by ['Edition', 'Medal'] and aggregate over 'Athlete' usa_medals_by_year = usa.groupby(['Edition', 'Medal'])['Athlete'].count() # Reshape usa_medals_by_year by unstacking usa_medals_by_year = usa_medals_by_year.unstack(level='Medal') # Plot the DataFrame usa_medals_by_year usa_medals_by_year.plot() plt.show()

Redefine the 'Medal' column of the DataFrame medals as an ordered categorical. To do this, use pd.Categorical() with three keyword arguments:
- values = medals.Medal.
- categories=['Bronze', 'Silver', 'Gold'].
- ordered=True.
- After this, you can verify that the type has changed using medals.info().
Plot the final DataFrame usa_medals_by_year as an area plot. This has been done for you, so hit 'Submit Answer' to see how the plot has changed!

# Redefine 'Medal' as an ordered categorical medals.Medal = pd.Categorical(values=medals.Medal, categories=['Bronze', 'Silver', 'Gold'], ordered=True) # Create the DataFrame: usa usa = medals[medals.NOC == 'USA'] # Group usa by 'Edition', 'Medal', and 'Athlete' usa_medals_by_year = usa.groupby(['Edition', 'Medal'])['Athlete'].count() # Reshape usa_medals_by_year by unstacking usa_medals_by_year = usa_medals_by_year.unstack(level='Medal') # Create an area plot of usa_medals_by_year usa_medals_by_year.plot.area() plt.show()

你可能感兴趣的:(class Manipulating DataFrames with pandas)

虚函数和多态应用场景 yshi2017
有两个类，在一个类添加函数的时候，另一个类也需要添加，这个时候可以提出一个基类讲这个函数作为一个基类的函数，子类实现这两个函数，比如此函数为outPut();调用基类函数方式：BaseClass::outPut();
R语言绘制散点图 Ora_ge R语音
［转自：http://blog.sina.com.cn/s/blog_69ffa1f90101siek.html］函数。简单地说，把一些R语句（赋值、计算或其他操作步骤）包装起来并给它一个名称，这就是函数。我们前面接触过的getClass(),class(),head(),rep(),cbind(),rbind()等都是函数。显示（打印）对象也有函数print()，但R有更简单的方法：输入对象名（
Python适配器模式详解：让不兼容的接口协同工作 detayun Python python 适配器模式开发语言
一、模式定义与核心思想适配器模式（AdapterPattern）是一种结构型设计模式，它通过创建一个中间层（适配器），将不兼容的接口转换为客户端期望的接口。就像现实中的电源适配器，让不同国家的插头都能在同一个插座上工作。二、模式结构解析#目标接口：客户端期望的接口classTarget:defrequest(self):"""标准请求方法"""raiseNotImplementedError#被适
c#:TCP服务端管理类妮妮学代码 c#tcp/ip java
TCP客户端连接多个服务端的类1.架构图2.创建TCP客户端与服务端通信的工具类注：TcpClientAsyncTool类中是客户端连接服务端的，TcpClient实质是Server，套用服务端连接客户端的，使用过程中自行修改名称，本案例暂未修改。连接使用异步操作，其余为同步执行的。publicclassTcpClientAsyncTool{privateTcpClient_tcpClient;p
JAVA刷题记录: 专题十五 BFS解决FloodFill算法用屁屁笑宽度优先算法
733.图像渲染-力扣（LeetCode）classSolution{int[]dx={0,0,-1,1};int[]dy={1,-1,0,0};publicint[][]floodFill(int[][]image,intsr,intsc,intcolor){intprev=image[sr][sc];if(color==prev)returnimage;Queueq=newLinkedList
14.优化算法之BFS解决FloodFill算法1 muyierfly 算法题算法宽度优先深度优先
0.FloodFill简介dfs：深度优先遍历（红色）bfs：宽度优先遍历1.图像渲染算法原理classSolution{int[]dx={0,0,1,-1};int[]dy={1,-1,0,0};publicint[][]floodFill(int[][]image,intsr,intsc,intcolor){intprev=image[sr][sc];//统计刚开始的颜⾊if(prev==co
力扣 hot100 Day49 qq_51397044 Hot100 leetcode 算法数据结构
105.从前序与中序遍历序列构造二叉树给定两个整数数组preorder和inorder，其中preorder是二叉树的先序遍历，inorder是同一棵树的中序遍历，请构造二叉树并返回其根节点。//抄的classSolution{private:unordered_mapindex;TreeNode*myBuildTree(constvector&preorder,constvector&inord
力扣 hot100 Day44 qq_51397044 Hot100 leetcode 算法
98.验证二叉搜索树给你一个二叉树的根节点root，判断其是否是一个有效的二叉搜索树。有效二叉搜索树定义如下：节点的左子树只包含小于当前节点的数。节点的右子树只包含大于当前节点的数。所有左子树和右子树自身必须也是二叉搜索树//自己写的classSolution{public:voidinorderHelper(TreeNode*root,vector&result){if(root==nullpt
力扣 hot100 Day45 qq_51397044 Hot100 leetcode 算法
230.二叉搜索树中第K小的元素给定一个二叉搜索树的根节点root，和一个整数k，请你设计一个算法查找其中第k小的元素（从1开始计数）。//抄的classSolution{public:voidhelper(TreeNode*root,intk,int&count,int&result){if(!root)return;helper(root->left,k,count,result);count
Leetcode703. 数据流中的第K大元素 LonnieQ
题目设计一个找到数据流中第K大元素的类（class）。注意是排序后的第K大元素，不是第K个不同的元素。你的KthLargest类需要一个同时接收整数k和整数数组nums的构造器，它包含数据流中的初始元素。每次调用KthLargest.add，返回当前数据流中第K大的元素。示例:intk=3;int[]arr=[4,5,8,2];KthLargestkthLargest=newKthLargest(
Java-数构链表 2301_81674311 java 链表开发语言
1.链表1.1链表的概念和结构链表是一种物理存储结构上非连续存储结构，数据元素的逻辑顺序是通过链表中引用链接次序实现的。这里大多讨论无头单向非循环链表。这种结构，结构简单，一般与其他数据结构结合，作为其他数据结构的子数据。1.2链表的实现publicclassMysingleList{staticclassListNode{publicintval;//节点的值域publicListNodenex
力扣 hot100 Day50 qq_51397044 Hot100 leetcode 算法职场和发展
437.路径总和III给定一个二叉树的根节点root，和一个整数targetSum，求该二叉树里节点值之和等于targetSum的路径的数目。路径不需要从根节点开始，也不需要在叶子节点结束，但是路径方向必须是向下的（只能从父节点到子节点）。//抄的classSolution{public:intpathSum(TreeNode*root,inttargetSum){unordered_mappre
分治算法---归并
1、排序数组classSolution{vectortmp;public:vectorsortArray(vector&nums){tmp.resize(nums.size());mergeSort(nums,0,nums.size()-1);returnnums;}voidmergeSort(vector&nums,intleft,intright){if(left>=right)return;
415.字符串相加粉蒸妹 LeedCode每日一题
给定两个字符串形式的非负整数num1和num2，计算它们的和。注意：num1和num2的长度都小于5100.num1和num2都只包含数字0-9.num1和num2都不包含任何前导零。你不能使用任何內建BigInteger库，也不能直接将输入的字符串转换为整数形式。publicclassQuestion1{publicstaticvoidmain(String[]args){Scannerin=n
全局异常处理器相关代码
文章目录全局异常处理器全局异常处理器@RestControllerAdvicepublicclassGlobalExceptionHandler{@ExceptionHandler(Exception.class)//捕获所有异常publicResultex(Exceptionex){ex.printStackTrace();//获取堆栈信息returnResult.error("操作失败");}
WEB：DOM （一）基础概念 —— 节点与选择重生之我是Java开发战士 WEB 前端
文章目录一、DOM核心概念解析1.1什么是DOM？1.2DOM与HTML的关系二、DOM节点（Node）详解2.1节点类型2.2节点的基本属性2.3元素节点特有的属性和方法三、DOM选择与访问3.1传统选择方法3.1.1getElementById()3.1.2getElementsByTagName()3.1.3getElementsByClassName()3.2现代选择方法（CSS选择器）3
字符串的翻转小结是我真的是我
题目一给定一个字符串，如"csdn"，编写函数返回翻转为"ndsc"的结果。思路不考虑库函数的情况下，采用递归的方式，每次返回从第二位开始的子串（同时递归下去）加上第一位字符，直到递归到剩下一个字符则直接返回即可。publicclassSolution{publicstaticvoidmain(String[]args){Stringstr="csdn";System.out.println(My
day 28打卡 weixin_39908253 AI学习笔记 python
day18选用昨天的kmeans得到的效果进行聚类，进而推断每个簇的实际含义#先运行之前处理好的代码importpandasaspdimportnumpyasnpimportmatplotlib.pyplotaspltimportseabornassnsimportwarningswarnings.filterwarnings('ignore')plt.rcParams['font.sans-se
（2）React的JSX语法 __method__
JSX−JSX是JavaScript语法的扩展。React开发不一定使用JSX，但我们建议使用它。要使用自定义的组件，要以大写字母开始自行编辑一个todolist页面首先在src下面创建src/TodoList.js，输入以下代码importReact,{Component}from"react";classTodoListextendsComponent{render(){return(todo
【NLP舆情分析】基于python微博舆情分析可视化系统(flask+pandas+echarts) 视频教程 - 基于wordcloud库实现词云图
大家好，我是java1234_小锋老师，最近写了一套【NLP舆情分析】基于python微博舆情分析可视化系统(flask+pandas+echarts)视频教程，持续更新中，计划月底更新完，感谢支持。今天讲解基于wordcloud库实现词云图视频在线地址：2026版【NLP舆情分析】基于python微博舆情分析可视化系统(flask+pandas+echarts+爬虫)视频教程（火爆连载更新中..
Java 匿名内部类详解：简洁、灵活的内联类定义方式大葱白菜 java合集开发语言后端 java 学习个人开发
作为一名Java开发工程师，你一定在开发过程中遇到过这样的场景：需要实现一个接口或继承一个类，但这个类只使用一次想简化代码结构，避免创建过多无意义的“一次性”类在事件监听器、线程任务、函数式编程中需要快速定义行为逻辑这时候，匿名内部类（AnonymousInnerClass）就派上用场了！本文将带你全面理解：什么是匿名内部类？匿名内部类的语法结构与执行流程使用场景与实际案例解析匿名内部类与Lamb
十种常用数据分析模型耐思nice～数据分析数据分析人工智能机器学习数学建模
1-线性回归（LinearRegression）场景：预测商品销售额优点：简单易用，结果易于解释缺点：假设线性关系，容易受到异常值影响概念：建立自变量和因变量之间线性关系的模型。公式：[y=b_0+b_1x_1+b_2x_2+...+b_nx_n]代码示例：importpandasaspdfromsklearn.linear_modelimportLinearRegressionfromsklea
1948. 删除系统中的重复文件夹追逐此刻力扣 python linux 开发语言
1948.删除系统中的重复文件夹-力扣（LeetCode）classTrieNode:__slots__='son','name','deleted'def__init__(self):self.son={}self.name=''self.deleted=FalseclassSolution:defdeleteDuplicateFolder(self,paths:List[List[str]])
一篇文章带你搞懂什么是类的继承 paid槮 python 开发语言
如果需要在类Microwave的基础上添加“增加”或“修改”功能且继续保留原来的类Microwave，可以使用继承的方式。继承是指在原类的基础上创建一个新类,而新类会自动获取原类中的所有属性和方法。原类称为父类,新类称为子类。类的继承方式在创建新类时，class后面的括号用于继承父类且不接收参数。class子类名(父类名):def__init(self,子参数):super().__init__(
Flutter：Text Widget 文本组件的使用风一样的code
Flutter学习咒语:"Flutter一切皆组件！"新语言第一个程序当然是'Helloworld'看一下最基础的HelloWold代码import'package:flutter/material.dart';voidmain()=>runApp(MyApp());classMyAppextendsStatelessWidget{@overrideWidgetbuild(BuildContext
Gradle：Cannot add task ‘clean‘ as a task with that name already exists. 海阔天空6688 开发工具 Android gradle android
Gradle编译项目报错：Cannotaddtask'clean'asataskwiththatnamealreadyexists.提示的是gradle中的clean方法已经存在了，冲突了，把项目中build.gradle的clean方法注释掉或者删掉重新编译就也可以了。buildscript{repositories{}dependencies{classpath'com.android.too
numpy教程 Jeffrey_Pacino 编程学习 numpy 数据分析
使用jupyternotebook分析数据之前导入的包importnumpyasnp#linearalgebraimportpandasaspd#dataprocessing,CSVfileI/O(e.g.pd.read_csv)%matplotlibinlineimportmatplotlib.pyplotasplt#Matlab-styleplottingimportseabornassns
Java | Leetcode Java题解之第338题比特位计数 m0_57195758 分享 Java Leetcode 题解
题目：题解：classSolution{publicint[]countBits(intn){int[]bits=newint[n+1];for(inti=1;i<=n;i++){bits[i]=bits[i&(i-1)]+1;}returnbits;}}
LeetCode第338题——比特位计数（Java） m0_52861211 LeetCode刷题笔记 leetcode 算法
题目描述：给你一个整数n，对于001-->12-->10示例2：输入：n=5输出：[0,1,1,2,1,2]解释：0-->01-->12-->103-->114-->1005-->101提示：00时p[n]=p[n/2]//当n为偶数时，n>0时代码：classSolution{publicint[]countBits(intn){int[]result=newint[n+1];intcount=
3201. 找出有效子序列的最大长度 I 咔咔咔的 c++
3201.找出有效子序列的最大长度I题目链接：3201.找出有效子序列的最大长度I代码如下：classSolution{public:intmaximumLength(vector&nums){intres=0;vector>f(2,vector(2));for(intx:nums){x%=2;for(inty=0;y<2;y++){f[y][x]=f[x][y]+1;res=max(res,f[
分享100个最新免费的高匿HTTP代理IP mcj8089 代理IP 代理服务器匿名代理免费代理IP 最新代理IP
推荐两个代理IP网站： 1. 全网代理IP：http://proxy.goubanjia.com/ 2. 敲代码免费IP：http://ip.qiaodm.com/ 120.198.243.130:80,中国/广东省 58.251.78.71:8088,中国/广东省 183.207.228.22:83,中国/
mysql高级特性之数据分区 annan211 java 数据结构 mongodb 分区 mysql
mysql高级特性 1 以存储引擎的角度分析，分区表和物理表没有区别。是按照一定的规则将数据分别存储的逻辑设计。器底层是由多个物理字表组成。 2 分区的原理分区表由多个相关的底层表实现，这些底层表也是由句柄对象表示，所以我们可以直接访问各个分区。存储引擎管理分区的各个底层表和管理普通表一样(所有底层表都必须使用相同的存储引擎)，分区表的索引只是
JS采用正则表达式简单获取URL地址栏参数 chiangfai js 地址栏参数获取
GetUrlParam:function GetUrlParam(param){ var reg = new RegExp("(^|&)"+ param +"=([^&]*)(&|$)"); var r = window.location.search.substr(1).match(reg); if(r!=null
怎样将数据表拷贝到powerdesigner (本地数据库表) Array_06 powerDesigner
================================================== 1、打开PowerDesigner12，在菜单中按照如下方式进行操作 file->Reverse Engineer->DataBase 点击后，弹出 New Physical Data Model 的对话框 2、在General选项卡中 Model name:模板名字，自
logbackのhelloworld 飞翔的马甲日志 logback
一、概述 1.日志是啥？当我是个逗比的时候我是这么理解的：log.debug()代替了system.out.print(); 当我项目工作时，以为是一堆得.log文件。这两天项目发布新版本，比较轻松，决定好好地研究下日志以及logback。传送门1：日志的作用与方法： http://www.infoq.com/cn/articles/why-and-how-log 上面的作
新浪微博爬虫模拟登陆随意而生新浪微博
转载自：http://hi.baidu.com/erliang20088/item/251db4b040b8ce58ba0e1235 近来由于毕设需要，重新修改了新浪微博爬虫废了不少劲，希望下边的总结能够帮助后来的同学们。现行版的模拟登陆与以前相比，最大的改动在于cookie获取时候的模拟url的请求
synchronized 香水浓 java thread
Java语言的关键字，可用来给对象和方法或者代码块加锁，当它锁定一个方法或者一个代码块的时候，同一时刻最多只有一个线程执行这段代码。当两个并发线程访问同一个对象object中的这个加锁同步代码块时，一个时间内只能有一个线程得到执行。另一个线程必须等待当前线程执行完这个代码块以后才能执行该代码块。然而，当一个线程访问object的一个加锁代码块时，另一个线程仍然
maven 简单实用教程 AdyZhang maven
1. Maven介绍 1.1. 简介 java编写的用于构建系统的自动化工具。目前版本是2.0.9，注意maven2和maven1有很大区别，阅读第三方文档时需要区分版本。 1.2. Maven资源见官方网站；The 5 minute test，官方简易入门文档；Getting Started Tutorial，官方入门文档；Build Coo
Android 通过 intent传值获得null aijuans android
我在通过intent 获得传递兑现过的时候报错，空指针,我是getMap方法进行传值，代码如下 1 2 3 4 5 6 7 8 9 public void getMap(View view){ Intent i =
apache 做代理报如下错误：The proxy server received an invalid response from an upstream baalwolf response
网站配置是apache＋tomcat,tomcat没有报错，apache报错是： The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /. Reason: Error reading fr
Tomcat6 内存和线程配置 BigBird2012 tomcat6
1、修改启动时内存参数、并指定JVM时区（在windows server 2008 下时间少了8个小时）在Tomcat上运行j2ee项目代码时，经常会出现内存溢出的情况，解决办法是在系统参数中增加系统参数： window下，在catalina.bat最前面 set JAVA_OPTS=-XX:PermSize=64M -XX:MaxPermSize=128m -Xms5
Karam与TDD bijian1013 Karam TDD
一.TDD 测试驱动开发（Test-Driven Development,TDD）是一种敏捷（AGILE）开发方法论，它把开发流程倒转了过来，在进行代码实现之前，首先保证编写测试用例，从而用测试来驱动开发（而不是把测试作为一项验证工具来使用）。 TDD的原则很简单： a.只有当某个
[Zookeeper学习笔记之七]Zookeeper源代码分析之Zookeeper.States bit1129 zookeeper
public enum States { CONNECTING, //Zookeeper服务器不可用，客户端处于尝试链接状态 ASSOCIATING, //？？？ CONNECTED, //链接建立，可以与Zookeeper服务器正常通信 CONNECTEDREADONLY, //处于只读状态的链接状态，只读模式可以在
【Scala十四】Scala核心八：闭包 bit1129 scala
Free variable A free variable of an expression is a variable that’s used inside the expression but not defined inside the expression. For instance, in the function literal expression (x: Int) => (x
android发送json并解析返回json ronin47 android
package com.http.test; import org.apache.http.HttpResponse; import org.apache.http.HttpStatus; import org.apache.http.client.HttpClient; import org.apache.http.client.methods.HttpGet; import
一份IT实习生的总结 brotherlamp PHP php资料 php教程 php培训 php视频
今天突然发现在不知不觉中自己已经实习了 3 个月了，现在可能不算是真正意义上的实习吧，因为现在自己才大三，在这边撸代码的同时还要考虑到学校的功课跟期末考试。让我震惊的是，我完全想不到在这 3 个月里我到底学到了什么，这是一件多么悲催的事情啊。同时我对我应该 get 到什么新技能也很迷茫。所以今晚还是总结下把，让自己在接下来的实习生活有更加明确的方向。最后感谢工作室给我们几个人这个机会让我们提前出来
据说是2012年10月人人网校招的一道笔试题-给出一个重物重量为X,另外提供的小砝码重量分别为1，3，9。。。3^N。将重物放到天平左侧，问在两边如何添加砝码 bylijinnan java
public class ScalesBalance { /** * 题目： * 给出一个重物重量为X,另外提供的小砝码重量分别为1，3，9。。。3^N。（假设N无限大，但一种重量的砝码只有一个） * 将重物放到天平左侧，问在两边如何添加砝码使两边平衡 * * 分析： * 三进制 * 我们约定括号表示里面的数是三进制，例如 47=(1202
dom4j最常用最简单的方法 chiangfai dom4j
要使用dom4j读写XML文档,需要先下载dom4j包,dom4j官方网站在 http://www.dom4j.org/目前最新dom4j包下载地址:http://nchc.dl.sourceforge.net/sourceforge/dom4j/dom4j-1.6.1.zip 解开后有两个包,仅操作XML文档的话把dom4j-1.6.1.jar加入工程就可以了,如果需要使用XPath的话还需要
简单HBase笔记 chenchao051 hbase
一、Client-side write buffer 客户端缓存请求描述：可以缓存客户端的请求，以此来减少RPC的次数，但是缓存只是被存在一个ArrayList中，所以多线程访问时不安全的。可以使用getWriteBuffer()方法来取得客户端缓存中的数据。默认关闭。二、Scan的Caching 描述： next( )方法请求一行就要使用一次RPC,即使
mysqldump导出时出现when doing LOCK TABLES daizj mysql mysqdump 导数据
　　执行　mysqldump -uxxx -pxxx -hxxx -Pxxxx database tablename > tablename.sql　导出表时，会报 mysqldump: Got error: 1044: Access denied for user 'xxx'@'xxx' to database 'xxx' when doing LOCK TABLES 解决
CSS渲染原理 dcj3sjt126com Web
从事Web前端开发的人都与CSS打交道很多，有的人也许不知道css是怎么去工作的，写出来的css浏览器是怎么样去解析的呢？当这个成为我们提高css水平的一个瓶颈时，是否应该多了解一下呢？一、浏览器的发展与CSS
《阿甘正传》台词 dcj3sjt126com
Part Ⅰ: 《阿甘正传》Forrest Gump经典中英文对白 Forrest: Hello! My names Forrest. Forrest Gump. You wanna Chocolate? I could eat about a million and a half othese. My momma always said life was like a box ochocol
Java处理JSON dyy_gusi json
Json在数据传输中很好用，原因是JSON 比 XML 更小、更快，更易解析。在Java程序中，如何使用处理JSON，现在有很多工具可以处理，比较流行常用的是google的gson和alibaba的fastjson，具体使用如下： 1、读取json然后处理 class ReadJSON { public static void main(String[] args)
win7下nginx和php的配置 geeksun nginx
1. 安装包准备 nginx : 从nginx.org下载nginx-1.8.0.zip php：从php.net下载php-5.6.10-Win32-VC11-x64.zip， php是免安装文件。 RunHiddenConsole: 用于隐藏命令行窗口 2. 配置 # java用8080端口做应用服务器，nginx反向代理到这个端口即可 p
基于2.8版本redis配置文件中文解释 hongtoushizi redis
转载自： http://wangwei007.blog.51cto.com/68019/1548167 在Redis中直接启动redis-server服务时, 采用的是默认的配置文件。采用redis-server xxx.conf 这样的方式可以按照指定的配置文件来运行Redis服务。下面是Redis2.8.9的配置文
第五章常用Lua开发库3-模板渲染 jinnianshilongnian nginx lua
动态web网页开发是Web开发中一个常见的场景，比如像京东商品详情页，其页面逻辑是非常复杂的，需要使用模板技术来实现。而Lua中也有许多模板引擎，如目前我在使用的lua-resty-template，可以渲染很复杂的页面，借助LuaJIT其性能也是可以接受的。如果学习过JavaEE中的servlet和JSP的话，应该知道JSP模板最终会被翻译成Servlet来执行；而lua-r
JZSearch大数据搜索引擎颠覆者 JavaScript
系统简介：大数据的特点有四个层面：第一，数据体量巨大。从TB级别，跃升到PB级别；第二，数据类型繁多。网络日志、视频、图片、地理位置信息等等。第三，价值密度低。以视频为例，连续不间断监控过程中，可能有用的数据仅仅有一两秒。第四，处理速度快。最后这一点也是和传统的数据挖掘技术有着本质的不同。业界将其归纳为4个“V”——Volume，Variety，Value，Velocity。大数据搜索引
10招让你成为杰出的Java程序员 pda158 java 编程框架
如果你是一个热衷于技术的 Java 程序员，那么下面的 10 个要点可以让你在众多 Java 开发人员中脱颖而出。　　 1. 拥有扎实的基础和深刻理解 OO 原则　　对于 Java 程序员，深刻理解 Object Oriented Programming（面向对象编程）这一概念是必须的。没有 OOPS 的坚实基础，就领会不了像 Java 这些面向对象编程语言
tomcat之oracle连接池配置小网客 oracle
tomcat版本7.0 配置oracle连接池方式：修改tomcat的server.xml配置文件： <GlobalNamingResources> <Resource name="utermdatasource" auth="Container" type="javax.sql.DataSou
Oracle 分页算法汇总 vipbooks oracle sql 算法 .net
这是我找到的一些关于Oracle分页的算法，大家那里还有没有其他好的算法没？我们大家一起分享一下！ -- Oracle 分页算法一 select * from ( select page.*,rownum rn from (select * from help) page -- 20 = (currentPag