hongyesuifeng

class Statistical Thinking in Python (Part 1)

一些EDA数据的基本技术

Import matplotlib.pyplot and seaborn as their usual aliases (plt and sns).
Use seaborn to set the plotting defaults.
Plot a histogram of the Iris versicolor petal lengths using plt.hist() and the provided NumPy array versicolor_petal_length.
Show the histogram using plt.show().

# Import plotting modules
import matplotlib.pyplot as plt
import seaborn as sns

# Set default Seaborn style
sns.set()
# Plot histogram of versicolor petal lengths

plt.hist(versicolor_petal_length)

# Show histogram

plt.show()

Label the axes. Don't forget that you should always include units in your axis labels. Your y-axis label is just 'count'. Your x-axis label is 'petal length (cm)'. The units are essential!
Display the plot constructed in the above steps using plt.show().

# Plot histogram of versicolor petal lengths
_ = plt.hist(versicolor_petal_length)

# Label axes
plt.xlabel("petal length (cm)")
plt.ylabel("count")

# Show histogram
plt.show()

hist直方图的bins数目一般是数据数量的开根号

Import numpy as np. This gives access to the square root function, np.sqrt().
Determine how many data points you have using len().
Compute the number of bins using the square root rule.
Convert the number of bins to an integer using the built in int() function.
Generate the histogram and make sure to use the binskeyword argument.
Hit 'Submit Answer' to plot the figure and see the fruit of your labors!

# Import numpy
import numpy as np

# Compute number of data points: n_data
n_data = len(versicolor_petal_length)

# Number of bins is the square root of number of data points: n_bins
n_bins = np.sqrt(n_data)

# Convert number of bins to integer: n_bins
n_bins = int(n_bins)

# Plot the histogram
plt.hist(versicolor_petal_length, bins=n_bins)

# Label axes
_ = plt.xlabel('petal length (cm)')
_ = plt.ylabel('count')

# Show histogram
plt.show()

_ = sns.swarmplot(x='state', y='dem_share', data=df_swing)
_ = plt.xlabel('state')
_ = plt.ylabel('percent of vote for Obama')
plt.show()

In the IPython Shell, inspect the DataFrame df using df.head(). This will let you identify which column names you need to pass as the x and y keyword arguments in your call to sns.swarmplot().
Use sns.swarmplot() to make a bee swarm plot from the DataFrame containing the Fisher iris data set, df. The x-axis should contain each of the three species, and the y-axis should contain the petal lengths.
Label the axes.
Show your plot.

# Create bee swarm plot with Seaborn's default settings
sns.swarmplot(x='species', y='petal length (cm)', data=df)

# Label the axes
plt.xlabel('species')
plt.ylabel('petal length (cm)')

# Show the plot
plt.show()

Define a function with the signature ecdf(data). Within the function definition,

Compute the number of data points, n, using the len() function.
The x-values are the sorted data. Use the np.sort() function to perform the sorting.
The y data of the ECDF go from 1/n to 1 in equally spaced increments. You can construct this using np.arange(). Remember, however, that the end value in np.arange() is not inclusive. Therefore, np.arange() will need to go from 1to n+1. Be sure to divide this by n.
The function returns the values x and y.

def ecdf(data):
"""Compute ECDF for a one-dimensional array of measurements."""

# Number of data points: n
n = len(data)

# x-data for the ECDF: x
x = np.sort(data)

# y-data for the ECDF: y
y = np.arange(1, n+1) / n

return x, y

Use ecdf() to compute the ECDF of versicolor_petal_length. Unpack the output intox_vers and y_vers.
Plot the ECDF as dots. Remember to include marker = '.' and linestyle = 'none' in addition to x_vers and y_vers as arguments inside plt.plot().
Set the margins of the plot with plt.margins() so that no data points are cut off. Use a 2% margin.
Label the axes. You can label the y-axis 'ECDF'.
Show your plot.

Use ecdf() to compute the ECDF of versicolor_petal_length. Unpack the output intox_vers and y_vers.
Plot the ECDF as dots. Remember to include marker = '.' and linestyle = 'none' in addition to x_vers and y_vers as arguments inside plt.plot().
Set the margins of the plot with plt.margins() so that no data points are cut off. Use a 2% margin.
Label the axes. You can label the y-axis 'ECDF'.
Show your plot.

# Compute ECDF for versicolor data: x_vers, y_vers
x_vers, y_vers = ecdf(versicolor_petal_length)

# Generate plot
plt.plot(x_vers, y_vers, marker='.', linestyle='none')

# Make the margins nice
plt.margins(0.02)

# Label the axes
plt.ylabel('ECDF')
plt.xlabel('versicolor_petal_length')

# Display the plot
plt.show()

Compute ECDFs for each of the three species using your ecdf() function. The variables setosa_petal_length, versicolor_petal_length, and virginica_petal_length are all in your namespace. Unpack the ECDFs into x_set, y_set, x_vers, y_vers and x_virg, y_virg, respectively.
Plot all three ECDFs on the same plot as dots. To do this, you will need three plt.plot() commands. Assign the result of each to _.
Specify 2% margins.
A legend and axis labels have been added for you, so hit 'Submit Answer' to see all the ECDFs!

# Compute ECDFs
x_set, y_set = ecdf(setosa_petal_length)
x_vers, y_vers = ecdf(versicolor_petal_length)
x_virg, y_virg = ecdf(virginica_petal_length)

# Plot all ECDFs on the same plot
_ = plt.plot(x_set, y_set, marker='.', linestyle='none', )
_ = plt.plot(x_vers, y_vers, marker='.', linestyle='none', )
_ = plt.plot(x_virg, y_virg, marker='.', linestyle='none', )

# Make nice margins
plt.margins(0.02)

# Annotate the plot
plt.legend(('setosa', 'versicolor', 'virginica'), loc='lower right')
_ = plt.xlabel('petal length (cm)')
_ = plt.ylabel('ECDF')

# Display the plot
plt.show()

dataframe计算分位数

Create percentiles, a NumPy array of percentiles you want to compute. These are the 2.5th, 25th, 50th, 75th, and 97.5th. You can do so by creating a list containing these ints/floats and convert the list to a NumPy array using np.array(). For example, np.array([30, 50])would create an array consisting of the 30th and 50th percentiles.
Use np.percentile() to compute the percentiles of the petal lengths from the Iris versicolor samples. The variable versicolor_petal_length is in your namespace.
Print the percentiles.

# Specify array of percentiles: percentiles
percentiles = np.array([2.5, 25, 50, 75, 97.5])

# Compute percentiles: ptiles_vers
ptiles_vers = np.percentile(versicolor_petal_length, percentiles)

# Print the result
print(ptiles_vers)

画图并且标记处分位数点：

Plot the percentiles as red diamonds on the ECDF. Pass the x and y co-ordinates - ptiles_vers and percentiles/100 - as positional arguments and specify the marker='D', color='red' and linestyle='none' keyword arguments. The argument for the y-axis - percentiles/100 has been specified for you.
Display the plot.

# Plot the ECDF
_ = plt.plot(x_vers, y_vers, '.')
plt.margins(0.02)
_ = plt.xlabel('petal length (cm)')
_ = plt.ylabel('ECDF')

# Overlay percentiles as red diamonds.
_ = plt.plot(ptiles_vers, percentiles/100, marker='D', color='red',
linestyle='none')

# Show the plot
plt.show()

计算方差

Create an array called differences that is the difference between the petal lengths (versicolor_petal_length) and the mean petal length. The variable versicolor_petal_length is already in your namespace as a NumPy array so you can take advantage of NumPy's vectorized operations.
Square each element in this array. For example, x**2squares each element in the array x. Store the result as diff_sq.
Compute the mean of the elements in diff_sq using np.mean(). Store the result as variance_explicit.
Compute the variance of versicolor_petal_lengthusing np.var(). Store the result as variance_np.
Print both variance_explicit and variance_npin one print call to make sure they are consistent.

# Array of differences to mean: differences
differences = versicolor_petal_length - np.mean(versicolor_petal_length)

# Square the differences: diff_sq
diff_sq = differences ** 2

# Compute the mean square difference: variance_explicit
variance_explicit = np.mean(diff_sq)

# Compute the variance using NumPy: variance_np
variance_np = np.var(versicolor_petal_length)

# Print the results
print(variance_explicit,variance_np)

计算皮尔逊相关系数

Define a function with signature pearson_r(x, y).
- Use np.corrcoef() to compute the correlation matrix of x and y (pass them to np.corrcoef() in that order).
- The function returns entry [0,1] of the correlation matrix.
Compute the Pearson correlation between the data in the arrays versicolor_petal_length and versicolor_petal_width. Assign the result to r.
Print the result.

def pearson_r(x, y):
"""Compute Pearson correlation coefficient between two arrays."""
# Compute correlation matrix: corr_mat
corr_mat = np.corrcoef(x, y)

# Return entry [0,1]
return corr_mat[0,1]

# Compute Pearson correlation coefficient for I. versicolor: r
r = pearson_r(versicolor_petal_length,versicolor_petal_width)

# Print the result
print(r)

Draw samples out of the Binomial distribution using np.random.binomial(). You should use parameters n = 100 and p = 0.05, and set the size keyword argument to 10000.
Compute the CDF using your previously-written ecdf()function.
Plot the CDF with axis labels. The x-axis here is the number of defaults out of 100 loans, while the y-axis is the CDF.
Show the plot.

# Take 10,000 samples out of the binomial distribution: n_defaults
n_defaults = np.random.binomial(100, 0.05, size = 10000)

# Compute CDF: x, y
x, y = ecdf(n_defaults)

# Plot the CDF with axis labels
plt.plot(x, y, marker = '.', linestyle = 'none' )
plt.xlabel('the number of defaults out of 100 loans')
plt.ylabel('CDF')

# Show the plot
plt.show()

正态分布的构建和绘制

Draw 100,000 samples from a Normal distribution that has a mean of 20 and a standard deviation of 1. Do the same for Normal distributions with standard deviations of 3 and 10, each still with a mean of 20. Assign the results to samples_std1, samples_std3 and samples_std10, respectively.
Plot a histograms of each of the samples; for each, use 100 bins, also using the keyword arguments normed=Trueand histtype='step'. The latter keyword argument makes the plot look much like the smooth theoretical PDF. You will need to make 3 plt.hist() calls.
Hit 'Submit Answer' to make a legend, showing which standard deviations you used, and show your plot! There is no need to label the axes because we have not defined what is being described by the Normal distribution; we are just looking at shapes of PDFs.

# Draw 100000 samples from Normal distribution with stds of interest: samples_std1, samples_std3, samples_std10
samples_std1 = np.random.normal(20,1,100000)
samples_std3 = np.random.normal(20,3,100000)
samples_std10 = np.random.normal(20,10,100000)

# Make histograms
plt.hist(samples_std1,normed=True,histtype='step',bins=100)
plt.hist(samples_std3,normed=True,histtype='step',bins=100)
plt.hist(samples_std10,normed=True,histtype='step',bins=100)

# Make a legend, set limits and show plot
_ = plt.legend(('std = 1', 'std = 3', 'std = 10'))
plt.ylim(-0.01, 0.42)
plt.show()

数据的标准正态分布拟合和原数据的拟合

Compute mean and standard deviation of Belmont winners' times with the two outliers removed. The NumPy array belmont_no_outliers has these data.
Take 10,000 samples out of a normal distribution with this mean and standard deviation using np.random.normal().
Compute the CDF of the theoretical samples and the ECDF of the Belmont winners' data, assigning the results to x_theor, y_theor and x, y, respectively.
Hit submit to plot the CDF of your samples with the ECDF, label your axes and show the plot.

# Compute mean and standard deviation: mu, sigma
mu = np.mean(belmont_no_outliers)
sigma = np.std(belmont_no_outliers)

# Sample out of a normal distribution with this mu and sigma: samples
samples = np.random.normal(mu,sigma,10000)

# Get the CDF of the samples and of the data
x_theor,y_theor = ecdf(samples)
x, y = ecdf(belmont_no_outliers)

# Plot the CDFs and show the plot
_ = plt.plot(x_theor, y_theor)
_ = plt.plot(x, y, marker='.', linestyle='none')
plt.margins(0.02)
_ = plt.xlabel('Belmont winning time (sec.)')
_ = plt.ylabel('CDF')
plt.show()

指数分布

Define a function with call signature successive_poisson(tau1, tau2, size=1) that samples the waiting time for a no-hitter and a hit of the cycle.

Draw waiting times tau1 (size number of samples) for the no-hitter out of an exponential distribution and assign to t1.
Draw waiting times tau2 (size number of samples) for hitting the cycle out of an exponential distribution and assign to t2.
The function returns the sum of the waiting times for the two events.

def successive_poisson(tau1, tau2, size=1):
# Draw samples out of first exponential distribution: t1
t1 = np.random.exponential(tau1, size)

# Draw samples out of second exponential distribution: t2
t2 = np.random.exponential(tau2, size)

return t1 + t2

Use your successive_poisson() function to draw 100,000 out of the distribution of waiting times for observing a no-hitter and a hitting of the cycle.
Plot the PDF of the waiting times using the step histogram technique of a previous exercise. Don't forget the necessary keyword arguments. You should use bins=100, normed=True, and histtype='step'.
Label the axes.
Show your plot.

# Draw samples of waiting times: waiting_times
waiting_times = successive_poisson(764, 715, size=100000)

# Make the histogram
plt.hist(waiting_times, bins=100, normed=True, histtype='step')

# Label axes
plt.xlabel('time')
plt.ylabel('waiting_times')

# Show the plot
plt.show()

多项式拟合

Compute the slope and intercept of the regression line using np.polyfit(). Remember, fertility is on the y-axis and illiteracy on the x-axis.
Print out the slope and intercept from the linear regression.
To plot the best fit line, create an array x that consists of 0 and 100 using np.array(). Then, compute the theoretical values of y based on your regression parameters. I.e., y = a * x + b.
Plot the data and the regression line on the same plot. Be sure to label your axes.
Hit 'Submit Answer' to display your plot.

# Plot the illiteracy rate versus fertility
_ = plt.plot(illiteracy, fertility, marker='.', linestyle='none')
plt.margins(0.02)
_ = plt.xlabel('percent illiterate')
_ = plt.ylabel('fertility')

# Perform a linear regression using np.polyfit(): a, b
a, b = np.polyfit(illiteracy, fertility,1)

# Print the results to the screen
print('slope =', a, 'children per woman / percent illiterate')
print('intercept =', b, 'children per woman')

# Make theoretical line to plot
x = np.array([0,100])
y = a * x + b

# Add regression line to your plot
_ = plt.plot(x, y)

# Draw the plot
plt.show()

Specify the values of the slope for which to compute the RSS. Use np.linspace() to get 200 points in the range between 0 and 0.1. For example, to get 100 points in the range between 0 and 0.5, you could use np.linspace() like so: np.linspace(0, 0.5, 100).
Initialize an array, rss, to contain the RSS using np.empty_like() and the array you created above. The empty_like() function returns a new array with the same shape and type as a given array (in this case, a_vals).
Write a for loop to compute the sum of RSS of the slope. Hint: the RSS is given by np.sum((y_data - a * x_data - b)**2). The variable b you computed in the last exercise is already in your namespace. Here, fertility is the y_data and illiteracy the x_data.
Plot the RSS (rss) versus slope (a_vals).
Hit 'Submit Answer' to see the plot!

# Specify slopes to consider: a_vals
a_vals = np.linspace(0, 0.1, 200)

# Initialize sum of square of residuals: rss
rss = np.empty_like(a_vals)

# Compute sum of square of residuals for each value of a_vals
for i, a in enumerate(a_vals):
rss[i] = np.sum((fertility - a*illiteracy - b)**2)

# Plot the RSS
plt.plot(a_vals, rss, '-')
plt.xlabel('slope (children per woman / percent illiterate)')
plt.ylabel('sum of square of residuals')

plt.show()

Compute the parameters for the slope and intercept using np.polyfit(). The Anscombe data are stored in the arrays x and y.
Print the slope a and intercept b.
Generate theoretical x and y data from the linear regression. Your x array, which you can create with np.array(), should consist of 3 and 15. To generate the y data, multiply the slope by x_theor and add the intercept.
Plot the Anscombe data as a scatter plot and then plot the theoretical line. Remember to include the marker='.'and linestyle='none' keyword arguments in addition to x and y when to plot the Anscombe data as a scatter plot. You do not need these arguments when plotting the theoretical line.
Hit 'Submit Answer' to see the plot!

# Perform linear regression: a, b
a, b = np.polyfit(x,y,1)

# Print the slope and intercept
print(a, b)

# Generate theoretical x and y data: x_theor, y_theor
x_theor = np.array([3, 15])
y_theor = x_theor * a + b

# Plot the Anscombe data and theoretical line
_ = plt.plot(x_theor, y_theor)
_ = plt.plot(x,y,marker='.',linestyle='none')

# Label the axes
plt.xlabel('x')
plt.ylabel('y')

# Show the plot
plt.show()

样本小的时候进行重复采样的方法

Write a for loop to acquire 50 bootstrap samples of the rainfall data and plot their ECDF.
- Use np.random.choice() to generate a bootstrap sample from the NumPy array rainfall. Be sure that the size of the resampled array is len(rainfall).
- Use the function ecdf() that you wrote in the prequel to this course to generate the x and yvalues for the ECDF of the bootstrap sample bs_sample.
- Plot the ECDF values. Specify color='gray' (to make gray dots) and alpha=0.1 (to make them semi-transparent, since we are overlaying so many) in addition to the marker='.' and linestyle='none' keyword arguments.
Use ecdf() to generate x and y values for the ECDF of the original rainfall data available in the array rainfall.
Plot the ECDF values of the original data.
Hit 'Submit Answer' to visualize the samples!

for _ in range(50):
# Generate bootstrap sample: bs_sample
bs_sample = np.random.choice(rainfall, size=len(rainfall))

# Compute and plot ECDF from bootstrap sample
x, y = ecdf(bs_sample)
_ = plt.plot(x, y, marker='.', linestyle='none',
color='gray', alpha=0.1)

# Compute and plot ECDF from original data
x, y = ecdf(rainfall)
_ = plt.plot(x, y, marker='.')

# Make margins and label axes
plt.margins(0.02)
_ = plt.xlabel('yearly rainfall (mm)')
_ = plt.ylabel('ECDF')

# Show the plot
plt.show()

Draw 10000 bootstrap replicates of the mean annual rainfall using your draw_bs_reps() function and the rainfall array. Hint: Pass in np.mean for func to compute the mean.
- As a reminder, draw_bs_reps() accepts 3 arguments: data, func, and size.
Compute and print the standard error of the mean of rainfall.
- The formula to compute this is np.std(data) / np.sqrt(len(data)).
Compute and print the standard deviation of your bootstrap replicates bs_replicates.
Make a histogram of the replicates using the normed=True keyword argument and 50 bins.
Hit 'Submit Answer' to see the plot!

# Take 10,000 bootstrap replicates of the mean: bs_replicates
bs_replicates = draw_bs_reps(rainfall, np.mean, 10000)

# Compute and print SEM
sem = np.std(rainfall) / np.sqrt(len(rainfall))
print(sem)

# Compute and print standard deviation of bootstrap replicates
bs_std = np.std(bs_replicates)
print(bs_std)

# Make a histogram of the results
_ = plt.hist(bs_replicates, bins=50, normed=True)
_ = plt.xlabel('mean annual rainfall (mm)')
_ = plt.ylabel('PDF')

# Show the plot
plt.show()

计算置信区间

Generate 10000 bootstrap replicates of τ from the nohitter_times data using your draw_bs_reps()function. Recall that the the optimal τ is calculated as the mean of the data.
Compute the 95% confidence interval using np.percentile() and passing in two arguments: The array bs_replicates, and the list of percentiles - in this case 2.5 and 97.5.
Print the confidence interval.
Plot a histogram of your bootstrap replicates. This has been done for you, so hit 'Submit Answer' to see the plot!

# Draw bootstrap replicates of the mean no-hitter time (equal to tau): bs_replicates
bs_replicates = draw_bs_reps(nohitter_times,np.mean,10000)

# Compute the 95% confidence interval: conf_int
conf_int = np.percentile(bs_replicates,[2.5,97.5])

# Print the confidence interval
print('95% confidence interval =', conf_int, 'games')

# Plot the histogram of the replicates
_ = plt.hist(bs_replicates, bins=50, normed=True)
_ = plt.xlabel(r'$\tau$ (games)')
_ = plt.ylabel('PDF')

# Show the plot
plt.show()

重复抽取数据和进行线性拟合

Define a function with call signature draw_bs_pairs_linreg(x, y, size=1) to perform pairs bootstrap estimates on linear regression parameters.

Use np.arange() to set up an array of indices going from 0 to len(x). These are what you will resample and use them to pick values out of the xand y arrays.
Use np.empty() to initialize the slope and intercept replicate arrays to be of size size.
Write a for loop to:
- Resample the indices inds. Use np.random.choice() to do this.
- Make new x and y arrays bs_x and bs_yusing the the resampled indices bs_inds. To do this, slice x and y with bs_inds.
- Use np.polyfit() on the new x and yarrays and store the computed slope and intercept.
Return the pair bootstrap replicates of the slope and intercept.

def draw_bs_pairs_linreg(x, y, size=1):
"""Perform pairs bootstrap for linear regression."""

# Set up array of indices to sample from: inds
inds = np.arange(len(x))

# Initialize replicates: bs_slope_reps, bs_intercept_reps
bs_slope_reps = np.empty(size)
bs_intercept_reps = np.empty(size)

# Generate replicates
for i in range(size):
bs_inds = np.random.choice(inds, size=len(inds))
bs_x, bs_y = x[bs_inds], y[bs_inds]
bs_slope_reps[i], bs_intercept_reps[i] = np.polyfit(bs_x, bs_y, 1)

return bs_slope_reps, bs_intercept_reps

Generate an array of x-values consisting of 0 and 100for the plot of the regression lines. Use the np.array()function for this.
Write a for loop in which you plot a regression line with a slope and intercept given by the pairs bootstrap replicates. Do this for 100 lines.
- When plotting the regression lines in each iteration of the for loop, recall the regression equation y = a*x + b. Here, a is bs_slope_reps[i] and b is bs_intercept_reps[i].
- Specify the keyword arguments linewidth=0.5, alpha=0.2, and color='red' in your call to plt.plot().
Make a scatter plot with illiteracy on the x-axis and fertility on the y-axis. Remember to specify the marker='.' and linestyle='none' keyword arguments.
Label the axes, set a 2% margin, and show the plot. This has been done for you, so hit 'Submit Answer' to visualize the bootstrap regressions!

# Generate array of x-values for bootstrap lines: x
x = np.array([0,100])

# Plot the bootstrap lines
for i in range(100):
_ = plt.plot(x, bs_slope_reps[i]*x + bs_intercept_reps[i],
linewidth=0.5, alpha=0.2, color='red')

# Plot the data
_ = plt.plot()

# Label axes, set the margins, and show the plot
_ = plt.xlabel('illiteracy')
_ = plt.ylabel('fertility')
plt.margins(0.02)
plt.show()

Visualizing permutation sampling

Write a for loop to 50 generate permutation samples, compute their ECDFs, and plot them.
- Generate a permutation sample pair from rain_july and rain_november using your permutation_sample() function.
- Generate the x and y values for an ECDF for each of the two permutation samples for the ECDF using your ecdf() function.
- Plot the ECDF of the first permutation sample (x_1and y_1) as dots. Do the same for the second permutation sample (x_2 and y_2).
Generate x and y values for ECDFs for the rain_july and rain_november data and plot the ECDFs using respectively the keyword arguments color='red' and color='blue'.
Label your axes, set a 2% margin, and show your plot. This has been done for you, so just hit 'Submit Answer' to view the plot!

for _ in range(50):
# Generate permutation samples
perm_sample_1, perm_sample_2 = permutation_sample(rain_july, rain_november)

# Compute ECDFs
x_1, y_1 = ecdf(perm_sample_1)
x_2, y_2 = ecdf(perm_sample_2)

# Plot ECDFs of permutation sample
_ = plt.plot(x_1, y_1, marker='.', linestyle='none',
color='red', alpha=0.02)
_ = plt.plot(x_2, y_2, marker='.', linestyle='none',
color='blue', alpha=0.02)

# Create and plot ECDFs from original data
x_1, y_1 = ecdf(rain_july)
x_2, y_2 = ecdf(rain_november)
_ = plt.plot(x_1, y_1, marker='.', linestyle='none', color='red')
_ = plt.plot(x_2, y_2, marker='.', linestyle='none', color='blue')

# Label axes, set margin, and show plot
plt.margins(0.02)
_ = plt.xlabel('monthly rainfall (mm)')
_ = plt.ylabel('ECDF')
plt.show()

P值的计算：

Define a function with call signature diff_of_means(data_1, data_2) that returns the differences in means between two data sets, mean of data_1 minus mean of data_2.
Use this function to compute the empirical difference of means that was observed in the frogs.
Draw 10,000 permutation replicates of the difference of means.
Compute the p-value.
Print the p-value.

def diff_of_means(data_1, data_2):
"""Difference in means of two arrays."""

# The difference of means of data_1, data_2: diff
diff = np.mean(data_1) - np.mean(data_2)

return diff

# Compute difference of mean impact force from experiment: empirical_diff_means
empirical_diff_means = diff_of_means(force_a, force_b)

# Draw 10,000 permutation replicates: perm_replicates
perm_replicates = draw_perm_reps(force_a, force_b,
diff_of_means, size=10000)

# Compute p-value: p
p = np.sum( perm_replicates >= empirical_diff_means) / len(perm_replicates)

# Print the result
print('p-value =', p)

Translate the impact forces of Frog B such that its mean is 0.55 N.
Use your draw_bs_reps() function to take 10,000 bootstrap replicates of the mean of your translated forces.
Compute the p-value by finding the fraction of your bootstrap replicates that are less than the observed mean impact force of Frog B. Note that the variable of interest here is force_b.
Print your p-value.

# Make an array of translated impact forces: translated_force_b
translated_force_b = force_b - np.mean(force_b) + 0.55

# Take bootstrap replicates of Frog B's translated impact forces: bs_replicates
bs_replicates = draw_bs_reps(translated_force_b, np.mean, 10000)

# Compute fraction of replicates that are less than the observed Frog B force: p
p = np.sum(bs_replicates <= np.mean(force_b)) / 10000

# Print the p-value
print('p = ', p)

Construct Boolean arrays, dems and reps that contain the votes of the respective parties; e.g., dems has 153 True entries and 91 False entries.
Write a function, frac_yay_dems(dems, reps) that returns the fraction of Democrats that voted yay. The first input is an array of Booleans, Two inputs are required to use your draw_perm_reps() function, but the second is not used.
Use your draw_perm_reps() function to draw 10,000 permutation replicates of the fraction of Democrat yay votes.
Compute and print the p-value.

# Construct arrays of data: dems, reps
dems = np.array([True] * 153 + [False] * 91)
reps = np.array([True] * 136 + [False] * 35)

def frac_yay_dems(dems, reps):
"""Compute fraction of Democrat yay votes."""
frac = np.sum(dems) / len(dems)
return frac

# Acquire permutation samples: perm_replicates
perm_replicates = draw_perm_reps(dems, reps, frac_yay_dems, 10000)

# Compute and print p-value: p
p = np.sum(perm_replicates <= 153/244) / len(perm_replicates)
print('p-value =', p)

Compute the observed difference in mean inter-nohitter time using diff_of_means().
Generate 10,000 permutation replicates of the difference of means using draw_perm_reps().
Compute and print the p-value.

# Compute the observed difference in mean inter-no-hitter times: nht_diff_obs
nht_diff_obs = diff_of_means(nht_dead, nht_live)

# Acquire 10,000 permutation replicates of difference in mean no-hitter time: perm_replicates
perm_replicates = draw_perm_reps(nht_dead, nht_live,
diff_of_means, size=10000)

# Compute and print the p-value: p
p = np.sum(perm_replicates <= nht_diff_obs) / len(perm_replicates)
print('p-val =',p)

Compute the observed Pearson correlation between illiteracy and fertility.
Initialize an array to store your permutation replicates.
Write a for loop to draw 10,000 replicates:
- Permute the illiteracy measurements using np.random.permutation().
- Compute the Pearson correlation between the permuted illiteracy array, illiteracy_permuted, and fertility.
Compute and print the p-value from the replicates.

# Compute observed correlation: r_obs
r_obs = pearson_r(illiteracy, fertility)

# Initialize permutation replicates: perm_replicates
perm_replicates = np.empty(10000)

# Draw replicates
for i in range(10000):
# Permute illiteracy measurments: illiteracy_permuted
illiteracy_permuted = np.random.permutation(illiteracy)

# Compute Pearson correlation
perm_replicates[i] = pearson_r(illiteracy_permuted, fertility)

# Compute p-value: p
p = np.sum(perm_replicates >= r_obs) / len(perm_replicates)
print('p-val =', p)

Use your ecdf() function to generate x,y values from the control and treated arrays for plotting the ECDFs.
Plot the ECDFs on the same plot.
The margins have been set for you, along with the legend and axis labels. Hit 'Submit Answer' to see the result!

# Compute x,y values for ECDFs
x_control, y_control = ecdf(control)
x_treated, y_treated = ecdf(treated)

# Plot the ECDFs
plt.plot(x_control, y_control, marker='.', linestyle='none')
plt.plot(x_treated, y_treated, marker='.', linestyle='none')

# Set the margins
plt.margins(0.02)

# Add a legend
plt.legend(('control', 'treated'), loc='lower right')

# Label axes and show plot
plt.xlabel('millions of alive sperm per mL')
plt.ylabel('ECDF')
plt.show()

Compute the mean alive sperm count of control minus that of treated.
Compute the mean of all alive sperm counts. To do this, first concatenate control and treated and take the mean of the concatenated array.
Generate shifted data sets for both control and treated such that the shifted data sets have the same mean. This has already been done for you.
Generate 10,000 bootstrap replicates of the mean each for the two shifted arrays. Use your draw_bs_reps()function.
Compute the bootstrap replicates of the difference of means.
The code to compute and print the p-value has been written for you. Hit 'Submit Answer' to see the result!

# Compute the difference in mean sperm count: diff_means
diff_means = np.mean(control) - np.mean(treated)

# Compute mean of pooled data: mean_count
mean_count = np.mean(np.concatenate((control, treated)))

# Generate shifted data sets
control_shifted = control - np.mean(control) + mean_count
treated_shifted = treated - np.mean(treated) + mean_count

# Generate bootstrap replicates
bs_reps_control = draw_bs_reps(control_shifted,
np.mean, size=10000)
bs_reps_treated = draw_bs_reps(treated_shifted,
np.mean, size=10000)

# Get replicates of difference of means: bs_replicates
bs_replicates = bs_reps_control - bs_reps_treated

# Compute and print p-value: p
p = np.sum(bs_replicates >= np.mean(control) - np.mean(treated)) \
/ len(bs_replicates)
print('p-value =', p)

你可能感兴趣的:(class Statistical Thinking in Python (Part 1))

美团一面，有点难度。 go
一位粉丝朋友分享了最近参与美团民宿旅游业务线的一面的经历，全程约1小时，面试官围绕高并发、分布式事务、性能优化等高频考点展开追问，问题密集且注重落地细节。以下是完整问题整理+回答思路+扩展解析，助你避坑！一、项目与高并发场景1.“介绍一个项目中的难点，并说明QPS和用户量峰值？”回答示例：项目背景：民宿节日大促活动，瞬时流量激增（如春节、国庆），用户抢购特价房源。核心数据：QPS峰值：约8000（
leetcode116. 填充每个节点的下一个右侧节点指针 Chevy_cxw c/c++算法设计
题目链接：https://leetcode-cn.com/problems/populating-next-right-pointers-in-each-node/题意：给定一个完美二叉树，其所有叶子节点都在同一层，每个父节点都有两个子节点。二叉树定义如下：structNode{intval;Node*left;Node*right;Node*next;}填充它的每个next指针，让这个指针指向其
DeepSeek 指导手册从入门到精通长久的梦 DeepSeek DeepSeek 技术架构解析 DeepSeek 代码重构应用 DeepSeek 提示词模板 DeepSeek 联网搜索技巧 DeepSeek 未来趋势 DeepSeek 开源的意义 DeepSeek 性能优化方法
目录正文第⼀章：准备篇（30分钟上手）❄️1.1三分钟创建你的AI伙伴❄️1.2认识你的AI控制台第⼆章：基础对话篇（像交朋友⼀样学交流）❄️2.1有效提问的五个⻩⾦法则❄️2.2新⼿必学的10个魔法指令第三章：效率⻜跃篇（⽂件处理与复杂任务）❄️3.1五分钟学会⽂档分析❄️3.2让AI帮你写代码第四章：场景实战篇⸺解决真实世界问题❄️4.1学术论⽂全流程辅助（从开题到答辩）❄️阶段⼀：开题攻坚❄
无人机遥感在农林信息提取中的实现方法与GIS融合制图教程岁月如歌，青春不败生态遥感无人机农业科学林业科学 GIS 制图遥感生态学
遥感技术作为一种空间大数据手段，能够从多时、多维、多地等角度，获取大量的农情数据。数据具有面状、实时、非接触、无伤检测等显著优势，是智慧农业必须采用的重要技术之一。一：综合态势分析1.1研究区及作物品种分析（1）形态指标分析（2）生理生化指标分析（3）胁迫指标分析（4）产量指标分析（5）综合分析1.2无人机平台分析：析目前常用于农林行业的无人机平台。1.3无人机机载传感器分析：析目前常用于农林行业
React 前端框架开发入门：从零开始构建你的第一个应用 2401_89793006 热门话题 react.js 前端框架前端
React前端框架开发入门：从零开始构建你的第一个应用React是当前最流行的前端框架之一，由Facebook（Meta）开发并开源。它采用组件化开发的模式，使UI变得更易管理和复用。本文将带你从零开始，掌握React的基础知识，并构建你的第一个React应用！1.什么是React？React是一个用于构建用户界面的JavaScript库，主要用于构建单页应用（SPA）。其核心特点包括：✅组件化开
如何训练LLMs进行“思考”（如o1和DeepSeek-R1）人工智能
如何训练LLMs进行“思考”（如o1和DeepSeek-R1）阅读时长：19分钟发布时间：2025-02-13近日热文：全网最全的神经网络数学原理（代码和公式）直观解释欢迎关注知乎和公众号的专栏内容LLM架构专栏知乎LLM专栏知乎【柏企】公众号【柏企科技说】【柏企阅文】一台会思考的笔记本电脑OpenAI的o1模型为大型语言模型（LLM）的训练开创了全新范式。它引入了所谓的“思考”令牌（tokens
MES系统：加速制造业数字化转型的驱动力
MES系统是一种集成化的车间生产信息化管理系统，它处于企业计划层与控制层之间，负责接收上层ERP系统下达的生产计划，并监控和指导车间生产过程的执行。MES系统通过数据采集、任务分配、过程监控、质量管理、设备维护、物料追踪等一系列功能，实现了生产现场的透明化、精益化和智能化管理。一、MES系统的核心功能1、生产计划管理：MES系统能依据订单任务和车间资源状况，制定高效的生产计划和产能调度计划，从而充
2025带你看清DevSecOps的发展背景、现状及未来趋势和最佳实践人工智能
DevSecOps的概念在2012年由Gartner首次提出，并逐渐受到国内企业的追捧。随着数字化转型加速和企业上云进程的推进，敏捷开发模式使软件开发生命周期缩短（几天到几周），留给安全的时间越来越短，因此必须在DevOps中有效地融入安全，即DevSecOps。业界已经达成一种共识，即DevSecOps是DevOps发展的必然结果。概览1.1.DevSecOps产生背景传统安全模式局限：传统的安
python pandas中apply()方法用法汇总 whale fall python进阶 python pandas 数据分析
apply函数是pandas中用于对DataFrame或Series中的每一行或每一列应用一个函数的强大工具。在apply()方法中，通常会传入一个函数作为参数，这个函数会应用到DataFrame的每一行或每一列上，或Series的每个元素上。下面是一些常见的用法示例：1.对Series使用apply()，传入一个函数如果你想对某一列（Series）应用函数，可以直接调用apply方法。impor
团队领导者指南：如何选择和应用项目管理方法论项目管理软件
项目管理方法论是用于规划、执行和控制项目的系统化框架和流程。不同的方法论适用于不同类型的项目和团队需求。以下是几种常见的项目管理方法论：1.瀑布模型(Waterfall)●特点:线性顺序的项目管理方法，项目分为多个阶段（如需求分析、设计、开发、测试、部署），每个阶段完成后才能进入下一个阶段。●适用场景:需求明确、变更较少的项目，如建筑、制造业等。●优点:结构清晰，易于理解和执行。●缺点:缺乏灵活性
WebSocket 握手过程子羽bro 日常开发合集 websocket 网络协议网络
文章目录1.WebSocket握手过程概述2.客户端发送握手请求3.服务器响应握手请求4.客户端验证握手响应5.建立WebSocket连接6.安全性与注意事项7.应用示例在现代Web开发中，WebSocket协议因其高效的实时通信能力而被广泛应用。WebSocket允许客户端和服务器之间建立持久的双向通信连接，从而实现诸如实时聊天、在线游戏、物联网设备监控等场景。然而，WebSocket连接的建立
【C++指南】解锁C++ STL：从入门到进阶的技术之旅倔强的石头_ C++指南 c++开发语言
博客主页：倔强的石头的CSDN主页Gitee主页：倔强的石头的gitee主页⏩文章专栏：《C++指南》期待您的关注目录一、STL是什么二、STL的核心组件2.1容器（Containers）2.2算法（Algorithms）2.3迭代器（Iterators）2.4其他组件三、STL的优势3.1高效开发3.2高性能3.3泛型与可扩展性3.4代码简洁与可维护性3.5跨平台兼容性四、结语一、STL是什么S
【深度学习】常见模型-GPT（Generative Pre-trained Transformer，生成式预训练 Transformer） IT古董深度学习人工智能深度学习 gpt transformer
GPT（GenerativePre-trainedTransformer）1️⃣什么是GPT？GPT（GenerativePre-trainedTransformer，生成式预训练Transformer）是由OpenAI开发的基于Transformer解码器（Decoder）的自回归（Autoregressive）语言模型。它能够通过大量无监督数据预训练，然后微调（Fine-tuning）以适应特
产品和品牌谁的优先级更高？看看 Curve 的初版界面就知道了安全区块链
撰文：BramVanRoelen，Maven11Capital产品主管编译：Tia，TechubNews「初创公司在不同阶段应如何平衡产品建设与品牌营销：初期应专注于构建优秀产品，品牌营销应在后期逐步增加，避免过早依赖品牌包装。」每周，总有一些初创公司雇佣昂贵的代理商来为他们设计「品牌故事」。但Aave却从一个看起来像黑客马拉松项目的小玩意，成长为DeFi借贷市场的中坚力量。这不是巧合——这是一个
从负数绝对值的计算来看Ruby的一个“奇葩”行为
计算一个数的绝对值是非常基础的操作，几乎所有主流的编程语言都内置了相应的函数或方法。在PHP、Python、SQL等语言中，直接调用abs()函数即可，例如abs(-1)。到了Java、C#这类面向对象的语言中，abs()通常是Math类的静态方法，调用时要加上前缀Math.，即Math.abs(-1)。Go语言就要稍微麻烦一点了，因为math包中的Abs()函数仅支持float64类型的参数，如
特朗普家族搅局加密界：原以为的「正本清源」却成了深陷泥潭区块链web3比特币
作者：Techub精选编译原标题：Crypto’sFirstFamilyIsDeepeningtheSwamp撰文：LionelLaurent，彭博社观点专栏作家编译：J1N，TechubNews美国总统特朗普的次子EricTrump认为现在是购买以太坊的好时机，他认为由于他对以太坊的支持推动了币价的短暂上涨。但与此同时，现在也是政客和监管机构采取行动的好时机，以建立更严格的监管措施，针对特朗普家
Python Playwright 打包报错 Please run the following command to download new browsers 卡尔特斯 Python python
想做一个浏览器自动化的小插件，本地安装了Playwright，测试可以正常打开浏览器自动化。但是在使用PyInstaller将Python代码打包成app/exe后，打开应用程序报错：playwright._impl._api_types.Error:Executabledoesn'texistat/Users/dengzemiao/Desktop/Project/python/dist/main
【FAQ】HarmonyOS SDK 闭源开放能力 — IAP Kit（4） harmonyos-next
1.问题描述：发布了一个订阅，看日志显示订阅发布成功了，但是在消费的时候没有值，这个是什么原因？人脸活体检测返回上一页App由沉浸式变为非沉浸式多了上下安全区域。解决方案：对于公共事件来说就是提供这个能力，需要调用方保证时序，订阅成功之后再发广播才能收到。2.问题描述：微信支付，支付宝支付，银联支付SDK是否已经支持？解决方案：1、支付宝：鸿蒙支付SDK获取链接：https://opendocs.
【FAQ】HarmonyOS SDK 闭源开放能力 —Remote Communication Kit harmonyos-next
1.问题描述：DynamicDnsRule有没有示例？这个地址是怎么解析出来https://developer.huawei.com/consumer/cn/doc/harmonyos-refere...解决方案：'DynamicDnsRule'：表示优先使用函数中返回的地址。/***域名和端口会自行获取，不需要传入，这边需要开发者指定Ip地址数组*@paramhost域名*@param_端口*@
【FAQ】HarmonyOS SDK 闭源开放能力 —Push Kit（7） harmonyos-next
1.问题描述：推送通知到手机，怎么配置拉起应用指定的页面？解决方案：1、如果点击通知栏打开默认Ability的话，actionType可以设置为0，同时可以在.clickAction.data中，指定待跳转的page页面，命名为pageUri。2、然后在UIAbility的onNewWant或者onCreate方法中解析配置的pageUri；3、如果应用进程不存在将会触发onCreate方法，可以
跟着案例一次搞定React-Hooks Coder螺丝钉 React react.js javascript 前端
1.ReactHooks是什么ReactHooks是ReactV16.8版本新增的特性，即在不编写类组件的情况下使用state以及React的新特性。React官网提供了10个HooksAPI,来满足我们在函数组件中定义状态，提供类似生命周期的功能和一些高级特性。2.Hooks的诞生背景2.1.类组件的不足状态逻辑难以复用：在旧版本的React中，想要实现逻辑的复用，需要使用到HOC或者Rende
未成年人模式护航，保障安全健康上网 harmonyos
为保护未成年人的上网环境，预防未成年人沉迷网络，帮助未成年人培养积极健康的用网习惯，HarmonyOSSDK提供未成年人模式功能，在华为设备上加强对面向未成年人的产品和服务的管理。场景介绍（应用跟随系统未成年人模式状态变化）1.查询系统状态：建议应用跟随系统未成年人模式状态切换，随系统一同开启或关闭未成年人模式。应用启动时可以查询系统的未成年人模式是否开启。未成年人模式开启时，应用应主动切换为未成
喜讯！全知科技案例获2024全国智慧医保大赛优胜奖安全
2024年11月5日，国家医保局主办的2024年全国智慧医保大赛决赛落幕。国家医保局党组书记、局长章轲、局党组成员、副局长颜清辉，重庆市人民政府副市长但彦铮出席颁奖典礼。大赛以“数字中国智慧医保”为主题，从“数字技术助力医保服务、医保改革和医保管理”以及“医保数据要素赋能百业千行”两个角度出发，共设置了三大主题赛道，包括智慧医保实践案例、智慧医保创新应用、医保数据要素赋能。参赛案例涉及新技术赋能医
Android studio：如何在同一个页面显示多个fragment 剑客狼心 android studio android ide
家母罹患肝癌，可在水滴筹页面查看详情实现一个简单的效果：创建TestOneFragmentpublicclassTestOneFragmentextendsFragment{@OverridepublicViewonCreateView(LayoutInflaterinflater,ViewGroupcontainer,BundlesavedInstanceState){//使用一个简单的布局文件
windows7 IIS远程执行代码漏洞ms15-034，导致系统蓝屏 dhl383561030 linux 安全
一.windows7系统打开iis服务方法1.控制面板-程序-程序和功能-打开关闭功能-internet服务-万维网全选-WEB管理工具全选,要保证镜像光盘是打开状态、防火墙是关闭的。2.可以使用默认网站，但是需要进行绑定。在绑定完毕之后要进行物理机访问是否成功。3.ms15-034是IIS漏洞ms-17-010是smb漏洞二、通过MSF进行漏洞验证：1.msfconsole#进入msf2.sea
Kubernetes (K8S)决定弃用 Docker！Kubernetes (K8S)学习详解熙媛学习笔记 java docker jenkins linux 服务器
确实如此。Kubernetes现已弃用Docker！！！目前，Kubernetes中的Docker支持功能现已弃用，并将在之后的版本中被删除。Kubernetes之前使用的是一个名为dockershim的模块，用以实现对Docker的CRI支持。但Kubernetes社区发现了与之相关的维护问题，因此建议大家考虑使用包含CRI完整实现（兼容v1alpha1或v1）的可用容器运行时。简而言之，Doc
机器翻译技术的演进与未来趋势：从规则到神经网络的革新 Echo_Wish 人工智能前沿技术机器翻译神经网络人工智能
随着全球化的不断推进和多语言交流的日益频繁，机器翻译（MachineTranslation,MT）技术的需求日益增长。机器翻译技术经历了从基于规则的方法到统计方法，再到如今的神经网络方法的发展历程。本文将探讨机器翻译技术的演进过程及其未来趋势，并结合Python代码示例，展示现代机器翻译技术的应用。一、机器翻译技术的发展历程1.基于规则的机器翻译（RBMT）早期的机器翻译技术主要基于规则（Rule
面试总结：Qt 信号槽机制与 MOC 原理 TravisBytes QT 编程问题档案面试 qt 职场和发展
目录1.基本概念1.1信号（Signal）1.2槽（Slot）1.3连接（Connect）2.MOC（Meta-ObjectCompiler）是什么？2.1为什么需要MOC2.2工作流程2.3`Q_OBJECT`宏的意义3.信号槽的底层原理3.1发射信号（emit）3.2调用槽函数3.3新旧语法的实现差异4.使用示例4.1常规：QObject子类中信号槽4.2Lambdas作为槽（现代写法）5.常
高等代数笔记5：线性变换 p_wh 高等代数
线性映射的定义与性质线性映射的定义数学研究的主题是空间与变换，对于代数学而言，空间指的是赋予了某种运算结构的集合，变换则是空间到空间的映射。线性代数则是研究线性空间及其上的映射。但是，研究的对象不是所有的映射，而是特殊的一类映射，这类映射和线性运算紧密联系，称为线性映射。定义5.1V1,V2V_1,V_2V1,V2是KKK的两个线性空间，f:V1→V2f:V_1\toV_2f:V1→V2是V1V_
python同花顺交易接口_开启量化第一步！同花顺iFinD数据接口免费版简易操作教程... weixin_39564527 python同花顺交易接口
金融市场波动频繁，投资往往会夹杂非理性的情绪。而量化交易，旨在以先进的数学模型替代人为的主观判断，利用计算机技术从庞大的历史数据中海选能带来超额收益的多种“大概率”事件以制定策略，辅助投资者进行理性投资。不过计算机分析存在一定的技术门槛，有没有简单易学的量化交易方式，能够快速获取有价值的投资策略方案呢？同花顺iFinD数据接口免费版提供简易的操作与丰富的实操案例，将作为引路者，带你迈入量化世界！P
如何用ruby来写hadoop的mapreduce并生成jar包 wudixiaotie mapreduce
ruby来写hadoop的mapreduce，我用的方法是rubydoop。怎么配置环境呢： 1.安装rvm：不说了网上有 2.安装ruby：由于我以前是做ruby的，所以习惯性的先安装了ruby，起码调试起来比jruby快多了。 3.安装jruby： rvm install jruby然后等待安
java编程思想 -- 访问控制权限百合不是茶 java 访问控制权限单例模式
访问权限是java中一个比较中要的知识点,它规定者什么方法可以访问,什么不可以访问一:包访问权限; 自定义包: package com.wj.control; //包 public class Demo { //定义一个无参的方法 public void DemoPackage(){ System.out.println("调用
[生物与医学]请审慎食用小龙虾 comsci 生物
现在的餐馆里面出售的小龙虾,有一些是在野外捕捉的,这些小龙虾身体里面可能带有某些病毒和细菌,人食用以后可能会导致一些疾病,严重的甚至会死亡..... 所以,参加聚餐的时候,最好不要点小龙虾...就吃养殖的猪肉,牛肉,羊肉和鱼,等动物蛋白质
org.apache.jasper.JasperException: Unable to compile class for JSP: 商人shang maven 2.2 jdk1.8
环境： jdk1.8 maven tomcat7-maven-plugin 2.0 原因： tomcat7-maven-plugin 2.0 不知吃 jdk 1.8，换成 tomcat7-maven-plugin 2.2就行，即 <plugin>
你的垃圾你处理掉了吗?GC oloz GC
前序:本人菜鸟，此文研究学习来自网络，各位牛牛多指教　 1.垃圾收集算法的核心思想　　Java语言建立了垃圾收集机制，用以跟踪正在使用的对象和发现并回收不再使用(引用)的对象。该机制可以有效防范动态内存分配中可能发生的两个危险：因内存垃圾过多而引发的内存耗尽，以及不恰当的内存释放所造成的内存非法引用。　　垃圾收集算法的核心思想是：对虚拟机可用内存空间，即堆空间中的对象进行识别
shiro 和 SESSSION 杨白白 shiro
shiro 在web项目里默认使用的是web容器提供的session，也就是说shiro使用的session是web容器产生的，并不是自己产生的，在用于非web环境时可用其他来源代替。在web工程启动的时候它就和容器绑定在了一起，这是通过web.xml里面的shiroFilter实现的。通过session.getSession()方法会在浏览器cokkice产生JESSIONID，当关闭浏览器，此
移动互联网终端淘宝客如何实现盈利小桔子移動客戶端淘客淘寶App
2012年淘宝联盟平台为站长和淘宝客带来的分成收入突破30亿元，同比增长100%。而来自移动端的分成达1亿元，其中美丽说、蘑菇街、果库、口袋购物等App运营商分成近5000万元。可以看出，虽然目前阶段PC端对于淘客而言仍旧是盈利的大头，但移动端已经呈现出爆发之势。而且这个势头将随着智能终端(手机，平板)的加速普及而更加迅猛
wordpress小工具制作 aichenglong wordpress 小工具
wordpress 使用侧边栏的小工具，很方便调整页面结构小工具的制作过程 1 在自己的主题文件中新建一个文件夹(如widget)，在文件夹中创建一个php(AWP_posts-category.php) 小工具是一个类,想侧边栏一样，还得使用代码注册，他才可以再后台使用，基本的代码一层不变 <?php class AWP_Post_Category extends WP_Wi
JS微信分享 AILIKES js
// 所有功能必须包含在 WeixinApi.ready 中进行 WeixinApi.ready(function(Api) { // 微信分享的数据 var wxData = { &nb
封装探讨百合不是茶 JAVA面向对象封装
//封装属性方法将某些东西包装在一起，通过创建对象或使用静态的方法来调用，称为封装；封装其实就是有选择性地公开或隐藏某些信息，它解决了数据的安全性问题，增加代码的可读性和可维护性在 Aname类中申明三个属性，将其封装在一个类中：通过对象来调用例如 1： //属性将其设为私有姓名 name 可以公开
jquery radio/checkbox change事件不能触发的问题 bijian1013 JavaScript jquery
我想让radio来控制当前我选择的是机动车还是特种车，如下所示： <html> <head> <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js" type="text/javascript"><
AngularJS中安全性措施 bijian1013 JavaScript AngularJS 安全性 XSRF JSON漏洞
在使用web应用中，安全性是应该首要考虑的一个问题。AngularJS提供了一些辅助机制，用来防护来自两个常见攻击方向的网络攻击。一.JSON漏洞当使用一个GET请求获取JSON数组信息的时候（尤其是当这一信息非常敏感，
[Maven学习笔记九]Maven发布web项目 bit1129 maven
基于Maven的web项目的标准项目结构 user-project user-core user-service user-web src
【Hive七】Hive用户自定义聚合函数(UDAF) bit1129 hive
用户自定义聚合函数，用户提供的多个入参通过聚合计算(求和、求最大值、求最小值)得到一个聚合计算结果的函数。问题：UDF也可以提供输入多个参数然后输出一个结果的运算，比如加法运算add(3，5)，add这个UDF需要实现UDF的evaluate方法,那么UDF和UDAF的实质分别究竟是什么？ Double evaluate(Double a, Double b)
通过 nginx-lua 给 Nginx 增加 OAuth 支持 ronin47
前言：我们使用Nginx的Lua中间件建立了OAuth2认证和授权层。如果你也有此打算，阅读下面的文档，实现自动化并获得收益。SeatGeek 在过去几年中取得了发展，我们已经积累了不少针对各种任务的不同管理接口。我们通常为新的展示需求创建新模块，比如我们自己的博客、图表等。我们还定期开发内部工具来处理诸如部署、可视化操作及事件处理等事务。在处理这些事务中，我们使用了几个不同的接口来认证： &n
利用tomcat-redis-session-manager做session同步时自定义类对象属性保存不上的解决方法 bsr1983 session
在利用tomcat-redis-session-manager做session同步时，遇到了在session保存一个自定义对象时，修改该对象中的某个属性，session未进行序列化，属性没有被存储到redis中。在 tomcat-redis-session-manager的github上有如下说明： Session Change Tracking As noted in the &qu
《代码大全》表驱动法-Table Driven Approach-1 bylijinnan java 算法
关于Table Driven Approach的一篇非常好的文章： http://www.codeproject.com/Articles/42732/Table-driven-Approach package com.ljn.base; import java.util.Random; public class TableDriven { public
Sybase封锁原理 chicony Sybase
昨天在操作Sybase IQ12.7时意外操作造成了数据库表锁定，不能删除被锁定表数据也不能往其中写入数据。由于着急往该表抽入数据，因此立马着手解决该表的解锁问题。无奈此前没有接触过Sybase IQ12.7这套数据库产品，加之当时已属于下班时间无法求助于支持人员支持，因此只有借助搜索引擎强大的
java异常处理机制 CrazyMizzz java
java异常关键字有以下几个，分别为 try catch final throw throws 他们的定义分别为 try： Opening exception-handling statement. catch： Captures the exception. finally： Runs its code before terminating
hive 数据插入DML语法汇总 daizj hive DML 数据插入
Hive的数据插入DML语法汇总1、Loading files into tables语法：1) LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]解释：1)、上面命令执行环境为hive客户端环境下： hive>l
工厂设计模式 dcj3sjt126com 设计模式
使用设计模式是促进最佳实践和良好设计的好办法。设计模式可以提供针对常见的编程问题的灵活的解决方案。工厂模式工厂模式（Factory）允许你在代码执行时实例化对象。它之所以被称为工厂模式是因为它负责“生产”对象。工厂方法的参数是你要生成的对象对应的类名称。 Example #1 调用工厂方法（带参数） <?phpclass Example{
mysql字符串查找函数 dcj3sjt126com mysql
FIND_IN_SET(str,strlist) 假如字符串str 在由N 子链组成的字符串列表strlist 中，则返回值的范围在1到 N 之间。一个字符串列表就是一个由一些被‘,’符号分开的自链组成的字符串。如果第一个参数是一个常数字符串，而第二个是type SET列，则 FIND_IN_SET() 函数被优化，使用比特计算。如果str不在strlist 或st
jvm内存管理 easterfly jvm
一、JVM堆内存的划分分为年轻代和年老代。年轻代又分为三部分：一个eden,两个survivor。工作过程是这样的：e区空间满了后，执行minor gc，存活下来的对象放入s0, 对s0仍会进行minor gc，存活下来的的对象放入s1中，对s1同样执行minor gc，依旧存活的对象就放入年老代中；年老代满了之后会执行major gc，这个是stop the word模式，执行
CentOS-6.3安装配置JDK-8 gengzg centos
JAVA_HOME=/usr/java/jdk1.8.0_45 JRE_HOME=/usr/java/jdk1.8.0_45/jre PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib export JAVA_HOME
【转】关于web路径的获取方法 huangyc1210 Web 路径
假定你的web application 名称为news,你在浏览器中输入请求路径： http://localhost:8080/news/main/list.jsp 则执行下面向行代码后打印出如下结果： 1、 System.out.println(request.getContextPath()); //可返回站点的根路径。也就是项
php里获取第一个中文首字母并排序远去的渡口数据结构 PHP
很久没来更新博客了，还是觉得工作需要多总结的好。今天来更新一个自己认为比较有成就的问题吧。最近在做储值结算，需求里结算首页需要按门店的首字母A-Z排序。我的数据结构原本是这样的： Array ( [0] => Array ( [sid] => 2885842 [recetcstoredpay] =&g
java内部类 hm4123660 java 内部类匿名内部类成员内部类方法内部类
　在Java中，可以将一个类定义在另一个类里面或者一个方法里面，这样的类称为内部类。内部类仍然是一个独立的类，在编译之后内部类会被编译成独立的.class文件，但是前面冠以外部类的类名和$符号。内部类可以间接解决多继承问题,可以使用内部类继承一个类，外部类继承一个类，实现多继承。 &nb
Caused by: java.lang.IncompatibleClassChangeError: class org.hibernate.cfg.Exten zhb8015
maven pom.xml关于hibernate的配置和异常信息如下，查了好多资料，问题还是没有解决。只知道是包冲突，就是不知道是哪个包....遇到这个问题的分享下是怎么解决的。。 maven pom: <dependency> <groupId>org.hibernate</groupId> <ar
Spark 性能相关参数配置详解－任务调度篇 Stark_Summer spark cache cpu 任务调度 yarn
随着Spark的逐渐成熟完善, 越来越多的可配置参数被添加到Spark中来, 本文试图通过阐述这其中部分参数的工作原理和配置思路, 和大家一起探讨一下如何根据实际场合对Spark进行配置优化。由于篇幅较长，所以在这里分篇组织，如果要看最新完整的网页版内容，可以戳这里：http://spark-config.readthedocs.org/，主要是便
css3滤镜 wangkeheng html css
经常看到一些网站的底部有一些灰色的图标，鼠标移入的时候会变亮，开始以为是js操作src或者bg呢，搜索了一下，发现了一个更好的方法：通过css3的滤镜方法。 html代码： <a href='' class='icon'><img src='utv.jpg' /></a> css代码： .icon{-webkit-filter: graysc