pure日月

python回归代码_一元回归1_基础（python代码实现）

机器学习，项目统计联系QQ:231469242

1.基本概念

2.SSE/SSR/SST可视化

3.简单回归分为两类

4.一元回归公式

5.估计的回归公式

6.最小二乘法得到回归线应该穿过中心点

7.预测值

8.误差项

9.斜率公式

10.截距公式

11. 决定系数R**2

12.线性关系检验

13.相关系数检验

14.残差

15.被调整的R平方(The Adjusted R2 Value)

16.回归系数的标准误

17.残差分析

19.OLS参数解读

20.一元回归共线性

21.bootstrap

相关系数：

变量之间关系强度

决定系数R**2

=SSR/SSE

SSR占据SST空间越大，R**2值越大

12.线性关系检验

建立模型前，我们已经假定x和y是线性关系，但假设是否成功需要检验后才能证实。

线性关系检验简称为F检验，它用于检验自变量x和因变量y之间线性关系是否显著。

如何判断决定模型是否匹配？用方差分析来算F值，即SSR/SSE的F值概率是否低于0.05

SSE自由度为n-2(变量个数-2)

SSR自由度为1(两个变量自由度为2-1=1)

SST，SSR，SSE的可视化

13.相关系数检验

如果决定系数R**2显著，还要结合样本量考虑参数估计是否适用于总体，如果样本量太小，或R值太小，则不适用与总体

这时用t检验t=(r*math.sqrt(n-2))/(math.sqrt(1-r**2)), 自由度为n-2

14.残差

真实值y1与估算值y~1之差

e=y1-(y~1)

残差平方和就是SSE

|参数估计值-真实值|**2 相加就是SSE

残差平方和

如果一个简单线性模型较好匹配数据，则SSE会最小

简单线性回归目标是创造一个线性模型，其残差平方和最小

SSE公式

练习

http://book.2cto.com/201512/58842.html

餐饮系统中可以统计得到不同菜品的日销量数据，数据示例如表3-7所示。

数据详见：demo/data/catering_sale_all.xls

分析这些菜品销售量之间的相关性可以得到不同菜品之间的关系，比如是替补菜品、互补菜品或者没有关系，为原材料采购提供参考。其Python代码如代码清单3-4所示。

代码清单3-4 餐饮销量数据相关性分析

#-*- coding: utf-8 -*-

#餐饮销量数据相关性分析

from __future__ import print_function

import pandas as pd

catering_sale = '../data/catering_sale_all.xls' #餐饮数据，含有其他属性

data = pd.read_excel(catering_sale, index_col = u'日期') #读取数据，指定“日期”列为索引列

data.corr() #相关系数矩阵，即给出了任意两款菜式之间的相关系数

data.corr()[u'百合酱蒸凤爪'] #只显示“百合酱蒸凤爪”与其他菜式的相关系数

data[u'百合酱蒸凤爪'].corr(data[u'翡翠蒸香茜饺']) #计算“百合酱蒸凤爪”与“翡翠蒸香茜饺”的相关系数

代码详见：demo/code/correlation_analyze.py

上面的代码给出了3种不同形式的求相关系数的运算。运行代码，可以得到任意两款菜式之间的相关系数，如运行“data.corr()[u'百合酱蒸凤爪']”可以得到下面的结果。

>>> data.corr()[u'百合酱蒸凤爪']

百合酱蒸凤爪 1.000000

翡翠蒸香茜饺 0.009206

金银蒜汁蒸排骨 0.016799

乐膳真味鸡 0.455638

蜜汁焗餐包 0.098085

生炒菜心 0.308496

铁板酸菜豆腐 0.204898

香煎韭菜饺 0.127448

香煎萝卜糕 -0.090276

原汁原味菜心 0.428316

Name: 百合酱蒸凤爪, dtype: float64

从上面的结果可以看到，如果顾客点了“百合酱蒸凤爪”，则和点“翡翠蒸香茜饺”“金银蒜汁蒸排骨”“香煎萝卜糕”“铁板酸菜豆腐”“香煎韭菜饺”等主食类的相关性比较低，反而点“乐膳真味鸡”“生炒菜心”“原汁原味菜心”的相关性比较高。

15.被调整的R平方(The Adjusted R2 Value)

http://www.graphpad.com/guides/prism/6/curve-fitting/index.htm?reg_interpreting_the_adjusted_r2.htm

http://www.statisticshowto.com/adjusted-r2/

http://www.360doc.com/content/16/1213/10/33459258_614269488.shtml

n代表数据量，k代表参数量

一种快速简单比较模型方法是选用较小的调整R方

R方仅用于样本数据，对于整体数据，R方没有啥用

调整R方永远小于R方

如果增加了越来越多无用的变量，调整R方变小；

如果增加了越来越多有用变量，调整R方变大。

R方公式

The formula is:

当给模型增加自变量时，复决定系数也随之逐步增大，当自变量足够多时总会得到模型拟合良好，而实际却可能并非如此。于是考虑对R2进行调整，记为Ra2，称调整后复决定系数。

R2＝SSR/SST=1-SSE/SST

Ra2=1-(SSE/dfE)/(SST/dfT)

Why you should not use R2to compare models

R2quantifies how well a model fits the data, so it seems as though it would be an easy way to compare models. It sure sounds easy -- pick the model with the larger R2. The problem with this approach is that there is no penalty for adding more parameters. So the model with more parameters will bend and twist more to come nearer the points, and so almost always has a higher R2. If you use R2as the criteria for picking the best model, you'd almost always pick the model with the most parameters.

The adjusted R2accounts for the number of parameters fit

The adjusted R2always has a lower value than R2 (unless you are fitting only one parameter). The equations below show why.

The equations above show how the adjusted R2is computed. The sum-of-squares of the residuals from the regression line or curve have n-K degrees of freedom, wheren is the number of data points and K is the number of parameters fit by the regression. The total sum-of-squares is the sum of the squares of the distances from a horizontal line through the mean of all Y values. Since it only has one parameter (the mean), the degrees of freedom equals n-1. The adjusted R2is larger than the ordinary R2whenever K is greater than 1.

Using adjusted R2and a quick and dirty way to compare models

A quick and easy way to compare models is to choose the one with the smaller adjusted R2. Choose to report this value on the Diagnostics tab.

Comparing models with adjusted R2is not a standard method for comparing nonlinear models (it is standard for multiple linear regression), and we suggest that you use the extra-sum-of-square F test or comparing AICc instead. If you do compare models by comparing adjusted R2, make sure that identical data, weighted identically, are used for all fits.

Adjusted R2in linear regression

Prism doesn't report the adjusted R2with linear regression, but you can fit a straight line with nonlinear regression.

If X and Y are not linearly related at all, the best fit slope is expected to be 0.0. If you analyzed many randomly selected samples, half the samples would have a slope that is positive and half the samples would have a negative slope. But in all these cases, R2would be positive (or zero). R2can never be negative (unless you constrain the slope or intercept so it is forced to fit worse than a horizontal line). In contrast, the adjusted R2can be negative. If you analyzed many randomly selected samples, you'd expect the adjusted R2to be positive in half the samples and negative in the other half.

Here is a simple way to think about the distinction. The R2quantifies the linear relationship in the sample of data you are analyzing. Even if there is no underlying relationship, there almost certainly is some relationship in that sample. The adjusted R2is smaller than R2and is your best estimate of the degree of relationship in the underlying population.

Adjusted R2 / Adjusted R-Squared: What is it used for?

Watch the video or read the article below:

Adjusted R2: Overview

Adjusted R2 is a special form of R2, the coefficient of determination.

The adjusted R2 has many applications in real life. Image: USCG

R2 shows how well terms (data points) fit a curve or line. Adjusted R2 also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model. If you add more and more useless variablesto a model, adjusted r-squared will decrease. If you add more useful variables, adjusted r-squared will increase.

Adjusted R2 will always be less than or equal to R2. You only need R2 when working withsamples. In other words, R2 isn’t necessary when you have data from an entire population.

where:

N is the number of points in your data sample.

K is the number of independent regressors, i.e. the number of variables in your model, excluding the constant.

If you already know R2 then it’s a fairly simple formula to work. However, if you do not already have R2 then you’ll probably not want to calculate this by hand! (If you must, see How to Calculate the Coefficient of Determination). There are many statistical packages that can calculated adjusted r squared for you. Adjusted r squared is given as part of Excel regression output. See: Excel regression analysis output explained.

Meaning of Adjusted R2

Both R2 and the adjusted R2 give you an idea of how many data points fall within the line of the regression equation. However, there is one main difference between R2 and the adjusted R2: R2 assumes that every single variable explains the 2 tells you the percentage of variation explained by only the independent variables that actually affect the dependent variable.

How Adjusted R2 Penalizes You

The adjusted R2 will penalize you for adding independent variables (K in the equation) that do not fit the model. Why? In regression analysis, it can be tempting to add more variables to the data as you think of them. Some of those variables will be significant, but you can’t be sure that significance is just by chance. The adjusted R2 will compensate for this by that penalizing you for those extra variables.

Problems with R2 that are corrected with an adjusted R2

R2 increases with every predictor added to a model. As R2 always increases and never decreases, it can appear to be a better fit with the more terms you add to the model. This can be completely misleading.

Similarly, if your model has too many terms and too many high-order polynomials you can run into the problem of over-fitting the data. When you over-fit data, a misleadingly high R2 value can lead to misleading projections.

16.回归系数的标准误

1.回归系数的标准误

因为样本统计量的标准差就是它的标准误，所以回归系数的标准误就是它的标准差。有多次抽样时，每次抽出来的样本可估计出一个回归系数，k次抽样有k个估计的回归系数，它们的标准差就是回归系数的标准误(详细参考“标准差与标准误”)。

2.回归的标准误

Yi=Xiβ+ε，其中(Xi，Yi)为观测值，β为回归系数的真实值，ε为误差项；

Yi=Xiβ^+μ，其中(Xi，Yi)为观测值，β^为回归系数的估计值，μ为残差项。

(1)回归的标准误指的是误差项标准差的估计值。

每次抽出来的样本虽然只有一个回归系数的估计值，但因为有n个个体，每个观测值都对应有个残差项(1次抽样有n个残差项)，所以每次抽样都对应着一个残差项标准差，而残差项方差(样本)是误差项方差(总体)的无偏估计量，这也就意味着回归的标准误就是残差项标准差(也被称作均方根误差：Root Mean Squared Error)，所以回归的标准误公式为：

，其中n为样本容量，k为待估参数个数，i为样本中的个体编号；

显然，回归标准误的平方就是残差项方差(也被称作均方误差：Mean Squared Error)：

，其中n为样本容量，k为待估参数个数，i为样本中的个体编号。

均方误差可以评价数据的变化程度，MSE的值越小，说明模型对实验数据具有更好的描述精确度。

Log-Likelihood对数似然估计函数值

理解对数似然估计函数值。对数似然估计函数值一般取负值，实际值(不是绝对值)越大越好。第一，基本推理。对于似然函数，如果是离散分布，最后得到的数值直接就是概率，取值区间为0-1，对数化之后的值就是负数了；如果是连续变量，因为概率密度函数的取值区间并不局限于0-1，所以最后得到的似然函数值不是概率而只是概率密度函数值，这样对数化之后的正负就不确定了。第二，Eviews的计算公式解释。公式值的大小关键取之于残差平方和(以及样本容量)，只有当残差平方和与样本容量的比之很小时，括号内的值才可能为负，从而公式值为正，这时说明参数拟合效度很高；反之公式值为负，但其绝对值越小表示残差平方和越小，因而参数拟合效度越高。

似然函数值的自然对数的—2倍，常用来反映模型的拟合程度，其值越小，表示拟合程度越好.

似然比统计量，服从卡方分布，是一个检验统计量，无所谓越大越好还是越小越好。该统计量大于卡方临界值时，拒绝原假设，否则接受原假设

Maximum likelihood estimation (越大越好) optimizes likelihood function ( no negative sign).

Here is why the objective function defined as -2 Log likelihood

The log function is monotonic and makes it easy to calculate. Some software do it in minimum that is why there a negative sign(-). My guess is because a regression problem is defined to minimize sum of error square. The number 2 is there for ease for hypothesis testing when one needs it, because 2*( loglikehood ration) is of chi square distribution with # df.

17.残差分析

1.方差齐性

2.正态性

1.方差齐性

2.正态性

Durbin-Watson检验

np.sum( np.diff( result.resid.values )**2.0 )

Out[18]: 3.1437096272928842

DW = np.sum( np.diff( result.resid.values )**2.0 )/result.ssr

Out[20]: 1.9753463429714668

print('Durbin-Watson: {:.5f}'.format( DW ))

Durbin-Watson: 1.97535

D.W统计量是用来检验残差分布是否为正态分布的,因为用OLS进行回归估计是假设模型残差服从正态分布的,因此,如果残差不服从正态分布,那么,模型将是有偏的,也就是说模型的解释能力是不强的.

D.W统计量在2左右说明残差是服从正态分布的,若偏离2太远,那么你所构建的模型的解释能力就要受影响了.

在线性回归中，我们总是假设残差是彼此独立的(不相关)。如果违反相互独立假设，一些模型的拟合结果就会成问题。例如，误差项之间的正相关往往会放大系数 t 值，从而使预测变量显得重要，而事实上它们可能并不重要。

Durbin-Watson 统计量通过确定两个相邻误差项的相关性是否为零来检验回归残差是否存在自相关。该检验以误差均由一阶自回归过程生成的假设为基础。要从检验中得出结论，根据样本量n和自变量数目k'查DW分布表，得下临界值LD 和上临界值UD，并依下列准则判断残差的自相关情形:

(1)如果0

(2)如果LD

(3)如果UD

(4)如果4-UD

(5)如果4-LD

详细的检验表(请右键另存为)：

T=6-100 (

T=100-200 (.TXT)

T=200-500 (.TXT)

T=500-2000 (.TXT)

Durbin-Waterson Test 检验表

从高斯-马尔可夫定理的证明过程中可以看出，只有在同方差和非自相关性的条件下，OLS估计才具有最小方差性。当模型存在自相关性时，OLS估计仍然是无偏估计，但不再具有有效性。这与存在异方差性时的情况一样，说明存在其他的参数估计方法，其估计误差小于OLS估计的误差；也就是说，对于存在自相关性的模型，应该改用其他方法估计模型中的参数。

1.自相关不影响OLS估计量的线性和无偏性，但使之失去有效性

2.自相关的系数估计量将有相当大的方差

3.自相关系数的T检验不显著

4.模型的预测功能失效

Jarque–Bera Test

H0：skewness (S), and kurtosis (K) 都等于0

H1：skewness (S), and kurtosis (K)有一个不等于0

样本小时，H1会成立

The Jarque–Bera test is another test that considers skewness (S), and kurtosis (K).The null hypothesis is that the distribution is normal, that both the skewness and excess kurtosis equal zero, or alternatively, that the skewness is zero and the regular run-of-the-mill kurtosis is three. Unfortunately,with small samples the Jarque–Bera

Condition Number

如果大于30，表明两个变量有很大共线性

The condition number measures the sensitivity of a function’s output to its input.

When two predictor variables are highly correlated, which is called multicollinearity,

the coefficients or factors of those predictor variables can fluctuate erratically for

small changes in the data or the model. Ideally, similar models should be similar,

i.e., have approximately equal coefficients. Multicollinearity can cause numerical

matrix inversion to crap out, or produce inaccurate results (see Kaplan 2009). One

approach to this problem in regression is the technique of ridge regression, which is

available in the Python package sklearn.

We calculate the condition number by taking the eigenvalues of the product of

the predictor variables (including the constant vector of ones) and then taking the

square root of the ratio of the largest eigenvalue to the smallest eigenvalue. If the

condition number is greater than 30, then the regression may have multicollinearity.

AIC

外文名Akaike information criterion

赤池信息量准则[1] 是由日本统计学家赤池弘次创立的，以熵的概念基础确定。

赤池信息量准则，即Akaike information criterion、简称AIC，是衡量统计模型拟合优良性的一种标准，是由日本统计学家赤池弘次创立和发展的。赤池信息量准则建立在熵的概念基础上，可以权衡所估计模型的复杂度和此模型拟合数据的优良性。

公式：

在一般的情况下，AIC可以表示为：

AIC=(2k-2L)/n

参数越少，AIC值越小，模型越好

样本数越多，AIC值越小，模型越好

它的假设条件是模型的误差服从独立正态分布。

其中：k是所拟合模型中参数的数量，L是对数似然值,n是观测值数目。

AIC的大小取决于L和k。k取值越小，AIC越小；L取值越大，AIC值越小。k小意味着模型简洁，L大意味着模型精确。因此AIC和修正的决定系数类似，在评价模型是兼顾了简洁性和精确性。

具体到，L=-(n/2)*ln(2*pi)-(n/2)*ln(sse/n)-n/2.其中n为样本量，sse为残差平方和

表明增加自由参数的数目提高了拟合的优良性，AIC鼓励数据拟合的优良性但是尽量避免出现过度拟合(Overfitting)的情况。所以优先考虑的模型应是AIC值最小的那一个。赤池信息准则的方法是寻找可以最好地解释数据但包含最少自由参数的模型。

AICc和AICu

在样本小的情况下，AIC转变为AICc：

AICc=AIC+[2k(k+1)/(n-k-1)

当n增加时，AICc收敛成AIC。所以AICc可以应用在任何样本大小的情况下(Burnham and Anderson, 2004)。

McQuarrie 和 Tsai(1998: 22)把AICc定义为：

AICc=ln(RSS/n)+(n+k)/(n-k-2),

他们提出的另一个紧密相关指标为AICu：

AICu=ln[RSS/(n-k)]+(n+k)/(n-k-2).

QAIC

QAIC(Quasi-AIC)可以定义为：

QAIC=2k-1/c*2lnL

其中：c是方差膨胀因素。因此QAIC可以调整过度离散(或者缺乏拟合)。

在小样本情况下, QAIC表示为：

QAICc=QAIC+2k(2k+1)/(n-k-1)

平均值的置信区间(confidence interval)：对于自变量的一个给定值X0，求出因变量y的平均值的估计区间

个别值的预测区间(prediction interval):对于自变量的一个给定值X0，求出因变量y的一个个别值的估计区间

# -*- coding: utf-8 -*-

"""

Created on Mon Jul 10 11:04:51 2017

@author: toby

"""

# Import standard packages

import numpy as np

import matplotlib.pyplot as plt

import scipy.stats as stats

def fitLine(x, y, alpha=0.05, newx=[], plotFlag=1):

''' Fit a curve to the data using a least squares 1st order polynomial fit '''

# Summary data

n = len(x) # number of samples

Sxx = np.sum(x**2) - np.sum(x)**2/n

# Syy = np.sum(y**2) - np.sum(y)**2/n # not needed here

Sxy = np.sum(x*y) - np.sum(x)*np.sum(y)/n

mean_x = np.mean(x)

mean_y = np.mean(y)

# Linefit

b = Sxy/Sxx

a = mean_y - b*mean_x

# Residuals

fit = lambda xx: a + b*xx

residuals = y - fit(x)

var_res = np.sum(residuals**2)/(n-2)

sd_res = np.sqrt(var_res)

# Confidence intervals

se_b = sd_res/np.sqrt(Sxx)

se_a = sd_res*np.sqrt(np.sum(x**2)/(n*Sxx))

df = n-2 # degrees of freedom

tval = stats.t.isf(alpha/2., df) # appropriate t value

ci_a = a + tval*se_a*np.array([-1,1])

ci_b = b + tval*se_b*np.array([-1,1])

# create series of new test x-values to predict for

npts = 100

px = np.linspace(np.min(x),np.max(x),num=npts)

se_fit = lambda x: sd_res * np.sqrt( 1./n + (x-mean_x)**2/Sxx)

se_predict = lambda x: sd_res * np.sqrt(1+1./n + (x-mean_x)**2/Sxx)

print(('Summary: a={0:5.4f}+/-{1:5.4f}, b={2:5.4f}+/-{3:5.4f}'.format(a,tval*se_a,b,tval*se_b)))

print(('Confidence intervals: ci_a=({0:5.4f} - {1:5.4f}), ci_b=({2:5.4f} - {3:5.4f})'.format(ci_a[0], ci_a[1], ci_b[0], ci_b[1])))

print(('Residuals: variance = {0:5.4f}, standard deviation = {1:5.4f}'.format(var_res, sd_res)))

print(('alpha = {0:.3f}, tval = {1:5.4f}, df={2:d}'.format(alpha, tval, df)))

# Return info

ri = {'residuals': residuals,

'var_res': var_res,

'sd_res': sd_res,

'alpha': alpha,

'tval': tval,

'df': df}

if plotFlag == 1:

# Plot the data

plt.figure()

plt.plot(px, fit(px),'k', label='Regression line')

#plt.plot(x,y,'k.', label='Sample observations', ms=10)

plt.plot(x,y,'k.')

x.sort()

limit = (1-alpha)*100

plt.plot(x, fit(x)+tval*se_fit(x), 'r--', lw=2, label='Confidence limit ({0:.1f}%)'.format(limit))

plt.plot(x, fit(x)-tval*se_fit(x), 'r--', lw=2 )

plt.plot(x, fit(x)+tval*se_predict(x), '--', lw=2, color=(0.2,1,0.2), label='Prediction limit ({0:.1f}%)'.format(limit))

plt.plot(x, fit(x)-tval*se_predict(x), '--', lw=2, color=(0.2,1,0.2))

plt.xlabel('X values')

plt.ylabel('Y values')

plt.title('Linear regression and confidence limits')

# configure legend

plt.legend(loc=0)

leg = plt.gca().get_legend()

ltext = leg.get_texts()

plt.setp(ltext, fontsize=14)

# show the plot

outFile = 'regression_wLegend.png'

plt.savefig(outFile, dpi=200)

print('Image saved to {0}'.format(outFile))

plt.show()

if newx != []:

try:

newx.size

except AttributeError:

newx = np.array([newx])

print(('Example: x = {0}+/-{1} => se_fit = {2:5.4f}, se_predict = {3:6.5f}'\

.format(newx[0], tval*se_predict(newx[0]), se_fit(newx[0]), se_predict(newx[0]))))

newy = (fit(newx), fit(newx)-se_predict(newx), fit(newx)+se_predict(newx))

return (a,b,(ci_a, ci_b), ri, newy)

else:

return (a,b,(ci_a, ci_b), ri)

def Draw_confidenceInterval(x,y):

x=np.array(x)

y=np.array(y)

goodIndex = np.invert(np.logical_or(np.isnan(x), np.isnan(y)))

(a,b,(ci_a, ci_b), ri,newy) = fitLine(x[goodIndex],y[goodIndex], alpha=0.01,newx=np.array([1,4.5]))

y=[6.47,6.13,6.19,4.89,5.63,4.52,5.89,4.79,5.27,6.08]

x=[4.03,3.76,3.77,3.34,3.47,2.92,3.20,2.71,3.53,4.51]

Draw_confidenceInterval(x,y)

19.OLS参数解读

# -*- coding: utf-8 -*-

#斯皮尔曼等级相关(Spearman’s correlation coefficient for ranked data)

import math,pylab,scipy

import numpy as np

import scipy.stats as stats

from scipy.stats import t

from scipy.stats import f

import pandas as pd

import matplotlib.pyplot as plt

from statsmodels.stats.diagnostic import lillifors

import normality_check

import statsmodels.formula.api as sm

x=[4.03,3.76,3.77,3.34,3.47,2.92,3.20,2.71,3.53,4.51]

y=[6.47,6.13,6.19,4.89,5.63,4.52,5.89,4.79,5.27,6.08]

list_group=[x,y]

sample=len(x)

#显著性

a=0.05

#数据可视化

plt.plot(x,y,'ro')

#斯皮尔曼等级相关，非参数检验

def Spearmanr(x,y):

print("use spearmanr,Nonparametric tests")

#样本不一致时，发出警告

if len(x)!=len(y):

print ("warming,the samples are not equal!")

r,p=stats.spearmanr(x,y)

print("spearman r**2:",r**2)

print("spearman p:",p)

if sample<500 and p>0.05:

print("when sample < 500，p has no mean(>0.05)")

print("when sample > 500，p has mean")

#皮尔森，参数检验

def Pearsonr(x,y):

print("use Pearson,parametric tests")

r,p=stats.pearsonr(x,y)

print("pearson r**2:",r**2)

print("pearson p:",p)

if sample<30:

print("when sample <30,pearson has no mean")

#皮尔森，参数检验,带有详细参数

def Pearsonr_details(x,y,xLabel,yLabel,formula):

n=len(x)

df=n-2

data=pd.DataFrame({yLabel:y,xLabel:x})

result = sm.ols(formula, data).fit()

print(result.summary())

#模型F分布显著性分析

print('\n')

print("linear relation Significant test:...................................")

#如果F检验的P值<0.05，拒绝H0，x和y无显著关系，H1成立，x和y有显著关系

if result.f_pvalue<0.05:

print ("P value of f test<0.05,the linear relation is right.")

#R的显著检验

print('\n')

print("R significant test:...................................")

r_square=result.rsquared

r=math.sqrt(r_square)

t_score=r*math.sqrt(n-2)/(math.sqrt(1-r**2))

t_std=t.isf(a/2,df)

if t_scoret_std:

print ("R is significant according to its sample size")

else:

print ("R is not significant")

#残差分析

print('\n')

print("residual error analysis:...................................")

states=normality_check.check_normality(result.resid)

if states==True:

print("the residual error are normal distributed")

else:

print("the residual error are not normal distributed")

#残差偏态和峰态

Skew = stats.skew(result.resid, bias=True)

Kurtosis = stats.kurtosis(result.resid, fisher=False,bias=True)

if round(Skew,1)==0:

print("residual errors normality Skew:in middle,perfect match")

elif round(Skew,1)>0:

print("residual errors normality Skew:close right")

elif round(Skew,1)<0:

print("residual errors normality Skew:close left")

if round(Kurtosis,1)==3:

print("residual errors normality Kurtosis:in middle,perfect match")

elif round(Kurtosis,1)>3:

print("residual errors normality Kurtosis:more peak")

elif round(Kurtosis,1)<3:

print("residual errors normality Kurtosis:more flat")

#自相关分析autocorrelation

print('\n')

print("autocorrelation test:...................................")

DW = np.sum( np.diff( result.resid.values )**2.0 )/ result.ssr

if round(DW,1)==2:

print("Durbin-Watson close to 2,there is no autocorrelation.OLS model works well")

#共线性检查

print('\n')

print("multicollinearity test:")

conditionNumber=result.condition_number

if conditionNumber>30:

print("conditionNumber>30,multicollinearity exists")

else:

print("conditionNumber<=30,multicollinearity not exists")

#绘制残差图，用于方差齐性检验

Draw_residual(list(result.resid))

'''

result.rsquared

Out[28]: 0.61510660055413524

'''

#kendalltau非参数检验

def Kendalltau(x,y):

print("use kendalltau,Nonparametric tests")

r,p=stats.kendalltau(x,y)

print("kendalltau r**2:",r**2)

print("kendalltau p:",p)

#选择模型

def R_mode(x,y,xLabel,yLabel,formula):

#正态性检验

Normal_result=normality_check.NormalTest(list_group)

print ("normality result:",Normal_result)

if len(list_group)>2:

Kendalltau(x,y)

if Normal_result==False:

Spearmanr(x,y)

Kendalltau(x,y)

if Normal_result==True:

Pearsonr_details(x,y,xLabel,yLabel,formula)

#调整的R方

def Adjust_Rsquare(r_square,n,k):

adjust_rSquare=1-((1-r_square)*(n-1)*1.0/(n-k-1))

return adjust_rSquare

'''

n=len(x)

n=10

k=1

r_square=0.615

Adjust_Rsquare(r_square,n,k)

Out[11]: 0.566875

'''

#绘图

def Plot(x,y,yLabel,xLabel,Title):

plt.plot(x,y,'ro')

plt.ylabel(yLabel)

plt.xlabel(xLabel)

plt.title(Title)

plt.show()

#绘图参数

yLabel='Alcohol'

xLabel='Tobacco'

Title='Sales in Several UK Regions'

Plot(x,y,yLabel,xLabel,Title)

formula='Alcohol ~ Tobacco'

#绘制残点图

def Draw_residual(residual_list):

x=[i for i in range(1,len(residual_list)+1)]

y=residual_list

pylab.plot(x,y,'ro')

pylab.title("draw residual to check wrong number")

# Pad margins so that markers don't get clipped by the axes,让点不与坐标轴重合

pylab.margins(0.3)

#绘制网格

pylab.grid(True)

pylab.show()

R_mode(x,y,xLabel,yLabel,formula)

'''

result.fittedvalues表示预测的y值阵列

result.fittedvalues

Out[42]:

0 6.094983

1 5.823391

2 5.833450

3 5.400915

4 5.531682

5 4.978439

6 5.260090

7 4.767201

8 5.592035

9 6.577813

dtype: float64

#计算残差的偏态

S = stats.skew(result.resid, bias=True)

Out[44]: -0.013678125910039975

K = stats.kurtosis(result.resid, fisher=False,bias=True)

Out[47]: 1.5271300905736027

'''

20.一元回归共线性

官网例子

https://github.com/thomas-haslwanter/statsintro_python/blob/master/ISP/Code_Quantlets/11_LinearModels/simpleModels/swim100m.csv

导入数据

# -*- coding: utf-8 -*-

'''Simple linear models.

- "model_formulas" is based on examples in Kaplan's book "Statistical Modeling".

- "polynomial_regression" shows how to work with simple design matrices, like MATLAB's "regress" command.

'''

# Import standard packages

import numpy as np

import pandas as pd

# additional packages

from statsmodels.formula.api import ols

import statsmodels.regression.linear_model as sm

from statsmodels.stats.anova import anova_lm

def model_formulas():

''' Define models through formulas '''

# Get the data:

# Development of world record times for the 100m Freestyle, for men and women.

data = pd.read_csv('swim100m.csv')

# Different models

model1 = ols("time ~ sex", data).fit() # one factor

model2 = ols("time ~ sex + year", data).fit() # two factors

model3 = ols("time ~ sex * year", data).fit() # two factors with interaction

# Model information

print((model1.summary()))

print((model2.summary()))

print((model3.summary()))

# ANOVAs

print('----------------- Results ANOVAs: Model 1 -----------------------')

print((anova_lm(model1)))

print('--------------------- Model 2 -----------------------------------')

print((anova_lm(model2)))

print('--------------------- Model 3 -----------------------------------')

model3Results = anova_lm(model3)

print(model3Results)

# Just to check the correct run

return model3Results['F'][0] # should be 156.1407931415788

def polynomial_regression():

''' Define the model directly through the design matrix.

Similar to MATLAB's "regress" command.

'''

# Generate the data: a noisy second order polynomial

# To get reproducable values, I provide a seed value

np.random.seed(987654321)

t = np.arange(0,10,0.1)

y = 4 + 3*t + 2*t**2 + 5*np.random.randn(len(t))

# --- >>> START stats <<< ---

# Make the fit. Note that this is another "OLS" than the one in "model_formulas",

# as it works directly with the design matrix!

M = np.column_stack((np.ones(len(t)), t, t**2))

res = sm.OLS(y, M).fit()

# --- >>> STOP stats <<< ---

# Display the results

print('Summary:')

print((res.summary()))

print(('The fit parameters are: {0}'.format(str(res.params))))

print('The confidence intervals are:')

print((res.conf_int()))

return res.params # should be [ 4.74244177, 2.60675788, 2.03793634]

if __name__ == '__main__':

model_formulas()

polynomial_regression()

第一个模型

model1 = ols("time ~ sex", data).fit() # one factor

多个参数表示不合适：

R方：值太小，解释度太差

AIC/BIC：值太大，模型不合适

omnibus:p概率0，残差非正太分布，模型不合适

durbin-waston:值太小，autocorrelation同相关明显

模型2 分析

model2 = ols("time ~ sex + year", data).fit() # two factors

AIC/BIC：值太大，模型不合适

omnibus:p概率0，残差非正太分布，模型不合适

durbin-waston:值太小，autocorrelation同相关明显

condition number太大：远远高于30，多重共线明显

模型3

model3 = ols("time ~ sex * year", data).fit() # two factors with interaction

AIC/BIC：值太大，模型不合适

omnibus:p概率0，残差非正太分布，模型不合适

durbin-waston:值太小，autocorrelation同相关明显

condition number太大：远远高于30，多重共线明显

21.bootstrap

安装bootstrap

http://scikits.scipy.org/bootstrap

https://wenku.baidu.com/view/0c449147336c1eb91a375d39.html

Scikits.bootstrap provides bootstrap confidence interval algorithms for scipy.

At present, it is rather feature-incomplete and in flux. However, the functions that have been written should be relatively stable as far as results.

Much of the code has been written based off the descriptions from Efron and Tibshirani's Introduction to the Bootstrap, and results should match the results obtained from following those explanations. However, the current ABC code is based off of the modified-BSD-licensed R port of the Efron bootstrap code, as I do not believe I currently have a sufficient understanding of the ABC method to write the code independently.

In any case, please contact me (Constantine Evans ) with any questions or suggestions. I'm trying to add documentation, and will be adding tests as well. I'm especially interested, however, in how the API should actually look; please let me know if you think the package should be organized differently.

The package is licensed under the Modified BSD License.

pip install scikits.bootstrap

bootstrap.py

# -*- coding: utf-8 -*-

''' Example of bootstrapping the confidence interval for the mean of a sample distribution

This function requires "bootstrap.py", which is available from

https://github.com/cgevans/scikits-bootstrap

'''

import scikits

# Import standard packages

import matplotlib.pyplot as plt

import scipy as sp

from scipy import stats

# additional packages

import scikits.bootstrap as bootstrap

def generate_data():

''' Generate the data for the bootstrap simulation '''

# To get reproducable values, I provide a seed value

sp.random.seed(987654321)

# Generate a non-normally distributed datasample

data = stats.poisson.rvs(2, size=1000)

# Show the data

plt.plot(data, '.')

plt.title('Non-normally distributed dataset: Press any key to continue')

plt.waitforbuttonpress()

plt.close()

return(data)

def calc_bootstrap(data):

''' Find the confidence interval for the mean of the given data set with bootstrapping. '''

# --- >>> START stats <<< ---

# Calculate the bootstrap

CIs = bootstrap.ci(data=data, statfunction=sp.mean)

# --- >>> STOP stats <<< ---

# Print the data: the "*" turns the array "CIs" into a list

print(('The conficence intervals for the mean are: {0} - {1}'.format(*CIs)))

return CIs

if __name__ == '__main__':

data = generate_data()

calc_bootstrap(data)

input('Done')

bootstrapping 解决不知道分布情况下，计算平均值的置信区间

你可能感兴趣的:(python回归代码)

guava loadingCache代码示例 IM 胡鹏飞 Java 工具类介绍
publicclassTest2{publicstaticvoidmain(String[]args)throwsException{LoadingCachecache=CacheBuilder.newBuilder()//设置并发级别为8，并发级别是指可以同时写缓存的线程数.concurrencyLevel(8)//设置缓存容器的初始容量为10.initialCapacity(10)//设置缓存
系统学习Python——并发模型和异步编程：进程、线程和GIL
分类目录：《系统学习Python》总目录在文章《并发模型和异步编程：基础知识》我们简单介绍了Python中的进程、线程和协程。本文就着重介绍Python中的进程、线程和GIL的关系。Python解释器的每个实例都是一个进程。使用multiprocessing或concurrent.futures库可以启动额外的Python进程。Python的subprocess库用于启动运行外部程序（不管使用何种
C++11堆操作深度解析：std::is_heap与std::is_heap_until原理解析与实践
文章目录堆结构基础与函数接口堆的核心性质函数签名与核心接口std::is_heapstd::is_heap_until实现原理深度剖析std::is_heap的验证逻辑std::is_heap_until的定位策略算法优化细节代码实践与案例分析基础用法演示自定义比较器实现最小堆检查边缘情况处理性能分析与实际应用时间复杂度对比典型应用场景与手动实现的对比注意事项与最佳实践迭代器要求比较器设计C++标
Flask框架入门：快速搭建轻量级Python网页应用「已注销」 python-AI python基础网站网络 python flask 后端
转载：Flask框架入门：快速搭建轻量级Python网页应用1.Flask基础Flask是一个使用Python编写的轻量级Web应用框架。它的设计目标是让Web开发变得快速简单，同时保持应用的灵活性。Flask依赖于两个外部库：Werkzeug和Jinja2，Werkzeug作为WSGI工具包处理Web服务的底层细节，Jinja2作为模板引擎渲染模板。安装Flask非常简单，可以使用pip安装命令
WHQL签名怎么申请 GDCA SSL证书 windows
WHQL（WindowsHardwareQualityLabs）签名是微软对硬件和驱动程序进行认证的一种方式，以确保它们与Windows操作系统的兼容性和稳定性。以下是申请WHQL签名的基本步骤，供您参考：1.准备阶段准备硬件设备和驱动程序：确保您的硬件设备已经准备好，并且对应的驱动程序已经经过充分的测试，能够在各种配置和环境下正常工作。获取EV代码签名证书：根据微软的要求，驱动程序进行WHQL认
Python Flask 框架入门：快速搭建 Web 应用的秘诀 Python编程之道 Python人工智能与大数据 Python编程之道 python flask 前端 ai
PythonFlask框架入门：快速搭建Web应用的秘诀关键词Flask、微框架、路由系统、Jinja2模板、请求处理、WSGI、Web开发摘要想快速用Python搭建一个灵活的Web应用？Flask作为“微框架”代表，凭借轻量、可扩展的特性，成为初学者和小型项目的首选。本文将从Flask的核心概念出发，结合生活化比喻、代码示例和实战案例，带你一步步掌握：如何用Flask搭建第一个Web应用？路由
【LeetCode 热题 100】24. 两两交换链表中的节点——（解法一）迭代+哨兵 xumistore LeetCode leetcode 链表算法 java
Problem:24.两两交换链表中的节点题目：给你一个链表，两两交换其中相邻的节点，并返回交换后链表的头节点。你必须在不修改节点内部的值的情况下完成本题（即，只能进行节点交换）。文章目录整体思路完整代码时空复杂度时间复杂度：O(N)空间复杂度：O(1)整体思路这段代码旨在解决一个经典的链表操作问题：两两交换链表中的节点(SwapNodesinPairs)。问题要求将链表中每两个相邻的节点进行交换
冒泡、选择、插入排序：三大基础排序算法深度解析（C语言实现） xienda 算法排序算法数据结构
在算法学习道路上，排序算法是每位程序员必须掌握的基石。本文将深入解析冒泡排序、选择排序和插入排序这三种基础排序算法，通过C语言代码实现和对比分析，帮助读者彻底理解它们的差异与应用场景。算法原理与代码实现1.冒泡排序（BubbleSort）工作原理：通过重复比较相邻元素，将较大元素逐步"冒泡"到数组末尾。voidbubbleSort(intarr[],intn){ for(inti=0;iarr[
Leetcode 148. 排序链表
文章目录前引题目代码（首刷看题解）代码（8.9二刷部分看解析）代码（9.15三刷部分看解析）前引综合性比较强的一道题，要求时间复杂度必须O(logn)才能通过，最适合链表的排序算法就是归并。这里采用自顶向下的方法步骤：找到链表中点（双指针）对两个子链表排序(递归，直到只有一个结点，记得将子链表最后指向nullptr）归并（引入dummy结点）题目Leetcode148.排序链表代码（首刷看题解）c
python_虚拟环境阿_焦 python
第一、配置虚拟环境：virtualenv（1）pipvirtualenv>安装虚拟环境包（2）pipinstallvirtualenvwrapper-win>安装虚拟环境依赖包（3）c盘创建虚拟目录>C:\virtualenv>配置环境变量【了解一下】：（1）如何使用virtualenv创建虚拟环境a、cd到C:\virtualenv目录下：b、mkvirtualenvname>创建虚拟环境nam
LeetCode 148. 排序链表：归并排序的细节解析进击的小白菜 2025 Top100 详解 leetcode 链表算法
文章目录题目描述一、方法思路：归并排序的核心步骤二、关键实现细节：快慢指针分割链表1.快慢指针的初始化问题2.为什么选择`fast=head.next`？示例1：链表长度为偶数（`1->2->3->4`）三、完整代码实现四、复杂度分析五、总结题目描述LeetCode148题要求对链表进行排序，时间复杂度需为O(nlogn)，且空间复杂度为O(logn)。由于链表的特殊结构（无法随机访问），归并排序
前端项目架构设计要领
1.架构设计的核心目标在设计前端项目架构时，核心目标是模块化、可维护、可扩展、可测试，以及开发效率的最大化。这些目标可以通过以下几个方面来实现：组件化：将UI功能封装为可复用的组件。模块化：将业务逻辑分解为独立的模块或服务。自动化构建与部署：实现自动化构建、测试和部署流程，减少人为操作的错误。代码规范化与检查：确保团队协作时，代码风格和质量一致。2.项目目录结构设计一个清晰合理的目录结构对大型项目
精通Canvas：15款时钟特效代码实现指南烟幕缭绕
本文还有配套的精品资源，点击获取简介：HTML5的Canvas是一个用于绘制矢量图形的API，通过JavaScript实现动态效果。本项目集合了15种不同的时钟特效代码，帮助开发者通过学习绘制圆形、线条、时间更新、旋转、颜色样式设置及动画效果等概念，深化对Canvas的理解和应用。项目中的CSS文件负责时钟的样式设定，而JS文件则包含实现各种特效的逻辑，通过不同的函数或类处理时间更新和动画绘制，提
嵌入式系统LCD显示模块编程实践
本文还有配套的精品资源，点击获取简介：本文档提供了一个具有800x480分辨率的3.5英寸液晶显示模块LW350AC9001的驱动程序代码，以及嵌入式系统中使用C/C++语言进行硬件编程的实践指南。该模块的2mm厚度使其适用于空间受限的便携式设备。内容包括驱动程序源代码、硬件控制接口使用方法，以及如何在嵌入式系统中进行图形处理、电源管理与性能优化。1.嵌入式系统原理1.1嵌入式系统概念嵌入式系统是
Python爱心光波
系列文章序号直达链接Tkinter1Python李峋同款可写字版跳动的爱心2Python跳动的双爱心3Python蓝色跳动的爱心4Python动漫烟花5Python粒子烟花Turtle1Python满屏飘字2Python蓝色流星雨3Python金色流星雨4Python漂浮爱心5Python爱心光波①6Python爱心光波②7Python满天繁星8Python五彩气球9Python白色飘雪10Pyt
Python流星雨 Want595 python 开发语言
文章目录系列文章写在前面技术需求完整代码代码分析1.模块导入2.画布设置3.画笔设置4.颜色列表5.流星类(Star)6.流星对象创建7.主循环8.流星运动逻辑9.视觉效果10.总结写在后面系列文章序号直达链接表白系列1Python制作一个无法拒绝的表白界面2Python满屏飘字表白代码3Python无限弹窗满屏表白代码4Python李峋同款可写字版跳动的爱心5Python流星雨代码6Python
算法学习笔记：17.蒙特卡洛算法 ——从原理到实战，涵盖 LeetCode 与考研 408 例题
在计算机科学和数学领域，蒙特卡洛算法（MonteCarloAlgorithm）以其独特的随机抽样思想，成为解决复杂问题的有力工具。从圆周率的计算到金融风险评估，从物理模拟到人工智能，蒙特卡洛算法都发挥着不可替代的作用。本文将深入剖析蒙特卡洛算法的思想、解题思路，结合实际应用场景与Java代码实现，并融入考研408的相关考点，穿插图片辅助理解，帮助你全面掌握这一重要算法。蒙特卡洛算法的基本概念蒙特卡
Python之七彩花朵代码实现 PlutoZuo Python python 开发语言
Python之七彩花朵代码实现文章目录Python之七彩花朵代码实现下面是一个简单的使用Python的七彩花朵。这个示例只是一个简单的版本，没有很多高级功能，但它可以作为一个起点，你可以在此基础上添加更多功能。importturtleastuimportrandomasraimportmathtu.setup(1.0,1.0)t=tu.Pen()t.ht()colors=['red','skybl
2025代码块种类以及作用 2501_92758067 intellij-idea phpstorm idea jupyter
https://www.bilibili.com/opus/1088624478422827030https://www.bilibili.com/opus/1088624529930977287https://t.bilibili.com/1088633635294150662https://www.bilibili.com/opus/1088633635294150662https://t.b
Python 脚本最佳实践2025版
前文可以直接把这篇文章喂给AI,可以放到AI角色设定里,也可以直接作为提示词.这样,你只管提需求,写脚本就让AI来.概述追求简洁和清晰：脚本应简单明了。使用函数(functions)、常量(constants)和适当的导入(import)实践来有逻辑地组织你的Python脚本。使用枚举(enumerations)和数据类(dataclasses)等数据结构高效管理脚本状态。通过命令行参数增强交互性
（Python基础篇）了解和使用分支结构 EternityArt 基础篇 python
目录一、引言二、Python分支结构的类型与语法（一）if语句（单分支）（二）if-else语句（双分支）（三）if-elif-else语句（多分支）三、分支结构的应用场景（一）提示用户输入用户名，然后再提示输入密码，如果用户名是“admin”并且密码是“88888”则提示正确，否则，如果用户名不是admin还提示用户用户名不存在,（二）提示用户输入用户名，然后再提示输入密码，如果用户名是“adm
（Python基础篇）循环结构 EternityArt 基础篇 python
一、什么是Python循环结构？循环结构是编程中重复执行代码块的机制。在Python中，循环允许你：1.迭代处理数据：遍历列表、字典、文件内容等。2.自动化重复任务：如批量处理数据、生成序列等。3.控制执行流程：根据条件决定是否继续或终止循环。二、为什么需要循环结构？假设你需要打印1到100的所有偶数：没有循环：需手动编写100行print()语句。print(0)print(2)print(4)
（Python基础篇）字典的操作 EternityArt 基础篇 python 开发语言
一、引言在Python编程中，字典（Dictionary）是一种极具灵活性的数据结构，它通过“键-值对”（key-valuepair）的形式存储数据，如同现实生活中的字典——通过“词语（键）”快速查找“释义（值）”。相较于列表和元组的有序索引访问，字典的优势在于基于键的快速查找，这使得它在处理需要频繁通过唯一标识获取数据的场景中极为高效。掌握字典的操作，能让我们更高效地组织和管理复杂数据，是Pyt
Python七彩花朵 Want595 python 开发语言
系列文章序号直达链接Tkinter1Python李峋同款可写字版跳动的爱心2Python跳动的双爱心3Python蓝色跳动的爱心4Python动漫烟花5Python粒子烟花Turtle1Python满屏飘字2Python蓝色流星雨3Python金色流星雨4Python漂浮爱心5Python爱心光波①6Python爱心光波②7Python满天繁星8Python五彩气球9Python白色飘雪10Pyt
Leetcode 3604. Minimum Time to Reach Destination in Directed Graph Espresso Macchiato leetcode笔记 leetcode 3604 leetcode medium leetcode双周赛160 BFS 广度优先遍历最优路径
Leetcode3604.MinimumTimetoReachDestinationinDirectedGraph1.解题思路2.代码实现题目链接：3604.MinimumTimetoReachDestinationinDirectedGraph1.解题思路这一题思路上就是一个广度优先遍历，我们不断考察当前时间点以及位置的情况下，下一个点可行的位置，然后考察最近的时间点能够到达的位置，遍历全部可能
霍夫变换（Hough Transform）算法原来详解和纯C++代码实现以及OpenCV中的使用示例点云SLAM 算法图形图像处理算法 opencv 图像处理与计算机视觉算法直线提取检测目标检测霍夫变换算法
霍夫变换（HoughTransform）是一种经典的图像处理与计算机视觉算法，广泛用于检测图像中的几何形状，例如直线、圆、椭圆等。其核心思想是将图像空间中的“点”映射到参数空间中的“曲线”，从而将形状检测问题转化为参数空间中的峰值检测问题。一、霍夫变换基本思想输入：边缘图像（如经过Canny边缘检测）输出：一组满足几何模型的形状（如直线、圆）关键思想：图像空间中的一个点→参数空间中的一个曲线参数空
用OpenCV标定相机内参应用示例（C++和Python）
下面是一个完整的使用OpenCV进行相机内参标定（CameraCalibration）的示例，包括C++和Python两个版本，基于棋盘格图案标定。一、目标：相机标定通过拍摄多张带有棋盘格图案的图像，估计相机的内参：相机矩阵（内参）K畸变系数distCoeffs可选外参（R,T）标定精度指标（如重投影误差）二、棋盘格参数设置（根据自己的棋盘格设置）：棋盘格角点数：9x6（内角点，9列×6行）；每个
Vue3+Vite+TS+Axios整合详细教程老马聊技术 Vue Vite TS vue.js
1.Vite简介Vite是新一代的前端构建工具，在尤雨溪开发Vue3.0的时候诞生。类似于Webpack+Webpack-dev-server。其主要利用浏览器ESM特性导入组织代码，在服务器端按需编译返回，完全跳过了打包这个概念，服务器随起随用。生产中利用Rollup作为打包工具，号称下一代的前端构建工具。vite是一种新型的前端构建工具，能够显著的提升前端开发者的体验。它主要有俩部分组成：一个
Anaconda 详细下载与安装教程
Anaconda详细下载与安装教程1.简介Anaconda是一个用于科学计算的开源发行版，包含了Python和R的众多常用库。它还包括了conda包管理器，可以方便地安装、更新和管理各种软件包。2.下载Anaconda2.1访问官方网站首先，打开浏览器，访问Anaconda官方网站。2.2选择适合的版本在页面中，你会看到两个主要的下载选项：AnacondaIndividualEdition：适用于
kube-scheduler 抢占机制分享放大价值 kubernetes源码分析 kubernetes kube-scheduler 抢占
当pod调度失败后，会在PostFilter扩展点执行抢占流程，下面分析相关的代码实现抢占接口//PodNominatorabstractsoperationstomaintainnominatedPods.typePodNominatorinterface{//将pod加入抢占成功的node中AddNominatedPod(pod*PodInfo,nodeNamestring)//将pod从no
开发者关心的那些事圣子足道 ios 游戏编程 apple 支付
我要在app里添加IAP，必须要注册自己的产品标识符（product identifiers）。产品标识符是什么？产品标识符（Product Identifiers）是一串字符串，它用来识别你在应用内贩卖的每件商品。App Store用产品标识符来检索产品信息，标识符只能包含大小写字母（A-Z）、数字（0-9）、下划线（-）、以及圆点(.)。你可以任意排列这些元素，但我们建议你创建标识符时使用
负载均衡器技术Nginx和F5的优缺点对比 bijian1013 nginx F5
对于数据流量过大的网络中，往往单一设备无法承担，需要多台设备进行数据分流，而负载均衡器就是用来将数据分流到多台设备的一个转发器。目前有许多不同的负载均衡技术用以满足不同的应用需求，如软/硬件负载均衡、本地/全局负载均衡、更高
LeetCode[Math] - #9 Palindrome Number Cwind java Algorithm 题解 LeetCode Math
原题链接：#9 Palindrome Number 要求：判断一个整数是否是回文数，不要使用额外的存储空间难度：简单分析：题目限制不允许使用额外的存储空间应指不允许使用O(n)的内存空间，O(1)的内存用于存储中间结果是可以接受的。于是考虑将该整型数反转，然后与原数字进行比较。注：没有看到有关负数是否可以是回文数的明确结论，例如
画图板的基本实现 15700786134 画图板
要实现画图板的基本功能，除了在qq登陆界面中用到的组件和方法外，还需要添加鼠标监听器，和接口实现。首先，需要显示一个JFrame界面： public class DrameFrame extends JFrame { //显示
linux的ps命令被触发 linux
Linux中的ps命令是Process Status的缩写。ps命令用来列出系统中当前运行的那些进程。ps命令列出的是当前那些进程的快照，就是执行ps命令的那个时刻的那些进程，如果想要动态的显示进程信息，就可以使用top命令。要对进程进行监测和控制，首先必须要了解当前进程的情况，也就是需要查看当前进程，而 ps 命令就是最基本同时也是非常强大的进程查看命令。使用该命令可以确定有哪些进程正在运行
Android 音乐播放器下一曲连续跳几首歌肆无忌惮_ android
最近在写安卓音乐播放器的时候遇到个问题。在MediaPlayer播放结束时会回调 player.setOnCompletionListener(new OnCompletionListener() { @Override public void onCompletion(MediaPlayer mp) { mp.reset(); Log.i("H
java导出txt文件的例子知了ing java servlet
代码很简单就一个servlet,如下： package com.eastcom.servlet; import java.io.BufferedOutputStream; import java.io.IOException; import java.net.URLEncoder; import java.sql.Connection; import java.sql.Resu
Scala stack试玩, 提高第三方依赖下载速度矮蛋蛋 scala sbt
原文地址： http://segmentfault.com/a/1190000002894524 sbt下载速度实在是惨不忍睹, 需要做些配置优化下载typesafe离线包, 保存为ivy本地库 wget http://downloads.typesafe.com/typesafe-activator/1.3.4/typesafe-activator-1.3.4.zip 解压r
phantomjs安装(linux，附带环境变量设置) ，以及casperjs安装。 alleni123 linux spider
1. 首先从官网 http://phantomjs.org/下载phantomjs压缩包，解压缩到/root/phantomjs文件夹。 2. 安装依赖 sudo yum install fontconfig freetype libfreetype.so.6 libfontconfig.so.1 libstdc++.so.6 3. 配置环境变量 vi /etc/profil
JAVA IO FileInputStream和FileOutputStream，字节流的打包输出百合不是茶 java核心思想 JAVA IO操作字节流
在程序设计语言中，数据的保存是基本，如果某程序语言不能保存数据那么该语言是不可能存在的，JAVA是当今最流行的面向对象设计语言之一，在保存数据中也有自己独特的一面，字节流和字符流 1，字节流是由字节构成的，字符流是由字符构成的字节流和字符流都是继承的InputStream和OutPutStream ,java中两种最基本的就是字节流和字符流类 FileInputStream
Spring基础实例（依赖注入和控制反转） bijian1013 spring
前提条件：在http://www.springsource.org/download网站上下载Spring框架，并将spring.jar、log4j-1.2.15.jar、commons-logging.jar加载至工程1.武器接口 package com.bijian.spring.base3; public interface Weapon { void kil
HR看重的十大技能 bijian1013 提升能力 HR 成长
一个人掌握何种技能取决于他的兴趣、能力和聪明程度，也取决于他所能支配的资源以及制定的事业目标，拥有过硬技能的人有更多的工作机会。但是，由于经济发展前景不确定，掌握对你的事业有所帮助的技能显得尤为重要。以下是最受雇主欢迎的十种技能。　　一、解决问题的能力　　每天，我们都要在生活和工作中解决一些综合性的问题。那些能够发现问题、解决问题并迅速作出有效决
【Thrift一】Thrift编译安装 bit1129 thrift
什么是Thrift The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and s
【Avro三】Hadoop MapReduce读写Avro文件 bit1129 mapreduce
Avro是Doug Cutting(此人绝对是神一般的存在）牵头开发的。开发之初就是围绕着完善Hadoop生态系统的数据处理而开展的（使用Avro作为Hadoop MapReduce需要处理数据序列化和反序列化的场景）,因此Hadoop MapReduce集成Avro也就是自然而然的事情。这个例子是一个简单的Hadoop MapReduce读取Avro格式的源文件进行计数统计，然后将计算结果
nginx定制500，502，503，504页面 ronin47 nginx　错误显示
server { listen 80; error_page 500/500.html; error_page 502/502.html; error_page 503/503.html; error_page 504/504.html; location /test {return502;}} 配置很简单，和配
java-1.二叉查找树转为双向链表 bylijinnan 二叉查找树
import java.util.ArrayList; import java.util.List; public class BSTreeToLinkedList { /* 把二元查找树转变成排序的双向链表题目：输入一棵二元查找树，将该二元查找树转换成一个排序的双向链表。要求不能创建任何新的结点，只调整指针的指向。 10 / \ 6 14 / \
Netty源码学习-HTTP-tunnel bylijinnan java netty
Netty关于HTTP tunnel的说明： http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/socket/http/package-summary.html#package_description 这个说明有点太简略了一个完整的例子在这里： https://github.com/bylijinnan
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别 coder_xpf jquery json map val()
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别数据库查询出来的map有一个字段为空通过System.out.println()输出 JSONUtil.serialize(map)： {"one":"1","two":"nul
Hibernate缓存总结 cuishikuan 开源 ssh javaweb hibernate缓存三大框架
一、为什么要用Hibernate缓存？ Hibernate是一个持久层框架，经常访问物理数据库。为了降低应用程序对物理数据源访问的频次，从而提高应用程序的运行性能。缓存内的数据是对物理数据源中的数据的复制，应用程序在运行时从缓存读写数据，在特定的时刻或事件会同步缓存和物理数据源的数据。二、Hibernate缓存原理是怎样的？ Hibernate缓存包括两大类：Hib
CentOs6 dalan_123 centos
首先su - 切换到root下面1、首先要先安装GCC GCC-C++ Openssl等以来模块：yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel2、再安装ncurses模块yum -y install ncurses-develyum install ncurses-devel3、下载Erang
10款用 jquery 实现滚动条至页面底端自动加载数据效果 dcj3sjt126com JavaScript
无限滚动自动翻页可以说是web2.0时代的一项堪称伟大的技术，它让我们在浏览页面的时候只需要把滚动条拉到网页底部就能自动显示下一页的结果，改变了一直以来只能通过点击下一页来翻页这种常规做法。无限滚动自动翻页技术的鼻祖是微博的先驱：推特(twitter)，后来必应图片搜索、谷歌图片搜索、google reader、箱包批发网等纷纷抄袭了这一项技术，于是靠滚动浏览器滚动条
ImageButton去边框&Button或者ImageButton的背景透明 dcj3sjt126com imagebutton
在ImageButton中载入图片后，很多人会觉得有图片周围的白边会影响到美观，其实解决这个问题有两种方法一种方法是将ImageButton的背景改为所需要的图片。如：android:background="@drawable/XXX" 第二种方法就是将ImageButton背景改为透明，这个方法更常用在XML里； <ImageBut
JSP之c:foreach eksliang jsp forearch
原文出自：http://www.cnblogs.com/draem0507/archive/2012/09/24/2699745.html <c:forEach>标签用于通用数据循环，它有以下属性属性描述是否必须缺省值 items 进行循环的项目否无 begin 开始条件否 0 end 结束条件否集合中的最后一个项目 step 步长否 1
Android实现主动连接蓝牙耳机 gqdy365 android
在Android程序中可以实现自动扫描蓝牙、配对蓝牙、建立数据通道。蓝牙分不同类型，这篇文字只讨论如何与蓝牙耳机连接。大致可以分三步：一、扫描蓝牙设备： 1、注册并监听广播： BluetoothAdapter.ACTION_DISCOVERY_STARTED BluetoothDevice.ACTION_FOUND BluetoothAdapter.ACTION_DIS
android学习轨迹之四：org.json.JSONException: No value for hyz301 json
org.json.JSONException: No value for items 在JSON解析中会遇到一种错误，很常见的错误 06-21 12:19:08.714 2098-2127/com.jikexueyuan.secret I/System.out﹕ Result:{"status":1,"page":1,&
干货分享：从零开始学编程系列汇总 justjavac 编程
程序员总爱重新发明轮子，于是做了要给轮子汇总。从零开始写个编译器吧系列 (知乎专栏) 从零开始写一个简单的操作系统 (伯乐在线) 从零开始写JavaScript框架 (图灵社区) 从零开始写jQuery框架 (蓝色理想 ) 从零开始nodejs系列文章 (粉丝日志) 从零开始编写网络游戏
jquery-autocomplete 使用手册 macroli jquery Ajax 脚本
jquery-autocomplete学习一、用前必备官方网站：http://bassistance.de/jquery-plugins/jquery-plugin-autocomplete/ 当前版本：1.1 需要JQuery版本：1.2.6 二、使用 <script src="./jquery-1.3.2.js" type="text/ja
PLSQL-Developer或者Navicat等工具连接远程oracle数据库的详细配置以及数据库编码的修改超声波 oracle plsql
　　在服务器上将Oracle安装好之后接下来要做的就是通过本地机器来远程连接服务器端的oracle数据库，常用的客户端连接工具就是PLSQL-Developer或者Navicat这些工具了。刚开始也是各种报错，什么TNS:no listener;TNS:lost connection;TNS:target hosts...花了一天的时间终于让PLSQL-Developer和Navicat等这些客户
数据仓库数据模型之：极限存储--历史拉链表 superlxw1234 极限存储数据仓库数据模型拉链历史表
在数据仓库的数据模型设计过程中，经常会遇到这样的需求： 1. 数据量比较大; 2. 表中的部分字段会被update,如用户的地址，产品的描述信息，订单的状态等等; 3. 需要查看某一个时间点或者时间段的历史快照信息，比如，查看某一个订单在历史某一个时间点的状态，比如，查看某一个用户在过去某一段时间内，更新过几次等等; 4. 变化的比例和频率不是很大，比如，总共有10
10点睛Spring MVC4.1-全局异常处理 wiselyman spring mvc
10.1 全局异常处理使用@ControllerAdvice注解来实现全局异常处理; 使用@ControllerAdvice的属性缩小处理范围 10.2 演示演示控制器 package com.wisely.web; import org.springframework.stereotype.Controller; import org.spring