「已注销」

Study Notes- Smoothers and Generalised Additive Models

Preliminaries

Preliminaries
In this practical we will do some model checking and model choice in R.
We need the following packages

ggplot2 - Package to implement the ggplot language for graphics in R.
tidyverse - This package is designed to make it easy to install and load multiple ‘tidyverse’ packages in a single step
MASS - Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S”(4th edition, 2002).
caret - For easy machine learning workflow
splines - Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and others

Make sure that these packages are downloaded and installed in R. We use the require() function to load
them into the R library. Note, this does the same as library() in this case.

We will use the Boston data set in the MASS package to predict the median house value (mdev), in Boston
Suburbs, based on the explanatory variable lstat (percentage of lower status of the population).

We want to build some models and then assess how well they do. For this we are going to randomly split
the data into training set (80% for building a predictive model) and evaluation set (20% for evaluating the
model).

As we work through the models we will calculate the usual metrics for model fit, e.g. R2 and RMSE, using
the validation data set, i.e. we will see how well it does at predicting ‘new’ data (out-of-sample validation).

options (warn = -1) # ignore the warnings

require(ggplot2) # input:invalid 可以去掉jupyter 的红色提醒
require(MASS)
require(caret)
require(splines)
require(tidyverse)
require(mgcv)
require(splines2)

Your code contains a unicode char which cannot be displayed in your
current locale and R will silently convert it to an escaped form when the
R kernel executes this code. This can lead to subtle errors if you use
such chars to do comparisons. For more information, please see
https://github.com/IRkernel/repr/wiki/Problems-with-unicode-on-windows

# load the data
data("Boston")
head(Boston)

A data.frame: 6 × 14
	crim	zn	indus	chas	nox	rm	age	dis	rad	tax	ptratio	black	lstat	medv

1	0.00632	18	2.31	0	0.538	6.575	65.2	4.0900	1	296	15.3	396.90	4.98	24.0
2	0.02731	0	7.07	0	0.469	6.421	78.9	4.9671	2	242	17.8	396.90	9.14	21.6
3	0.02729	0	7.07	0	0.469	7.185	61.1	4.9671	2	242	17.8	392.83	4.03	34.7
4	0.03237	0	2.18	0	0.458	6.998	45.8	6.0622	3	222	18.7	394.63	2.94	33.4
5	0.06905	0	2.18	0	0.458	7.147	54.2	6.0622	3	222	18.7	396.90	5.33	36.2
6	0.02985	0	2.18	0	0.458	6.430	58.7	6.0622	3	222	18.7	394.12	5.21	28.7

# Split the data into training and test sets
set.seed(123)
# createDataPartition( )就是数据划分函数，对象是Boston$medv，p=0.8表示训练数据所占的比例为80%，
# list是输出结果的格式，默认list=FALSE。
training.samples<- Boston$medv%>%
    createDataPartition(p= 0.8, list= FALSE)
train.data<- Boston[training.samples, ]
test.data<- Boston[-training.samples, ]

Your code contains a unicode char which cannot be displayed in your
current locale and R will silently convert it to an escaped form when the
R kernel executes this code. This can lead to subtle errors if you use
such chars to do comparisons. For more information, please see
https://github.com/IRkernel/repr/wiki/Problems-with-unicode-on-windows

First let’s have a look at the relationship between the two variables

ggplot(train.data, aes(x= lstat, y= medv))+
    geom_point()+
    geom_smooth(method= 'loess', formula= y~ x)

This suggests a non-linear relationship between the two variables.

Linear regression

The standard linear regression model equation can be written as $\beta_0 + \beta_1 * lstat$ .

# Fit the model
model1<- lm(medv~ lstat, data= train.data)
summary(model1)

Call:
lm(formula = medv ~ lstat, data = train.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.218  -4.011  -1.123   2.025  24.459 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  34.6527     0.6230   55.62   <2e-16 ***
lstat        -0.9561     0.0428  -22.34   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.144 on 405 degrees of freedom
Multiple R-squared:  0.5521,	Adjusted R-squared:  0.551 
F-statistic: 499.2 on 1 and 405 DF,  p-value: < 2.2e-16

# Make predictions
predictions<- model1 %>%
    predict(test.data)

# Model performance
model1_performance<- data.frame(
    RMSE= RMSE(predictions, test.data$medv), #均方根误差
    # R平方为回归平方和与总离差平方和的比值，表示总离差平方和中可以由回归平方和解释的比例，
    # 这一比例越大越好，模型越精确，回归效果越显著。R平方介于0~1之间，越接近1，回归拟合效果越好
    R2= R2(predictions, test.data$medv)
)

Your code contains a unicode char which cannot be displayed in your
current locale and R will silently convert it to an escaped form when the
R kernel executes this code. This can lead to subtle errors if you use
such chars to do comparisons. For more information, please see
https://github.com/IRkernel/repr/wiki/Problems-with-unicode-on-windows

model1_performance

A data.frame: 1 × 2
RMSE	R2

6.503817	0.513163

This gives a RMSE of 6.503817 and a R2 of 0.513163. The R2 is low, which is not surprising given the relationship is non-linear!

ggplot(data= train.data, aes(x= lstat, y= medv))+
    geom_point()+
    geom_smooth(method= lm, formula= y~ x)

Polynomial regression

A ploynomial regression adds ploynomial or quadratic terms to the regression equations as follows:
$\beta_o + beta_1 * lstat + \beta_2 * lstat^2$

To create a predictor $x^2$ you can use the function $I ()$ , e.g $I(x^2)$ . This raises x to the power of 2.

model2<- lm(medv~ lstat+ I(lstat^2), data= train.data)model2

Call:lm(formula = medv ~ lstat + I(lstat^2), data = train.data)Coefficients:(Intercept)        lstat   I(lstat^2)      42.5736      -2.2673       0.0412

Or, you can use the poly function:

model2<- lm(medv~ poly(lstat, 2, raw= TRUE), data= train.data)

# Make predictions
predictions2<- model2%>%
    predict(test.data)

# Model perfomance
model2_performance<- data.frame(
    RMSE2= RMSE(predictions2, test.data$medv),
    R22= R2(predictions2, test.data$medv)
)

model2_performance

A data.frame: 1 × 2
RMSE2	R22

5.630727	0.6351934

ggplot(data= train.data, aes(x= lstat, y= medv))+
    geom_point()+
    geom_smooth(method= lm, formula= y~ poly(x, 2, raw= TRUE))

This gives a slightly smaller RMSE (than with the linear model) and an increase in R2 from 0.51 to 0.63.
Not bad, but can we do better?

How about trying a polynomial of order 6?

# Fit the model
model3<- lm(medv~poly(lstat, 6, raw= TRUE), data= train.data)
summary(model3)

Call:
lm(formula = medv ~ poly(lstat, 6, raw = TRUE), data = train.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.1962  -3.1527  -0.7655   2.0404  26.7661 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  7.788e+01  6.844e+00  11.379  < 2e-16 ***
poly(lstat, 6, raw = TRUE)1 -1.767e+01  3.569e+00  -4.952 1.08e-06 ***
poly(lstat, 6, raw = TRUE)2  2.417e+00  6.779e-01   3.566 0.000407 ***
poly(lstat, 6, raw = TRUE)3 -1.761e-01  6.105e-02  -2.885 0.004121 ** 
poly(lstat, 6, raw = TRUE)4  6.845e-03  2.799e-03   2.446 0.014883 *  
poly(lstat, 6, raw = TRUE)5 -1.343e-04  6.290e-05  -2.136 0.033323 *  
poly(lstat, 6, raw = TRUE)6  1.047e-06  5.481e-07   1.910 0.056910 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.188 on 400 degrees of freedom
Multiple R-squared:  0.6845,	Adjusted R-squared:  0.6798 
F-statistic: 144.6 on 6 and 400 DF,  p-value: < 2.2e-16

# Make predictions
predictions3<- model3%>%
    predict(test.data)

# Model performance
model3_performance<- data.frame(
    RMSE3= RMSE(predictions3, test.data$medv),
    R23= R2(predictions3, test.data$medv)
)
model3_performance

A data.frame: 1 × 2
RMSE3	R23

5.349512	0.6759031

ggplot(data= train.data, aes(x= lstat, y= medv))+
    geom_point()+
    stat_smooth(method= lm, formula= y~ poly(x, 6, raw= TRUE))

Reduced RMSE and an increase in R2, but counld you interpret the coefficients?

Spline regression

Spline provide a way to smoothly interploate between fixed points, called knots. Polynomial regression is computed between knots. In other words, splines are series of polynomial segments strung together, joining at knots.

The R Package splines2 include the function bSpline for creating a b-spline term in a regression model.

You need to specify two parameters: the degre of the polynomial and the location of the knots.

knots<- quantile(train.data$lstat, p= c(0.25, 0.5, 0.75))

And we will create a model using a cubic spline (each segment has a polynomial regression of degree = 3):

knotes<- quantile(train.data$lstat, p= c(0.25, 0.5, 0.75))
model4<- lm(medv~bSpline(lstat, knots= knots), data= train.data)
# Make predicitions
predictions<- model4%>%
    predict(test.data)
# Model performance
model4_performance<- data.frame(
    RMSE= RMSE(predictions,test.data$medv),
    R2= R2(predictions, test.data$medv)
)
summary(model4)
model4_performance

Call:
lm(formula = medv ~ bSpline(lstat, knots = knots), data = train.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-12.952  -3.106  -0.821   2.063  26.861 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      52.290      3.344  15.639  < 2e-16 ***
bSpline(lstat, knots = knots)1  -15.740      4.884  -3.223  0.00137 ** 
bSpline(lstat, knots = knots)2  -28.181      3.094  -9.109  < 2e-16 ***
bSpline(lstat, knots = knots)3  -30.083      3.724  -8.077 7.89e-15 ***
bSpline(lstat, knots = knots)4  -41.640      3.713 -11.214  < 2e-16 ***
bSpline(lstat, knots = knots)5  -41.442      5.014  -8.265 2.08e-15 ***
bSpline(lstat, knots = knots)6  -41.308      4.716  -8.760  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.184 on 400 degrees of freedom
Multiple R-squared:  0.685,	Adjusted R-squared:  0.6803 
F-statistic:   145 on 6 and 400 DF,  p-value: < 2.2e-16

A data.frame: 1 × 2
RMSE	R2

5.366847	0.6796817

ggplot(data= train.data, aes(x= lstat, y= medv))+
    geom_point()+
    geom_smooth(method= lm, formula= y~ splines2::bSpline(x, df= 3))

A slight increase in R2, but RMSE has gone up a little.

Generalised additive models

Where you have a non-linear relationship polynomial regression may not be flexible enough to capture the relationship, and spline terms require specifying the knots. Generalised additive models, or GAMs, provide a mechanism to automatically fit a spline regression. i.e you don’t have to choose the knots. This can be done using the mgcv package:

model5<- gam(medv~ s(lstat), data= train.data)
# Make predictions
predictions<- model5%>%
    predict(test.data)
# Model performance
model5_performance<- data.frame(
    RMSE= RMSE(predictions, test.data$medv),
    R2= R2(predictions, test.data$medv)
)
summary(model5)
model5_performance

Family: gaussian 
Link function: identity 

Formula:
medv ~ s(lstat)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  22.5106     0.2567   87.69   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
           edf Ref.df     F p-value    
s(lstat) 7.355  8.338 104.1  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.681   Deviance explained = 68.7%
GCV = 27.381  Scale est. = 26.819    n = 407

A data.frame: 1 × 2
RMSE	R2

5.318856	0.6760512

ggplot(data= train.data, aes(x= lstat, y= medv))+
    geom_point()+
    geom_smooth(method= gam, formula= y~ s(x))

The term s(lstat) tells the gam() function to fit a smooth function, with the default being to use a ‘penalised’ spline (with the number of knots and their location found using penalty functions).

GAM example - $CO_2$ data from Manua Loa

We want to try to identify the intra- (between) and inter- (within) yearly trends.

# Load the data, remember to set the working directory or use Import Dataset
CO2<- read.csv("D:/Code/Datasets/manua_loa_co2.csv", header= TRUE)
head(CO2)

A data.frame: 6 × 4
	year	co2	month	Date

1	1958	315.71	3	1/03/1958
2	1958	317.45	4	1/04/1958
3	1958	317.50	5	1/05/1958
4	1958	317.10	6	1/06/1958
5	1958	315.86	7	1/07/1958
6	1958	314.93	8	1/08/1958

We want to look at inter-annual (within year) trend first. We can convert the data into a continuous time variable (and take a subset for visualisation).

CO2$time<- as.integer(as.Date(CO2$Date, format= "%d/%m/%Y"))
CO2_dat<- CO2
CO2<- CO2[CO2$year %in% (2000: 2010),]
ggplot(CO2_dat, aes(x= time, y= co2))+
    geom_line()

The model being fit here is of the form $\beta_0 + f_{trend}(time) + \epsilon, \epsilon ~ N(0,\sigma^2)$ . We can fit a GAM to these data as follows:

CO2_time<- gam(co2~ s(time), data= CO2, method= "REML")
summary(CO2_time)

Family: gaussian 
Link function: identity 

Formula:
co2 ~ s(time)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 379.5817     0.1906    1992   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
        edf Ref.df    F p-value    
s(time)   1  1.001 1104  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.894   Deviance explained = 89.5%
-REML = 291.24  Scale est. = 4.7949    n = 132

plot(CO2_time)

Note the effective degrees of freedoms (edf) is one, which indicates a linear model. All well and good, until we check how well the model is:

# Split the output into 4 panes
par(mfrow= c(2, 2))
gam.check(CO2_time)

Method: REML   Optimizer: outer newton
full convergence after 8 iterations.
Gradient range [-0.0001447502,6.463421e-05]
(score 291.2359 & scale 4.79491).
Hessian positive definite, eigenvalue range [0.0001447177,64.99994].
Model rank =  10 / 10 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

        k' edf k-index p-value    
s(time)  9   1    0.16  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The residual have a clear rise-and-fall pattern - clearly there is some within-year patterns. Let’s try again, and introduce something called a cyclical smoother. This will be a model of the form $\beta_0 + f_{cyclical}(month) + f_{trend}(time)+ \epsilon, \epsilon ~ N(0,\sigma)^2$ .For the cyclical smoother $f_{cyclical}(month)$ we use the bs=argument to choose the type of smoother, and the k=argument to choose the number of knots (as cubic regression splines have a set number of knots). We use 12 knots, because there are 12 months.

# Fit the model
CO2_season_time<- gam(co2~ s(month, bs= 'cc', k= 12)+ s(time), data= CO2, method= "REML")
# Look at the smoothed terms
par(mfrow= c(1, 2))
plot(CO2_season_time)

We can see that the cycical smoother is picking up the monthly rise and fall in CO2. Let’s see how the model diagnostics look now:

par(mfrow= c(1, 2))
gam.check(CO2_season_time)

Method: REML   Optimizer: outer newton
full convergence after 6 iterations.
Gradient range [-2.640054e-06,5.25847e-08]
(score 87.72571 & scale 0.1441556).
Hessian positive definite, eigenvalue range [1.026183,65.43149].
Model rank =  20 / 20 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

            k'   edf k-index p-value    
s(month) 10.00  8.67    0.72  <2e-16 ***
s(time)   9.00  6.61    0.87   0.045 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(CO2_season_time)

Family: gaussian 
Link function: identity 

Formula:
co2 ~ s(month, bs = "cc", k = 12) + s(time)

Parametric coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 379.58174    0.03305   11486   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
          edf Ref.df      F p-value    
s(month) 8.67  10.00  410.5  <2e-16 ***
s(time)  6.61   7.74 4909.2  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.997   Deviance explained = 99.7%
-REML = 87.726  Scale est. = 0.14416   n = 132

Much better indeed - residuals look normally distributed with no obvious pattern time。
What are the R2 and RMSE for this method? Are they better than for the long-term trend model?

你可能感兴趣的:(R,数据分析,数据分析,统计学)

【数据分析】通过个体和遗址层面的遗传相关性网络分析生信学习者1 数据分析数据分析数据挖掘 r语言数据可视化
禁止商业或二改转载，仅供自学使用，侵权必究，如需截取部分内容请后台联系作者!文章目录介绍原理应用场景加载R包数据下载函数个体层面的遗传相关性网络分析导入数据数据预处理构建遗传相关性的个体网络对个体网络Nij进行可视化评估和选择最佳模型评估和选择最佳模型最佳模型进行总结拟合优度检验遗址层面的遗传相关性网络分析导入数据数据预处理构建遗址之间的遗传相关性网络可视化图条件边预测与模型评估总结系统信息介绍个
大模型（含deepseek r1）本地部署利器ollama的API操作指南人工智能llm
ollama介绍：Ollama是一个开源的大型语言模型（LLM）平台，旨在让用户能够轻松地在本地运行、管理和与大型语言模型进行交互。它支持多种预训练的大型语言模型（如LLaMA2、Mistral、Gemma、DeepSeek等），并提供了一个简单高效的方式来加载和使用这些模型。出现Error:somethingwentwrong,pleaseseetheollamaserverlogsfordet
探索天气预警API：精准预测，守护安全 api
引言在当今这个快速变化的世界中，天气的波动直接影响着人们的日常生活、农业生产、交通出行乃至公共安全。为了有效应对各种极端天气事件，天气预警API应运而生，成为连接气象数据与公众服务的重要桥梁。本文将深入探讨天气预警API的工作原理、应用场景以及其对社会的积极影响。天气预警API的工作原理天气预警API基于先进的气象监测技术和大数据分析，通过收集全球范围内的气象卫星、雷达、地面观测站等数据源，进行实
Redis 全方位解析：从入门到实战 kiss strong redis 数据库缓存
引言在当今互联网快速发展的时代，高并发、低延迟的应用场景越来越普遍。Redis，作为一款高性能的开源数据库，以其卓越的性能和灵活的功能，成为了许多开发者的首选工具。无论是在缓存、消息队列，还是在实时数据分析等领域，Redis都展现出了强大的能力。本文将从Redis的基本介绍、官网、安装、特性，到具体的存储类型、Java代码实例、SpringBoot整合，以及Redis的主要作用和应用场景，进行全面
【系统设计】忘记MySQL密码，应该如何重置红烧白开水。 mysql 数据库开发语言数据关系型数据库密码重置
如果在电脑上安装的MySQL数据库忘记了密码，可以通过以下步骤重置密码。具体操作因操作系统和MySQL版本略有不同，但总体流程类似：步骤1：停止MySQL服务首先需要停止正在运行的MySQL服务。Linux/macOSsudosystemctlstopmysql#或sudoservicemysqlstopWindows按Win+R，输入services.msc并回车。找到MySQL服务，右键选择停
《DeepSeek-R1 vs ChatGPT-4：AI大模型“王座争夺战”的终极拆解报告》 Athena-H LLM 人工智能 gpt chatgpt ai
引言：大模型时代的双雄博弈在生成式AI爆发式迭代的今天，DeepSeek-R1与ChatGPT-4分别以“中国智造新锐”与“全球标杆王者”的身份，掀起技术路线与应用生态的激烈碰撞。本文从架构设计、场景适配、性能极限三大维度，揭示两大模型的真实战力图谱。一、核心技术架构：差异化路线对决对比维度DeepSeek-R1ChatGPT-4模型架构多模态混合专家模型（MoE+Transformer）纯Dec
文件与目录操作函数详解归零 dddd c语言 linux 算法
在编程中，文件和目录操作是常见的任务。本文将详细讲解常用的文件操作函数和目录操作函数，包括其功能、参数、使用方法，并通过示例代码展示如何用这些函数实现常见的操作。文件操作函数1.1fopen()函数功能：打开或创建文件。参数：constchar*pathname：文件名。constchar*mode：文件操作模式。常见的模式有："r"：只读模式，文件必须存在。"w"：只写模式，文件不存在则创建，存
idea快捷键 yiqi_perss 日常 intellij-idea java ide
idea快捷键ctrl+`可以切换界面CTRL快捷键介绍Ctrl+`切换界面Ctrl+E打开最近访问的文件Ctrl+shift+e打开你最近编辑的文本Ctrl+R在当前文件进行文本替换（必备）Ctrl+F在当前文件进行文本查找（必备）Ctrl+Z撤销（必备）Ctrl+Y删除光标所在行或删除选中的行（必备）Ctrl+X剪切光标所在行或剪切选择内容Ctrl+C复制光标所在行或复制选择内容Ctrl+D复
1002:方便记忆的电话号码努力的小Qin
1002:方便记忆的电话号总时间限制:2000ms内存限制:65536kB描述英文字母（除Q和Z外）和电话号码存在着对应关系，如下所示：A,B,C->2D,E,F->3G,H,I->4J,K,L->5M,N,O->6P,R,S->7T,U,V->8W,X,Y->9标准的电话号码格式是xxx-xxxx，其中x表示0-9中的一个数字。有时为了方便记忆电话号码，我们会将电话号码的数字转变为英文字母，如把
深入解析：Tableau在数据可视化中的高级应用 Echo_Wish 实战高阶大数据信息可视化数据分析数据挖掘
深入解析：Tableau在数据可视化中的高级应用引言在大数据时代，数据可视化已成为数据分析中不可或缺的一部分。作为一款广受欢迎的数据可视化工具，Tableau以其强大的功能和灵活性，赢得了众多数据分析师的青睐。然而，许多人在使用Tableau时，仅停留在基本操作层面，未能充分发挥其潜力。本文将深入探讨Tableau的高级应用，展示其在复杂数据分析中的强大能力，并以具体实例说明其实际应用效果。数据预
vue3+vite 自动引入export default的包大橙子- vue.js 前端
importautoImportfrom'unplugin-auto-import/vite';exportdefaultdefineConfig({plugins:[vue(),vueJsx(),autoImport({imports:['vue','vue-router','pinia',{//相当于importuseRouterStackStorefrom'@/store/modules/r
Anaconda 配置镜像源猿代码_xiao python pytorch python 深度学习
Anaconda镜像使用帮助Anaconda是一个用于科学计算的Python发行版，支持Linux,Mac,Windows,包含了众多流行的科学计算、数据分析的Python包。Anaconda安装包可以到https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/下载。TUNA还提供了Anaconda仓库与第三方源（conda-forge、msys2
python panda下载_pandas python下载|Pandas for python v0.25.0官方版 v0.25.0官方版 - 哩咯下载站... weixin_39647458 python panda下载
Pandas是python的数据分析包，最初被作为金融数据分析工具而开发出来，提供pandas.whl包下载，有需要的赶快下载吧！软件介绍Pandas是python的一个数据分析包，最初由AQRCapitalManagement于2008年4月开发，并于2009年底开源出来，目前由专注于Python数据包开发的PyData开发team继续开发和维护，属于PyData项目的一部分。Pandas最初被
零基础学Python图片处理：用Pillow库轻松玩转修图小彭爱学习 python python pillow python处理图片
零基础学Python图片处理：用Pillow库轻松玩转修图一、初识Pillow：你的第一张数码暗房安装准备（只需要1分钟）打开电脑的命令行（Windows按Win+R输入cmd，Mac打开终端），输入：pipinstallpillow看到「SuccessfullyinstalledPillow-x.x.x」就安装成功啦！图片处理四步走打开图片：像打开文件柜一样找到图片修改图片：裁剪、调色、加特效查
ansible部署ceph 时空无限 ceph ansible ceph linux
前言：ceph的官网有坑。按照官网部署，不会一蹴而就。会遇到各种各样的问题，ceph文档有待改进。环境操作系统cat/etc/redhat-releaseCentOSLinuxrelease8.5.2111uname-r4.18.0-80.el8.x86_64主机名三台主机，其中node-01为部署机器和ceph的其中一个节点。node-01node-02node-03网卡和ip每台主机双网卡en
数据分析利器：COMEX外盘期货主力连续合约与月份合约研究方法银河金融数据库外盘期货高频历史行情数据下载数据分析区块链数据挖掘金融
数据分析利器：COMEX外盘期货主力连续合约与月份合约研究方法为了促进学习和研究，我们在此分享一部分匿名处理的外盘期货高频历史行情数据集。外盘期货分钟高频历史行情数据链接:https://pan.baidu.com/s/19zhe1CCpDM56amDKO2nMwQ?pwd=4wpq提取码:4wpq请注意，分享这些数据的目的是为了教育和研究，不构成任何投资建议。关键词：量化;量化;贵金属;计算能力
阿里云 MaxCompute MaxQA 开启公测，解锁近实时高效查询体验阿里云大数据AI技术阿里云云原生 MaxCompute 大数据实时数仓
随着实时、近实时数据分析需求的持续增长，查询响应时间在现代数据分析和业务应用中变得越来越重要。为减少查询响应时间，提升数据效率，阿里云云原生大数据计算服务MaxCompute推出MaxQA（原MCQA2.0）查询加速功能，在独享的查询加速资源池的基础上，对管控链路、查询优化器、执行引擎、存储引擎以及缓存机制等多个环节进行全面优化，显著减少了查询响应时间，适用于BI场景、交互式分析以及近实时数仓等对
数据分析-56-深入理解假设检验的步骤和T检验的应用案例皮皮冰燃数据分析数据分析假设检验
文章目录1假设检验(HypothesisTesting)1.1假设检验的步骤1.1.1提出假设1.1.2选择显著性水平1.1.3选择检验统计量1.1.4计算检验统计量1.1.5确定临界值或p值1.2假设检验的类型1.2.1单尾检验(One-tailedtest)1.2.2双尾检验(Two-tailedtest)2T检验2.1单样本t检验2.2独立样本t检验2.3配对样本t检验3应用案例3.1单样本
本地运行 DeepSeek-R1 的成本究竟多高？前端javascript
ReactHook深入浅出CSS技巧与案例详解vue2与vue3技巧合集VueUse源码解读本地运行DeepSeek-R1的成本究竟多高？DeepSeek让人们对大规模生成式模型的追求更进一步，甚至有人想在本地跑下规模高达671B参数的版本。但要在家里开这种“巨无霸”，可不是闹着玩的：光是推理就对硬件提出了非常高的要求。这篇文章将大致拆解一下，如果真想在个人电脑上运行DeepSeek-R1，可能需
R语言中的函数32：seq_along() zoujiahui_2018 #R语言中的函数 r语言开发语言
介绍seq_along函数在R语言中用于生成一个整数序列，其长度与给定对象的长度相同。这个函数特别有用，当你想要创建一个索引序列来遍历一个向量或列表时。用法seq_along(x)参数x:任何R对象（如向量、列表等）。返回值:返回一个从1到x的长度的整数序列。示例#创建一个向量vec<-c("a","b","c")#使用seq_along生成索引indices<-seq_along(vec)pri
IPoIB 接收数据流程分析 109702008 编程 #C语言网络人工智能 linux 网络
1.引言IPoIB（InfiniBandoverIP）是一种将InfiniBand协议封装在IP网络中的技术，允许通过标准的以太网基础设施传输InfiniBand数据。本文将详细分析IPoIB驱动程序中接收数据的处理流程，重点关注关键函数的实现和数据处理的关键步骤。2.网络设备操作结构体staticconststructnet_device_opsipoib_netdev_default_pf_r
EBS 性能不足？从吞吐量到 IOPS，阿里云全方位优化 Anna_Tong 阿里云云计算存储加速吞吐量优化 RAID配置云计算运维 IOPS 提升
在云计算环境中，存储性能对于业务稳定运行至关重要，尤其是数据库、大数据分析、AI计算等高IO需求的应用。然而，许多用户在使用EBS（弹性块存储）时，可能会遇到磁盘吞吐量或IOPS（每秒输入/输出操作数）不足的问题，导致应用响应变慢、数据处理延迟，甚至影响业务连续性。那么，是什么原因导致EBS性能瓶颈？如何优化吞吐量和IOPS以提升存储性能？阿里云又能提供哪些优化方案？本文将从存储架构、性能监控、优
java脚本弹出输入框,使用弹出框编辑(增加)表单内容 jordan.xue java脚本弹出输入框
0、背景使用Amazeui中Prompt模态窗口Modal1、JQuery功能：表单复位获取表单的值，并显示在修改弹出框中(文本框、单选、多选、下拉框)发送Post异步请求给后台自刷新$(function(){//编辑功能$('table.edit').on("click",function(){//表单复位document.getElementById("form-machineRole").r
最少前缀操作问题--感受不到动态规划，怎么办怎么办幼儿园口算大王算法 java 动态规划
题目：标签：动态规划（应该是双指针的，不理解）小U和小R有两个字符串，分别是S和T，现在小U需要通过对S进行若干次操作，使其变成T的一个前缀。操作可以是修改S的某一个字符，或者删除S末尾的字符。现在你需要帮助小U计算出，最少需要多少次操作才能让S变成T的前缀。测试样例样例1：输入：S="aba",T="abb"输出：1样例2：输入：S="abcd",T="efg"输出：4样例3：输入：S="xyz
供应链商业数据分析求职指南：技能点、工具包与业务模式全解析 xl.liu 数据分析数据挖掘
引言随着中国经济的快速发展，国内企业对供应链管理的关注度日益增加。为了应对激烈的市场竞争，企业纷纷寻求通过优化供应链来提高效率、降低成本并增强响应速度。在这个背景下，供应链商业数据分析（SupplyChainBusinessAnalytics）作为连接数据与决策的关键桥梁，逐渐成为企业不可或缺的一部分。对于有志于从事这一领域的专业人士而言，掌握必要的技能、熟悉先进的工具以及理解特定的业务模式是成功
文件包含lfi.php使用三和三千万网络安全安全
使用这个脚本的前提是有文件包含点，也可以访问到phpinfo页面#!/usr/bin/pythonimportsysimportthreadingimportsocketdefsetup(host,port):TAG="SecurityTest"PAYLOAD="""%s\r')?>\r"""%TAGREQ1_DATA="""-----------------------------7dbff1d
使用Python抓取新闻媒体网站的最新头条与相关内容：深入的爬虫开发与数据分析实战 Python爬虫项目 2025年爬虫实战项目 python 爬虫数据分析数据挖掘人工智能开发语言
引言在互联网时代，新闻媒体网站是人们获取信息和了解世界的重要渠道。随着新闻的即时更新，获取最新头条并进行数据分析成为许多行业领域（如媒体、广告、舆情监测等）的重要需求。通过抓取新闻媒体网站的内容，我们不仅能获取各类新闻文章，还能为后续的数据分析、情感分析、舆情监控等提供基础数据。本篇博客将详细讲解如何使用Python编写一个爬虫，抓取新闻媒体网站的最新头条及其相关内容。我们将使用最新的技术栈，包括
ColD Fusion，分布式多任务微调的协同 “密码” 人工智能
ColDFusion，分布式多任务微调的协同“密码”发布时间：2025-02-19近日热文：1.全网最全的神经网络数学原理（代码和公式）直观解释2.大模型进化史：从Transformer到DeepSeek-R1的AI变革之路3.2W8000字深度剖析25种RAG变体：全网最全~没有之一知乎【柏企】公众号【柏企科技说】【柏企阅文】在预训练模型的基础上进行改进，有望提升所有基于它微调的模型性能。然而，
Day02 Python之文件操作（open、read、write、close）小菜鸟也要努力吖 Python python
一、file对象的属性1、file.name返回文件的名称2、file.mode返回文件的访问模式3、file.closed查看文件是否关闭，是TRUE，否FALSE二、访问模式r:只读模式(默认);文件必须存在w:只写模式;不存在则创建，存在则重写a:追加模式;不存在则创建,存在则只追加内容+：表示可以同时读写某个文件r+:可读写文件，文件不存在抛出异常w+:先写再读三、打开文件1、创建文件，内
Android 第四十四章 Menu 漂泊的蚂蚁 Android android
publicclassMainActivityextendsAppCompatActivity{@OverrideprotectedvoidonCreate(BundlesavedInstanceState){super.onCreate(savedInstanceState);setContentView(R.layout.activity_main);}@Overridepublicboole
TOMCAT在POST方法提交参数丢失问题 357029540 java tomcat jsp
摘自http://my.oschina.net/luckyi/blog/213209 昨天在解决一个BUG时发现一个奇怪的问题，一个AJAX提交数据在之前都是木有问题的，突然提交出错影响其他处理流程。检查时发现页面处理数据较多，起初以为是提交顺序不正确修改后发现不是由此问题引起。于是删除掉一部分数据进行提交，较少数据能够提交成功。恢复较多数据后跟踪提交FORM DATA ，发现数
在MyEclipse中增加JSP模板删除-2008-08-18 ljy325 jsp xml MyEclipse
在D:\Program Files\MyEclipse 6.0\myeclipse\eclipse\plugins\com.genuitec.eclipse.wizards_6.0.1.zmyeclipse601200710\templates\jsp 目录下找到Jsp.vtl，复制一份，重命名为jsp2.vtl,然后把里面的内容修改为自己想要的格式，保存。然后在 D:\Progr
JavaScript常用验证脚本总结 eksliang JavaScript javaScript表单验证
转载请出自出处：http://eksliang.iteye.com/blog/2098985 下面这些验证脚本，是我在这几年开发中的总结，今天把他放出来，也算是一种分享吧，现在在我的项目中也在用！包括日期验证、比较，非空验证、身份证验证、数值验证、Email验证、电话验证等等...! &nb
微软BI（4） 18289753290 微软BI SSIS
1） Q:查看ssis里面某个控件输出的结果： A MessageBox.Show(Dts.Variables["v_lastTimestamp"].Value.ToString()); 这是我们在包里面定义的变量 2):在关联目的端表的时候如果是一对多的关系，一定要选择唯一的那个键作为关联字段。 3) Q：ssis里面如果将多个数据源的数据插入目的端一
定时对大数据量的表进行分表对数据备份酷的飞上天空大数据量
工作中遇到数据库中一个表的数据量比较大，属于日志表。正常情况下是不会有查询操作的，但如果不进行分表数据太多，执行一条简单sql语句要等好几分钟。。分表工具：linux的shell + mysql自身提供的管理命令原理：使用一个和原表数据结构一样的表，替换原表。 linux shell内容如下： =======================开始
本质的描述与因材施教永夜-极光感想随笔
不管碰到什么事,我都下意识的想去探索本质,找寻一个最形象的描述方式。我坚信,世界上对一件事物的描述和解释,肯定有一种最形象,最贴近本质,最容易让人理解 &
很迷茫。。。随便小屋随笔
小弟我今年研一，也是从事的咱们现在最流行的专业（计算机）。本科三流学校，为了能有个更好的跳板，进入了考研大军，非常有幸能进入研究生的行业（具体学校就不说了，怕把学校的名誉给损了）。先说一下自身的条件，本科专业软件工程。主要学习就是软件开发，几乎和计算机没有什么区别。因为学校本身三流，也就是让老师带着学生学点东西，然后让学生毕业就行了。对专业性的东西了解的非常浅。就那学的语言来说
23种设计模式的意图和适用范围 aijuans 设计模式
Factory Method 意图定义一个用于创建对象的接口，让子类决定实例化哪一个类。Factory Method 使一个类的实例化延迟到其子类。　　适用性当一个类不知道它所必须创建的对象的类的时候。　　当一个类希望由它的子类来指定它所创建的对象的时候。　　当类将创建对象的职责委托给多个帮助子类中的某一个，并且你希望将哪一个帮助子类是代理者这一信息局部化的时候。 Abstr
Java中的synchronized和volatile aoyouzi java volatile synchronized
说到Java的线程同步问题肯定要说到两个关键字synchronized和volatile。说到这两个关键字，又要说道JVM的内存模型。JVM里内存分为main memory和working memory。 Main memory是所有线程共享的，working memory则是线程的工作内存，它保存有部分main memory变量的拷贝，对这些变量的更新直接发生在working memo
js数组的操作和this关键字百合不是茶 js 数组操作 this关键字
js数组的操作; 一:数组的创建: 1、数组的创建 var array = new Array();　//创建一个数组 var array = new Array([size]);　//创建一个数组并指定长度，注意不是上限，是长度 var arrayObj = new Array([element0[, element1[, ...[, elementN]]]
别人的阿里面试感悟 bijian1013 面试分享工作感悟阿里面试
原文如下：http://greemranqq.iteye.com/blog/2007170 一直做企业系统，虽然也自己一直学习技术，但是感觉还是有所欠缺，准备花几个月的时间，把互联网的东西，以及一些基础更加的深入透析，结果这次比较意外，有点突然，下面分享一下感受吧！ &nb
淘宝的测试框架Itest Bill_chen spring maven 框架单元测试 JUnit
Itest测试框架是TaoBao测试部门开发的一套单元测试框架，以Junit4为核心，集合DbUnit、Unitils等主流测试框架，应该算是比较好用的了。近期项目中用了下，有关itest的具体使用如下： 1.在Maven中引入itest框架： <dependency> <groupId>com.taobao.test</groupId&g
【Java多线程二】多路条件解决生产者消费者问题 bit1129 java多线程
package com.tom; import java.util.LinkedList; import java.util.Queue; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.locks.Condition; import java.util.concurrent.loc
汉字转拼音pinyin4j 白糖_ pinyin4j
以前在项目中遇到汉字转拼音的情况，于是在网上找到了pinyin4j这个工具包，非常有用，别的不说了，直接下代码： import java.util.HashSet; import java.util.Set; import net.sourceforge.pinyin4j.PinyinHelper; import net.sourceforge.pinyin
org.hibernate.TransactionException: JDBC begin failed解决方案 bozch ssh 数据库异常 DBCP
org.hibernate.TransactionException: JDBC begin failed: at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:68) at org.hibernate.impl.SessionImp
java-并查集（Disjoint-set）-将多个集合合并成没有交集的集合 bylijinnan java
import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Map; import java.ut
Java PrintWriter打印乱码 chenbowen00 java
一个小程序读写文件，发现PrintWriter输出后文件存在乱码，解决办法主要统一输入输出流编码格式。读文件： BufferedReader 从字符输入流中读取文本，缓冲各个字符，从而提供字符、数组和行的高效读取。可以指定缓冲区的大小，或者可使用默认的大小。大多数情况下，默认值就足够大了。通常，Reader 所作的每个读取请求都会导致对基础字符或字节流进行相应的读取请求。因
[天气与气候]极端气候环境 comsci 环境
如果空间环境出现异变...外星文明并未出现,而只是用某种气象武器对地球的气候系统进行攻击,并挑唆地球国家间的战争,经过一段时间的准备...最大限度的削弱地球文明的整体力量,然后再进行入侵...... 那么地球上的国家应该做什么样的防备工作呢? &n
oracle order by与union一起使用的用法 daizj UNION oracle order by
当使用union操作时，排序语句必须放在最后面才正确，如下：只能在union的最后一个子查询中使用order by，而这个order by是针对整个unioning后的结果集的。So：如果unoin的几个子查询列名不同，如 Sql代码 select supplier_id, supplier_name from suppliers UNI
zeus持久层读写分离单元测试 deng520159 单元测试
本文是zeus读写分离单元测试,距离分库分表,只有一步了.上代码: 1.ZeusMasterSlaveTest.java package com.dengliang.zeus.webdemo.test; import java.util.ArrayList; import java.util.List; import org.junit.Assert; import org.j
Yii 截取字符串(UTF-8) 使用组件 dcj3sjt126com yii
1.将Helper.php放进protected\components文件夹下。 2.调用方法： Helper::truncate_utf8_string($content,20,false); //不显示省略号 Helper::truncate_utf8_string($content,20); //显示省略号 &n
安装memcache及php扩展 dcj3sjt126com PHP
安装memcache tar zxvf memcache-2.2.5.tgz cd memcache-2.2.5/ /usr/local/php/bin/phpize (?) ./configure --with-php-confi
JsonObject 处理日期 feifeilinlin521 java json JsonOjbect JsonArray JSONException
写这边文章的初衷就是遇到了json在转换日期格式出现了异常 net.sf.json.JSONException: java.lang.reflect.InvocationTargetException 原因是当你用Map接收数据库返回了java.sql.Date 日期的数据进行json转换出的问题话不多说直接上代码 &n
Ehcache（06）——监听器 234390216 监听器 listener ehcache
监听器 Ehcache中监听器有两种，监听CacheManager的CacheManagerEventListener和监听Cache的CacheEventListener。在Ehcache中，Listener是通过对应的监听器工厂来生产和发生作用的。下面我们将来介绍一下这两种类型的监听器。
activiti 自带设计器中chrome 34版本不能打开bug的解决 jackyrong Activiti
在acitivti modeler中，如果是chrome 34，则不能打开该设计器，其他浏览器可以，经证实为bug，参考 http://forums.activiti.org/content/activiti-modeler-doesnt-work-chrome-v34 修改为，找到 oryx.debug.js 在最头部增加 if (!Document.
微信收货地址共享接口-终极解决 laotu5i0 微信开发
最近要接入微信的收货地址共享接口，总是不成功，折腾了好几天，实在没办法网上搜到的帖子也是骂声一片。我把我碰到并解决问题的过程分享出来，希望能给微信的接口文档起到一个辅助作用，让后面进来的开发者能快速的接入，而不需要像我们一样苦逼的浪费好几天，甚至一周的青春。各种羞辱、谩骂的话就不说了，本人还算文明。如果你能搜到本贴，说明你已经碰到了各种 ed
关于人才 netkiller.github.com 工作面试招聘 netkiller 人才
关于人才每个月我都会接到许多猎头的电话，有些猎头比较专业，但绝大多数在我看来与猎头二字还是有很大差距的。与猎头接触多了，自然也了解了他们的工作，包括操作手法，总体上国内的猎头行业还处在初级阶段。总结就是“盲目推荐，以量取胜”。目前现状许多从事人力资源工作的人，根本不懂得怎么找人才。处在人才找不到企业，企业找不到人才的尴尬处境。企业招聘，通常是需要用人的部门提出招聘条件，由人
搭建 CentOS 6 服务器 - 目录 rensanning centos
(1) 安装CentOS ISO（desktop/minimal）、Cloud（AWS/阿里云）、Virtualization（VMWare、VirtualBox）详细内容 (2) Linux常用命令 cd、ls、rm、chmod...... 详细内容 (3) 初始环境设置用户管理、网络设置、安全设置...... 详细内容 (4) 常驻服务Daemon
【求助】mongoDB无法更新主键 toknowme mongodb
Query query = new Query(); query.addCriteria(new Criteria("_id").is(o.getId())); &n
jquery 页面滚动到底部自动加载插件集合 xp9802 jquery
很多社交网站都使用无限滚动的翻页技术来提高用户体验，当你页面滑到列表底部时候无需点击就自动加载更多的内容。下面为你推荐 10 个 jQuery 的无限滚动的插件： 1. jQuery ScrollPagination jQuery ScrollPagination plugin 是一个 jQuery 实现的支持无限滚动加载数据的插件。 2. jQuery Screw S