weixin_39734646

100种算法的python实现_Python实现的各种机器学习算法

七种算法包括：

线性回归算法

Logistic 回归算法

感知器

K 最近邻算法

K 均值聚类算法

含单隐层的神经网络

多项式的 Logistic 回归算法

01 线性回归算法

在线性回归中，我们想要建立一个模型，来拟合一个因变量 y 与一个或多个独立自变量(预测变量) x 之间的关系。

给定：

数据集

是d-维向量

是一个目标变量，它是一个标量

线性回归模型可以理解为一个非常简单的神经网络：

它有一个实值加权向量

它有一个实值偏置量 b

它使用恒等函数作为其激活函数

线性回归模型可以使用以下方法进行训练

a) 梯度下降法

b) 正态方程(封闭形式解)：

其中 X 是一个矩阵，其形式为

，包含所有训练样本的维度信息。

而正态方程需要计算

的转置。这个操作的计算复杂度介于

)和

之间，而这取决于所选择的实现方法。因此，如果训练集中数据的特征数量很大，那么使用正态方程训练的过程将变得非常缓慢。

线性回归模型的训练过程有不同的步骤。首先(在步骤 0 中)，模型的参数将被初始化。在达到指定训练次数或参数收敛前，重复以下其他步骤。

第 0 步：

用0 (或小的随机值)来初始化权重向量和偏置量，或者直接使用正态方程计算模型参数

第 1 步(只有在使用梯度下降法训练时需要)：

计算输入的特征与权重值的线性组合，这可以通过矢量化和矢量传播来对所有训练样本进行处理：

其中 X 是所有训练样本的维度矩阵，其形式为

；· 表示点积。

第 2 步(只有在使用梯度下降法训练时需要)：

用均方误差计算训练集上的损失：

第 3 步(只有在使用梯度下降法训练时需要):

对每个参数，计算其对损失函数的偏导数：

所有偏导数的梯度计算如下：

第 4 步(只有在使用梯度下降法训练时需要):

更新权重向量和偏置量：

其中，

表示学习率。

In [4]:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

np.random.seed(123)

数据集

In [5]:

# We will use a simple training set

X = 2 * np.random.rand(500, 1)

y = 5 + 3 * X + np.random.randn(500, 1)

fig = plt.figure(figsize=(8,6))

plt.scatter(X, y)

plt.title("Dataset")

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.show()

In [6]:

# Split the data into a training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y)

print(f'Shape X_train: {X_train.shape}')

print(f'Shape y_train: {y_train.shape}')

print(f'Shape X_test: {X_test.shape}')

print(f'Shape y_test: {y_test.shape}')

Shape X_train: (375, 1)Shape y_train: (375, 1)Shape X_test: (125, 1)Shape y_test: (125, 1)

线性回归分类

In [23]:

class LinearRegression:

def __init__(self):

pass

def train_gradient_descent(self, X, y, learning_rate=0.01, n_iters=100):

"""

Trains a linear regression model using gradient descent

"""

# Step 0: Initialize the parameters

n_samples, n_features = X.shape

self.weights = np.zeros(shape=(n_features,1))

self.bias = 0

costs = []

for i in range(n_iters):

# Step 1: Compute a linear combination of the input features and weights

y_predict = np.dot(X, self.weights) + self.bias

# Step 2: Compute cost over training set

cost = (1 / n_samples) * np.sum((y_predict - y)**2)

costs.append(cost)

if i % 100 == 0:

print(f"Cost at iteration {i}: {cost}")

# Step 3: Compute the gradients

dJ_dw = (2 / n_samples) * np.dot(X.T, (y_predict - y))

dJ_db = (2 / n_samples) * np.sum((y_predict - y))

# Step 4: Update the parameters

self.weights = self.weights - learning_rate * dJ_dw

self.bias = self.bias - learning_rate * dJ_db

return self.weights, self.bias, costs

def train_normal_equation(self, X, y):

"""

Trains a linear regression model using the normal equation

"""

self.weights = np.dot(np.dot(np.linalg.inv(np.dot(X.T, X)), X.T), y)

self.bias = 0

return self.weights, self.bias

def predict(self, X):

return np.dot(X, self.weights) + self.bias

使用梯度下降进行训练

In [24]:

regressor = LinearRegression()

w_trained, b_trained, costs = regressor.train_gradient_descent(X_train, y_train, learning_rate=0.005, n_iters=600)

fig = plt.figure(figsize=(8,6))

plt.plot(np.arange(n_iters), costs)

plt.title("Development of cost during training")

plt.xlabel("Number of iterations")

plt.ylabel("Cost")

plt.show()

Cost at iteration 0: 66.45256981003433 Cost at iteration 100: 2.2084346146095934 Cost at iteration 200: 1.2797812854182806 Cost at iteration 300: 1.2042189195356685 Cost at iteration 400: 1.1564867816573 Cost at iteration 500: 1.121391041394467

测试(梯度下降模型)

In [28]:

n_samples, _ = X_train.shape

n_samples_test, _ = X_test.shape

y_p_train = regressor.predict(X_train)

y_p_test = regressor.predict(X_test)

error_train = (1 / n_samples) * np.sum((y_p_train - y_train) ** 2)

error_test = (1 / n_samples_test) * np.sum((y_p_test - y_test) ** 2)

print(f"Error on training set: {np.round(error_train, 4)}")

print(f"Error on test set: {np.round(error_test)}")

Error on training set: 1.0955

Error on test set: 1.0

使用正规方程(normal equation)训练

# To compute the parameters using the normal equation, we add a bias value of 1 to each input example

X_b_train = np.c_[np.ones((n_samples)), X_train]

X_b_test = np.c_[np.ones((n_samples_test)), X_test]

reg_normal = LinearRegression()

w_trained = reg_normal.train_normal_equation(X_b_train, y_train)

测试(正规方程模型)

y_p_train = reg_normal.predict(X_b_train)

y_p_test = reg_normal.predict(X_b_test)

error_train = (1 / n_samples) * np.sum((y_p_train - y_train) ** 2)

error_test = (1 / n_samples_test) * np.sum((y_p_test - y_test) ** 2)

print(f"Error on training set: {np.round(error_train, 4)}")

print(f"Error on test set: {np.round(error_test, 4)}")

Error on training set: 1.0228

Error on test set: 1.0432

可视化测试预测

# Plot the test predictions

fig = plt.figure(figsize=(8,6))

plt.scatter(X_train, y_train)

plt.scatter(X_test, y_p_test)

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.show()

02 Logistic 回归算法

在 Logistic 回归中，我们试图对给定输入特征的线性组合进行建模，来得到其二元变量的输出结果。例如，我们可以尝试使用竞选候选人花费的金钱和时间信息来预测选举的结果(胜或负)。Logistic 回归算法的工作原理如下。

给定：

数据集

是d-维向量

是一个二元的目标变量

Logistic 回归模型可以理解为一个非常简单的神经网络：

它有一个实值加权向量

它有一个实值偏置量 b

它使用 sigmoid 函数作为其激活函数

与线性回归不同，Logistic 回归没有封闭解。但由于损失函数是凸函数，因此我们可以使用梯度下降法来训练模型。事实上，在保证学习速率足够小且使用足够的训练迭代步数的前提下，梯度下降法(或任何其他优化算法)可以是能够找到全局最小值。

训练 Logistic 回归模型有不同的步骤。首先(在步骤 0 中)，模型的参数将被初始化。在达到指定训练次数或参数收敛前，重复以下其他步骤。

第 0 步：用 0 (或小的随机值)来初始化权重向量和偏置值

第 1 步：计算输入的特征与权重值的线性组合，这可以通过矢量化和矢量传播来对所有训练样本进行处理：

其中 X 是所有训练样本的维度矩阵，其形式为

；·表示点积。

第 2 步：用 sigmoid 函数作为激活函数，其返回值介于0到1之间：

第 3 步：计算整个训练集的损失值。

我们希望模型得到的目标值概率落在 0 到 1 之间。因此在训练期间，我们希望调整参数，使得模型较大的输出值对应正标签(真实标签为 1)，较小的输出值对应负标签(真实标签为 0 )。这在损失函数中表现为如下形式：

第 4 步：对权重向量和偏置量，计算其对损失函数的梯度。

关于这个导数实现的详细解释，可以参见这里(https://stats.stackexchange.com/questions/278771/how-is-the-cost-function-from-logistic-regression-derivated)。

一般形式如下：

对于偏置量的导数计算，此时

为 1。

第 5 步：更新权重和偏置值。

其中，

表示学习率。

In [24]:

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.datasets import make_blobs

import matplotlib.pyplot as plt

np.random.seed(123)

% matplotlib inline

数据集

In [25]:

# We will perform logistic regression using a simple toy dataset of two classes

X, y_true = make_blobs(n_samples= 1000, centers=2)

fig = plt.figure(figsize=(8,6))

plt.scatter(X[:,0], X[:,1], c=y_true)

plt.title("Dataset")

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.show()

In [26]:

# Reshape targets to get column vector with shape (n_samples, 1)

y_true = y_true[:, np.newaxis]

# Split the data into a training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y_true)

print(f'Shape X_train: {X_train.shape}')

print(f'Shape y_train: {y_train.shape}')

print(f'Shape X_test: {X_test.shape}')

print(f'Shape y_test: {y_test.shape}')

Shape X_train: (750, 2)

Shape y_train: (750, 1)

Shape X_test: (250, 2)

Shape y_test: (250, 1)

Logistic回归分类

In [27]:

class LogisticRegression:

def __init__(self):

pass

def sigmoid(self, a):

return 1 / (1 + np.exp(-a))

def train(self, X, y_true, n_iters, learning_rate):

"""

Trains the logistic regression model on given data X and targets y

"""

# Step 0: Initialize the parameters

n_samples, n_features = X.shape

self.weights = np.zeros((n_features, 1))

self.bias = 0

costs = []

for i in range(n_iters):

# Step 1 and 2: Compute a linear combination of the input features and weights,

# apply the sigmoid activation function

y_predict = self.sigmoid(np.dot(X, self.weights) + self.bias)

# Step 3: Compute the cost over the whole training set.

cost = (- 1 / n_samples) * np.sum(y_true * np.log(y_predict) + (1 - y_true) * (np.log(1 - y_predict)))

# Step 4: Compute the gradients

dw = (1 / n_samples) * np.dot(X.T, (y_predict - y_true))

db = (1 / n_samples) * np.sum(y_predict - y_true)

# Step 5: Update the parameters

self.weights = self.weights - learning_rate * dw

self.bias = self.bias - learning_rate * db

costs.append(cost)

if i % 100 == 0:

print(f"Cost after iteration {i}: {cost}")

return self.weights, self.bias, costs

def predict(self, X):

"""

Predicts binary labels for a set of examples X.

"""

y_predict = self.sigmoid(np.dot(X, self.weights) + self.bias)

y_predict_labels = [1 if elem > 0.5 else 0 for elem in y_predict]

return np.array(y_predict_labels)[:, np.newaxis]

初始化并训练模型

In [29]:

regressor = LogisticRegression()

w_trained, b_trained, costs = regressor.train(X_train, y_train, n_iters=600, learning_rate=0.009)

fig = plt.figure(figsize=(8,6))

plt.plot(np.arange(600), costs)

plt.title("Development of cost over training")

plt.xlabel("Number of iterations")

plt.ylabel("Cost")

plt.show()

Cost after iteration 0: 0.6931471805599453

Cost after iteration 100: 0.046514002935609956

Cost after iteration 200: 0.02405337743999163

Cost after iteration 300: 0.016354408151412207

Cost after iteration 400: 0.012445770521974634

Cost after iteration 500: 0.010073981792906512

测试模型

In [31]:

y_p_train = regressor.predict(X_train)

y_p_test = regressor.predict(X_test)

print(f"train accuracy: {100 - np.mean(np.abs(y_p_train - y_train)) * 100}%")

print(f"test accuracy: {100 - np.mean(np.abs(y_p_test - y_test))}%")

train accuracy: 100.0%

test accuracy: 100.0%

03 感知器算法

感知器是一种简单的监督式的机器学习算法，也是最早的神经网络体系结构之一。它由 Rosenblatt 在 20 世纪 50 年代末提出。感知器是一种二元的线性分类器，其使用 d- 维超平面来将一组训练样本( d- 维输入向量)映射成二进制输出值。它的原理如下：

给定：

数据集

是d-维向量

是一个目标变量，它是一个标量

感知器可以理解为一个非常简单的神经网络：

它有一个实值加权向量

它有一个实值偏置量 b

它使用 Heaviside step 函数作为其激活函数

感知器的训练可以使用梯度下降法，训练算法有不同的步骤。首先(在步骤0中)，模型的参数将被初始化。在达到指定训练次数或参数收敛前，重复以下其他步骤。

第 0 步：用 0 (或小的随机值)来初始化权重向量和偏置值

第 1 步：计算输入的特征与权重值的线性组合，这可以通过矢量化和矢量传播法则来对所有训练样本进行处理：

其中 X 是所有训练示例的维度矩阵，其形式为

；·表示点积。

第 2 步：用 Heaviside step 函数作为激活函数，其返回一个二进制值：

第 3 步：使用感知器的学习规则来计算权重向量和偏置量的更新值。

其中，

表示学习率。

第 4 步：更新权重向量和偏置量。

In [1]:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs

from sklearn.model_selection import train_test_split

np.random.seed(123)

% matplotlib inline

数据集

In [2]:

X, y = make_blobs(n_samples=1000, centers=2)

fig = plt.figure(figsize=(8,6))

plt.scatter(X[:,0], X[:,1], c=y)

plt.title("Dataset")

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.show()

In [3]:

y_true = y[:, np.newaxis]

X_train, X_test, y_train, y_test = train_test_split(X, y_true)

print(f'Shape X_train: {X_train.shape}')

print(f'Shape y_train: {y_train.shape})')

print(f'Shape X_test: {X_test.shape}')

print(f'Shape y_test: {y_test.shape}')

Shape X_train: (750, 2)

Shape y_train: (750, 1))

Shape X_test: (250, 2)

Shape y_test: (250, 1)

感知器分类

In [6]:

class Perceptron():

def __init__(self):

pass

def train(self, X, y, learning_rate=0.05, n_iters=100):

n_samples, n_features = X.shape

# Step 0: Initialize the parameters

self.weights = np.zeros((n_features,1))

self.bias = 0

for i in range(n_iters):

# Step 1: Compute the activation

a = np.dot(X, self.weights) + self.bias

# Step 2: Compute the output

y_predict = self.step_function(a)

# Step 3: Compute weight updates

delta_w = learning_rate * np.dot(X.T, (y - y_predict))

delta_b = learning_rate * np.sum(y - y_predict)

# Step 4: Update the parameters

self.weights += delta_w

self.bias += delta_b

return self.weights, self.bias

def step_function(self, x):

return np.array([1 if elem >= 0 else 0 for elem in x])[:, np.newaxis]

def predict(self, X):

a = np.dot(X, self.weights) + self.bias

return self.step_function(a)

初始化并训练模型

In [7]:

p = Perceptron()

w_trained, b_trained = p.train(X_train, y_train,learning_rate=0.05, n_iters=500)

测试

In [10]:

y_p_train = p.predict(X_train)

y_p_test = p.predict(X_test)

print(f"training accuracy: {100 - np.mean(np.abs(y_p_train - y_train)) * 100}%")

print(f"test accuracy: {100 - np.mean(np.abs(y_p_test - y_test)) * 100}%")

training accuracy: 100.0%

test accuracy: 100.0%

可视化决策边界

In [13]:

def plot_hyperplane(X, y, weights, bias):

"""

Plots the dataset and the estimated decision hyperplane

"""

slope = - weights[0]/weights[1]

intercept = - bias/weights[1]

x_hyperplane = np.linspace(-10,10,10)

y_hyperplane = slope * x_hyperplane + intercept

fig = plt.figure(figsize=(8,6))

plt.scatter(X[:,0], X[:,1], c=y)

plt.plot(x_hyperplane, y_hyperplane, '-')

plt.title("Dataset and fitted decision hyperplane")

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.show()

In [14]:

plot_hyperplane(X, y, w_trained, b_trained)

04 K 最近邻算法

k-nn 算法是一种简单的监督式的机器学习算法，可以用于解决分类和回归问题。这是一个基于实例的算法，并不是估算模型，而是将所有训练样本存储在内存中，并使用相似性度量进行预测。

给定一个输入示例，k-nn 算法将从内存中检索 k 个最相似的实例。相似性是根据距离来定义的，也就是说，与输入示例之间距离最小(欧几里得距离)的训练样本被认为是最相似的样本。

输入示例的目标值计算如下：

分类问题：

a) 不加权：输出 k 个最近邻中最常见的分类

b) 加权：将每个分类值的k个最近邻的权重相加，输出权重最高的分类

回归问题：

a) 不加权：输出k个最近邻值的平均值

b) 加权：对于所有分类值，将分类值加权求和并将结果除以所有权重的总和

加权版本的 k-nn 算法是改进版本，其中每个近邻的贡献值根据其与查询点之间的距离进行加权。下面，我们在 sklearn 用 k-nn 算法的原始版本实现数字数据集的分类。

In [1]:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.datasets import load_digits

from sklearn.model_selection import train_test_split

np.random.seed(123)

% matplotlib inline

数据集

In [2]:

# We will use the digits dataset as an example. It consists of the 1797 images of hand-written digits. Each digit is

# represented by a 64-dimensional vector of pixel values.

digits = load_digits()

X, y = digits.data, digits.target

X_train, X_test, y_train, y_test = train_test_split(X, y)

print(f'X_train shape: {X_train.shape}')

print(f'y_train shape: {y_train.shape}')

print(f'X_test shape: {X_test.shape}')

print(f'y_test shape: {y_test.shape}')

# Example digits

fig = plt.figure(figsize=(10,8))

for i in range(10):

ax = fig.add_subplot(2, 5, i+1)

plt.imshow(X[i].reshape((8,8)), cmap='gray')

X_train shape: (1347, 64)

y_train shape: (1347,)

X_test shape: (450, 64)

y_test shape: (450,)

K 最邻近类别

In [3]:

class kNN():

def __init__(self):

pass

def fit(self, X, y):

self.data = X

self.targets = y

def euclidean_distance(self, X):

"""

Computes the euclidean distance between the training data and

a new input example or matrix of input examples X

"""

# input: single data point

if X.ndim == 1:

l2 = np.sqrt(np.sum((self.data - X)**2, axis=1))

# input: matrix of data points

if X.ndim == 2:

n_samples, _ = X.shape

l2 = [np.sqrt(np.sum((self.data - X[i])**2, axis=1)) for i in range(n_samples)]

return np.array(l2)

def predict(self, X, k=1):

"""

Predicts the classification for an input example or matrix of input examples X

"""

# step 1: compute distance between input and training data

dists = self.euclidean_distance(X)

# step 2: find the k nearest neighbors and their classifications

if X.ndim == 1:

if k == 1:

nn = np.argmin(dists)

return self.targets[nn]

else:

knn = np.argsort(dists)[:k]

y_knn = self.targets[knn]

max_vote = max(y_knn, key=list(y_knn).count)

return max_vote

if X.ndim == 2:

knn = np.argsort(dists)[:, :k]

y_knn = self.targets[knn]

if k == 1:

return y_knn.T

else:

n_samples, _ = X.shape

max_votes = [max(y_knn[i], key=list(y_knn[i]).count) for i in range(n_samples)]

return max_votes

初始化并训练模型

In [11]:

knn = kNN()

knn.fit(X_train, y_train)

print("Testing one datapoint, k=1")

print(f"Predicted label: {knn.predict(X_test[0], k=1)}")

print(f"True label: {y_test[0]}")

print()

print("Testing one datapoint, k=5")

print(f"Predicted label: {knn.predict(X_test[20], k=5)}")

print(f"True label: {y_test[20]}")

print()

print("Testing 10 datapoint, k=1")

print(f"Predicted labels: {knn.predict(X_test[5:15], k=1)}")

print(f"True labels: {y_test[5:15]}")

print()

print("Testing 10 datapoint, k=4")

print(f"Predicted labels: {knn.predict(X_test[5:15], k=4)}")

print(f"True labels: {y_test[5:15]}")

print()

Testing one datapoint, k=1

Predicted label: 3

True label: 3

Testing one datapoint, k=5

Predicted label: 9

True label: 9

Testing 10 datapoint, k=1

Predicted labels: [[3 1 0 7 4 0 0 5 1 6]]

True labels: [3 1 0 7 4 0 0 5 1 6]

Testing 10 datapoint, k=4

Predicted labels: [3, 1, 0, 7, 4, 0, 0, 5, 1, 6]

True labels: [3 1 0 7 4 0 0 5 1 6]

测试集精度

In [12]:

# Compute accuracy on test set

y_p_test1 = knn.predict(X_test, k=1)

test_acc1= np.sum(y_p_test1[0] == y_test)/len(y_p_test1[0]) * 100

print(f"Test accuracy with k = 1: {format(test_acc1)}")

y_p_test8 = knn.predict(X_test, k=5)

test_acc8= np.sum(y_p_test8 == y_test)/len(y_p_test8) * 100

print(f"Test accuracy with k = 8: {format(test_acc8)}")

Test accuracy with k = 1: 97.77777777777777

Test accuracy with k = 8: 97.55555555555556

05 K均值聚类算法

K-Means 是一种非常简单的聚类算法(聚类算法都属于无监督学习)。给定固定数量的聚类和输入数据集，该算法试图将数据划分为聚类，使得聚类内部具有较高的相似性，聚类与聚类之间具有较低的相似性。

算法原理

1. 初始化聚类中心，或者在输入数据范围内随机选择，或者使用一些现有的训练样本(推荐)

2. 直到收敛

将每个数据点分配到最近的聚类。点与聚类中心之间的距离是通过欧几里德距离测量得到的。

通过将聚类中心的当前估计值设置为属于该聚类的所有实例的平均值，来更新它们的当前估计值。

目标函数

聚类算法的目标函数试图找到聚类中心，以便数据将划分到相应的聚类中，并使得数据与其最接近的聚类中心之间的距离尽可能小。

给定一组数据X1，...，Xn和一个正数k，找到k个聚类中心C1，...，Ck并最小化目标函数：

这里：

决定了数据点

是否属于类

表示类

的聚类中心

表示欧几里得距离

K-Means 算法的缺点：

聚类的个数在开始就要设定

聚类的结果取决于初始设定的聚类中心

对异常值很敏感

不适合用于发现非凸聚类问题

该算法不能保证能够找到全局最优解，因此它往往会陷入一个局部最优解

In [21]:

import numpy as np

import matplotlib.pyplot as plt

import random

from sklearn.datasets import make_blobs

np.random.seed(123)

% matplotlib inline

数据集

In [22]:

X, y = make_blobs(centers=4, n_samples=1000)

print(f'Shape of dataset: {X.shape}')

fig = plt.figure(figsize=(8,6))

plt.scatter(X[:,0], X[:,1], c=y)

plt.title("Dataset with 4 clusters")

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.show()

Shape of dataset: (1000, 2)

K均值分类

In [23]:

class KMeans():

def __init__(self, n_clusters=4):

self.k = n_clusters

def fit(self, data):

"""

Fits the k-means model to the given dataset

"""

n_samples, _ = data.shape

# initialize cluster centers

self.centers = np.array(random.sample(list(data), self.k))

self.initial_centers = np.copy(self.centers)

# We will keep track of whether the assignment of data points

# to the clusters has changed. If it stops changing, we are

# done fitting the model

old_assigns = None

n_iters = 0

while True:

new_assigns = [self.classify(datapoint) for datapoint in data]

if new_assigns == old_assigns:

print(f"Training finished after {n_iters} iterations!")

return

old_assigns = new_assigns

n_iters += 1

# recalculate centers

for id_ in range(self.k):

points_idx = np.where(np.array(new_assigns) == id_)

datapoints = data[points_idx]

self.centers[id_] = datapoints.mean(axis=0)

def l2_distance(self, datapoint):

dists = np.sqrt(np.sum((self.centers - datapoint)**2, axis=1))

return dists

def classify(self, datapoint):

"""

Given a datapoint, compute the cluster closest to the

datapoint. Return the cluster ID of that cluster.

"""

dists = self.l2_distance(datapoint)

return np.argmin(dists)

def plot_clusters(self, data):

plt.figure(figsize=(12,10))

plt.title("Initial centers in black, final centers in red")

plt.scatter(data[:, 0], data[:, 1], marker='.', c=y)

plt.scatter(self.centers[:, 0], self.centers[:,1], c='r')

plt.scatter(self.initial_centers[:, 0], self.initial_centers[:,1], c='k')

plt.show()

初始化并调整模型

kmeans = KMeans(n_clusters=4)

kmeans.fit(X)

Training finished after 4 iterations!

描绘初始和最终的聚类中心

kmeans.plot_clusters(X)

06 简单的神经网络

在这一章节里，我们将实现一个简单的神经网络架构，将 2 维的输入向量映射成二进制输出值。我们的神经网络有 2 个输入神经元，含 6 个隐藏神经元隐藏层及 1 个输出神经元。

我们将通过层之间的权重矩阵来表示神经网络结构。在下面的例子中，输入层和隐藏层之间的权重矩阵将被表示为

，隐藏层和输出层之间的权重矩阵为

。除了连接神经元的权重向量外，每个隐藏和输出的神经元都会有一个大小为 1 的偏置量。

我们的训练集由 m = 750 个样本组成。因此，我们的矩阵维度如下：

训练集维度： X = (750，2)

目标维度： Y = (750，1)

维度：(m，nhidden) = (2,6)

维度：(bias vector)：(1，nhidden) = (1,6)

维度： (nhidden，noutput)= (6,1)

维度：(bias vector)：(1，noutput) = (1,1)

损失函数

我们使用与 Logistic 回归算法相同的损失函数：

对于多类别的分类任务，我们将使用这个函数的通用形式作为损失函数，称之为分类交叉熵函数。

训练

我们将用梯度下降法来训练我们的神经网络，并通过反向传播法来计算所需的偏导数。训练过程主要有以下几个步骤：

1. 初始化参数(即权重量和偏差量)

2. 重复以下过程，直到收敛：

通过网络传播当前输入的批次大小，并计算所有隐藏和输出单元的激活值和输出值。

针对每个参数计算其对损失函数的偏导数

更新参数

前向传播过程

首先，我们计算网络中每个单元的激活值和输出值。为了加速这个过程的实现，我们不会单独为每个输入样本执行此操作，而是通过矢量化对所有样本一次性进行处理。其中：

表示对所有训练样本激活隐层单元的矩阵

表示对所有训练样本输出隐层单位的矩阵

隐层神经元将使用 tanh 函数作为其激活函数：

输出层神经元将使用 sigmoid 函数作为激活函数：

激活值和输出值计算如下(·表示点乘)：

反向传播过程

为了计算权重向量的更新值，我们需要计算每个神经元对损失函数的偏导数。这里不会给出这些公式的推导，你会在其他网站上找到很多更好的解释(https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)。

对于输出神经元，梯度计算如下(矩阵符号)：

对于输入和隐层的权重矩阵，梯度计算如下：

权重更新

In [3]:

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.datasets import make_circles

from sklearn.model_selection import train_test_split

np.random.seed(123)

% matplotlib inline

数据集

In [4]:

X, y = make_circles(n_samples=1000, factor=0.5, noise=.1)

fig = plt.figure(figsize=(8,6))

plt.scatter(X[:,0], X[:,1], c=y)

plt.xlim([-1.5, 1.5])

plt.ylim([-1.5, 1.5])

plt.title("Dataset")

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.show()

In [5]:

# reshape targets to get column vector with shape (n_samples, 1)

y_true = y[:, np.newaxis]

# Split the data into a training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y_true)

print(f'Shape X_train: {X_train.shape}')

print(f'Shape y_train: {y_train.shape}')

print(f'Shape X_test: {X_test.shape}')

print(f'Shape y_test: {y_test.shape}')

Shape X_train: (750, 2)

Shape y_train: (750, 1)

Shape X_test: (250, 2)

Shape y_test: (250, 1)

Neural Network Class

以下部分实现受益于吴恩达的课程

https://www.coursera.org/learn/neural-networks-deep-learning

class NeuralNet():

def __init__(self, n_inputs, n_outputs, n_hidden):

self.n_inputs = n_inputs

self.n_outputs = n_outputs

self.hidden = n_hidden

# Initialize weight matrices and bias vectors

self.W_h = np.random.randn(self.n_inputs, self.hidden)

self.b_h = np.zeros((1, self.hidden))

self.W_o = np.random.randn(self.hidden, self.n_outputs)

self.b_o = np.zeros((1, self.n_outputs))

def sigmoid(self, a):

return 1 / (1 + np.exp(-a))

def forward_pass(self, X):

"""

Propagates the given input X forward through the net.

Returns:

A_h: matrix with activations of all hidden neurons for all input examples

O_h: matrix with outputs of all hidden neurons for all input examples

A_o: matrix with activations of all output neurons for all input examples

O_o: matrix with outputs of all output neurons for all input examples

"""

# Compute activations and outputs of hidden units

A_h = np.dot(X, self.W_h) + self.b_h

O_h = np.tanh(A_h)

# Compute activations and outputs of output units

A_o = np.dot(O_h, self.W_o) + self.b_o

O_o = self.sigmoid(A_o)

outputs = {

"A_h": A_h,

"A_o": A_o,

"O_h": O_h,

"O_o": O_o,

}

return outputs

def cost(self, y_true, y_predict, n_samples):

"""

Computes and returns the cost over all examples

"""

# same cost function as in logistic regression

cost = (- 1 / n_samples) * np.sum(y_true * np.log(y_predict) + (1 - y_true) * (np.log(1 - y_predict)))

cost = np.squeeze(cost)

assert isinstance(cost, float)

return cost

def backward_pass(self, X, Y, n_samples, outputs):

"""

Propagates the errors backward through the net.

Returns:

dW_h: partial derivatives of loss function w.r.t hidden weights

db_h: partial derivatives of loss function w.r.t hidden bias

dW_o: partial derivatives of loss function w.r.t output weights

db_o: partial derivatives of loss function w.r.t output bias

"""

dA_o = (outputs["O_o"] - Y)

dW_o = (1 / n_samples) * np.dot(outputs["O_h"].T, dA_o)

db_o = (1 / n_samples) * np.sum(dA_o)

dA_h = (np.dot(dA_o, self.W_o.T)) * (1 - np.power(outputs["O_h"], 2))

dW_h = (1 / n_samples) * np.dot(X.T, dA_h)

db_h = (1 / n_samples) * np.sum(dA_h)

gradients = {

"dW_o": dW_o,

"db_o": db_o,

"dW_h": dW_h,

"db_h": db_h,

}

return gradients

def update_weights(self, gradients, eta):

"""

Updates the model parameters using a fixed learning rate

"""

self.W_o = self.W_o - eta * gradients["dW_o"]

self.W_h = self.W_h - eta * gradients["dW_h"]

self.b_o = self.b_o - eta * gradients["db_o"]

self.b_h = self.b_h - eta * gradients["db_h"]

def train(self, X, y, n_iters=500, eta=0.3):

"""

Trains the neural net on the given input data

"""

n_samples, _ = X.shape

for i in range(n_iters):

outputs = self.forward_pass(X)

cost = self.cost(y, outputs["O_o"], n_samples=n_samples)

gradients = self.backward_pass(X, y, n_samples, outputs)

if i % 100 == 0:

print(f'Cost at iteration {i}: {np.round(cost, 4)}')

self.update_weights(gradients, eta)

def predict(self, X):

"""

Computes and returns network predictions for given dataset

"""

outputs = self.forward_pass(X)

y_pred = [1 if elem >= 0.5 else 0 for elem in outputs["O_o"]]

return np.array(y_pred)[:, np.newaxis]

初始化并训练神经网络

nn = NeuralNet(n_inputs=2, n_hidden=6, n_outputs=1)

print("Shape of weight matrices and bias vectors:")

print(f'W_h shape: {nn.W_h.shape}')

print(f'b_h shape: {nn.b_h.shape}')

print(f'W_o shape: {nn.W_o.shape}')

print(f'b_o shape: {nn.b_o.shape}')

print()

print("Training:")

nn.train(X_train, y_train, n_iters=2000, eta=0.7)

Shape of weight matrices and bias vectors:

W_h shape: (2, 6)

b_h shape: (1, 6)

W_o shape: (6, 1)

b_o shape: (1, 1)

Training:

Cost at iteration 0: 1.0872

Cost at iteration 100: 0.2723

Cost at iteration 200: 0.1712

Cost at iteration 300: 0.1386

Cost at iteration 400: 0.1208

Cost at iteration 500: 0.1084

Cost at iteration 600: 0.0986

Cost at iteration 700: 0.0907

Cost at iteration 800: 0.0841

Cost at iteration 900: 0.0785

Cost at iteration 1000: 0.0739

Cost at iteration 1100: 0.0699

Cost at iteration 1200: 0.0665

Cost at iteration 1300: 0.0635

Cost at iteration 1400: 0.061

Cost at iteration 1500: 0.0587

Cost at iteration 1600: 0.0566

Cost at iteration 1700: 0.0547

Cost at iteration 1800: 0.0531

Cost at iteration 1900: 0.0515

测试神经网络

n_test_samples, _ = X_test.shape

y_predict = nn.predict(X_test)

print(f"Classification accuracy on test set: {(np.sum(y_predict == y_test)/n_test_samples)*100} %")

Classification accuracy on test set: 98.4 %

可视化决策边界

X_temp, y_temp = make_circles(n_samples=60000, noise=.5)

y_predict_temp = nn.predict(X_temp)

y_predict_temp = np.ravel(y_predict_temp)

fig = plt.figure(figsize=(8,12))

ax = fig.add_subplot(2,1,1)

plt.scatter(X[:,0], X[:,1], c=y)

plt.xlim([-1.5, 1.5])

plt.ylim([-1.5, 1.5])

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.title("Training and test set")

ax = fig.add_subplot(2,1,2)

plt.scatter(X_temp[:,0], X_temp[:,1], c=y_predict_temp)

plt.xlim([-1.5, 1.5])

plt.ylim([-1.5, 1.5])

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.title("Decision boundary")

Out[11]:Text(0.5,1,'Decision boundary')

07 Softmax 回归算法

Softmax 回归算法，又称为多项式或多类别的 Logistic 回归算法。

给定：

数据集

是d-维向量

是对应于

的目标变量，例如对于K=3分类问题，

Softmax 回归模型有以下几个特点：

对于每个类别，都存在一个独立的、实值加权向量

这个权重向量通常作为权重矩阵中的行。

对于每个类别，都存在一个独立的、实值偏置量b

它使用 softmax 函数作为其激活函数

它使用交叉熵( cross-entropy )作为损失函数

训练 Softmax 回归模型有不同步骤。首先(在步骤0中)，模型的参数将被初始化。在达到指定训练次数或参数收敛前，重复以下其他步骤。

第 0 步：用 0 (或小的随机值)来初始化权重向量和偏置值

第 1 步：对于每个类别k，计算其输入的特征与权重值的线性组合，也就是说为每个类别的训练样本计算一个得分值。对于类别k，输入向量为

,则得分值的计算如下：

其中表示类别k的权重矩阵

，·表示点积。

我们可以通过矢量化和矢量传播法则计算所有类别及其训练样本的得分值：

其中 X 是所有训练样本

的维度矩阵，W 表示每个类别的权重矩阵维度，其形式为

；

第 2 步：用 softmax 函数作为激活函数，将得分值转化为概率值形式。

属于类别 k 的输入向量的概率值为：

同样地，我们可以通过矢量化来对所有类别同时处理，得到其概率输出。模型预测出的表示的是该类别的最高概率。

第 3 步：计算整个训练集的损失值。

我们希望模型预测出的高概率值是目标类别，而低概率值表示其他类别。这可以通过以下的交叉熵损失函数来实现：

在上面公式中，目标类别标签表示成独热编码形式( one-hot )。因此

为1时表示

的目标类别是 k，反之则为 0。

第 4 步：对权重向量和偏置量，计算其对损失函数的梯度。

关于这个导数实现的详细解释，可以参见这里(http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/)。

一般形式如下：

对于偏置量的导数计算，此时

为1。

第 5 步：对每个类别k，更新其权重和偏置值。

其中，

表示学习率。

In [1]:

from sklearn.datasets import load_iris

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.datasets import make_blobs

import matplotlib.pyplot as plt

np.random.seed(13)

数据集

In [2]:

X, y_true = make_blobs(centers=4, n_samples = 5000)

fig = plt.figure(figsize=(8,6))

plt.scatter(X[:,0], X[:,1], c=y_true)

plt.title("Dataset")

plt.xlabel("First feature")

plt.ylabel("Second feature")

plt.show()

In [3]:

# reshape targets to get column vector with shape (n_samples, 1)

y_true = y_true[:, np.newaxis]

# Split the data into a training and test set

X_train, X_test, y_train, y_test = train_test_split(X, y_true)

print(f'Shape X_train: {X_train.shape}')

print(f'Shape y_train: {y_train.shape}')

print(f'Shape X_test: {X_test.shape}')

print(f'Shape y_test: {y_test.shape}')

Shape X_train: (3750, 2)

Shape y_train: (3750, 1)

Shape X_test: (1250, 2)

Shape y_test: (1250, 1)

Softmax回归分类

class SoftmaxRegressor:

def __init__(self):

pass

def train(self, X, y_true, n_classes, n_iters=10, learning_rate=0.1):

"""

Trains a multinomial logistic regression model on given set of training data

"""

self.n_samples, n_features = X.shape

self.n_classes = n_classes

self.weights = np.random.rand(self.n_classes, n_features)

self.bias = np.zeros((1, self.n_classes))

all_losses = []

for i in range(n_iters):

scores = self.compute_scores(X)

probs = self.softmax(scores)

y_predict = np.argmax(probs, axis=1)[:, np.newaxis]

y_one_hot = self.one_hot(y_true)

loss = self.cross_entropy(y_one_hot, probs)

all_losses.append(loss)

dw = (1 / self.n_samples) * np.dot(X.T, (probs - y_one_hot))

db = (1 / self.n_samples) * np.sum(probs - y_one_hot, axis=0)

self.weights = self.weights - learning_rate * dw.T

self.bias = self.bias - learning_rate * db

if i % 100 == 0:

print(f'Iteration number: {i}, loss: {np.round(loss, 4)}')

return self.weights, self.bias, all_losses

def predict(self, X):

"""

Predict class labels for samples in X.

Args:

X: numpy array of shape (n_samples, n_features)

Returns:

numpy array of shape (n_samples, 1) with predicted classes

"""

scores = self.compute_scores(X)

probs = self.softmax(scores)

return np.argmax(probs, axis=1)[:, np.newaxis]

def softmax(self, scores):

"""

Tranforms matrix of predicted scores to matrix of probabilities

Args:

scores: numpy array of shape (n_samples, n_classes)

with unnormalized scores

Returns:

softmax: numpy array of shape (n_samples, n_classes)

with probabilities

"""

exp = np.exp(scores)

sum_exp = np.sum(np.exp(scores), axis=1, keepdims=True)

softmax = exp / sum_exp

return softmax

def compute_scores(self, X):

"""

Computes class-scores for samples in X

Args:

X: numpy array of shape (n_samples, n_features)

Returns:

scores: numpy array of shape (n_samples, n_classes)

"""

return np.dot(X, self.weights.T) + self.bias

def cross_entropy(self, y_true, scores):

loss = - (1 / self.n_samples) * np.sum(y_true * np.log(scores))

return loss

def one_hot(self, y):

"""

Tranforms vector y of labels to one-hot encoded matrix

"""

one_hot = np.zeros((self.n_samples, self.n_classes))

one_hot[np.arange(self.n_samples), y.T] = 1

return one_hot

初始化并训练模型

regressor = SoftmaxRegressor()

w_trained, b_trained, loss = regressor.train(X_train, y_train, learning_rate=0.1, n_iters=800, n_classes=4)

fig = plt.figure(figsize=(8,6))

plt.plot(np.arange(800), loss)

plt.title("Development of loss during training")

plt.xlabel("Number of iterations")

plt.ylabel("Loss")

plt.show()Iteration number: 0, loss: 1.393

Iteration number: 100, loss: 0.2051

Iteration number: 200, loss: 0.1605

Iteration number: 300, loss: 0.1371

Iteration number: 400, loss: 0.121

Iteration number: 500, loss: 0.1087

Iteration number: 600, loss: 0.0989

Iteration number: 700, loss: 0.0909

测试模型

n_test_samples, _ = X_test.shape

y_predict = regressor.predict(X_test)

print(f"Classification accuracy on test set: {(np.sum(y_predict == y_test)/n_test_samples) * 100}%")

测试集分类准确率：99.03999999999999%

你可能感兴趣的:(100种算法的python实现)

开博尔支持超高清8K显示HDMI2.1线材评测体验只你不知测评文 HDMI2.1 HDMI2.1高清线 8K电视线 4K高清线
前言（网络整理）：虽然目前没有真正的HDMI2.1的电视机，但是HDMI协会针对HDMI2.1标准做出了临时参数标准和HDMI2.1连接器认证授权，经开博尔技术咨询后得知，开博尔对于HDMI协会当前对HDMI2.1的研产要求均满足，采用定制HDMI2.1认证连接器。HDMI2.0能够实现60FPS的4K图像或30FPS的8K图像。而新的HDMI2.1则可以显示120FPS的4K图像或60FPS的8
Java突击小练习--利用正则表达式来简易的校验邮箱与手机号格式 CJH~ java 正则表达式 mysql
//校验邮箱publicclassTestEmail{publicstaticvoidmain(String[]args){Scannerinput=newScanner(System.in);//*号代表任意数量，放在0-9a-zA-Z后面，代表可以写任意数量的字母和数字//@是邮箱的符号，接在刚刚那串的后面//|代表或，也就是说@后可以跟着qq或163或sina中的任意字符串，代表哪家邮箱//
如何使用 Spring Boot 实现分页和排序大G哥 spring boot 后端 java spring
在SpringBoot中实现分页和排序通常是通过SpringDataJPA或者SpringDataMongoDB提供的分页功能来完成的。以下是一个基于SpringDataJPA的分页和排序实现的基本步骤。1.添加依赖首先，确保你在pom.xml中包含了SpringDataJPA和数据库驱动的依赖。org.springframework.bootspring-boot-starter-data-jp
开源框架--Glide源码阅读下 Bonnie_cat 开源 glide
接上半部分开源框架–Glide源码阅读上，我们接着看Glide源码的with和load。3.源码阅读3.2load上半部分分析知道了with()方法返回的是RequestManager，下面看RequestManager的load()方法，@OverridepublicRequestBuilderload(@NullableStringstring){returnasDrawable().load
Elasticsearch大文件检索性能提升20倍实践（干货）_elasticsearch 查询优化 2401_84247505 2024年程序员学习 elasticsearch jenkins 大数据
3、问题排查与定位步骤1：限定返回记录条数。不提供直接访问末页的入口。baidu，360，搜狗等搜索引擎都不提供访问末页的请求方式。都是基于如下的请求方式：通过点击上一下、下一页逐页访问。这个从用户的角度也很好理解，搜索引擎返回的前面都是相关度最高的，也是用户最关心的信息。Elasticsearch的默认支持的数据条数是10000条，可以通过post请求修改。最终，本步骤将支持ES最大返回值100
URL拼接重复参数问题 Cloud_. 服务器运维后端 java springboot
在调用第三方API时，手动拼接URL参数容易因编码或重复拼接引发隐藏Bug。例如：Stringname="name=520";//参数值本身包含等号Stringurl=SERVICE_URL+"/add?key="+key+"&sid="+sid+"&name="+name;//错误拼接导致name=name=520最终生成的URL会变成：http://api.com/add?key=123&si
Spring Boot整合JWT 实现双Token机制 Cloud_. spring boot 后端 java
目录JWT核心概念解析SpringBoot整合步骤2.1基础环境搭建2.2Token生成与解析2.3拦截器实现企业级增强方案3.1双Token刷新机制3.2安全防护策略常见问题与解决方案1.JWT核心概念解析1.1Token的三重使命身份凭证：替代Session实现无状态认证信息载体：存储用户基础信息（如userid、roles）安全屏障：数字签名防止数据篡改1.2JWT结构示例Header{"a
Manus 一码难求，MetaGPT、OpenManus、Camel AI 会是替代方案吗？全干程序员demo 技术热文人工智能
Manus一码难求，MetaGPT、OpenManus、CamelAI会是替代方案吗？一、Manus的现象与问题Manus作为一款号称“全球首个通用AI智能体”的产品，凭借其强大的功能和新颖的营销策略迅速走红。然而，其封闭的邀请码机制和高昂的使用门槛，让普通开发者望而却步。Manus的邀请码被炒至高价，甚至出现账号冻结等现象，这引发了用户对其技术壁垒和实际应用价值的质疑。二、MetaGPT、Ope
【大模型对话的界面搭建-Open WebUI】 y_dd 人工智能深度学习人工智能 llama 语言模型
OpenWebUI前身就是OllamaWebUI，为Ollama提供一个可视化界面，可以完全离线运行，支持Ollama和兼容OpenAI的API。github网址https://github.com/open-webui/open-webui安装第一种docker安装如果ollama安装在同一台服务器上：dockerrun-d-p3000:8080--add-host=host.docker.in
go 加载yaml配置文件 zsd_666 后端 golang android 开发语言
go加载yaml配置文件config.yaml文件mysql:url:127.0.0.1userName:rootpassword:rootdbname:testport:3306准备结构体//用于接收yaml配置参数的struct结构体typeconfstruct{MysqlMysql`yaml:"mysql"`}typeMysqlstruct{Urlstring`yaml:"url"`User
信息学奥赛一本通C++语言-----1119：矩阵交换行宝祺祺吖 c++算法
【题目描述】给定一个5×55×5的矩阵(数学上，一个r×cr×c的矩阵是一个由rr行cc列元素排列成的矩形阵列)，将第nn行和第mm行交换，输出交换后的结果。【输入】输入共66行，前55行为矩阵的每一行元素,元素与元素之间以一个空格分开。第66行包含两个整数m、nm、n，以一个空格分开（1≤m,n≤5）（1≤m,n≤5）。【输出】输出交换之后的矩阵，矩阵的每一行元素占一行，元素之间以一个空格分开。
TidyBot++：用于机器人学习开源的完整移动机械手三谷秋水计算机视觉智能体人工智能机器人开源人工智能机器学习深度学习
24年12月来自普林斯顿、斯坦福和dexterity.ai的论文“TidyBot++:AnOpen-SourceHolonomicMobileManipulatorforRobotLearning”。要充分利用模仿学习在移动机械操作方面的最新进展，需要收集大量人工引导的演示。本文提出一种开源设计，用于设计一种廉价、坚固、灵活的移动机械手，该机械手可支撑任意臂，从而实现各种现实世界的家用移动机械操作
程序员必看！DeepSeek全栈开发指南：从代码生成到分布式训练的黑科技解析 AI创享派后端
一、DeepSeek技术新突破：程序员必须掌握的MoE架构实战2025年2月25日，DeepSeek开源了专为MoE模型设计的DeepEP通信库，这项技术革新直接影响了分布式训练和推理效率。该库支持FP8精度与NVLink/RDMA技术，吞吐量提升3倍以上，特别适合处理千亿级参数的分布式任务。对于后端工程师而言，DeepEP的以下特性值得关注：计算-通信重叠机制：通过回调函数实现GPU资源动态分配
还不会构建MindIE镜像？一篇文章搞定 Zain Lau vim 编辑器 linux MindIE 昇腾
MindIE镜像构建工程项目简介用于构建多平台/架构的MindiE镜像的脚本。用户可以根据需要准备好所需的软件包，修改相关配置并构建镜像。前提条件网络连接在整个构建过程中，必须保持稳定的网络连接。此构建工程依赖于在线下载多个资源，包括但不限于Python源码、编译工具以及各种依赖，无法离线构建。Docker推荐版本：Docker20.10.x及以上最低版本要求：Docker19.03.x安装方式：
elasticsearch analyzer 学习笔记 weixin_40455124 elasticsearch 代码分析及扩展 elasticsearch analyzer token
基本定义analyzer执行将输入字符流分解为token的过程使用场景在indexing的时候，也即在建立索引的时候在searching的时候，也即在搜索时，分析需要搜索的词语analysisCharacterfiltering(字符过滤器):使用字符过滤器转换字符Breakingtextintotokens(把文字转化为标记):将文本分成一组一个或多个标记Tokenfiltering：使用标记过
Android StrictMode 使用与原理深度解析伟江.Zeng Android基础 android StrictMode 性能优化内存泄漏代码规范耗时检测 kotlin
AndroidStrictMode是Android系统提供的一种开发者工具，用于检测应用主线程中不合理的耗时操作（如磁盘I/O、网络请求等）和内存泄漏问题。通过配置策略和惩罚机制，它帮助开发者在早期发现潜在性能问题，提升应用流畅性。以下从使用方式和实现原理两方面进行深度解析。一、StrictMode使用详解1.基础配置在Application或Activity的onCreate()中初始化Stri
adb shell input text 完美支持中文输入 hzm326 python android windows linux adb
adb默认是不支持Unicode编码的，无法通过adbshellinputtext命令输入中文到手机或模拟器解决中文输入还得感谢老外写了一个输入法，源码地址：https://github.com/senzhk/ADBKeyBoard第一步：安装ADBKeyBoard.apk文件打开手机或模拟器，adbinstallADBKeyBoard.apk安装该输入法或者直接安装即可第二步：设置默认输入法默认
【Android】adb shell基本使用教程 Vesper63 android adb
adbshell是AndroidDebugBridge(ADB)工具中的一个命令，用于在连接的Android设备或模拟器上执行shell命令。通过adbshell，你可以直接与设备的Linux内核交互，执行各种操作。基本用法启动adbshell：在终端或命令提示符中输入以下命令：adbshell这将进入设备的shell环境，提示符通常会变为$或#（#表示root权限）。执行单个命令：如果你只想执行
MATLAB算法实战应用案例精讲-【深度学习】归一化林聪木 matlab 算法深度学习
目录为什么要做特征归一化/标准化？常用featurescaling方法计算方式上对比分析featurescaling需要还是不需要什么时候需要featurescaling？什么时候不需要FeatureScaling？归一化基础知识点1.什么是归一化2.为什么要归一化3.为什么归一化能提高求解最优解的速度4.归一化有哪些类型5.不同归一化的使用条件6.归一化和标准化的联系与区别层归一化综述提出背景概
SSL证书自动续签(解决泛域名续签问题) 月会 ssl自动续签
文章目录SSL证书自动生成并自动续期Let’sEncryptCertbot介绍申请ssl证书下载certbot申请证书非泛域名申请证书nginx使用证书证书续期脚本linux定时执行脚本泛域名SSL证书自动生成并自动续期自动续期使用Let’sEncrypt证书颁发机构和certbot客户端共同完成Let’sEncryptLet’sEncrypt是一家免费、开放、自动化的证书颁发机构（CA），为公众
顺序表和链表的比较数九天有一个秘密链表数据结构算法
这两个结构各有优势，相辅相成。顺序表：优点：1.支持随机访问。2.CPU高速缓存命中率更高。(物理空间连续)缺点：1.头部和中部插入和删除时间效率低(O(n))。2.连续的物理空间，空间不够后需要增容：a.增容有一定程度的消耗。b.为了避免频繁的进行增容，我们一般都按照倍数去增容，用不完会有一定的空间浪费。链表(带头循环双链表)优点：1.任意位置插入删除效率高(O(n))。2.按需申请和释放空间。
AtCoder Beginner Contest 275 A-D题解 Gowilli AtCoder c++算法数据结构
比赛名称：AtCoderBeginnerContest275A-FindTakahashi找出最大的元素并输出下标使用两个变量一个存储当前找到的最大值一个存储找到的最大值对应的下标，若当前数大于最大值更新最大值和下标AC代码//Problem:A-FindTakahashi//Contest:AtCoder-AtCoderBeginnerContest275//URL:https://atcode
Redis7——进阶篇（四）啥也不会的小神龙· Redis系列 redis 缓存学习 redis经典面试题
前言：此篇文章系本人学习过程中记录下来的笔记，里面难免会有不少欠缺的地方，诚心期待大家多多给予指教。基础篇：Redis（一）Redis（二）Redis（三）Redis（四）Redis（五）Redis（六）Redis（七）Redis（八）进阶篇：Redis（九）Redis（十）Redis（十一）接上期内容：上期完成了缓存双写一致性方面的学习。下面学习HyperLogLog/Geo/Bitmap实际案
【大模型UI\多模型回复UI】 Ai君臣 LLMS 微调 ui 大LLMS UI
文章目录1、开源大模型用户界面（UI）2、同时让多个模型回复UI1、开源大模型用户界面（UI）LobeChatOpenWebUI：这是一款功能丰富且用户友好的开源自托管AI界面，旨在完全离线运行。它支持多种大型语言模型（LLM），包括Ollama和兼容OpenAI的API。OpenWebUI提供直观的界面，支持多模型和多模态交互，具有全面的Markdown和LaTeX支持，以及本地RAG集成等功能
记一次联想ThinkBook 16P G5 IRX ，麦克风无声音问题的解决花花鱼 Windows windows 音频
1、微信语音麦克风无声音在电脑上微信电话，麦克风的功能没有，或者说你要录个屏给客户，发现讲不了话，也是比较的麻烦。2、联系客服建议升级声卡驱动，然后更新了以后，一个样没什么区别。各种设置，发现还是不行。3、声音设置当然，图片上的是静音麦克风了，按一下键就可以去掉。4、专家给了工具解决旧版驱动残留文件清除工具.zip链接:https://pan.baidu.com/s/1eVjT_QjYk_vz10
使用 certbot 在centos7 搭建ssl证书自动并且续约 TwoSs110 ssl https
第一步，确定服务器适合安装的certbot版本sudoyuminstallpython27如果上述方法不起作用，你可以尝试编译安装。首先，你需要安装编译Python所需的依赖包。sudoyuminstallgccmakeopenssl-develsqlite-develreadline-develzlib-develbzip2-devel接下来，下载Python2.7.5的源代码，并进行编译安装。
PCIe信号传输的幕后：HCSL与LP-HCSL深度解析赛卡单片机嵌入式硬件服务器人工智能硬件架构 fpga开发
在数字化浪潮席卷的当下，PCIe（PeripheralComponentInterconnectExpress）作为高速串行计算机扩展总线标准，已然成为计算机内部硬件设备连接领域的中流砥柱。其信号传输的质量与完整性，恰似计算机系统运行的“命门”，对系统整体性能起着决定性作用。在PCIe体系架构里，HCSL（High-speedCurrentSteeringLogic）与LP-HCSL（Low-Po
半导体可靠性测试解析：HTOL、LTOL与Burn-In 赛卡硬件架构汽车车载系统
引言在半导体器件复杂度与可靠性要求同步提升的今天，高温工作寿命测试（HTOL）、低温寿命测试（LTOL）和老化筛选测试（Burn-In）构成了芯片可靠性验证的三大支柱。这些测试通过模拟极端环境下的失效机制，帮助制造商提前发现潜在缺陷，优化设计并满足汽车、工业等领域的严苛标准。本文将从测试原理、标准要求及报告解读维度展开深度解析。一、核心测试方法的技术边界与协同逻辑1.HTOL（高温工作寿命测试）测
DeepSeek：全栈开发者视角下的AI革命者大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 人工智能
DeepSeek：全栈开发者视角下的AI革命者写在前面随着人工智能（AI）技术的不断进步，AI已经成为各行各业创新的核心动力。从自动驾驶到智能制造，再到自然语言处理和图像识别，AI正在逐渐渗透并改变着我们的生活和工作方式。DeepSeek，作为AI领域的新兴技术，凭借其独特的技术架构和颠覆性的创新理念，成为了全栈开发者关注的焦点。本文将从全栈开发者的角度出发，详细解析DeepSeek的诞生、技术架
设计空间探索：乘法器设计的面积、延时、功耗优化赛卡人工智能前端算法
复杂压缩器可压缩更多高度,减少层数(外层while循环次数),但延迟较高。使用哪些压缩器以何种方案进行压缩,是一个设计空间探索问题。1.压缩器种类的选择4-2压缩器：由两个全加器（FA）组成，能够将4个输入压缩为2个输出（和与进位）。适用于中等规模的压缩需求，可以有效减少部分积的位宽。6-2压缩器：能够将6个输入压缩为2个输出，适用于较大规模的压缩需求，尤其在多列压缩时可以减少层次数量。9-2压缩
Enum用法不懂事的小屁孩 enum
以前的时候知道enum，但是真心不怎么用，在实际开发中，经常会用到以下代码: protected final static String XJ = "XJ"; protected final static String YHK = "YHK"; protected final static String PQ = "PQ";
【Spark九十七】RDD API之aggregateByKey bit1129 spark
1. aggregateByKey的运行机制 /** * Aggregate the values of each key, using given combine functions and a neutral "zero value". * This function can return a different result type
hive创建表是报错： Specified key was too long; max key length is 767 bytes daizj hive
今天在hive客户端创建表时报错，具体操作如下 hive> create table test2(id string); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataSto
Map 与 JavaBean之间的转换周凡杨 java 自省转换反射
最近项目里需要一个工具类，它的功能是传入一个Map后可以返回一个JavaBean对象。很喜欢写这样的Java服务，首先我想到的是要通过Java 的反射去实现匿名类的方法调用，这样才可以把Map里的值set 到JavaBean里。其实这里用Java的自省会更方便，下面两个方法就是一个通过反射，一个通过自省来实现本功能。 1：JavaBean类 1 &nb
java连接ftp下载 g21121 java
有的时候需要用到java连接ftp服务器下载，上传一些操作，下面写了一个小例子。 /** ftp服务器地址 */ private String ftpHost; /** ftp服务器用户名 */ private String ftpName; /** ftp服务器密码 */ private String ftpPass; /** ftp根目录 */ private String f
web报表工具FineReport使用中遇到的常见报错及解决办法（二）老A不折腾 finereport web报表 java报表总结
抛砖引玉，希望大家能把自己整理的问题及解决方法晾出来，Mark一下，利人利己。出现问题先搜一下文档上有没有，再看看度娘有没有，再看看论坛有没有。有报错要看日志。下面简单罗列下常见的问题，大多文档上都有提到的。 1、没有返回数据集：在存储过程中的操作语句之前加上set nocount on 或者在数据集exec调用存储过程的前面加上这句。当S
linux 系统cpu 内存等信息查看墙头上一根草 cpu 内存 liunx
1 查看CPU 　　1.1 查看CPU个数　　# cat /proc/cpuinfo | grep "physical id" | uniq | wc -l 　　2 　　**uniq命令：删除重复行;wc –l命令：统计行数** 　　1.2 查看CPU核数　　# cat /proc/cpuinfo | grep "cpu cores" | u
Spring中的AOP aijuans spring AOP
Spring中的AOP Written by Tony Jiang @ 2012-1-18 （转）何为AOP AOP，面向切面编程。在不改动代码的前提下，灵活的在现有代码的执行顺序前后，添加进新规机能。来一个简单的Sample: 目标类： [java] view plain copy print ? package&nb
placeholder(HTML 5) IE 兼容插件 alxw4616 JavaScript jquery jQuery插件
placeholder 这个属性被越来越频繁的使用. 但为做HTML 5 特性IE没能实现这东西. 以下的jQuery插件就是用来在IE上实现该属性的. /** * [placeholder(HTML 5) IE 实现.IE9以下通过测试.] * v 1.0 by oTwo 2014年7月31日 11:45:29 */ $.fn.placeholder = function
Object类,值域,泛型等总结(适合有基础的人看) 百合不是茶泛型的继承和通配符变量的值域 Object类转换
java的作用域在编程的时候经常会遇到,而我经常会搞不清楚这个问题,所以在家的这几天回忆一下过去不知道的每个小知识点变量的值域; package 基础; /** * 作用域的范围 * * @author Administrator * */ public class zuoyongyu { public static vo
JDK1.5 Condition接口 bijian1013 java thread Condition java多线程
Condition 将 Object 监视器方法（wait、notify和 notifyAll）分解成截然不同的对象，以便通过将这些对象与任意 Lock 实现组合使用，为每个对象提供多个等待 set （wait-set）。其中，Lock 替代了 synchronized 方法和语句的使用，Condition 替代了 Object 监视器方法的使用。条件（也称为条件队列或条件变量）为线程提供了一
开源中国OSC源创会记录 bijian1013 hadoop spark MemSQL
一.Strata+Hadoop World（SHW）大会是全世界最大的大数据大会之一。SHW大会为各种技术提供了深度交流的机会，还会看到最领先的大数据技术、最广泛的应用场景、最有趣的用例教学以及最全面的大数据行业和趋势探讨。二.Hadoop &nbs
【Java范型七】范型消除 bit1129 java
范型是Java1.5引入的语言特性，它是编译时的一个语法现象，也就是说，对于一个类，不管是范型类还是非范型类，编译得到的字节码是一样的，差别仅在于通过范型这种语法来进行编译时的类型检查，在运行时是没有范型或者类型参数这个说法的。范型跟反射刚好相反，反射是一种运行时行为，所以编译时不能访问的变量或者方法(比如private)，在运行时通过反射是可以访问的，也就是说，可见性也是一种编译时的行为，在
【Spark九十四】spark-sql工具的使用 bit1129 spark
spark-sql是Spark bin目录下的一个可执行脚本，它的目的是通过这个脚本执行Hive的命令，即原来通过 hive>输入的指令可以通过spark-sql>输入的指令来完成。 spark-sql可以使用内置的Hive metadata-store，也可以使用已经独立安装的Hive的metadata store 关于Hive build into Spark
js做的各种倒计时 ronin47 js 倒计时
第一种：精确到秒的javascript倒计时代码 HTML代码: <form name="form1"> <div align="center" align="middle"
java-37.有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接 bylijinnan java
public class MaxCatenate { /* * Q.37 有n 个长为m+1 的字符串，如果某个字符串的最后m 个字符与某个字符串的前m 个字符匹配，则两个字符串可以联接， * 问这n 个字符串最多可以连成一个多长的字符串，如果出现循环，则返回错误。 */ public static void main(String[] args){
mongoDB安装开窍的石头 mongodb安装基本操作
mongoDB的安装 1:mongoDB下载 https://www.mongodb.org/downloads 2:下载mongoDB下载后解压
[开源项目]引擎的关键意义 comsci 开源项目
一个系统，最核心的东西就是引擎。。。。。而要设计和制造出引擎，最关键的是要坚持。。。。。。现在最先进的引擎技术，也是从莱特兄弟那里出现的，但是中间一直没有断过研发的
软件度量的一些方法 cuiyadll 方法
软件度量的一些方法http://cuiyingfeng.blog.51cto.com/43841/6775/在前面我们已介绍了组成软件度量的几个方面。在这里我们将先给出关于这几个方面的一个纲要介绍。在后面我们还会作进一步具体的阐述。当我们不从高层次的概念级来看软件度量及其目标的时候，我们很容易把这些活动看成是不同而且毫不相干的。我们现在希望表明他们是怎样恰如其分地嵌入我们的框架的。也就是我们度量的
XSD中的targetNameSpace解释 darrenzhu xml namespace xsd targetnamespace
参考链接: http://blog.csdn.net/colin1014/article/details/357694 xsd文件中定义了一个targetNameSpace后，其内部定义的元素，属性，类型等都属于该targetNameSpace,其自身或外部xsd文件使用这些元素，属性等都必须从定义的targetNameSpace中找：例如：以下xsd文件，就出现了该错误，即便是在一
什么是RAID0、RAID1、RAID0+1、RAID5，等磁盘阵列模式? dcj3sjt126com raid
RAID 1又称为Mirror或Mirroring，它的宗旨是最大限度的保证用户数据的可用性和可修复性。 RAID 1的操作方式是把用户写入硬盘的数据百分之百地自动复制到另外一个硬盘上。由于对存储的数据进行百分之百的备份，在所有RAID级别中，RAID 1提供最高的数据安全保障。同样，由于数据的百分之百备份，备份数据占了总存储空间的一半，因而，Mirror的磁盘空间利用率低，存储成本高。 Mir
yii2 restful web服务快速入门 dcj3sjt126com PHP yii2
快速入门 Yii 提供了一整套用来简化实现 RESTful 风格的 Web Service 服务的 API。特别是，Yii 支持以下关于 RESTful 风格的 API：支持 Active Record 类的通用API的快速原型涉及的响应格式（在默认情况下支持 JSON 和 XML) 支持可选输出字段的定制对象序列化适当的格式的数据采集和验证错误
MongoDB查询(3)——内嵌文档查询（七） eksliang MongoDB查询内嵌文档 MongoDB查询内嵌数组
MongoDB查询内嵌文档转载请出自出处：http://eksliang.iteye.com/blog/2177301 一、概述有两种方法可以查询内嵌文档：查询整个文档；针对键值对进行查询。这两种方式是不同的，下面我通过例子进行分别说明。二、查询整个文档例如:有如下文档 db.emp.insert({ &qu
android4.4从系统图库无法加载图片的问题 gundumw100 android
典型的使用场景就是要设置一个头像，头像需要从系统图库或者拍照获得，在android4.4之前，我用的代码没问题，但是今天使用android4.4的时候突然发现不灵了。baidu了一圈，终于解决了。下面是解决方案： private String[] items = new String[] { "图库","拍照" }; /* 头像名称 */
网页特效大全 jQuery等 ini JavaScript jquery css html5 ini
HTML5和CSS3知识和特效 asp.net ajax jquery实例分享一个下雪的特效 jQuery倾斜的动画导航菜单选美大赛示例你会选谁 jQuery实现HTML5时钟功能强大的滚动播放插件JQ-Slide 万圣节快乐！！！向上弹出菜单jQuery插件 htm5视差动画 jquery将列表倒转顺序推荐一个jQuery分页插件 jquery animate
swift objc_setAssociatedObject block(version1.2 xcode6.4) 啸笑天 version
import UIKit class LSObjectWrapper: NSObject { let value: ((barButton: UIButton?) -> Void)? init(value: (barButton: UIButton?) -> Void) { self.value = value
Aegis 默认的 Xfire 绑定方式，将 XML 映射为 POJO MagicMa_007 java POJO xml Aegis xfire
Aegis 是一个默认的 Xfire 绑定方式，它将 XML 映射为 POJO, 支持代码先行的开发.你开发服务类与 POJO,它为你生成 XML schema/wsdl XML 和注解映射概览默认情况下，你的 POJO 类被是基于他们的名字与命名空间被序列化。如果
js get max value in (json) Array qiaolevip 每天进步一点点学习永无止境 max 纵观千象
// Max value in Array var arr = [1,2,3,5,3,2];Math.max.apply(null, arr); // 5 // Max value in Jaon Array var arr = [{"x":"8/11/2009","y":0.026572007},{"x"
XMLhttpRequest 请求 XML,JSON ,POJO 数据 Luob. POJO json Ajax xml XMLhttpREquest
在使用XMlhttpRequest对象发送请求和响应之前，必须首先使用javaScript对象创建一个XMLHttpRquest对象。 var xmlhttp； function getXMLHttpRequest(){ if(window.ActiveXObject){ xmlhttp:new ActiveXObject("Microsoft.XMLHTTP
jquery wuai jquery
以下防止文档在完全加载之前运行Jquery代码，否则会出现试图隐藏一个不存在的元素、获得未完全加载的图像的大小等等 $(document).ready(function(){ jquery代码; }); <script type="text/javascript" src="c:/scripts/jquery-1.4.2.min.js&quo