Berlin.cpp คิดถึง

飞桨领航团深度学习速成营学习记录

第一课作业 - 简单图像增广
- 作业要求
- 1. 图片缩放
- 2. 图片翻转
- 3. 图片旋转
- 4. 图片亮度调节
- 5. 图片随机裁剪
第二课作业 - 手写数字识别
- 作业要求
- 1. 准备数据
- 2. 网络配置
- 3. 补全网络代码
第三课作业 - 蝴蝶图像分类
- 作业要求
- 1. 创建项目和挂载数据
- 2. 初探蝴蝶数据集
- 3. 准备数据
- 4. 建立模型
- 5. 应用高阶API训练模型
- 6. 应用已经训练好的模型进行预测
第四课作业 - 柠檬分类实战
- 作业要求
- 1. 调优

第一课作业 - 简单图像增广

常用图像增广方法主要有：左右翻转(上下翻转对于许多目标并不常用)，随机裁剪，变换颜色(亮度，对比度，饱和度和色调)等等，我们拟用opencv-python实现部分数据增强方法。

结构如下：

class FunctionClass:
    def __init__(self, parameter):
        self.parameter=parameter

    def __call__(self, img):

作业要求

1.补全代码
2.验证增强效果
3.可自选实现其他增强效果

import cv2
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

filename = '1.jpg'
## [Load an image from a file]
img = cv2.imread(filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)

print(img.shape)

(350, 350, 3)

1. 图片缩放

class Resize:
    def __init__(self, size):
        self.size=size

    def __call__(self, img):
        # 此处插入代码
        return cv2.resize(img,self.size)


resize=Resize( (600, 600))
img2=resize(img)
plt.imshow(img2)

2. 图片翻转

class Flip:
    def __init__(self, mode):
        self.mode=mode

    def __call__(self, img):
        # 此处插入代码
        return cv2.flip(img, self.mode)


flip=Flip(mode=0)
img2=flip(img)
plt.imshow(img2)

3. 图片旋转

class Rotate:
    def __init__(self, degree,size):
        self.degree=degree
        self.size=size

    def __call__(self, img):

        # 此处插入代码
        rows, cols,x = img.shape
        M = cv2.getRotationMatrix2D(((cols - 1) / 2.0, (rows - 1) / 2.0), self.degree, self.size)
        return cv2.warpAffine(img, M, (cols, rows))


rotate=Rotate( 45, 0.7)
img2=rotate(img)
plt.imshow(img2)

4. 图片亮度调节

class Brightness:
    def __init__(self,brightness_factor):
        self.brightness_factor=brightness_factor

    def __call__(self, img):

        # 此处插入代码
        rows, cols, x = img.shape
        a = 1
        blank = np.zeros([rows, cols, x], img.dtype)
        return cv2.addWeighted(img, a, blank, 1-a, self.brightness_factor * 100)
        

brightness=Brightness(0.6)
img2=brightness(img)
plt.imshow(img2)

5. 图片随机裁剪

import random
import math

class RandomErasing(object):
    def __init__(self, EPSILON=0.5, sl=0.02, sh=0.4, r1=0.3,
                 mean=[0., 0., 0.]):
        self.EPSILON = EPSILON
        self.mean = mean
        self.sl = sl
        self.sh = sh
        self.r1 = r1

    def __call__(self, img):
        if random.uniform(0, 1) > self.EPSILON:
            return img

        for attempt in range(100):
            area = img.shape[0] * img.shape[1]

            target_area = random.uniform(self.sl, self.sh) * area
            aspect_ratio = random.uniform(self.r1, 1 / self.r1)

            h = int(round(math.sqrt(target_area * aspect_ratio)))
            w = int(round(math.sqrt(target_area / aspect_ratio)))

            
            # 此处插入代码
        if w < img.shape[0] and h < img.shape[1]:
            x1 = random.randint(0, img.shape[1] - h)
            y1 = random.randint(0, img.shape[0] - w)
            if img.shape[2] == 3:
                img[ x1:x1 + h, y1:y1 + w, 0] = self.mean[0]
                img[ x1:x1 + h, y1:y1 + w, 1] = self.mean[1]
                img[ x1:x1 + h, y1:y1 + w, 2] = self.mean[2]
            else:
                img[x1:x1 + h, y1:y1 + w,0] = self.mean[0]
            return img

        return img


erase = RandomErasing()
img2=erase(img)
0]
            return img

        return img


erase = RandomErasing()
img2=erase(img)
plt.imshow(img2)

第二课作业 - 手写数字识别

作业要求

1.补全网络代码，并运行手写数字识别项目。以出现最后的图片和预测结果为准。（65分）
2.保留原本的multilayer_perceptron网络定义（自己补全完的），自己定义一个卷积网络并运行成功。以出现最后的图片和预测结果为准。（45分）

首先导入必要的包
numpy---------->python第三方库，用于进行科学计算
PIL------------> Python Image Library,python第三方图像处理库
matplotlib----->python的绘图库 pyplot:matplotlib的绘图框架
os------------->提供了丰富的方法来处理文件和目录

#导入需要的包
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import os
import paddle
print("本教程基于Paddle的版本号为："+paddle.__version__)

本教程基于Paddle的版本号为：2.0.0

1. 准备数据

(1)数据集介绍

MNIST数据集包含60000个训练集和10000测试数据集。分为图片和标签，图片是28*28的像素矩阵，标签为0~9共10个数字。

(2)transform函数是定义了一个归一化标准化的标准

(3)train_dataset和test_dataset

paddle.vision.datasets.MNIST()中的mode='train’和mode='test’分别用于获取mnist训练集和测试集

transform=transform参数则为归一化标准

#导入数据集Compose的作用是将用于数据集预处理的接口以列表的方式进行组合。
#导入数据集Normalize的作用是图像归一化处理，支持两种方式： 1. 用统一的均值和标准差值对图像的每个通道进行归一化处理； 2. 对每个通道指定不同的均值和标准差值进行归一化处理。
from paddle.vision.transforms import Compose, Normalize
transform = Compose([Normalize(mean=[127.5],std=[127.5],data_format='CHW')])
# 使用transform对数据集做归一化
print('下载并加载训练数据')
train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=transform)
test_dataset = paddle.vision.datasets.MNIST(mode='test', transform=transform)
print('加载完成')

下载并加载训练数据


Cache file /home/aistudio/.cache/paddle/dataset/mnist/train-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/train-images-idx3-ubyte.gz 
Begin to download

Download finished
Cache file /home/aistudio/.cache/paddle/dataset/mnist/train-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/train-labels-idx1-ubyte.gz 
Begin to download
........
Download finished
Cache file /home/aistudio/.cache/paddle/dataset/mnist/t10k-images-idx3-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/t10k-images-idx3-ubyte.gz 
Begin to download

Download finished
Cache file /home/aistudio/.cache/paddle/dataset/mnist/t10k-labels-idx1-ubyte.gz not found, downloading https://dataset.bj.bcebos.com/mnist/t10k-labels-idx1-ubyte.gz 
Begin to download
..
Download finished


加载完成

#让我们一起看看数据集中的图片是什么样子的
train_data0, train_label_0 = train_dataset[0][0],train_dataset[0][1]
train_data0 = train_data0.reshape([28,28])
plt.figure(figsize=(2,2))
print(plt.imshow(train_data0, cmap=plt.cm.binary))
print('train_data0 的标签为: ' + str(train_label_0))

AxesImage(18,18;111.6x108.72)
train_data0 的标签为: [5]


/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  if isinstance(obj, collections.Iterator):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2366: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return list(data) if isinstance(data, collections.MappingView) else data
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  'a.item() instead', DeprecationWarning, stacklevel=1)

#让我们再来看看数据样子是什么样的吧
print(train_data0)

[[-1.         -1.         -1.         -1.         -1.         -1.
  -1.         -1.         -1.         -1.         -1.         -1.
  -1.         -1.         -1.         -1.         -1.         -1.
  -1.         -1.         -1.         -1.         -1.         -1.
  -1.         -1.         -1.         -1.        ]

 ......（省略）
 
 [-1.         -1.         -1.         -1.         -1.         -1.
  -1.         -1.         -1.         -1.         -1.         -1.
  -1.         -1.         -1.         -1.         -1.         -1.
  -1.         -1.         -1.         -1.         -1.         -1.
  -1.         -1.         -1.         -1.        ]]

2. 网络配置

以下的代码判断就是定义一个简单的多层感知器，一共有三层，两个大小为100的隐层和一个大小为10的输出层，因为MNIST数据集是手写0到9的灰度图像，类别有10个，所以最后的输出大小是10。最后输出层的激活函数是Softmax，所以最后的输出层相当于一个分类器。加上一个输入层的话，多层感知器的结构是：输入层–>>隐层–>>隐层–>>输出层。

3. 补全网络代码

# 定义多层感知器 
#动态图定义多层感知器
class multilayer_perceptron(paddle.nn.Layer):
    def __init__(self):
        super(multilayer_perceptron,self).__init__()
        #请在这里补全网络代码
        self.flatten = paddle.nn.Flatten()
        self.linear_1 = paddle.nn.Linear(784, 512)
        self.linear_2 = paddle.nn.Linear(512, 10)
        self.relu = paddle.nn.ReLU()
        self.dropout = paddle.nn.Dropout(0.2)



    def forward(self, x):
        #请在这里补全传播过程的代码
        y = self.flatten(x)
        y = self.linear_1(y)
        y = self.relu(y)
        y = self.dropout(y)
        y = self.linear_2(y)
        
        

        return y

#请在这里定义卷积网络的代码
#注意：定义完成卷积的代码后，后面的代码是需要修改的！

from paddle.metric import Accuracy

LeNet = multilayer_perceptron()

# 用Model封装模型
model = paddle.Model(LeNet)

# 定义损失函数
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())

# 配置模型
model.prepare(optim,paddle.nn.CrossEntropyLoss(),Accuracy())

# 训练保存并验证模型
model.fit(train_dataset,test_dataset,epochs=2,batch_size=64,save_dir='multilayer_perceptron',verbose=1)

The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/2
step  30/938 [..............................] - loss: 0.4647 - acc: 0.6240 - ETA: 10s - 12ms/st

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return (isinstance(seq, collections.Sequence) and


step  60/938 [>.............................] - loss: 0.3397 - acc: 0.7237 - ETA: 8s - 9ms/step step 680/938 [====================>.........] - loss: 0.1142 - acc: 0.8893 - ETA: 1s - 7ms/s
step 938/938 [==============================] - loss: 0.3313 - acc: 0.9031 - 7ms/step         
save checkpoint at /home/aistudio/multilayer_perceptron/0
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 157/157 [==============================] - loss: 0.0225 - acc: 0.9564 - 6ms/step         
Eval samples: 10000
Epoch 2/2
step 938/938 [==============================] - loss: 0.1209 - acc: 0.9503 - 7ms/step         
save checkpoint at /home/aistudio/multilayer_perceptron/1
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 157/157 [==============================] - loss: 0.0044 - acc: 0.9683 - 6ms/step         
Eval samples: 10000
save checkpoint at /home/aistudio/multilayer_perceptron/final

# 训练保存并验证模型
model.fit(train_dataset,test_dataset,epochs=2,batch_size=64,save_dir='multilayer_perceptron',verbose=1)

The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/2
step 938/938 [==============================] - loss: 0.2485 - acc: 0.9600 - 7ms/step         
save checkpoint at /home/aistudio/multilayer_perceptron/0
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 157/157 [==============================] - loss: 0.0070 - acc: 0.9663 - 6ms/step        
Eval samples: 10000
Epoch 2/2
step 938/938 [==============================] - loss: 0.0874 - acc: 0.9647 - 7ms/step         
save checkpoint at /home/aistudio/multilayer_perceptron/1
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 157/157 [==============================] - loss: 7.2327e-04 - acc: 0.9687 - 6ms/step     
Eval samples: 10000
save checkpoint at /home/aistudio/multilayer_perceptron/final

#获取测试集的第一个图片
test_data0, test_label_0 = test_dataset[0][0],test_dataset[0][1]
test_data0 = test_data0.reshape([28,28])
plt.figure(figsize=(2,2))
#展示测试集中的第一个图片
print(plt.imshow(test_data0, cmap=plt.cm.binary))
print('test_data0 的标签为: ' + str(test_label_0))
#模型预测
result = model.predict(test_dataset, batch_size=1)
#打印模型预测的结果
el.predict(test_dataset, batch_size=1)
#打印模型预测的结果
print('test_data0 预测的数值为：%d' % np.argsort(result[0][0])[0][-1])

AxesImage(18,18;111.6x108.72)
test_data0 的标签为: [7]
Predict begin...
step 10000/10000 [==============================] - 1ms/step        
Predict samples: 10000
test_data0 预测的数值为：7

第三课作业 - 蝴蝶图像分类

作业要求

人工智能技术的应用领域日趋广泛，新的智能应用层出不穷。本项目将利用人工智能技术来对蝴蝶图像进行分类，需要能对蝴蝶的类别、属性进行细粒度的识别分类。相关研究工作者能够根据采集到的蝴蝶图片，快速识别图中蝴蝶的种类。期望能够有助于提升蝴蝶识别工作的效率和精度。

1. 创建项目和挂载数据

数据集都来源于网络公开数据（和鲸社区）。图片中所涉及的蝴蝶总共有9个属，20个物种，文件genus.txt中描述了9个属名，species.txt描述了20个物种名。

在创建项目时，可以为该项目挂载Butterfly20蝴蝶数据集，即便项目重启，该挂载的数据集也不会被自动清除。具体方法如下：首先采用notebook方式构建项目，项目创建框中的最下方有个数据集选项，选择“+添加数据集”。然后，弹出搜索框，在关键词栏目输入“bufferfly20”，便能够查询到该数据集。最后，选中该数据集，可以自动在项目中挂载该数据集了。

需要注意的是，每次重新打开该项目，data文件夹下除了挂载的数据集，其他文件都将被删除。

被挂载的数据集会自动出现在data目录之下，通常是压缩包的形式。在data/data63004目录，其中有两个压缩文件，分别是Butterfly20.zip和Butterfly20_test.zip。也可以利用下载功能把数据集下载到本地进行训练。

2. 初探蝴蝶数据集

我们看看蝴蝶图像数据长什么样子？

首先，解压缩数据。类以下几个步骤：

第一步，把当前路径转换到data目录，可以使用命令!cd data。在AI studio nootbook中可以使用Linux命令，需要在命令的最前面加上英文的感叹号(!)。用&&可以连接两个命令。用\号可以换行写代码。需要注意的是，每次重新打开该项目，data文件夹下除了挂载的数据集，其他文件都会被清空。因此，如果把数据保存在data目录中，每次重新启动项目时，都需要解压缩一下。如果想省事持久化保存，可以把数据保存在work目录下。

实际上，!加某命令的模式，等价于python中的get_ipython().system(‘某命令’)模式。

第二步，利用unzip命令，把压缩包解压到当前路径。unzip的-q参数代表执行时不显示任何信息。unzip的-o参数代表不必先询问用户，unzip执行后覆盖原有的文件。两个参数合起来，可以写为-qo。

第三步，用rm命令可以把一些文件夹给删掉，比如，__MACOSX文件夹

!cd data &&\
unzip -qo data63004/Butterfly20_test.zip &&\
unzip -qo data63004/Butterfly20.zip &&\
rm -r __MACOSX

接着，我们分析一下数据集，发现Butterfly20文件夹中有很多子文件夹，每个子文件夹下又有很多图片，每个子文件夹的名字都是蝴蝶属种的名字。由此，可以推测每个文件夹下是样本，而样本的标签就是子文件夹的名字。

我们绘制data/Butterfly20/001.Atrophaneura_horishanus文件夹下的图片006.jpg。根据百度百科，Atrophaneura horishanus是凤蝶科、曙凤蝶属的一个物种。

我们再绘制data/Butterfly20/002.Atrophaneura_varuna文件夹下的图片006.jpg。根据百度百科，Atrophaneura varuna对应的中文名称是“瓦曙凤蝶”，它是凤蝶科、曙凤蝶属的另一个物种。

虽然乍一看蝴蝶都是相似的，但不同属种的蝴蝶在形状、颜色等细节方面还是存在很大的差别。

import matplotlib.pyplot as plt
import PIL.Image as Image

path='/home/aistudio/data/Butterfly20/001.Atrophaneura_horishanus/006.jpg'
img = Image.open(path)
plt.imshow(img)          #根据数组绘制图像
plt.show()               #显示图像

path='/home/aistudio/data/Butterfly20/002.Atrophaneura_varuna/006.jpg'
img = Image.open(path)
plt.imshow(img)          #根据数组绘制图像
plt.show()               #显示图像

更具挑战的是，即便属于同一属种，不同的蝴蝶图片在角度、明暗、背景、姿态、颜色等方面均存在不小差别。甚至有的图片里面有多只蝴蝶。以下两张蝴蝶图片均出自同一个属种Atrophaneura horishanus。

path1='/home/aistudio/data/Butterfly20/001.Atrophaneura_horishanus/006.jpg'
path2='/home/aistudio/data/Butterfly20/001.Atrophaneura_horishanus/150.jpg'

img1 = Image.open(path1)
plt.imshow(img1)          #根据数组绘制图像
plt.show()

img2 = Image.open(path2)
plt.imshow(img2)          #根据数组绘制图像
plt.show()               #显示图像

3. 准备数据

数据准备过程包括以下两个重点步骤：

一是建立样本数据读取路径与样本标签之间的关系。

二是构造读取器与数据预处理。可以写个自定义数据读取器，它继承于PaddlePaddle2.0的dataset类，在__getitem__方法中把自定义的预处理方法加载进去。

#以下代码用于建立样本数据读取路径与样本标签之间的关系
import os
import random

data_list = [] #用个列表保存每个样本的读取路径、标签

#由于属种名称本身是字符串，而输入模型的是数字。需要构造一个字典，把某个数字代表该属种名称。键是属种名称，值是整数。
label_list=[]
with open("/home/aistudio/data/species.txt") as f:
    for line in f:
        a,b = line.strip("\n").split(" ")
        label_list.append([b, int(a)-1])
label_dic = dict(label_list)

#获取Butterfly20目录下的所有子目录名称，保存进一个列表之中
class_list = os.listdir("/home/aistudio/data/Butterfly20")
class_list.remove('.DS_Store') #删掉列表中名为.DS_Store的元素，因为.DS_Store并没有样本。

for each in class_list:
    for f in os.listdir("/home/aistudio/data/Butterfly20/"+each):
        data_list.append(["/home/aistudio/data/Butterfly20/"+each+'/'+f,label_dic[each]])

#按文件顺序读取，可能造成很多属种图片存在序列相关，用random.shuffle方法把样本顺序彻底打乱。
random.shuffle(data_list)

#打印前十个，可以看出data_list列表中的每个元素是[样本读取路径, 样本标签]。
print(data_list[0:10])

#打印样本数量，一共有1866个样本。
print("样本数量是：{}".format(len(data_list)))

[['/home/aistudio/data/Butterfly20/003.Byasa_alcinous/031.jpg', 2], ['/home/aistudio/data/Butterfly20/015.Pachliopta_aristolochiae/042.jpg', 14], ['/home/aistudio/data/Butterfly20/005.Byasa_polyeuctes/056.jpg', 4], ['/home/aistudio/data/Butterfly20/003.Byasa_alcinous/083.jpg', 2], ['/home/aistudio/data/Butterfly20/001.Atrophaneura_horishanus/021.jpg', 0], ['/home/aistudio/data/Butterfly20/018.Papilio_bianor/048.jpg', 17], ['/home/aistudio/data/Butterfly20/009.Iphiclides_podalirius/024.jpg', 8], ['/home/aistudio/data/Butterfly20/003.Byasa_alcinous/038.jpg', 2], ['/home/aistudio/data/Butterfly20/011.Lamproptera_meges/042.jpg', 10], ['/home/aistudio/data/Butterfly20/006.Graphium_agamemnon/050.jpg', 5]]
样本数量是：1866

#以下代码用于构造读取器与数据预处理
#首先需要导入相关的模块
import paddle
from paddle.vision.transforms import Compose, ColorJitter, Resize,Transpose, Normalize, RandomHorizontalFlip, RandomRotation
import cv2
import numpy as np
from PIL import Image
from paddle.io import Dataset

#自定义的数据预处理函数，输入原始图像，输出处理后的图像，可以借用paddle.vision.transforms的数据处理功能
def preprocess(img, is_val):
    if is_val:
        transform = Compose([
            Resize(size=(224, 224)), #把数据长宽像素调成224*224
            Normalize(mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], data_format='HWC'), #标准化
            Transpose(), #原始数据形状维度是HWC格式，经过Transpose，转换为CHW格式
            ])
    else:
        transform = Compose([
            Resize(size=(224, 224)), #把数据长宽像素调成224*224
            #ColorJitter(0.4, 0.4, 0.4, 0.4),
            RandomHorizontalFlip(),
            RandomRotation(90),
            Normalize(mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], data_format='HWC'), #标准化
            Transpose(), #原始数据形状维度是HWC格式，经过Transpose，转换为CHW格式
            ])
    img = transform(img).astype("float32")
    return img

#自定义数据读取器
class Reader(Dataset):
    def __init__(self, data, is_val=False):
        super(Reader, self).__init__()
        self.is_val = is_val
        #在初始化阶段，把数据集划分训练集和测试集。由于在读取前样本已经被打乱顺序，取20%的样本作为测试集，80%的样本作为训练集。
        self.samples = data[-int(len(data)*0.2):] if self.is_val else data[:-int(len(data)*0.2)]

    def __getitem__(self, idx):
        #处理图像
        img_path = self.samples[idx][0] #得到某样本的路径
        img = Image.open(img_path)
        if img.mode != 'RGB':
            img = img.convert('RGB')
        img = preprocess(img, self.is_val) #数据预处理--这里仅包括简单数据预处理，没有用到数据增强

        #处理标签
        label = self.samples[idx][1] #得到某样本的标签
        label = np.array([label], dtype="int64") #把标签数据类型转成int64
        return img, label

    def __len__(self):
        #返回每个Epoch中图片数量
        return len(self.samples)

#生成训练数据集实例
train_dataset = Reader(data_list, is_val=False)

#生成测试数据集实例
eval_dataset = Reader(data_list, is_val=True)

#打印一个训练样本
#print(train_dataset[1136][0])
print(train_dataset[1136][0].shape)
print(train_dataset[1136][1])

(3, 224, 224)
[4]

4. 建立模型

为了提升探索速度，建议首先选用比较成熟的基础模型，看看基础模型所能够达到的准确度。之后再试试模型融合，准确度是否有提升。最后可以试试自己独创模型。

为简便，这里直接采用101层的残差网络ResNet，并且采用预训练模式。为什么要采用预训练模型呢？因为通常模型参数采用随机初始化，而预训练模型参数初始值是一个比较确定的值。这个参数初始值是经历了大量任务训练而得来的，比如用CIFAR图像识别任务来训练模型，得到的参数。虽然蝴蝶识别任务和CIFAR图像识别任务是不同的，但可能存在某些机器视觉上的共性。用预训练模型可能能够较快地得到比较好的准确度。

在PaddlePaddle2.0中，使用预训练模型只需要设定模型参数pretained=True。值得注意的是，预训练模型得出的结果类别是1000维度，要用个线性变换，把类别转化为20维度。

#定义模型
class MyNet(paddle.nn.Layer):
    def __init__(self):
        super(MyNet,self).__init__()
        self.layer=paddle.vision.models.resnet50(pretrained=True)
        self.fc1 = paddle.nn.Linear(1000, 512)
        #修改模型
        self.fc2 = paddle.nn.Linear(512, 20)
        self.flatten = paddle.nn.Flatten()

    #网络的前向计算过程
    def forward(self,x):
        x=self.layer(x)
        x=self.flatten(x)
        x=self.fc1(x)
        #修改模型
        x=self.flatten(x)
        x=self.fc2(x)
        return x

5. 应用高阶API训练模型

一是定义输入数据形状大小和数据类型。

二是实例化模型。如果要用高阶API，需要用Paddle.Model()对模型进行封装，如model = paddle.Model(model,inputs=input_define,labels=label_define)。

三是定义优化器。这个使用Adam优化器，学习率设置为0.0001，优化器中的学习率(learning_rate)参数很重要。要是训练过程中得到的准确率呈震荡状态，忽大忽小，可以试试进一步把学习率调低。

四是准备模型。这里用到高阶API，model.prepare()。

五是训练模型。这里用到高阶API，model.fit()。参数意义详见下述代码注释。

total_images = len(train_dataset)
batch_size = 64
EPOCHS = 100

#定义输入
input_define = paddle.static.InputSpec(shape=[-1,3,224,224], dtype="float32", name="img")
label_define = paddle.static.InputSpec(shape=[-1,1], dtype="int64", name="label")

#实例化网络对象并定义优化器等训练逻辑
model = MyNet()
model = paddle.Model(model,inputs=input_define,labels=label_define) #用Paddle.Model()对模型进行封装
optimizer = paddle.optimizer.Adam(learning_rate=0.00005, parameters=model.parameters())
#上述优化器中的学习率(learning_rate)参数很重要。要是训练过程中得到的准确率呈震荡状态，忽大忽小，可以试试进一步把学习率调低。

model.prepare(optimizer=optimizer, #指定优化器
              loss=paddle.nn.CrossEntropyLoss(), #指定损失函数
              metrics=paddle.metric.Accuracy()) #指定评估方法

#用于visualdl可视化
visualdl = paddle.callbacks.VisualDL(log_dir='visualdl_log')
#早停机制，在eval_acc不增大10个epoch时停止训练并保存最佳模型
early_stop = paddle.callbacks.EarlyStopping(
                                            'acc',
                                            mode='max',
                                            patience=10,
                                            verbose=1,
                                            min_delta=0,
                                            baseline=None,
                                            save_best_model=True)

model.fit(train_data=train_dataset,     #训练数据集
          eval_data=eval_dataset,         #测试数据集
          batch_size=batch_size,                  #一个批次的样本数量
          epochs=EPOCHS,                      #迭代轮次
          save_dir="/home/aistudio/res101", #把模型参数、优化器参数保存至自定义的文件夹
          save_freq=10,                    #设定每隔多少个epoch保存模型参数及优化器参数
          shuffle=True,
          verbose=1,
          callbacks=[visualdl, early_stop]
)

2021-03-10 14:33:34,745 - INFO - unique_endpoints {''}
2021-03-10 14:33:34,747 - INFO - File /home/aistudio/.cache/paddle/hapi/weights/resnet50.pdparams md5 checking...
2021-03-10 14:33:35,106 - INFO - Found /home/aistudio/.cache/paddle/hapi/weights/resnet50.pdparams


The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/100


/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return (isinstance(seq, collections.Sequence) and
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:648: UserWarning: When training, we now always track global mean and variance.
  "When training, we now always track global mean and variance.") 
  
  step 24/24 [==============================] - loss: 1.6762 - acc: 0.3362 - 497ms/step        
save checkpoint at /home/aistudio/res101/0
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 6/6 [==============================] - loss: 2.0643 - acc: 0.5201 - 497ms/step
Eval samples: 373
Epoch 2/100
step 24/24 [==============================] - loss: 0.8775 - acc: 0.7220 - 488ms/step        
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 6/6 [==============================] - loss: 1.5468 - acc: 0.6595 - 494ms/step
Eval samples: 373

......（省略）

Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 6/6 [==============================] - loss: 1.2660 - acc: 0.8525 - 498ms/step
Eval samples: 373
Epoch 58: Early stopping.
Best checkpoint has been saved at /home/aistudio/res101/best_model
save checkpoint at /home/aistudio/res101/final

#加载best模型
model.load('./res101/best_model.pdparams')

result = model.evaluate(eval_dataset, verbose=1)

print(result)

Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 373/373 [==============================] - 30ms/step         
Eval samples: 373
{}

6. 应用已经训练好的模型进行预测

如果是要参加建模比赛，通常赛事组织方会提供待预测的数据集，我们需要利用自己构建的模型，来对待预测数据集合中的数据标签进行预测。也就是说，我们其实并不知道到其真实标签是什么，只有比赛的组织方知道真实标签，我们的模型预测结果越接近真实结果，那么分数也就越高。

预测流程分为以下几个步骤：

一是构建数据读取器。因为预测数据集没有标签，该读取器写法和训练数据读取器不一样，建议重新写一个类，继承于Dataset基类。

二是实例化模型。如果要用高阶API，需要用Paddle.Model()对模型进行封装，如paddle.Model(MyNet(),inputs=input_define)，由于是预测模型，所以仅设定输入数据格式就好了。

三是读取刚刚训练好的参数。这个保存在/home/aistudio/work目录之下，如果指定的是final则是最后一轮训练后的结果。可以指定其他轮次的结果，比如model.load(’/home/aistudio/work/30’)，这里用到了高阶API，model.load()

四是准备模型。这里用到高阶API，model.prepare()。

五是读取待预测集合中的数据，利用已经训练好的模型进行预测。

六是结果保存。

class InferDataset(Dataset):
    def __init__(self, img_path=None):
        """
        数据读取Reader(推理)
        :param img_path: 推理单张图片
        """
        super().__init__()
        if img_path:
            self.img_paths = [img_path]
        else:
            raise Exception("请指定需要预测对应图片路径")

    def __getitem__(self, index):
        # 获取图像路径
        img_path = self.img_paths[index]
        # 使用Pillow来读取图像数据并转成Numpy格式
        img = Image.open(img_path)
        if img.mode != 'RGB': 
            img = img.convert('RGB') 
        img = preprocess(img, True) #数据预处理--这里仅包括简单数据预处理，没有用到数据增强
        return img

    def __len__(self):
        return len(self.img_paths)

#实例化推理模型
model = paddle.Model(MyNet(),inputs=input_define)

#读取刚刚训练好的参数
model.load('./res101/best_model.pdparams')

#准备模型
model.prepare()

#得到待预测数据集中每个图像的读取路径
infer_list=[]
with open("/home/aistudio/data/testpath.txt") as file_pred:
    for line in file_pred:
        infer_list.append("/home/aistudio/data/"+line.strip())

#模型预测结果通常是个数，需要获得其对应的文字标签。这里需要建立一个字典。
def get_label_dict2():
    label_list2=[]
    with open("/home/aistudio/data/species.txt") as filess:
        for line in filess:
            a,b = line.strip("\n").split(" ")
            label_list2.append([int(a)-1, b])
    label_dic2 = dict(label_list2)
    return label_dic2

label_dict2 = get_label_dict2()
#print(label_dict2)

#利用训练好的模型进行预测
results=[]
for infer_path in infer_list:
    infer_data = InferDataset(infer_path)
    result = model.predict(test_data=infer_data)[0] #关键代码，实现预测功能
    result = paddle.to_tensor(result)
    result = np.argmax(result.numpy()) #获得最大值所在的序号
    results.append("{}".format(label_dict2[result])) #查找该序号所对应的标签名字

#把结果保存起来
with open("work/result.txt", "w") as f:
    for r in results:
    result = paddle.to_tensor(result)
    result = np.argmax(result.numpy()) #获得最大值所在的序号
    results.append("{}".format(label_dict2[result])) #查找该序号所对应的标签名字

#把结果保存起来
with open("work/result.txt", "w") as f:
    for r in results:
        f.write("{}\n".format(r))

2021-03-10 14:54:17,492 - INFO - unique_endpoints {''}
[INFO 2021-03-10 14:54:17,492 download.py:154] unique_endpoints {''}
2021-03-10 14:54:17,493 - INFO - File /home/aistudio/.cache/paddle/hapi/weights/resnet50.pdparams md5 checking...
[INFO 2021-03-10 14:54:17,493 download.py:251] File /home/aistudio/.cache/paddle/hapi/weights/resnet50.pdparams md5 checking...
2021-03-10 14:54:17,852 - INFO - Found /home/aistudio/.cache/paddle/hapi/weights/resnet50.pdparams
[INFO 2021-03-10 14:54:17,852 download.py:184] Found /home/aistudio/.cache/paddle/hapi/weights/resnet50.pdparams


Predict begin...
step 1/1 [==============================] - 43ms/step
Predict samples: 1
Predict begin...
step 1/1 [==============================] - 37ms/step
Predict samples: 1

......（省略） 

Predict begin...
step 1/1 [==============================] - 38ms/step
Predict samples: 1

第四课作业 - 柠檬分类实战

作业要求

如何根据据图像的视觉内容为图像赋予一个语义类别是图像分类的目标，也是图像检索、图像内容分析和目标识别等问题的基础。
本实践旨在通过一个美食分类的案列，让大家理解和掌握如何使用飞桨2.0搭建一个卷积神经网络。
特别提示：本实践所用数据集均来自互联网，请勿用于商务用途。

解压文件，使用train.csv训练，测试使用val.csv。最后以在val上的准确率作为最终分数。

1. 调优

思考并动手进行调优，以在验证集上的准确率为评价指标，验证集上准确率越高，得分越高！模型大家可以更换，调参技巧任选，代码需要大家自己调通。

# 导入所需要的库
from sklearn.utils import shuffle
import os
import pandas as pd
import numpy as np
from PIL import Image

import paddle
import paddle.nn as nn
from paddle.io import Dataset
import paddle.vision.transforms as T
import paddle.nn.functional as F
from paddle.metric import Accuracy

import warnings
warnings.filterwarnings("ignore")

# 读取数据
train_images = pd.read_csv('lemon_lesson/train_images.csv', usecols=['id','class_num'])

# labelshuffling

def labelShuffling(dataFrame, groupByName = 'class_num'):

    groupDataFrame = dataFrame.groupby(by=[groupByName])
    labels = groupDataFrame.size()
    print("length of label is ", len(labels))
    maxNum = max(labels)
    lst = pd.DataFrame()
    for i in range(len(labels)):
        print("Processing label  :", i)
        tmpGroupBy = groupDataFrame.get_group(i)
        createdShuffleLabels = np.random.permutation(np.array(range(maxNum))) % labels[i]
        print("Num of the label is : ", labels[i])
        lst=lst.append(tmpGroupBy.iloc[createdShuffleLabels], ignore_index=True)
        print("Done")
    # lst.to_csv('test1.csv', index=False)
    return lst

# 划分训练集和校验集
all_size = len(train_images)
# print(all_size)
train_size = int(all_size * 0.8)
train_image_list = train_images[:train_size]
val_image_list = train_images[train_size:]

df = labelShuffling(train_image_list)
df = shuffle(df)

train_image_path_list = df['id'].values
label_list = df['class_num'].values
label_list = paddle.to_tensor(label_list, dtype='int64')
train_label_list = paddle.nn.functional.one_hot(label_list, num_classes=4)

val_image_path_list = val_image_list['id'].values
val_label_list = val_image_list['class_num'].values
val_label_list = paddle.to_tensor(val_label_list, dtype='int64')
val_label_list = paddle.nn.functional.one_hot(val_label_list, num_classes=4)

# 定义数据预处理
data_transforms = T.Compose([
    T.Resize(size=(224, 224)),
    T.RandomHorizontalFlip(224),
    T.RandomVerticalFlip(224),
    T.Transpose(),    # HWC -> CHW
    T.Normalize(
        mean=[0, 0, 0],        # 归一化
        std=[255, 255, 255],
        to_rgb=True)    
])

length of label is  4
Processing label  : 0
Num of the label is :  321
Done
Processing label  : 1
Num of the label is :  207
Done
Processing label  : 2
Num of the label is :  181
Done
Processing label  : 3
Num of the label is :  172
Done

# 构建Dataset
class MyDataset(paddle.io.Dataset):
    """
    步骤一：继承paddle.io.Dataset类
    """
    def __init__(self, train_img_list, val_img_list,train_label_list,val_label_list, mode='train'):
        """
        步骤二：实现构造函数，定义数据读取方式，划分训练和测试数据集
        """
        super(MyDataset, self).__init__()
        self.img = []
        self.label = []
        # 借助pandas读csv的库
        self.train_images = train_img_list
        self.test_images = val_img_list
        self.train_label = train_label_list
        self.test_label = val_label_list
        if mode == 'train':
            # 读train_images的数据
            for img,la in zip(self.train_images, self.train_label):
                self.img.append('train_images/'+img)
                self.label.append(la)
        else:
            # 读test_images的数据
            for img,la in zip(self.train_images, self.train_label):
                self.img.append('train_images/'+img)
                self.label.append(la)

    def load_img(self, image_path):
        # 实际使用时使用Pillow相关库进行图片读取即可，这里我们对数据先做个模拟
        image = Image.open(image_path).convert('RGB')
        return image

    def __getitem__(self, index):
        """
        步骤三：实现__getitem__方法，定义指定index时如何获取数据，并返回单条数据（训练数据，对应的标签）
        """
        image = self.load_img(self.img[index])
        label = self.label[index]
        # label = paddle.to_tensor(label)
        
        return data_transforms(image), paddle.nn.functional.label_smooth(label)

    def __len__(self):
        """
        步骤四：实现__len__方法，返回数据集总数目
        """
        return len(self.img)

#train_loader
train_dataset = MyDataset(train_img_list=train_image_path_list, val_img_list=val_image_path_list, train_label_list=train_label_list, val_label_list=val_label_list, mode='train')
train_loader = paddle.io.DataLoader(train_dataset, places=paddle.CPUPlace(), batch_size=32, shuffle=True, num_workers=0)

#val_loader

val_dataset = MyDataset(train_img_list=train_image_path_list, val_img_list=val_image_path_list, train_label_list=train_label_list, val_label_list=val_label_list, mode='test')
val_loader = paddle.io.DataLoader(train_dataset, places=paddle.CPUPlace(), batch_size=32, shuffle=True, num_workers=0)

import paddle
from paddle.vision import models

# 模型封装
model_res = models.mobilenet_v2(num_classes=4,pretrained=True)
model = paddle.Model(model_res)

# 定义优化器

#scheduler = paddle.optimizer.lr.LinearWarmup(
        #learning_rate=0.5, warmup_steps=20, start_lr=0, end_lr=0.5, verbose=True)
#optim = paddle.optimizer.SGD(learning_rate=scheduler, parameters=model.parameters())
optim = paddle.optimizer.Adam(learning_rate=0.0001, parameters=model.parameters(),weight_decay=0.0001)

# 配置模型
model.prepare(
    optim,
    paddle.nn.CrossEntropyLoss(soft_label=True),
    Accuracy()
    )

# 模型训练与评估
model.fit(train_loader,
        val_loader,
        log_freq=1,
        epochs=10,
        # callbacks=Callbk(write=write, iters=iters),
        verbose=1,
t(train_loader,
        val_loader,
        log_freq=1,
        epochs=10,
        # callbacks=Callbk(write=write, iters=iters),
        verbose=1,
        )

2021-03-10 13:10:11,146 - INFO - unique_endpoints {''}
2021-03-10 13:10:11,147 - INFO - File /home/aistudio/.cache/paddle/hapi/weights/mobilenet_v2_x1.0.pdparams md5 checking...
2021-03-10 13:10:11,195 - INFO - Found /home/aistudio/.cache/paddle/hapi/weights/mobilenet_v2_x1.0.pdparams


The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/10
step 41/41 [==============================] - loss: 0.5766 - acc: 0.9221 - 305ms/step        
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 41/41 [==============================] - loss: 0.3831 - acc: 1.0000 - 289ms/step        
Eval samples: 1284
Epoch 2/10
step 41/41 [==============================] - loss: 0.7965 - acc: 0.9977 - 299ms/step        

......（省略）  
  
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 41/41 [==============================] - loss: 0.3626 - acc: 1.0000 - 292ms/step        
Eval samples: 1284

你可能感兴趣的:(python,机器学习,深度学习,图像识别)

基于 PyTorch 的 MNIST 手写数字分类模型欣然～ pytorch 分类人工智能
一、概述本代码使用PyTorch框架构建了一个简单的神经网络模型，用于解决MNIST手写数字分类任务。代码主要包括数据的加载与预处理、神经网络模型的构建、损失函数和优化器的定义、模型的训练、评估以及最终模型的保存等步骤。二、依赖库torch：PyTorch深度学习框架的核心库，提供了张量操作、自动求导等功能。torch.nn：PyTorch的神经网络模块，包含了各种神经网络层、损失函数等。torc
python实现成语接龙 Camellia 泡泡笔记 python
first_idiom='万事如意'end_str=first_idiom[-1]new_li=[first_idiom]li=['发愤图强','笑容满面','意气风发','强颜欢笑']forindexinrange(len(li)):foriinli:ifend_str==i[0]:new_li.append(i)li.remove(i)end_str=i[-1]breakprint(new_l
涛哥聊Python | borb，一个好用的 Python 库，处理 PDF 文件好帮手！双木的木 python拓展学习 python库 python 开发语言机器学习 pdf 人工智能深度学习
本文来源公众号“涛哥聊Python”，仅用于学术分享，侵权删，干货满满。原文链接：borb，一个好用的Python库！大家好，今天为大家分享一个好用的Python库-borb。Github地址：https://github.com/jorisschellekens/borbPythonBorb是一个用于处理PDF文件的Python库，它提供了丰富的功能和工具，使得PDF文件的创建、修改和解析变得更
python—计算学生成绩等级 2111339 彭传月 python
一、打开软件新建窗口输入代码#计算学生成绩等级is_continue='y'whileis_continue=='Y'oris_continue=='y':score=eval(input('请输入学生的成绩：'))ifscore>=90:print('A')elifscore>=80:print('B')elifscore>=70:print('C')elifscore>=60:print('D
CPU占用率飙升至100%：是攻击还是正常现象？群联云防护小杜安全问题汇总 ddos 安全 waf 服务器 cpu 占用被攻击
在运维和开发的日常工作中，CPU占用率突然飙升至100%往往是一个令人紧张的信号。这可能意味着服务器正在遭受攻击，但也可能是由于某些正常的、但资源密集型的任务或进程造成的。本文将探讨如何识别和应对服务器的异常CPU占用情况，并通过Python脚本示例，提供一种监控和诊断CPU占用率的方法。一、CPU占用率100%：攻击or正常？1.1攻击迹象持续性高占用：如果CPU占用率长时间保持在100%，且没
Python 成绩等级判定 Camellia 泡泡 python 笔记
score=int(input("请输入学生成绩:"))if90<=score<=100:grade="A"elif75<=score<=90:grade="B"elif60<=score<=75:grade="C"elifscore<60:grade="D"print("本次考试，等级为:",grade)运行结果：
【Python】PDFMiner.six：高效处理PDF文档的Python工具技术无疆 Python python pdf 开发语言 python3.11 人工智能数据挖掘机器学习
PDF是一种广泛使用的文件格式，特别适用于呈现固定布局的文档。然而，提取PDF文件中的文本和信息并不总是那么简单。幸好有许多Python库可以帮助我们，其中，PDFMiner.six是一个功能强大、专门用于PDF文档解析的库。⭕️宇宙起点什么是PDFMiner.six？主要功能安装PDFMiner.six♨️核心功能和代码示例1.提取PDF文档的纯文本2.从多个页面提取文本3.提取PDF中的表格内
25道Python练手题（附详细答案），赶紧收藏！_python题库字节全栈_rJF python 开发语言
importrandomasrdnumber=rd.randint(0,100)foriinrange(10):choice=int(input("请输入你要猜测的数字："))ifchoice>number:print("你猜大了")elifchoice0and5*x+3*y+z/3==100:count+=1print("="*60)print(f'第{count}种买法，公鸡买了{x}只，母鸡
python爱心代码高级 youyouxiong python 开发语言
在Python中，我们可以使用各种方法来绘制一个“爱心”形状。以下是一个使用turtle模块绘制爱心的高级示例。这个示例将使用更复杂的数学公式和图形操作来绘制一个更精致的爱心形状。importturtleimportmath#设置初始状态window=turtle.Screen()window.bgcolor("black")#设置背景色为黑色love=turtle.Turtle()love.sp
python画一个爱心戴子雯 python绘画 python
大家好这是我的地一篇博客，我要写一个关于python的文章我要用python写一个爱心。不说别的，先看效果效果如下：话不多说，上代码，在这之前要下载python下载这事咱们放在最后现在上代码！！！！！！！！！！！！！！importturtleastt.pensize(2)#笔大小2像素t.pencolor("red")#颜色为红色t.left
brew 安装pip_pip brew wget 安装 weixin_32612253 brew 安装pip
终端播放器安装教程从简书上看到一篇,终端实现网易云音乐的文章,并给出了一个github链接.心里有些痒痒,想看看是什么样子,于是尝试安装.安装过程中有些坎坷,记录以便以后查阅.程序实现是用Python写的.安装使用方式仅仅给了三行命令.安装$pipinstallnetease-musicbox$brewinstallmpg123使用$musicbox下载了源码后,不知道该如何安装.三行命令也是莫名
使用 Baseten 部署和运行机器学习模型的指南 shuoac 机器学习人工智能 python
随着机器学习模型在各个行业中的广泛应用，如何高效地部署和运行这些模型成为一个关键问题。本文将介绍如何使用Baseten平台来部署和服务机器学习模型。Baseten是LangChain生态系统中的一个重要提供者，它提供了所需的基础设施来高效地运行模型。无论是开源模型如Llama2和Mistral，还是专有或经过微调的模型，Baseten都能在专用GPU上运行。技术背景介绍Baseten提供了一种不同
python实现绘制爱心函数（绘制过程） halo0416 python 开发语言
首先，确保已经安装了matplotlib库和numpy库。如果没有安装，可以通过pip来安装：pipinstallmatplotlibpipinstallnumpy了解心形函数公式：x(t)=y(t)=13cos⁡(t)−5cos⁡(2t)−2cos⁡(3t)−cos⁡(4t)定义函数：defheart_shape(t):x=16*np.sin(t)**3y=13*np.cos(t)-5*np.c
python 绘图（爱心） @小H python 开发语言
#-*-coding:utf-8-*-fromturtleimport*defcurvemove():foriinrange(200):right(1)forward(1)color('red','pink')begin_fill()left(140)forward(111.65)curvemove()left(120)curvemove()forward(111.65)end_fill()don
Mulvus向量库数据插入失败排查 Sirius Wu milvus
Mulvus是一个开源的向量数据库，要判断数据是否成功插入以及在插入失败时进行排查，可以参考以下方法：确认数据是否成功插入1.API返回结果在使用Mulvus提供的API插入数据时，API会返回相应的结果信息。以PythonSDK为例，插入数据的代码通常如下：frompymilvusimportconnections,Collection,FieldSchema,CollectionSchema,
使用 Python 绘制爱心图形（高级版）徐浪老师徐浪老师大讲堂 python 开发语言
以下是一段使用Python绘制高级“爱心”图案的代码，结合数学公式生成精美的爱心形状，并附加一些交互式的效果，比如渐变颜色或动态展示：动态渐变爱心importnumpyasnpimportmatplotlib.pyplotaspltimportmatplotlib.animationasanimation#设置爱心的数学公式defheart_shape(t):x=16*np.sin(t)**3y=
2025计算机毕设全流程实战指南：Java/Python+协同过滤+小程序开发避坑手册启点毕设课程设计 java python 大四论文指南查重降重技巧毕业设计 spring
技术框架的选择是项目开发的关键起点，直接影响开发效率和最终成果质量。然而，许多开发者在选择技术框架时面临困难：现有知识储备不足以支撑复杂项目需求，团队经验有限，框架选择缺乏前瞻性常导致后期问题。尽管技术框架的选择过程充满挑战，但合适的框架能为项目开发和维护奠定基础，而不当的选择则可能带来持续的技术债务和开发困扰。所以，建议对项目技术框架把握不好的同学，最好是找自己的研究生学长或者老师详细的把关机技
pycharm中使用anaconda部署python环境_pycharm部署配置anaconda环境教程 weixin_39796652
本篇文章小编给大家分享一下pycharm部署配置anaconda环境教程，小编觉得挺不错的，现在分享给大家供大家参考，有需要的小伙伴们可以来看看。pycharm部署anaconda环境Pycharm：python编辑器，社区版本Anaconda：开源的python发行版本(专注于数据分析的python版本)，包含大量的科学包环境基本指令(准备工作)：conda--version查看anaconda
python poetry添加某个git仓库的某个分支 waketzheng git
命令行不太清楚怎么弄，但可以通过编辑pyproject.toml实现实例：pypika-tortoise={git="https://github.com/henadzit/pypika-tortoise",branch="do-not-use-builder"}参考：WIPDonotcopypypikaquerybyhenadzit·PullRequest#1851·tortoise/torto
The following modules are *disabled* in configure script:_sqlite3 waketzheng python
Unabletoupgradepast3.6.9-#24byRosuav-PythonHelp-DiscussionsonPython.orgsudoaptinstalllibsqlite3-devcdPython-3.13.1./configure--enable-optimizations--enable-loadable-sqlite-extensionsmakesudomakealtins
高效快速教你DeepSeek如何进行本地部署并且可视化对话大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 经验分享
科技文章：高效快速教你DeepSeek如何进行本地部署并且可视化对话摘要：随着自然语言处理（NLP）技术的进步，DeepSeek作为一款基于深度学习的语义搜索技术，广泛应用于文本理解、对话系统及信息检索等多个领域。本文将探讨如何高效快速地在本地部署DeepSeek，并结合可视化工具实现对话过程的监控与分析。通过详尽的步骤、案例分析与代码示例，帮助开发者更好地理解和应用DeepSeek技术。同时，本
CentOS7 python安装Ta-lib 0.6.x【talib不能直接安装，必须先安装ta_lib之c++库才可以】 weixin_43343144 服务器运维
正常流程：CentOS7python安装Ta-lib【talib不能直接安装，必须先安装ta_lib之c++库才可以】_centos7安装ta-lib-CSDN博客不同的版本参考如下！参考官方文档：ta-lib·PyPI务必下载匹配版本的【ta-lib-0.6.4-src.tar.gz】才可以正常安装$wgethttps://github.com/ta-lib/ta-lib/releases/do
【Kivy App】Pyjnius是什么？ Botiway 移动APP Kivy python
Pyjnius是一个Python库，用于在Python中访问Java类和方法，特别适用于在Kivy或其它Python应用中调用AndroidAPI。以下是Pyjnius的详细介绍、安装和使用方法：1.Pyjnius是什么？Pyjnius是一个Python-to-Java的桥接工具，允许Python代码直接调用Java类和方法。它基于JavaNativeInterface(JNI)，主要用于以下场景
机器学习——分类、回归、聚类、LASSO回归、Ridge回归（自用）代码的建筑师模型学习模型训练机器学习机器学习分类回归正则化项 LASSO Ridge 朴素
纠正自己的误区：机器学习是一个大范围，并不是一个小的方向，比如：线性回归预测、卷积神经网络和强化学都是机器学习算法在不同场景的应用。机器学习最为关键的是要有数据，也就是数据集名词解释：数据集中的一行叫一条样本或者实例，列名称为特征或者属性。样本的数量称为数据量，特征的数量称为特征维度机器学习常用库：Numpy和sklearn朴素的意思是特征的各条件都是相互独立的机器学习（模型、策略、算法）损失函数
基于Python PYQT5 的相机定时采集图像程序，GUI打包独立运行夏时summer time python qt 数码相机相机
基于PythonPYQT5编写相机定时采集图像及手动采集版本介绍Python3.6pyqt55.15.4pyqt5-tools5.15.4.3.2另外就是常用的cv2和numpy包fromPyQt5importQtCore,QtGui,QtWidgetsfromPyQt5importQtCore,QtGui,QtWidgetsimportcv2importnumpyasnpfromdatetime
《AI医疗系统开发实战录》第6期——智能导诊系统实战骆驼_代码狂魔程序员的法宝人工智能 django python neo4j 知识图谱
关注我，后期文章全部免费开放，一起推进AI医疗的发展核心主题：如何构建95%准确率的智能导诊系统？技术突破：结合BERT+知识图谱的混合模型设计一、智能导诊架构设计python基于BERT的意图识别模型（PyTorch）fromtransformersimportBertTokenizer,BertForSequenceClassificationimporttorchclassTriageMod
量化交易系统中如何处理机器学习模型的训练和部署？ openwin_top 量化交易系统开发机器学习人工智能量化交易
microPythonPython最小内核源码解析NI-motion运动控制c语言示例代码解析python编程示例系列python编程示例系列二python的Web神器Streamlit如何应聘高薪职位量化交易系统中，机器学习模型的训练和部署需要遵循一套严密的流程，以确保模型的可靠性、性能和安全性。以下是详细描述以及相关的示例：1.数据收集和预处理数据收集在量化交易中，数据是最重要的资产。收集的数
Mac下载python并安装小小酥*
下载pythonPython官网：https://www.python.org/进入官网后点击download，选择MacOSX版本2.安装MAC系统一般都自带有Python2.x版本的环境，你也可以在链接https://www.python.org/downloads/mac-osx/上下载最新版安装。3.设置环境变量程序和可执行文件可以在许多目录，而这些路径很可能不在操作系统提供可执行文件的搜
Python使用minIO上传下载身似山河挺脊梁 python
前提VSCode+Python3.9minIO有Python的例子1.python生成临时文件2.写入一些数据3.上传到minIO4.获取分享出连接5.发出通知#创建一个客户端minioClient=Minio(endpoint='xx',access_key='xx',secret_key='xx',secure=False)#生成文件名current_datetime=datetime.dat
深入理解Python上下文管理器 ……-…… python 开发语言
1.什么是上下文管理器？2.with语句的魔法3.创建上下文管理器的两种方式3.1基于类的实现3.2使用contextlib模块4.异常处理1.什么是上下文管理器？上下文管理器（ContextManager）是Python中用于精确分配和释放资源的机制。它通过__enter__()和__exit__()两个魔术方法实现了上下文管理协议，确保即使在代码执行出错的情况下，资源也能被正确清理。#经典文件
Linux的Initrd机制被触发 linux
Linux 的 initrd 技术是一个非常普遍使用的机制，linux2.6 内核的 initrd 的文件格式由原来的文件系统镜像文件转变成了 cpio 格式，变化不仅反映在文件格式上， linux 内核对这两种格式的 initrd 的处理有着截然的不同。本文首先介绍了什么是 initrd 技术，然后分别介绍了 Linux2.4 内核和 2.6 内核的 initrd 的处理流程。最后通过对 Lin
maven本地仓库路径修改 bitcarter maven
默认maven本地仓库路径：C:\Users\Administrator\.m2 修改maven本地仓库路径方法： 1.打开E:\maven\apache-maven-2.2.1\conf\settings.xml 2.找到
XSD和XML中的命名空间 darrenzhu xml xsd schema namespace 命名空间
http://www.360doc.com/content/12/0418/10/9437165_204585479.shtml http://blog.csdn.net/wanghuan203/article/details/9203621 http://blog.csdn.net/wanghuan203/article/details/9204337 http://www.cn
Java 求素数运算周凡杨 java 算法素数
网络上对求素数之解数不胜数，我在此总结归纳一下，同时对一些编码，加以改进，效率有成倍热提高。第一种：原理: 6N(+-)1法任何一个自然数，总可以表示成为如下的形式之一： 6N，6N+1，6N+2，6N+3，6N+4，6N+5 (N=0，1，2，…)
java 单例模式 g21121 java
想必单例模式大家都不会陌生，有如下两种方式来实现单例模式： class Singleton { private static Singleton instance=new Singleton(); private Singleton(){} static Singleton getInstance() { return instance; }
Linux下Mysql源码安装 510888780 mysql
1.假设已经有mysql-5.6.23-linux-glibc2.5-x86_64.tar.gz (1)创建mysql的安装目录及数据库存放目录解压缩下载的源码包，目录结构，特殊指定的目录除外：
32位和64位操作系统墙头上一根草 32位和64位操作系统
32位和64位操作系统是指：CPU一次处理数据的能力是32位还是64位。现在市场上的CPU一般都是64位的，但是这些CPU并不是真正意义上的64 位CPU，里面依然保留了大部分32位的技术，只是进行了部分64位的改进。32位和64位的区别还涉及了内存的寻址方面，32位系统的最大寻址空间是2 的32次方= 4294967296（bit）= 4（GB）左右，而64位系统的最大寻址空间的寻址空间则达到了
我的spring学习笔记10-轻量级_Spring框架 aijuans Spring 3
一、问题提问： → 请简单介绍一下什么是轻量级？轻量级（Leightweight）是相对于一些重量级的容器来说的，比如Spring的核心是一个轻量级的容器，Spring的核心包在文件容量上只有不到1M大小，使用Spring核心包所需要的资源也是很少的，您甚至可以在小型设备中使用Spring。
mongodb 环境搭建及简单CURD antlove Web Install curd NoSQL mongo
一搭建mongodb环境 1. 在mongo官网下载mongodb 2. 在本地创建目录 "D:\Program Files\mongodb-win32-i386-2.6.4\data\db" 3. 运行mongodb服务 [mongod.exe --dbpath "D:\Program Files\mongodb-win32-i386-2.6.4\data\
数据字典和动态视图百合不是茶 oracle 数据字典动态视图系统和对象权限
数据字典（data dictionary）是 Oracle 数据库的一个重要组成部分，这是一组用于记录数据库信息的只读（read-only）表。随着数据库的启动而启动,数据库关闭时数据字典也关闭数据字典中包含数据库中所有方案对象（schema object）的定义(包括表，视图，索引，簇，同义词，序列，过程，函数，包，触发器等等) 数据库为一
多线程编程一般规则 bijian1013 java thread 多线程 java多线程
如果两个工两个以上的线程都修改一个对象，那么把执行修改的方法定义为被同步的，如果对象更新影响到只读方法，那么只读方法也要定义成同步的。不要滥用同步。如果在一个对象内的不同的方法访问的不是同一个数据，就不要将方法设置为synchronized的。
将文件或目录拷贝到另一个Linux系统的命令scp bijian1013 linux unix scp
一.功能说明 scp就是security copy，用于将文件或者目录从一个Linux系统拷贝到另一个Linux系统下。scp传输数据用的是SSH协议，保证了数据传输的安全，其格式如下： scp 远程用户名@IP地址：文件的绝对路径
【持久化框架MyBatis3五】MyBatis3一对多关联查询 bit1129 Mybatis3
以教员和课程为例介绍一对多关联关系，在这里认为一个教员可以叫多门课程，而一门课程只有1个教员教，这种关系在实际中不太常见，通过教员和课程是多对多的关系。示例数据：地址表： CREATE TABLE ADDRESSES ( ADDR_ID INT(11) NOT NULL AUTO_INCREMENT, STREET VAR
cookie状态判断引发的查找问题 bitcarter form cgi
先说一下我们的业务背景： 1.前台将图片和文本通过form表单提交到后台，图片我们都做了base64的编码，并且前台图片进行了压缩 2.form中action是一个cgi服务 3.后台cgi服务同时供PC，H5，APP 4.后台cgi中调用公共的cookie状态判断方法（公共的，大家都用，几年了没有问题）问题：（折腾两天。。。。） 1.PC端cgi服务正常调用，cookie判断没
通过Nginx,Tomcat访问日志(access log)记录请求耗时 ronin47
一、Nginx通过$upstream_response_time $request_time统计请求和后台服务响应时间 nginx.conf使用配置方式： log_format main '$remote_addr - $remote_user [$time_local] "$request" ''$status $body_bytes_sent "$http_r
java-67- n个骰子的点数。把n个骰子扔在地上，所有骰子朝上一面的点数之和为S。输入n，打印出S的所有可能的值出现的概率。 bylijinnan java
public class ProbabilityOfDice { /** * Q67 n个骰子的点数 * 把n个骰子扔在地上，所有骰子朝上一面的点数之和为S。输入n，打印出S的所有可能的值出现的概率。 * 在以下求解过程中，我们把骰子看作是有序的。 * 例如当n=2时，我们认为（1，2）和（2，1）是两种不同的情况 */ private stati
看别人的博客，觉得心情很好 Cb123456 博客心情
以为写博客，就是总结，就和日记一样吧，同时也在督促自己。今天看了好长时间博客: 职业规划: http://www.iteye.com/blogs/subjects/zhiyeguihua android学习: 1.http://byandby.i
[JWFD开源工作流]尝试用原生代码引擎实现循环反馈拓扑分析 comsci 工作流
我们已经不满足于仅仅跳跃一次，通过对引擎的升级，今天我测试了一下循环反馈模式，大概跑了200圈，引擎报一个溢出错误在一个流程图的结束节点中嵌入一段方程，每次引擎运行到这个节点的时候，通过实时编译器GM模块，计算这个方程，计算结果与预设值进行比较，符合条件则跳跃到开始节点，继续新一轮拓扑分析，直到遇到
JS常用的事件及方法 cwqcwqmax9 js
事件描述 onactivate 当对象设置为活动元素时触发。 onafterupdate 当成功更新数据源对象中的关联对象后在数据绑定对象上触发。 onbeforeactivate 对象要被设置为当前元素前立即触发。 onbeforecut 当选中区从文档中删除之前在源对象触发。 onbeforedeactivate 在 activeElement 从当前对象变为父文档其它对象之前立即
正则表达式验证日期格式 dashuaifu 正则表达式 IT其它 java其它
正则表达式验证日期格式 function isDate(d){ var v = d.match(/^(\d{4})-(\d{1,2})-(\d{1,2})$/i); if(!v) { this.focus(); return false; } } <input value="2000-8-8" onblu
Yii CModel.rules() 方法、validate预定义完整列表、以及说说验证 dcj3sjt126com yii
public array rules () {return} array 要调用 validate() 时应用的有效性规则。返回属性的有效性规则。声明验证规则，应重写此方法。每个规则是数组具有以下结构：array('attribute list', 'validator name', 'on'=>'scenario name', ...validation
UITextAttributeTextColor = deprecated in iOS 7.0 dcj3sjt126com ios
In this lesson we used the key "UITextAttributeTextColor" to change the color of the UINavigationBar appearance to white. This prompts a warning "first deprecated in iOS 7.0." Ins
判断一个数是质数的几种方法 EmmaZhao Math python
质数也叫素数，是只能被1和它本身整除的正整数，最小的质数是2，目前发现的最大的质数是p=2^57885161-1【注1】。判断一个数是质数的最简单的方法如下： def isPrime1(n): for i in range(2, n): if n % i == 0: return False return True 但是在上面的方法中有一些冗余的计算，所以
SpringSecurity工作原理小解读坏我一锅粥 SpringSecurity
SecurityContextPersistenceFilter ConcurrentSessionFilter WebAsyncManagerIntegrationFilter HeaderWriterFilter CsrfFilter LogoutFilter Use
JS实现自适应宽度的Tag切换 ini JavaScript html Web css html5
效果体验：http://hovertree.com/texiao/js/3.htm 该效果使用纯JavaScript代码，实现TAB页切换效果，TAB标签根据内容自适应宽度，点击TAB标签切换内容页。 HTML文件代码： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"
Hbase Rest API : 数据查询 kane_xie REST hbase
hbase（hadoop）是用java编写的，有些语言（例如python）能够对它提供良好的支持，但也有很多语言使用起来并不是那么方便，比如c#只能通过thrift访问。Rest就能很好的解决这个问题。Hbase的org.apache.hadoop.hbase.rest包提供了rest接口，它内嵌了jetty作为servlet容器。启动命令：./bin/hbase rest s
JQuery实现鼠标拖动元素移动位置（源码+注释）明子健 jquery js 源码拖动鼠标
欢迎讨论指正！ print.html代码： <!DOCTYPE html> <html> <head> <meta http-equiv=Content-Type content="text/html;charset=utf-8"> <title>发票打印</title> &l
Postgresql 连表更新字段语法 update qifeifei PostgreSQL
下面这段sql本来目的是想更新条件下的数据，可是这段sql却更新了整个表的数据。sql如下： UPDATE tops_visa.visa_order SET op_audit_abort_pass_date = now() FROM tops_visa.visa_order as t1 INNER JOIN tops_visa.visa_visitor as t2 ON t1.
将redis,memcache结合使用的方案? tcrct redis cache
公司架构上使用了阿里云的服务，由于阿里的kvstore收费相当高，打算自建，自建后就需要自己维护，所以就有了一个想法，针对kvstore(redis)及ocs(memcache)的特点，想自己开发一个cache层，将需要用到list，set，map等redis方法的继续使用redis来完成，将整条记录放在memcache下，即findbyid，save等时就memcache，其它就对应使用redi
开发中遇到的诡异的bug wudixiaotie bug
今天我们服务器组遇到个问题：我们的服务是从Kafka里面取出数据，然后把offset存储到ssdb中，每个topic和partition都对应ssdb中不同的key，服务启动之后，每次kafka数据更新我们这边收到消息，然后存储之后就发现ssdb的值偶尔是-2,这就奇怪了，最开始我们是在代码中打印存储的日志，发现没什么问题，后来去查看ssdb的日志，才发现里面每次set的时候都会对同一个key