前途似海_来日方长

深度学习-数据基本使用

数据使用

文章目录

数据使用
- 一、数据的获取
- - 1、图片爬虫工具
  - 2、视频爬虫工具
  - 3、复杂的爬虫工具(flickr)
  - 4、按照用户的ID来爬取图片
  - 5、对一些特定的网站进行爬（摄影网站）(图虫、500px，花瓣网等等)
  - 6、爬虫合集
- 二、数据整理
- - 1、数据检查与归一化
  - 2、数据去重
- 三、数据标注
- - 1、labelme
  - 2、其他的一些标注工具
- 四、数据增强方法
- - 1、基本数据增强方法
  - 2、自动数据增强方法
  - 3、从零生成新的数据
- 五、pytorch数据增强实战（针对图像分类任务）
- - 1、pytorch数据增强接口
  - 2、pytorch数据增强实践(目标检测)
  - 3、数据增强开源库imgaug介绍
  - 4、imaug开源库具体的几个例子:

一、数据的获取

1、图片爬虫工具

https://github.com/sczhengyabin/Image-Downloader

2、视频爬虫工具

https://github.com/iawia002/annie

3、复杂的爬虫工具(flickr)

https://github.com/chenusc11/flickr-crawler

4、按照用户的ID来爬取图片

https://github.com/hellock/icrawler

5、对一些特定的网站进行爬（摄影网站）(图虫、500px，花瓣网等等)

https://github.com/chenusc11/darrenfantasy/image_crawler

6、爬虫合集

https://github.com/facert/awesome-spider

二、数据整理

1、数据检查与归一化

去除坏图与尺寸异常

格式归一化

类型归一化(jpg，png)
命名归一化

下面这个代码是去除坏图以及命名归一化

from pathlib import Path
import datetime
import cv2
import os

def listfiles(rootDir, ifrename=True):
    list_dirs = os.walk(rootDir)
    num = 0
    # os.walk 会迭代遍历文件夹下面的每一个文件夹和文件的名字，然后进行重命名，一直遍历到最低层
    for root, dirs, files in list_dirs:
        files.sort()
        for d in dirs:
            print(os.path.join(root, d))
        for f in files:
            fileid = f.split('.')[0]
            filepath = os.path.join(root, f)
            try:
                src = cv2.imread(filepath, 1)
                print("src=", filepath, src.shape)
                # 去除原来的图片
                os.remove(filepath)
                if ifrename:
                    # 前面补0到5位数字
                    cv2.imwrite(os.path.join(root ,str(num).zfill(5) + ".jpg"), src)
                    num = num + 1
                else:
                    cv2.imwrite(os.path.join(root, fileid + ".jpg"), src)
            except:
                os.remove(filepath)
                continue

if __name__ == "__main__":
    listfiles("/home/wl/linshi/linshi2")#这个文件夹下面有多个文件夹也是可以的
    
    # 下面这个是控制小数位数的输出的，可以看着用
    #a = 3.1415926
    #print(round(a, 4))
    #print("%.2f" % a)
    #print("{:.3f}".format(a))

2、数据去重

相同的图像（内容完全一样，只不过分辨率不同）
相似的图像（连续视频帧，扰动污染有水印等等）

下面的代码是去除相同的图片（基于MD5，直接在该文件夹下删除相同的图片，或者其他文件也行）（单文件夹和多文件夹都有）

import os
import hashlib
import sys


def get_md5(file):
    file = open(file, 'rb')
    md5 = hashlib.md5(file.read())
    file.close()
    md5_values = md5.hexdigest()
    return md5_values


def remove_by_md5_singledir(file_dir):
    file_list = os.listdir(file_dir)
    md5_list = []
    print("去重前图像的数量：" + str(len(file_list)))
    for filepath in file_list:
        filemd5 = get_md5(os.path.join(file_dir, filepath))
        if filemd5 not in md5_list:
            md5_list.append(filemd5)
        else:
            os.remove(os.path.join(file_dir, filepath))
    print("去重后图像数量:" + str(len(os.listdir(file_dir))))


def remove_by_md5_multidir(file_list):
    md5_list = []
    print("去重前图像数量：" + str(len(file_list)))
    for filepath in file_list:
        filemd5 = get_md5(filepath)
        file_id = filepath.split('/')[-1]
        file_dir = filepath[0:len(filepath) - len(file_id)]
        if filemd5 not in md5_list:
            md5_list.append(filemd5)
        else:
            os.remove(filepath)
    print("去重后图像的数量：" + str(len(md5_list)))


if __name__ == "__main__":
    file_dir = sys.argv[1]
    remove_by_md5_singledir(file_dir)

    file_dir1 = sys.argv[1]
    file_list1 = os.listdir(file_dir1)
    file_list1 = [os.path.join(file_dir1, x) for x in file_list1]
    file_dir2 = sys.argv[2]
    file_list2 = os.listdir(file_dir2)
    file_list2 = [os.path.join(file_dir2, x) for x in file_list2]
    remove_by_md5_multidir(file_list1 + file_list2)

下面的代码是去除相同或者相似的图片（基于图片内容进行判断）（单文件夹的模式）

import numpy as np
import cv2
import os

def compare_image(image1, image2, mode='same'):
    # 比较是否完全相同，这个非常严格，要求每个像素都相同
    if mode == 'same':
        assert (image1.shape == image2.shape)
        diff = (image1 == image2).astype(np.int)
        if cv2.countNonZero(diff) == image1.shape[0]* image1.shape[1]:
            return 1.0
    # 比较是否相似，基于绝对差阈值
    elif mode == 'abs':
        assert (image1.shape == image2.shape)
        diff = np.sum(np.abs(image1.astype(np.float) - image2.astype(np.float)))
        return diff / (image1.shape[0] * image1.shape[1])
    return 0

def remove_by_pixel_singledir(file_dir, mode, th=5.0):
    file_list = os.listdir(file_dir)
    print('去重前图像的数量：' + str(len((file_list))))
    for i in range(0, len(file_list)):
        if i < len(file_list) - 1:
            imagei = cv2.imread(os.path.join(file_dir, file_list[i]), 0)
            imagei = cv2.resize(imagei, (128, 128), interpolation=cv2.INTER_NEAREST)
            print('testing image' + os.path.join(file_dir, file_list[i]))
            for j in range(i+1 ,len(file_list)):
                imagej = cv2.imread(os.path.join(file_dir, file_list[j]), 0)
                imagej = cv2.resize(imagej, (128, 128), interpolation=cv2.INTER_NEAREST)
                similarity = compare_image(imagei, imagej, mode = mode)
                print("simi=" + str(similarity))
                if similarity >= 1.0 and mode == 'same':
                    os.remove(os.path.join(file_dir, file_list[j]))
                    print('删除' + os.path.join(file_dir, file_list[j]))
                    file_list.pop(j)
                elif similarity < th and mode == 'abs':
                    os.remove(os.path.join(file_dir, file_list[j]))
                    print('删除' + os.path.join(file_dir, file_list[j]))
                    file_list.pop(j)
                else:
                    break
    print("去重后的图像数量:" + str(len(os.listdir(file_dir))))



if __name__ == "__main__":
    mode = "same"
    file_dir = "/home/wl/linshi"
    remove_by_pixel_singledir(file_dir, mode)

后续的改进方案：

基于图片的相似度的计算改进：

更多的相似度准则：MSE距离，leveshtein距离，DNN特征相似度

更多的遍历方案等（文件物理大小，图像尺寸，文件名字）进行预先排序，搜索一定的深度或最近邻。

3、训练、验证、测试集数据集划分

下面的两个代码分别是随机打乱和均匀划分样本的代码

import random
import sys

def shuffle(file_in, file_out):
    fin = open(file_in, 'r')
    fout = open(file_out, 'w')
    lines = fin.readlines()
    random.shuffle(lines)
    for line in lines:
        fout.write(line)

def splittrain_val(fileall, valratio=0.1):
    fileids = fileall.split('.')
    fileid = fileids[len(fileids)-2]
    f = open(fileall)
    ftrain = open(fileid + "_train.txt", 'w')
    fval = open(fileid + "_val.txt", 'w')
    count = 0
    if valratio == 0 or valratio >=1:
        valratio = 0.1
    interval = (int)(1.0/valratio)
    while 1:
        line = f.readline()
        if line:
            count = count + 1
            if count % interval == 0:
                fval.write(line)
            else:
                ftrain.write(line)
        else:
            break


if __name__ == "__main__":
    splittrain_val("/home/wl/linshi/test_files.txt", 0.5)

三、数据标注

1、labelme

在线版本（比较早了）：http://labelme.csail.mit.edu/Release3.0

离线版本：https://github.com/wkentaro/labelme

2、其他的一些标注工具

其他的一些标注工具：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-eBj9iVmV-1652447768980)(/home/wl/.config/Typora/typora-user-images/image-20220503104114737.png)]

3、智能标注工具

百度的paddle中的EIseg
基于RNN的半监督交互式工具

https://github.com/fidler-lab/polyrnn-pp-pytorch
基于GCN的半监督交互式工具

四、数据增强方法

1、基本数据增强方法

为什么做数据增强的方法：增加模型泛化能力方法

显式正则化（模型集成，参数正则化等）
隐式正则化（数据增强，随机梯度下降等）

数据增强方法有哪些

单样本增强：几何操作类和颜色操作类
多样本增强：离散样本点连续化来进行插值拟合

单样本的几何变换：翻转（方向敏感的任务不能用）、旋转（角度敏感的任务不可用）、缩放、仿射等操作（256×256裁剪224×224，相当于数量级增加了32倍）

单样本的颜色操作类：噪声、模糊、颜色扰动、对比度扰动、擦除等

多样本的数据增强——samplepairing：随机抽取两张图片分别经过基础数据增强操作（如随机翻转等）处理后，直接叠加合成一个新的样本，标签为原样本标签中的一种。

多样本的数据增强——Mixup：对图像和标签都进行线性插值

综合变换的库：https://github.com/aleju/imgaug

2、自动数据增强方法

Autoaugment（主要是图像分类任务上来做实验）

学习已有的数据增强操作的组合，不同的任务，需要不同的数据增强操作

准备16个常用的数据操作
从16个中选择5个操作，随机产生使用该操作的概率和相应的幅度，将其成为一个sub-policy，一共产生5个sub-polices
对训练过程中每一个batch的图片，随机采用5个sub-policy操作方法中的一种。
通过模型在验证集上的泛化能力来反馈，使用的优化方法是增强学习方法。
经过80~100个epoch后网络开始学习到有效的sub-policies.
之后串接这5个sub-policies，然后再进行最后的训练。

3、从零生成新的数据

一般是用生成对抗网络来实现的，这里略

五、pytorch数据增强实战（针对图像分类任务）

1、pytorch数据增强接口

最常见的数据增强任务：每一次训练，通过裁剪获得同样大小的图片来输入网络

（1）首先是数据预处理

	norm_mean = [0.485, 0.456, 0.406]
    norm_std = [0.229, 0.224, 0.225]
    train_transform = transforms.Compose([
        # (256)区别:一个256的话是短边resize到256，长边缩小到相应比例，不一定是256
        # (256, 256)的话则是长边和短边都缩小到256
        transforms.Resize((256)),
        # 然后从中心截取(256,256)的矩形
        transforms.CenterCrop(256),
        # 随机再从里面裁剪出来224的
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.ToTensor(),
        transforms.Normalize(norm_mean, norm_std),
    ])

（2）数据增强的接口

见pyTorch中文文档：

pytorch官网地址：https://pytorch.org/

开源翻译的中文的地址：http://pytorch.apachecn.org/

github的地址：https://github.com/apachecn/pytorch-doc-zh

下面是一些案例（面向语义分割的数据增强）：

import torchvision.transforms.functional as TF
import random

def my_seg_transforms(image, mask):
    if random.random > 0.5:
        angle = random.randint(-30, 30)
        image = TF.rotate(image, angle)
        mask = TF.rotate(mask, angle)
    return image, mask

2、pytorch数据增强实践(目标检测)

# -*- coding=utf-8 -*-

# 包括:
#     1. 裁剪(需改变bbox)
#     2. 平移(需改变bbox)
#     3. 改变亮度
#     4. 加噪声
#     5. 旋转角度(需要改变bbox)
#     6. 镜像(需要改变bbox)
#     7. cutout
# 注意:
#     random.seed(),相同的seed,产生的随机数是一样的!!
import sys

ros_path = '/opt/ros/kinetic/lib/python2.7/dist-packages'

if ros_path in sys.path:
    sys.path.remove(ros_path)

import cv2
import time
import random
import os
import math
import numpy as np
from skimage.util import random_noise
from skimage import exposure


# 显示带标签显示的图片
def show_pic(img, bboxes=None, labels=None):
    '''
    输入:
        img:图像array
        bboxes:图像的所有boudning box list, 格式为[[x_min, y_min, x_max, y_max]....]
        names:每个box对应的名称
    '''
    #     cv2.imwrite('./1.jpg', img)
    #     img = cv2.imread('./1.jpg')
    img = img / 255
    for i in range(len(bboxes)):
        bbox = bboxes[i]
        x_min = bbox[0]
        y_min = bbox[1]
        x_max = bbox[2]
        y_max = bbox[3]
        cv2.rectangle(img, (int(x_min), int(y_min)), (int(x_max), int(y_max)), (0, 255, 0), 3)
        cv2.putText(img, labels[i], (int(x_min), int(y_min)), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
    cv2.namedWindow('pic', 0)  # 1表示原图
    cv2.moveWindow('pic', 0, 0)
    cv2.resizeWindow('pic', 1200, 800)  # 可视化的图片大小
    cv2.imshow('pic', img)
    if cv2.waitKey(1) == ord('q'):
        cv2.destroyAllWindows()
        sys.exit()


#     cv2.destroyAllWindows()
#     os.remove('./1.jpg')

# 图像均为cv2读取
class DataAugmentForObjectDetection():
    def __init__(self, rotation_rate=0.5, max_rotation_angle=30,
                 crop_rate=0.5, shift_rate=0.5, change_light_rate=0.5,
                 add_noise_rate=0.5, flip_rate=0.5,
                 cutout_rate=0.5, cut_out_length=50, cut_out_holes=1, cut_out_threshold=0.5):
        self.rotation_rate = rotation_rate
        self.max_rotation_angle = max_rotation_angle
        self.crop_rate = crop_rate
        self.shift_rate = shift_rate
        self.change_light_rate = change_light_rate
        self.add_noise_rate = add_noise_rate
        self.flip_rate = flip_rate
        self.cutout_rate = cutout_rate

        self.cut_out_length = cut_out_length
        self.cut_out_holes = cut_out_holes
        self.cut_out_threshold = cut_out_threshold

    # 加噪声
    def _addNoise(self, img):
        '''
        输入:
            img:图像array
        输出:
            加噪声后的图像array,由于输出的像素是在[0,1]之间,所以得乘以255
        '''
        # random.seed(int(time.time()))
        # return random_noise(img, mode='gaussian', seed=int(time.time()), clip=True)*255
        return random_noise(img, mode='gaussian', clip=True) * 255

    # 调整亮度
    def _changeLight(self, img):
        # random.seed(int(time.time()))
        flag = random.uniform(0.5, 1.5)  # flag>1为调暗,小于1为调亮
        return exposure.adjust_gamma(img, flag)

    # cutout
    def _cutout(self, img, bboxes, length=100, n_holes=1, threshold=0.5):
        '''
        原版本：https://github.com/uoguelph-mlrg/Cutout/blob/master/util/cutout.py
        Randomly mask out one or more patches from an image.
        Args:
            img : a 3D numpy array,(h,w,c)
            bboxes : 框的坐标
            n_holes (int): Number of patches to cut out of each image.
            length (int): The length (in pixels) of each square patch.
        '''

        def cal_iou(boxA, boxB):
            '''
            boxA, boxB为两个框，返回iou
            boxB为bouding box
            '''

            # determine the (x, y)-coordinates of the intersection rectangle
            xA = max(boxA[0], boxB[0])
            yA = max(boxA[1], boxB[1])
            xB = min(boxA[2], boxB[2])
            yB = min(boxA[3], boxB[3])

            if xB <= xA or yB <= yA:
                return 0.0

            # compute the area of intersection rectangle
            interArea = (xB - xA + 1) * (yB - yA + 1)

            # compute the area of both the prediction and ground-truth
            # rectangles
            boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
            boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)

            # compute the intersection over union by taking the intersection
            # area and dividing it by the sum of prediction + ground-truth
            # areas - the interesection area
            # iou = interArea / float(boxAArea + boxBArea - interArea)
            iou = interArea / float(boxBArea)

            # return the intersection over union value
            return iou

        # 得到h和w
        if img.ndim == 3:
            h, w, c = img.shape
        else:
            _, h, w, c = img.shape

        mask = np.ones((h, w, c), np.float32)

        for n in range(n_holes):

            chongdie = True  # 看切割的区域是否与box重叠太多

            while chongdie:
                y = np.random.randint(h)
                x = np.random.randint(w)

                y1 = np.clip(y - length // 2, 0,
                             h)  # numpy.clip(a, a_min, a_max, out=None), clip这个函数将将数组中的元素限制在a_min, a_max之间，大于a_max的就使得它等于 a_max，小于a_min,的就使得它等于a_min
                y2 = np.clip(y + length // 2, 0, h)
                x1 = np.clip(x - length // 2, 0, w)
                x2 = np.clip(x + length // 2, 0, w)

                chongdie = False
                for box in bboxes:
                    if cal_iou([x1, y1, x2, y2], box) > threshold:
                        chongdie = True
                        break

            mask[y1: y2, x1: x2, :] = 0.

        # mask = np.expand_dims(mask, axis=0)
        img = img * mask

        return img

    # 旋转
    def _rotate_img_bbox(self, img, bboxes, angle=5, scale=1.):
        '''
        参考:https://blog.csdn.net/u014540717/article/details/53301195crop_rate
        输入:
            img:图像array,(h,w,c)
            bboxes:该图像包含的所有boundingboxs,一个list,每个元素为[x_min, y_min, x_max, y_max],要确保是数值
            angle:旋转角度
            scale:默认1
        输出:
            rot_img:旋转后的图像array
            rot_bboxes:旋转后的boundingbox坐标list
        '''
        # ---------------------- 旋转图像 ----------------------
        w = img.shape[1]
        h = img.shape[0]
        # 角度变弧度
        rangle = np.deg2rad(angle)  # angle in radians
        # now calculate new image width and height
        nw = (abs(np.sin(rangle) * h) + abs(np.cos(rangle) * w)) * scale
        nh = (abs(np.cos(rangle) * h) + abs(np.sin(rangle) * w)) * scale
        # ask OpenCV for the rotation matrix
        rot_mat = cv2.getRotationMatrix2D((nw * 0.5, nh * 0.5), angle, scale)
        # calculate the move from the old center to the new center combined
        # with the rotation
        rot_move = np.dot(rot_mat, np.array([(nw - w) * 0.5, (nh - h) * 0.5, 0]))
        # the move only affects the translation, so update the translation
        # part of the transform
        rot_mat[0, 2] += rot_move[0]
        rot_mat[1, 2] += rot_move[1]
        # 仿射变换
        rot_img = cv2.warpAffine(img, rot_mat, (int(math.ceil(nw)), int(math.ceil(nh))), flags=cv2.INTER_LANCZOS4)

        # ---------------------- 矫正bbox坐标 ----------------------
        # rot_mat是最终的旋转矩阵
        # 获取原始bbox的四个中点，然后将这四个点转换到旋转后的坐标系下
        rot_bboxes = list()
        for bbox in bboxes:
            xmin = bbox[0]
            ymin = bbox[1]
            xmax = bbox[2]
            ymax = bbox[3]
            point1 = np.dot(rot_mat, np.array([(xmin + xmax) / 2, ymin, 1]))
            point2 = np.dot(rot_mat, np.array([xmax, (ymin + ymax) / 2, 1]))
            point3 = np.dot(rot_mat, np.array([(xmin + xmax) / 2, ymax, 1]))
            point4 = np.dot(rot_mat, np.array([xmin, (ymin + ymax) / 2, 1]))
            # 合并np.array
            concat = np.vstack((point1, point2, point3, point4))
            # 改变array类型
            concat = concat.astype(np.int32)
            # 得到旋转后的坐标
            rx, ry, rw, rh = cv2.boundingRect(concat)
            rx_min = rx
            ry_min = ry
            rx_max = rx + rw
            ry_max = ry + rh
            # 加入list中
            rot_bboxes.append([rx_min, ry_min, rx_max, ry_max])

        return rot_img, rot_bboxes

    # 裁剪
    def _crop_img_bboxes(self, img, bboxes):
        '''
        裁剪后的图片要包含所有的框
        输入:
            img:图像array
            bboxes:该图像包含的所有boundingboxs,一个list,每个元素为[x_min, y_min, x_max, y_max],要确保是数值
        输出:
            crop_img:裁剪后的图像array
            crop_bboxes:裁剪后的bounding box的坐标list
        '''
        # ---------------------- 裁剪图像 ----------------------
        w = img.shape[1]
        h = img.shape[0]
        x_min = w  # 裁剪后的包含所有目标框的最小的框
        x_max = 0
        y_min = h
        y_max = 0
        for bbox in bboxes:
            x_min = min(x_min, bbox[0])
            y_min = min(y_min, bbox[1])
            x_max = max(x_max, bbox[2])
            y_max = max(y_max, bbox[3])

        d_to_left = x_min  # 包含所有目标框的最小框到左边的距离
        d_to_right = w - x_max  # 包含所有目标框的最小框到右边的距离
        d_to_top = y_min  # 包含所有目标框的最小框到顶端的距离
        d_to_bottom = h - y_max  # 包含所有目标框的最小框到底部的距离

        # 随机扩展这个最小框
        crop_x_min = int(x_min - random.uniform(0, d_to_left))
        crop_y_min = int(y_min - random.uniform(0, d_to_top))
        crop_x_max = int(x_max + random.uniform(0, d_to_right))
        crop_y_max = int(y_max + random.uniform(0, d_to_bottom))

        # 随机扩展这个最小框 , 防止别裁的太小
        # crop_x_min = int(x_min - random.uniform(d_to_left//2, d_to_left))
        # crop_y_min = int(y_min - random.uniform(d_to_top//2, d_to_top))
        # crop_x_max = int(x_max + random.uniform(d_to_right//2, d_to_right))
        # crop_y_max = int(y_max + random.uniform(d_to_bottom//2, d_to_bottom))

        # 确保不要越界
        crop_x_min = max(0, crop_x_min)
        crop_y_min = max(0, crop_y_min)
        crop_x_max = min(w, crop_x_max)
        crop_y_max = min(h, crop_y_max)

        crop_img = img[crop_y_min:crop_y_max, crop_x_min:crop_x_max]

        # ---------------------- 裁剪boundingbox ----------------------
        # 裁剪后的boundingbox坐标计算
        crop_bboxes = list()
        for bbox in bboxes:
            crop_bboxes.append([bbox[0] - crop_x_min, bbox[1] - crop_y_min, bbox[2] - crop_x_min, bbox[3] - crop_y_min])

        return crop_img, crop_bboxes

    # 平移
    def _shift_pic_bboxes(self, img, bboxes):
        '''
        参考:https://blog.csdn.net/sty945/article/details/79387054
        平移后的图片要包含所有的框
        输入:
            img:图像array
            bboxes:该图像包含的所有boundingboxs,一个list,每个元素为[x_min, y_min, x_max, y_max],要确保是数值
        输出:
            shift_img:平移后的图像array
            shift_bboxes:平移后的bounding box的坐标list
        '''
        # ---------------------- 平移图像 ----------------------
        w = img.shape[1]
        h = img.shape[0]
        x_min = w  # 裁剪后的包含所有目标框的最小的框
        x_max = 0
        y_min = h
        y_max = 0
        for bbox in bboxes:
            x_min = min(x_min, bbox[0])
            y_min = min(y_min, bbox[1])
            x_max = max(x_max, bbox[2])
            y_max = max(y_max, bbox[3])

        d_to_left = x_min  # 包含所有目标框的最大左移动距离
        d_to_right = w - x_max  # 包含所有目标框的最大右移动距离
        d_to_top = y_min  # 包含所有目标框的最大上移动距离
        d_to_bottom = h - y_max  # 包含所有目标框的最大下移动距离

        x = random.uniform(-(d_to_left - 1) / 3, (d_to_right - 1) / 3)
        y = random.uniform(-(d_to_top - 1) / 3, (d_to_bottom - 1) / 3)

        M = np.float32([[1, 0, x], [0, 1, y]])
        # x为向左或右移动的像素值,正为向右负为向左; y为向上或者向下移动的像素值,正为向下负为向上
        try:
            shift_img = cv2.warpAffine(img, M, (img.shape[1], img.shape[0]))
        except Exception as e:
            print("error")

        # ---------------------- 平移boundingbox ----------------------
        shift_bboxes = list()
        for bbox in bboxes:
            shift_bboxes.append([bbox[0] + x, bbox[1] + y, bbox[2] + x, bbox[3] + y])

        return shift_img, shift_bboxes

    # 镜像
    def _filp_pic_bboxes(self, img, bboxes):
        '''
            参考:https://blog.csdn.net/jningwei/article/details/78753607
            平移后的图片要包含所有的框
            输入:
                img:图像array
                bboxes:该图像包含的所有boundingboxs,一个list,每个元素为[x_min, y_min, x_max, y_max],要确保是数值
            输出:
                flip_img:平移后的图像array
                flip_bboxes:平移后的bounding box的坐标list
        '''
        # ---------------------- 翻转图像 ----------------------
        import copy
        flip_img = copy.deepcopy(img)
        #         if random.random() < 0.5:    #0.5的概率水平翻转，0.5的概率垂直翻转
        horizon = True
        #         else:
        #             horizon = False
        h, w, _ = img.shape
        if horizon:  # 水平翻转
            flip_img = cv2.flip(flip_img, 1)  # 1是水平，-1是水平垂直
        else:
            flip_img = cv2.flip(flip_img, 0)

        # ---------------------- 调整boundingbox ----------------------
        flip_bboxes = list()
        for box in bboxes:
            x_min = box[0]
            y_min = box[1]
            x_max = box[2]
            y_max = box[3]
            if horizon:
                flip_bboxes.append([w - x_max, y_min, w - x_min, y_max])
            else:
                flip_bboxes.append([x_min, h - y_max, x_max, h - y_min])

        return flip_img, flip_bboxes

    def dataAugment(self, img, bboxes):
        '''
        图像增强
        输入:
            img:图像array
            bboxes:该图像的所有框坐标
        输出:
            img:增强后的图像
            bboxes:增强后图片对应的box
        '''
        change_num = 0  # 改变的次数
        print('------')
        while change_num < 1:  # 默认至少有一种数据增强生效
            if random.random() < self.crop_rate:  # 裁剪
                print('裁剪')
                change_num += 1
                img, bboxes = self._crop_img_bboxes(img, bboxes)

            if random.random() > self.rotation_rate:  # 旋转
                print('旋转')
                change_num += 1
                angle = random.uniform(-self.max_rotation_angle, self.max_rotation_angle)
                #                 angle = random.sample([90, 180, 270],1)[0]
                scale = random.uniform(0.7, 0.8)
                img, bboxes = self._rotate_img_bbox(img, bboxes, angle, scale)

            if random.random() < self.shift_rate:  # 平移
                print('平移')
                change_num += 1
                img, bboxes = self._shift_pic_bboxes(img, bboxes)

            if random.random() > self.change_light_rate:  # 改变亮度
                print('亮度')
                change_num += 1
                img = self._changeLight(img)

            if random.random() < self.add_noise_rate:  # 加噪声
                print('加噪声')
                change_num += 1
                img = self._addNoise(img)

            #             if random.random() < self.cutout_rate:  #cutout
            #                 print('cutout')
            #                 change_num += 1
            #                 img = self._cutout(img, bboxes, length=self.cut_out_length, n_holes=self.cut_out_holes, threshold=self.cut_out_threshold)

            #             if random.random() < self.flip_rate:    #翻转
            #                 print('翻转')
            #                 change_num += 1
            #                 img, bboxes = self._filp_pic_bboxes(img, bboxes)
            print('\n')
        # print('------')
        return img, bboxes



# -*- coding=utf-8 -*-
import xml.etree.ElementTree as ET
import xml.dom.minidom as DOC

# 从xml文件中提取bounding box信息, 格式为[[x_min, y_min, x_max, y_max, name]]
def parse_xml(xml_path):
    '''
    输入：
        xml_path: xml的文件路径
    输出：
        从xml文件中提取bounding box信息, 格式为[[x_min, y_min, x_max, y_max, name]]
    '''
    tree = ET.parse(xml_path)
    root = tree.getroot()
    objs = root.findall('object')
    coords = list()
    for ix, obj in enumerate(objs):
        name = obj.find('name').text
        box = obj.find('bndbox')
        x_min = int(float(box[0].text))
        y_min = int(float(box[1].text))
        x_max = int(float(box[2].text))
        y_max = int(float(box[3].text))
        coords.append([x_min, y_min, x_max, y_max, name])
    return coords


import os
from lxml.etree import Element, SubElement, tostring
from xml.dom.minidom import parseString
from PIL import Image


# 保存xml文件函数的核心实现，输入为图片名称image_name,分类category（一个列表，元素与bbox对应），bbox(一个列表，与分类对应)，保存路径save_dir ，通道数channel
def save_xml(image_name, category, bbox, file_dir='/home/xbw/wurenting/dataset_3/',
             save_dir='/home/xxx/voc_dataset/Annotations/', channel=3):
    file_path = file_dir
    img = Image.open(file_path + image_name)
    width = img.size[0]
    height = img.size[1]

    node_root = Element('annotation')

    node_folder = SubElement(node_root, 'folder')
    node_folder.text = 'VOC2007'

    node_filename = SubElement(node_root, 'filename')
    node_filename.text = image_name

    node_size = SubElement(node_root, 'size')
    node_width = SubElement(node_size, 'width')
    node_width.text = '%s' % width

    node_height = SubElement(node_size, 'height')
    node_height.text = '%s' % height

    node_depth = SubElement(node_size, 'depth')
    node_depth.text = '%s' % channel

    for i in range(len(bbox)):
        left, top, right, bottom = bbox[i][0], bbox[i][1], bbox[i][2], bbox[i][3]
        node_object = SubElement(node_root, 'object')
        node_name = SubElement(node_object, 'name')
        node_name.text = category[i]
        node_difficult = SubElement(node_object, 'difficult')
        node_difficult.text = '0'
        node_bndbox = SubElement(node_object, 'bndbox')
        node_xmin = SubElement(node_bndbox, 'xmin')
        node_xmin.text = '%s' % left
        node_ymin = SubElement(node_bndbox, 'ymin')
        node_ymin.text = '%s' % top
        node_xmax = SubElement(node_bndbox, 'xmax')
        node_xmax.text = '%s' % right
        node_ymax = SubElement(node_bndbox, 'ymax')
        node_ymax.text = '%s' % bottom

    xml = tostring(node_root, pretty_print=True)
    dom = parseString(xml)

    save_xml = os.path.join(save_dir, image_name.replace('jpg', 'xml'))
    with open(save_xml, 'wb') as f:
        f.write(xml)

    return


import shutil

need_aug_num = 1

dataAug = DataAugmentForObjectDetection()

source_pic_root_path = '/home/wl/import/last_data/VOCdevkit/VOC2007/JPEGImages/'
source_xml_root_path = '/home/wl/import/last_data/VOCdevkit/VOC2007/Annotations/'
img_save_path = '/home/wl/import/last_data/VOCdevkit/VOC2007/aug_img/'
save_dir = '/home/wl/import/last_data/VOCdevkit/VOC2007/aug_label/'

for parent, _, files in os.walk(source_pic_root_path):
    for file in files:
        cnt = 0
        while cnt < need_aug_num:
            pic_path = os.path.join(parent, file)
            xml_path = os.path.join(source_xml_root_path, file[:-4]+'.xml')
            coords = parse_xml(xml_path)        #解析得到box信息，格式为[[x_min,y_min,x_max,y_max,name]]
            coordss = [coord[:4] for coord in coords]
            labels = [coord[4] for coord in coords]
            img = cv2.imread(pic_path)
            show_pic(img, coordss,labels)    # 原图

            auged_img, auged_bboxes = dataAug.dataAugment(img, coordss)
            cnt += 1
            cv2.imwrite(img_save_path+file[:-4]+'_1.jpg',auged_img)
            save_xml(file[:-4]+'_1.jpg',labels,auged_bboxes,file_dir = img_save_path,save_dir=save_dir)
            show_pic(auged_img, auged_bboxes,labels)  # 强化后的图
cv2.destroyAllWindows()






#测试label是否正确
import shutil

# need_aug_num = 1
#
# dataAug = DataAugmentForObjectDetection()
#
# source_pic_root_path = '/home/xbw/darknet_boat/darknet/scripts/VOCdevkit/VOC2007/add_990/990_add/'
# source_xml_root_path = '/home/xbw/darknet_boat/darknet/scripts/VOCdevkit/VOC2007/add_990/990_xml/'
#
# for parent, _, files in os.walk(source_pic_root_path):
#     for file in files:
#         cnt = 0
#         while cnt < need_aug_num:
#             pic_path = os.path.join(parent, file)
#             xml_path = os.path.join(source_xml_root_path, file[:-4]+'.xml')
#             coords = parse_xml(xml_path)        #解析得到box信息，格式为[[x_min,y_min,x_max,y_max,name]]
#             coordss = [coord[:4] for coord in coords]
#             labels = [coord[4] for coord in coords]
#             img = cv2.imread(pic_path)
#             show_pic(img, coordss,labels)    # 原图
#             cnt += 1
# cv2.destroyAllWindows()

3、数据增强开源库imgaug介绍

安装

pip install imgaug

项目地址：https://github.com/aleju/imgaug

里面有相应的数据增强的操作

支持各类数据增强操作
- affine transformmations , perspective transformations, contrast changes, gaussian noise, dropout of regions, hue/saturation changes, cropping/paddimng, blurring,…
支持的各项视觉任务
- Image(uint8),Heatmaps(float32),Segmentation Maps(int),Mask(bool),Keypoints/Landmarks(int/float coordinates),Bounding Boxes(int/float coordinates),Polygons(int/float coordinates),Line Strings(int/float coordinates)

如何使用

1、组合一系列增强函数（下面是一个标准用法的例子）

augmenters.Sequential

import imgaug.augmenters as iaa
aug_seq = iaa.Sequential([
    # 仿射变换
    iaa.Affine(translate_px{"x":-40}), 
    # 高斯噪声的变换
    iaa.AdditiveGaussianNoise(scale=0.1*255),random_order=True 
])
# 下面是标准的使用方法
for batch_idx in range(100):
    images = load_batch(batch_idx)
    # 这里记住输入的图像必须是[N,C,H,W]格式的张量，或者图像数组
    images_aug = aug_seq(images = images)
    train_on_images(images_aug)


# 此外还有一些其他的例子
# jpeg压缩
aug = iaa.JpegCompression(compression=(70,99))
# 图像翻转
aug = iaa.Fliplr(0.5)

4、imaug开源库具体的几个例子:

(1)、简单的数据增强的例子

#coding:utf8
import numpy as np
import imgaug as ia
import imgaug.augmenters as iaa

ia.seed(1)

## 创建矩阵(16, 64, 64, 3).
images = np.array(
    [ia.quokka(size=(64, 64)) for _ in range(16)],
    dtype=np.uint8
)

seq = iaa.Sequential([
    iaa.Fliplr(0.5), ## 以0.5的概率进行水平翻转horizontal flips
    iaa.Crop(percent=(0, 0.1)), ## 随机裁剪random crops
    ## 对50%的图片进行高斯模糊，标准差参数取值0～0.5.
    iaa.Sometimes(
        0.5,
        iaa.GaussianBlur(sigma=(0, 0.5))
    ),
    ## 对50%的通道添加高斯噪声
    iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255), per_channel=0.5),
], random_order=True) ## 以上所有操作，使用随机顺序

images_aug = seq(images=images) ## 应用操作增强
grid_image = ia.draw_grid(images_aug,4)

import imageio
imageio.imwrite("example.jpg", grid_image)

（2)、关键点的数据增强

#coding:utf8
import imgaug as ia 
import imgaug.augmenters as iaa 
from imgaug.augmentables import Keypoint, KeypointsOnImage
ia.seed(1)

## 创建图片和关键点
image = ia.quokka(size=(256, 256))
kps = KeypointsOnImage([
    Keypoint(x=65, y=100),
    Keypoint(x=75, y=200),
    Keypoint(x=100, y=100),
    Keypoint(x=200, y=80)
], shape=image.shape)

seq = iaa.Sequential([
    iaa.Multiply((1.2, 1.5)), ## 改变亮度
    iaa.Affine(
        rotate=10,
        scale=(0.5, 0.7)
    )
])

## 对关键点和图片进行增强
image_aug, kps_aug = seq(image=image, keypoints=kps)

for i in range(len(kps.keypoints)):
    before = kps.keypoints[i]
    after = kps_aug.keypoints[i]
    print("Keypoint %d: (%.8f, %.8f) -> (%.8f, %.8f)" % (
        i, before.x, before.y, after.x, after.y)
)

image_before = kps.draw_on_image(image, size=7)
image_after = kps_aug.draw_on_image(image_aug, size=7)

import imageio
imageio.imwrite("before_keypoint.jpg", image_before)
imageio.imwrite("after_keypoint.jpg", image_after)

(3)、目标检测任务

#coding:utf8
import imgaug as ia
import imgaug.augmenters as iaa
from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage
import cv2

ia.seed(1)

image = ia.quokka(size=(256, 256))

bbs = BoundingBoxesOnImage([
    BoundingBox(x1=65, y1=100, x2=200, y2=150),
    BoundingBox(x1=150, y1=80, x2=200, y2=130)
], shape=image.shape)

seq = iaa.Sequential([
    iaa.Multiply((1.2, 1.5)),
    iaa.Affine(
        translate_px={"x": 40, "y": 60},
        scale=(0.5, 0.7)
    ) ## 对x和y方向分别平移40/60px，尺度缩放为原来的0-70%
])

# 对目标框和图片进行增强
image_aug, bbs_aug = seq(image=image, bounding_boxes=bbs)

for i in range(len(bbs.bounding_boxes)):
    before = bbs.bounding_boxes[i]
    after = bbs_aug.bounding_boxes[i]
    print("BB %d: (%.4f, %.4f, %.4f, %.4f) -> (%.4f, %.4f, %.4f, %.4f)" % (
        i,
        before.x1, before.y1, before.x2, before.y2,
        after.x1, after.y1, after.x2, after.y2)
    )

# 绘制增强前后框
image_before = bbs.draw_on_image(image, size=2)
image_after = bbs_aug.draw_on_image(image_aug, size=2, color=[0, 0, 255])

import imageio
imageio.imwrite("before_boundingbox00.jpg", image_before)
imageio.imwrite("after_boundingbox00.jpg", image_after)

此外需要注意的是，有的框会超出边界，我们需要得到有效框

#coding:utf8
import imgaug as ia
import imgaug.augmenters as iaa
from imgaug.augmentables.bbs import BoundingBox, BoundingBoxesOnImage

ia.seed(1)

image = ia.quokka(size=(256, 256))
bbs = BoundingBoxesOnImage([
    BoundingBox(x1=25, x2=75, y1=25, y2=75),
    BoundingBox(x1=100, x2=150, y1=25, y2=75),
    BoundingBox(x1=175, x2=225, y1=25, y2=75)
], shape=image.shape)

seq = iaa.Affine(translate_px={"x": 120})
image_aug, bbs_aug = seq(image=image, bounding_boxes=bbs)

## 边界填充，1个白色像素，(BY-1)个黑色像素
def pad(image, by):
    image_border1 = ia.pad(image, top=1, right=1, bottom=1, left=1,
                           mode="constant", cval=255)
    image_border2 = ia.pad(image_border1, top=by-1, right=by-1,
                           bottom=by-1, left=by-1,
                           mode="constant", cval=0)
    return image_border2

## 边框绘制函数
GREEN = [0, 255, 0]
ORANGE = [255, 140, 0]
RED = [255, 0, 0]
def draw_bbs(image, bbs, border):
    image_border = pad(image, border)
    for bb in bbs.bounding_boxes:
        if bb.is_fully_within_image(image.shape):
            color = GREEN
        elif bb.is_partly_within_image(image.shape):
            color = ORANGE
        else:
            color = RED
        image_border = bb.shift(left=border, top=border)\
                         .draw_on_image(image_border, size=2, color=color)

    return image_border

image_before = draw_bbs(image, bbs, 100)
image_after1 = draw_bbs(image_aug, bbs_aug, 100)
image_after2 = draw_bbs(image_aug, bbs_aug.remove_out_of_image(), 100)
image_after3 = draw_bbs(image_aug, bbs_aug.remove_out_of_image().clip_out_of_image(), 100)

import imageio
imageio.imwrite("normal_boundingbox.jpg", image_before)
imageio.imwrite("after1_boundingbox.jpg", image_after1)
imageio.imwrite("after2_boundingbox.jpg", image_after2)
imageio.imwrite("after3_boundingbox.jpg", image_after3)

(4)、分割数据增强

#coding:utf8
import imageio
import numpy as np
import imgaug as ia
import imgaug.augmenters as iaa
from imgaug.augmentables.segmaps import SegmentationMapsOnImage

ia.seed(1)

image = ia.quokka(size=(128, 128), extract="square")

segmap = np.zeros((128, 128, 1), dtype=np.int32)
segmap[28:71, 35:85, 0] = 1
segmap[10:25, 30:45, 0] = 2
segmap[10:25, 70:85, 0] = 3
segmap[10:110, 5:10, 0] = 4
segmap[118:123, 10:110, 0] = 5
segmap = SegmentationMapsOnImage(segmap, shape=image.shape)

# 数据增强操作
seq = iaa.Sequential([
    iaa.Dropout([0.05, 0.2]),      # 随机丢掉 5% or 20%的像素
    iaa.Sharpen((0.0, 1.0)),       # 锐化操作sharpen
    iaa.Affine(rotate=(-45, 45)),  # 旋转-45到45度
    iaa.ElasticTransformation(alpha=50, sigma=5)  # 应用ElasticTransformation操作
], random_order=True)

# 对分割掩膜和图片进行增强
images_aug = []
segmaps_aug = []
for _ in range(5):
    images_aug_i, segmaps_aug_i = seq(image=image, segmentation_maps=segmap)
    images_aug.append(images_aug_i)
    segmaps_aug.append(segmaps_aug_i)

cells = []
for image_aug, segmap_aug in zip(images_aug, segmaps_aug):
    cells.append(image)                                         # column 1
    cells.append(segmap.draw_on_image(image)[0])                # column 2
    cells.append(image_aug)                                     # column 3
    cells.append(segmap_aug.draw_on_image(image_aug)[0])        # column 4
    cells.append(segmap_aug.draw(size=image_aug.shape[:2])[0])  # column 5

grid_image = ia.draw_grid(cells, cols=5)
imageio.imwrite("example_segmaps.jpg", grid_image)

(5)、bbox计算iou的例子

import numpy as np
import imgaug as ia
from imgaug.augmentables.bbs import BoundingBox


ia.seed(1)

# Define image with two bounding boxes.
image = ia.quokka(size=(256, 256))
bb1 = BoundingBox(x1=50, x2=100, y1=25, y2=75)
bb2 = BoundingBox(x1=75, x2=125, y1=50, y2=100)

# Compute intersection, union and IoU value
# Intersection and union are both bounding boxes. They are here
# decreased/increased in size purely for better visualization.
bb_inters = bb1.intersection(bb2).extend(all_sides=-1)
bb_union = bb1.union(bb2).extend(all_sides=2)
iou = bb1.iou(bb2)

# Draw bounding boxes, intersection, union and IoU value on image.
image_bbs = np.copy(image)
image_bbs = bb1.draw_on_image(image_bbs, size=2, color=[0, 255, 0])
image_bbs = bb2.draw_on_image(image_bbs, size=2, color=[0, 255, 0])
image_bbs = bb_inters.draw_on_image(image_bbs, size=2, color=[255, 0, 0])
image_bbs = bb_union.draw_on_image(image_bbs, size=2, color=[0, 0, 255])
image_bbs = ia.draw_text(
    image_bbs, text="IoU=%.2f" % (iou,),
    x=bb_union.x2+10, y=bb_union.y1+bb_union.height//2,
    color=[255, 255, 255], size=13
)

你可能感兴趣的:(深度学习-数据处理trick,python)

部署一个简单的python服务器机智的frank 服务器部署
返回字符串的网页#引入需要的模块fromwsgiref.simple_serverimportmake_server#定义web接口函数defapplication(env,response):"""定义一个web接口函数,可以接收浏览器客户端发送的url地址,调用执行函数通过url地址调用执行函数:paramenv:环境,表示浏览器发送的请求环境:paramresponse:响应,表示服务器给浏
本地部署Deepseek：从零开始，打造你的私人AI助手！软件求生 #工作建议架构微服务云原生 java 开发语言
大家好，我是小米，一个31岁、热爱技术的“技术宅”。今天我要和大家分享一个超级酷炫的技术——本地部署Deepseek！如果你对AI感兴趣，或者想拥有一个属于自己的私人AI助手，那这篇文章绝对不容错过！Deepseek是什么？在开始之前，我们先来聊聊Deepseek到底是什么。简单来说，Deepseek是一个基于深度学习的AI模型，它可以帮助你完成各种任务，比如自然语言处理、图像识别、数据分析等等。
python考试必考知识点整理 chengxuyuan1213_ python javascript 数据库
Python考试通常会涵盖该语言的基础语法、数据结构、面向对象编程、文件操作、异常处理、模块与包的使用，以及一些高级特性。以下是对Python考试必考知识点的整理：一、基础语法变量与数据类型变量的定义和命名规则。常见的数据类型：整数、浮点数、字符串、布尔值、列表、元组、字典、集合等。数据类型的转换方法。运算符与表达式算术运算符：+、-、*、/、%、**等。比较运算符：==、!=、>、=、<=等。逻
Python 爬虫功能介绍 chengxuyuan1213_ python 爬虫网络爬虫
在当今互联网信息爆炸的时代，数据的获取与分析成为了各行各业不可或缺的一部分。Python，作为一种简洁、易读且功能强大的编程语言，凭借其丰富的库和框架，在数据抓取（即网络爬虫）领域展现了极大的优势。本文旨在介绍Python爬虫的基础功能，帮助初学者快速入门，理解爬虫的基本工作原理及常用技术。一、什么是网络爬虫网络爬虫（WebCrawler），又称网络蜘蛛（WebSpider）或网络机器人（WebR
服务器与环境配置——Ubuntu22.04杂记 Osiria 服务器 python ubuntu
服务器与环境配置——Ubuntu22.04杂记系统配置apt/apt-getProxy配置修改主机名用户权限文件复制一些容易出错的python库安装Pytorch3D(0.7.5)psbody-mesh4.0([link](https://github.com/MPI-IS/mesh))其它系统配置apt/apt-getProxy配置sudonano/etc/apt/apt.conf.d/prox
python部署教程 chengxuyuan1213_ python 开发语言
Python程序的部署涉及多个步骤，包括准备环境、打包程序、配置服务器等。以下是一个详细的Python部署教程：一、准备环境选择服务器：根据项目需求选择合适的服务器，可以是物理服务器或云服务器（如阿里云、腾讯云等）。确保服务器具有足够的硬件配置和性能，以应对工作负载和请求量。安装操作系统：服务器通常使用Linux操作系统，如Ubuntu、CentOS等。配置好服务器的网络、防火墙等基础设施。安装P
anaconda，Python，cuda,pytorch 下载安装三希 python pytorch 开发语言
以下是Anaconda、Python、CUDA和PyTorch的下载安装步骤：一、Anaconda下载访问Anaconda官方网站：DownloadAnacondaDistribution|Anaconda。根据您的操作系统（Windows、macOS或Linux）选择合适的版本进行下载。例如，对于Windows系统，选择对应的.exe安装文件。安装Windows：双击下载的.exe文件。按照安装
Python学习心得体会 yuetouwen python windows 开发语言
一、引言Python作为一种高级编程语言，以其简洁性、易读性和强大的功能在当今的编程领域中占据着重要地位。在学习Python的过程中，我不仅掌握了一种新的编程工具，更深入地理解了编程的思维方式和逻辑结构。二、语法基础与编程环境搭建Python的语法简洁明了，相较于其他编程语言，其代码更接近自然语言。例如，使用缩进来表示代码块，而不是像C或Java那样使用大括号。在学习初期，我快速掌握了变量的定义、
python求绝对值内置函数_Python之路【第十四篇】：Python的内置函数 weixin_39883440 python求绝对值内置函数
Python中自带了一些内置函数，如下图所示详细说明可以戳这里本篇中并不是一一介绍所有的内置函数，有一些内置函数会在以后的学习中慢慢接触到abs():求绝对值函数print(abs(-12345))all():用于判断可迭代对象是否每个参数都为真，都为真返回True,否则返回Falseprint(all([1,2,3,4,5]))print(all(['',123]))print(all([]))
C# dynamic 关键字使用详解鲤籽鲲 C#c#windows 开发语言
总目录前言dynamic是C#4.0引入的关键字，用于声明动态类型，允许在运行时解析类型和成员，而非编译时。它主要设计用于简化与动态语言（如Python、JavaScript）的交互、处理未知结构的数据（如JSON、XML）以及减少反射代码的复杂性。一、基本概念动态类型解析：编译器不会对dynamic变量进行类型检查，所有操作（方法调用、属性访问）在运行时解析。底层机制：由DLR（DynamicL
编程小白冲Kaggle每日打卡（8）--kaggle学堂：＜Python＞列表 AZmax01 编程小白冲Kaggle每日打卡 python windows 开发语言
Kaggle课程官方链接：Lists本专栏旨在Kaggle官方课程的汉化，让大家更方便地看懂。Lists¶列表以及你可以用它们做的事情。包括索引、切片和变异Python中的列表表示值的有序序列。以下是一个如何创建它们的示例：primes=[2,3,5,7]我们可以把其他类型的东西放在列表中：planets=['Mercury','Venus','Earth','Mars','Jupiter','S
LabVIEW无线齿轮监测系统 LabVIEW开发 LabVIEW开发案例 labview
LabVIEW的无线齿轮监测系统设计利用改进的天牛须算法优化支持向量机，实现了无线齿轮故障监测。通过LabVIEW软件和相关硬件，可以实现对齿轮箱振动信号的采集、传输和故障识别，集远程采集、数据库存储、邮件报警、数据处理于一体，具有良好的识别效果，能满足实际应用需求。项目背景：在工业生产中，齿轮是常见的动力传动元件，其正常运行对于设备的稳定性和可靠性至关重要。然而，齿轮在长时间运行过程中可能会出现
Python-集合基础的详细讲解何等样仁 python 数据结构
1.集合（set）的概述：Python中的集合与数学中集合（set）差不多一致，也是用于保存不重复的元素。它有可变集合（set）和不可变集合（frozenset）两种，在python中用到集合，多半是使用到了他的唯一性，或者是集合可加减性，不用怀疑。同样在自己写代码时如果要用到上面的也可以考虑来提高效率.2.集合操作：2.1集合的创建：Python中提供了两种集合创建方式，第一种是字面量形式的创建
【Python学习 / 6】面向对象编程（OOP）卜及中 Python基础 python 学习开发语言
文章目录⭐前言⭐一、类和对象：面向对象编程基础1.类（Class）类的组成：例子：定义一个简单的`Dog`类代码解析：2.对象（Object）对象的创建：3.三大特性：封装、继承和多态3.1封装（Encapsulation）封装的实现方式：示例：封装的应用解释：3.2继承（Inheritance）继承的优点：3.3多态（Polymorphism）示例：多态解释：4.`self`参数示例：解释：5.
如何把python 打包的exe 做成windows 服务运行 IT枫斗者编程学习 JAVA基础工作中实际总结 python windows 开发语言 java 前端 chrome
如何把python打包的exe做成windows服务运行将Python脚本打包的exe文件作为Windows服务运行，可以通过以下步骤实现。Windows服务是一种在后台运行的程序，通常不需要用户交互。以下是一个完整的指南：1.使用pywin32创建Windows服务pywin32是一个Python库，提供了与WindowsAPI的接口，可以用来创建和管理Windows服务。安装pywin32pi
Python 运维（四）：使用 PyInstaller 将 Python 程序打包成可执行文件水滴技术 Python入门核心技术 python 运维 PyInstaller 打包
本文收录于《Python入门核心技术》专栏，专栏总目录：点这里，订阅后可阅读专栏内所有文章。大家好，我是水滴~~PyInstaller是一款强大的Python打包工具，通过将Python程序转换成可执行文件，它简化了程序的分享和分发过程。本文从简介、安装、使用以及典型案例四个方面对PyInstaller进行了介绍。文章内容包含大量的示例代码，希望能够帮助新手同学快速入门。文章目录一、简介二、安装P
pythoninstaller打包多个py 文件_PyInstaller详解：将.py文件打包成exe文件 xrxiong 文件
1.安装本人系统是：Ubuntu16.04，python3.7。安装的pyinstaller==3.5,UPX==upx-3.95-amd64_linux.tar.xz。不安装对应版本的pyinstaller和UPX会一直打包不成功(亲测)。首先需要安装pyinstaller：pipinstallpyinstaller==3.5如果不安装UPX的话，会出现打包成exe失败：upxisnotavai
python函数名是变量_Python 变量做函数名的简单示例 weixin_39759995 python函数名是变量
这篇文章主要为大家详细介绍了Python变量做函数名的简单示例，具有一定的参考价值，可以用来参考一下。对python这个高级语言感兴趣的小伙伴，下面一起跟随512笔记的小编两巴掌来看看吧！PHP有变量函数这一用法(见http://www.php.net/manual/en/functions.variable-functions.php)即有一字符串变量现在想用这个变量的值做为某函数名来调用代码如
iOS开发书籍推荐 - 《高性能 iOS应用开发》（附带链接）胖虎1 开发经验分享 ios iOS开发 iOS 高性能 iOS高级 iOS进阶
引言在iOS开发的过程中，随着应用功能的增加和用户需求的提升，性能优化成为了不可忽视的一环。尤其是面对复杂的界面、庞大的数据处理以及不断增加的后台操作，如何确保应用的流畅性和响应速度，成为开发者的一大挑战。《高性能iOS应用开发》这本书正是为了解决这些问题，提供了深入的性能优化指导。这本书不仅涵盖了从应用启动到界面渲染、从内存管理到多线程处理的各个性能优化方面，还通过具体的案例和实用的技巧，帮助开
Python变量作用域250218 taoyong001 python
函数调用时，会创建自己的独有的作用域作用域是以函数为作用域的而且使用条件语句，可能让定义一些变量的代码运行，从而创建其内部变量，如果定义条件不成立，这些变量就不会被创建并被使用变量只要在函数中出现，就会一直有效到函数结束全局变量与局部变量在py文件中定义的变量叫做全局变量，一般要用大写global关键字默认情况下，在局部作用域内对全局变量只能读取，无法重新赋值如果在局部来修改全局变量就是相当于在局
债券到期收益率计算周纠纠金融科技-计算机相关 python 深度学习
债券到期收益率YTM计算公式Python实现计算公式Python实现importscipy.optimizeassoimportnumpyasnp'''计算债券到期收益率的函数PV:表示债券全价；C:票面年利息；k:年付息频率；y:到期收益率；M:债券面值;T:债券期限（年）'''defYTM(PV,C,k,M,T
【华为OD技术面试手撕真题】106、半径为 k 的子数组平均值 | 手撕真题+思路参考+代码解析（C & C++ & Java & Python & JS） KJ.JK 华为OD技术面试手撕真题华为od 面试 c语言华为od机试真题华为od机试E卷半径为 k 的子数组平均值
文章目录一、题目题目描述样例1二、代码参考C语言思路C语言代码C++语言思路C++代码Java语言思路Java代码Python语言思路Python代码JS语言思路JS代码作者：KJ.JK个人博客首页：KJ.JK专栏介绍：本专栏更新每年华为OD机试的高频手撕代码题，每个题目都会使用五种语言进行解答（C&C++&Java&Python&JS），思路分析都非常详细，争取实现最低的时间复杂度和高通过率，每
Python爬虫+数据分析：京东商品评论数据接口代码逐梦人爬虫技能晋升路线 python 爬虫数据分析
一、引言在电商领域，商品评论数据蕴含着丰富的信息，如消费者的满意度、产品的优缺点等。京东作为国内知名的电商平台，其商品评论数据对于商家进行市场调研、改进产品，以及消费者了解商品真实情况都具有重要价值。通过获取京东商品评论数据接口，我们可以方便、高效地获取这些有价值的信息，为后续的数据分析和决策提供支持。二、接口概述需要说明的是，京东并没有公开免费的商品评论数据接口供开发者随意使用。如果要获取京东商
python爬虫多线程原理代码逐梦人爬虫技能晋升路线 python 爬虫开发语言
多线程爬虫原理与优势在Python爬虫中，多线程是一种提升爬取效率的有效技术。在传统的单线程爬虫里，每次只能处理一个请求，只有等当前请求完成（包括发送请求、等待响应、解析数据）之后，才能开始下一个请求。而多线程爬虫可以让多个请求同时进行，在等待某个请求响应的时间里，CPU可以去处理其他请求，充分利用了CPU时间，大大提高了爬取效率。多线程爬虫的实现步骤1.引入必要的库importrequestsi
Python爬虫+数据分析：淘宝商品评论页面数据代码逐梦人爬虫技能晋升路线 python 爬虫数据分析
一、引言在电商平台中，商品评论包含了大量消费者的反馈信息，这些信息对于商家了解产品优缺点、改进服务，以及消费者做出购买决策都具有重要价值。淘宝作为国内知名的电商平台，其商品评论页面的数据蕴含着丰富的信息。通过Python爬虫技术获取这些数据，并运用数据分析方法进行处理和解读，可以挖掘出有价值的商业洞察。然而，需要注意的是，淘宝有严格的反爬机制，在进行爬虫操作时要遵守相关法律法规和平台规则，避免过度
蓝耘服务器与DeepSeek的结合：引领智能化时代的新突破 Lethehong 热点时事服务器运维 deepseek python
嗨，我是Lethehong！立志在坚不欲说，成功在久不在速欢迎关注：点赞⬆️留言收藏欢迎使用：小智初学计算机网页AI目录蓝耘服务器与DeepSeek的结合：引领智能化时代的新突破一、蓝耘服务器的技术优势1、高性能计算能力2、可扩展性与高效存储3、绿色节能设计二、DeepSeek：智能算法的引擎1、高效的深度学习训练与推理2、自适应学习与迁移学习3、多任务学习三、蓝耘服务器与DeepSeek结合的优
高级 Python Web 开发：利用 FastAPI 构建高效的服务端事件（SSE）实时数据推送 Switch616 Python Web javascript 负载均衡 python fastapi 前端开发语言运维
高级PythonWeb开发：利用FastAPI构建高效的服务端事件（SSE）实时数据推送目录服务端事件（SSE）概述与原理FastAPI实现SSE数据推送实时更新前端界面SSE的性能优化与并发控制SSE的安全性与认证机制1.服务端事件（SSE）概述与原理服务端事件（Server-SentEvents，简称SSE）是一种基于HTTP协议的技术，允许服务器通过单向通道向客户端推送实时更新。与WebSo
Python函数的函数名250217 taoyong001 python windows 开发语言
函数名其实就是一个变量，这个变量就是代指函数而已函数也可以被哈希，所以函数名也可以当作集合中的元素，也可作为字典的key值#将函数作为字典中的值，可以避免写大量的if...else语句deffun1():return123deffun2():return456deffun3():return789deffun4():return"abc"map_fun={"1":fun1,"2":fun2,"3"
pyinstaller通过spec文件打包py程序的步骤代码逐梦人爬虫技能晋升路线 python 开发语言
这篇文章主要介绍了pyinstaller通过spec文件打包py程序,本文通过实例代码给大家介绍的非常详细，对大家的学习或工作具有一定的参考借鉴价值，需要的朋友可以参考下pyinstaller是python的一个第三方模块，使用它可以将python程序打包为可执行文件，实现打包后的程序在没有python环境的机器上也可以运行。pyinstaller的安装方式可通过：pipinstallerpyin
用deepseek学大模型08-长短时记忆网络 (LSTM) wyg_031113 lstm 人工智能 rnn
deepseek.com从入门到精通长短时记忆网络(LSTM),着重介绍的目标函数，损失函数，梯度下降标量和矩阵形式的数学推导，pytorch真实能跑的代码案例以及模型,数据，模型应用场景和优缺点，及如何改进解决及改进方法数据推导。从入门到精通长短时记忆网络(LSTM)参考：长短时记忆网络（LSTM）在序列数据处理中的优缺点分析1.LSTM核心机制LSTM通过门控机制（遗忘门、输入门、输出门）和细
xml解析小猪猪08 xml
1、DOM解析的步奏准备工作： 1.创建DocumentBuilderFactory的对象 2.创建DocumentBuilder对象 3.通过DocumentBuilder对象的parse(String fileName)方法解析xml文件 4.通过Document的getElem
每个开发人员都需要了解的一个SQL技巧 brotherlamp linux linux视频 linux教程 linux自学 linux资料
对于数据过滤而言CHECK约束已经算是相当不错了。然而它仍存在一些缺陷，比如说它们是应用到表上面的，但有的时候你可能希望指定一条约束，而它只在特定条件下才生效。使用SQL标准的WITH CHECK OPTION子句就能完成这点，至少Oracle和SQL Server都实现了这个功能。下面是实现方式： CREATE TABLE books ( id &
Quartz——CronTrigger触发器 eksliang quartz CronTrigger
转载请出自出处：http://eksliang.iteye.com/blog/2208295 一.概述 CronTrigger 能够提供比 SimpleTrigger 更有具体实际意义的调度方案，调度规则基于 Cron 表达式，CronTrigger 支持日历相关的重复时间间隔（比如每月第一个周一执行），而不是简单的周期时间间隔。二.Cron表达式介绍 1）Cron表达式规则表 Quartz
Informatica基础 18289753290 Informatica Monitor manager workflow Designer
1. 1）PowerCenter Designer：设计开发环境，定义源及目标数据结构；设计转换规则，生成ETL映射。 2）Workflow Manager：合理地实现复杂的ETL工作流，基于时间，事件的作业调度 3）Workflow Monitor：监控Workflow和Session运行情况，生成日志和报告 4）Repository Manager：
linux下为程序创建启动和关闭的的sh文件，scrapyd为例酷的飞上天空 scrapy
对于一些未提供service管理的程序每次启动和关闭都要加上全部路径，想到可以做一个简单的启动和关闭控制的文件下面以scrapy启动server为例，文件名为run.sh： #端口号，根据此端口号确定PID PORT=6800 #启动命令所在目录 HOME='/home/jmscra/scrapy/' #查询出监听了PORT端口
人--自私与无私永夜-极光
今天上毛概课,老师提出一个问题--人是自私的还是无私的,根源是什么? 从客观的角度来看,人有自私的行为,也有无私的
Ubuntu安装NS-3 环境脚本随便小屋 ubuntu
将附件下载下来之后解压，将解压后的文件ns3environment.sh复制到下载目录下（其实放在哪里都可以，就是为了和我下面的命令相统一）。输入命令： sudo ./ns3environment.sh >>result 这样系统就自动安装ns3的环境，运行的结果在result文件中，如果提示 com
创业的简单感受 aijuans 创业的简单感受
2009年11月9日我进入a公司实习，2012年4月26日，我离开a公司，开始自己的创业之旅。今天是2012年5月30日，我忽然很想谈谈自己创业一个月的感受。当初离开边锋时，我就对自己说：“自己选择的路，就是跪着也要把他走完”，我也做好了心理准备，准备迎接一次次的困难。我这次走出来，不管成败
如何经营自己的独立人脉 aoyouzi 如何经营自己的独立人脉
独立人脉不是父母、亲戚的人脉，而是自己主动投入构造的人脉圈。“放长线，钓大鱼”，先行投入才能产生后续产出。现在几乎做所有的事情都需要人脉。以银行柜员为例，需要拉储户，而其本质就是社会人脉，就是社交！很多人都说，人脉我不行，因为我爸不行、我妈不行、我姨不行、我舅不行……我谁谁谁都不行，怎么能建立人脉？我这里说的人脉，是你的独立人脉。以一个普通的银行柜员
JSP基础百合不是茶 jsp 注释隐式对象
1,JSP语句的声明 <%! 声明 %> 　　声明：这个就是提供java代码声明变量、方法等的场所。表达式 <%= 表达式 %> 　　这个相当于赋值，可以在页面上显示表达式的结果，程序代码段/小型指令　<% 程序代码片段 %> 2,JSP的注释
web.xml之session-config、mime-mapping bijian1013 java web.xml servlet session-config mime-mapping
session-config 1.定义： <session-config> <session-timeout>20</session-timeout> </session-config> 2.作用：用于定义整个WEB站点session的有效期限，单位是分钟。 mime-mapping 1.定义： <mime-m
互联网开放平台（1） Bill_chen 互联网 qq 新浪微博百度腾讯
现在各互联网公司都推出了自己的开放平台供用户创造自己的应用，互联网的开放技术欣欣向荣，自己总结如下： 1.淘宝开放平台(TOP) 网址：http://open.taobao.com/ 依赖淘宝强大的电子商务数据，将淘宝内部业务数据作为API开放出去，同时将外部ISV的应用引入进来。目前TOP的三条主线： TOP访问网站：open.taobao.com ISV后台：my.open.ta
【MongoDB学习笔记九】MongoDB索引 bit1129 mongodb
索引可以在任意列上建立索引索引的构造和使用与传统关系型数据库几乎一样,适用于Oracle的索引优化技巧也适用于Mongodb 使用索引可以加快查询,但同时会降低修改,插入等的性能内嵌文档照样可以建立使用索引测试数据 var p1 = { "name":"Jack", "age&q
JDBC常用API之外的总结白糖_ jdbc
做JAVA的人玩JDBC肯定已经很熟练了，像DriverManager、Connection、ResultSet、Statement这些基本类大家肯定很常用啦，我不赘述那些诸如注册JDBC驱动、创建连接、获取数据集的API了，在这我介绍一些写框架时常用的API，大家共同学习吧。 ResultSetMetaData获取ResultSet对象的元数据信息
apache VelocityEngine使用记录 bozch VelocityEngine
VelocityEngine是一个模板引擎，能够基于模板生成指定的文件代码。使用方法如下： VelocityEngine engine = new VelocityEngine();// 定义模板引擎 Properties properties = new Properties();// 模板引擎属
编程之美-快速找出故障机器 bylijinnan 编程之美
package beautyOfCoding; import java.util.Arrays; public class TheLostID { /*编程之美假设一个机器仅存储一个标号为ID的记录，假设机器总量在10亿以下且ID是小于10亿的整数，假设每份数据保存两个备份，这样就有两个机器存储了同样的数据。 1.假设在某个时间得到一个数据文件ID的列表，是
关于Java中redirect与forward的区别 chenbowen00 java servlet
在Servlet中两种实现： forward方式：request.getRequestDispatcher(“/somePage.jsp”).forward(request, response); redirect方式：response.sendRedirect(“/somePage.jsp”); forward是服务器内部重定向，程序收到请求后重新定向到另一个程序，客户机并不知
[信号与系统]人体最关键的两个信号节点 comsci 系统
如果把人体看做是一个带生物磁场的导体,那么这个导体有两个很重要的节点,第一个在头部,中医的名称叫做百汇穴, 另外一个节点在腰部,中医的名称叫做命门如果要保护自己的脑部磁场不受到外界有害信号的攻击,最简单的
oracle 存储过程执行权限 daizj oracle 存储过程权限执行者调用者
在数据库系统中存储过程是必不可少的利器，存储过程是预先编译好的为实现一个复杂功能的一段Sql语句集合。它的优点我就不多说了，说一下我碰到的问题吧。我在项目开发的过程中需要用存储过程来实现一个功能，其中涉及到判断一张表是否已经建立，没有建立就由存储过程来建立这张表。 CREATE OR REPLACE PROCEDURE TestProc IS fla
为mysql数据库建立索引 dengkane mysql 性能索引
前些时候，一位颇高级的程序员居然问我什么叫做索引，令我感到十分的惊奇，我想这绝不会是沧海一粟，因为有成千上万的开发者（可能大部分是使用MySQL的）都没有受过有关数据库的正规培训，尽管他们都为客户做过一些开发，但却对如何为数据库建立适当的索引所知较少，因此我起了写一篇相关文章的念头。最普通的情况，是为出现在where子句的字段建一个索引。为方便讲述，我们先建立一个如下的表。
学习C语言常见误区如何看懂一个程序如何掌握一个程序以及几个小题目示例 dcj3sjt126com c 算法
如果看懂一个程序，分三步 1、流程 2、每个语句的功能 3、试数如何学习一些小算法的程序尝试自己去编程解决它，大部分人都自己无法解决如果解决不了就看答案关键是把答案看懂，这个是要花很大的精力，也是我们学习的重点看懂之后尝试自己去修改程序，并且知道修改之后程序的不同输出结果的含义照着答案去敲调试错误
centos6.3安装php5.4报错 dcj3sjt126com centos6
报错内容如下: Resolving Dependencies --> Running transaction check ---> Package php54w.x86_64 0:5.4.38-1.w6 will be installed --> Processing Dependency: php54w-common(x86-64) = 5.4.38-1.w6 for
JSONP请求 flyer0126 jsonp
使用jsonp不能发起POST请求。 It is not possible to make a JSONP POST request. JSONP works by creating a <script> tag that executes Javascript from a different domain; it is not pos
Spring Security（03）——核心类简介 234390216 Authentication
核心类简介目录 1.1 Authentication 1.2 SecurityContextHolder 1.3 AuthenticationManager和AuthenticationProvider 1.3.1 &nb
在CentOS上部署JAVA服务 java--hhf java jdk centos Java服务
本文将介绍如何在CentOS上运行Java Web服务，其中将包括如何搭建JAVA运行环境、如何开启端口号、如何使得服务在命令执行窗口关闭后依旧运行第一步：卸载旧Linux自带的JDK ①查看本机JDK版本 java -version 结果如下 java version "1.6.0"
oracle、sqlserver、mysql常用函数对比[to_char、to_number、to_date] ldzyz007 oracle mysql SQL Server
oracle &n
记Protocol Oriented Programming in Swift of WWDC 2015 ningandjin protocol WWDC 2015 Swift2.0
其实最先朋友让我就这个题目写篇文章的时候，我是拒绝的，因为觉得苹果就是在炒冷饭，把已经流行了数十年的OOP中的“面向接口编程”还拿来讲，看完整个Session之后呢，虽然还是觉得在炒冷饭，但是毕竟还是加了蛋的，有些东西还是值得说说的。通常谈到面向接口编程，其主要作用是把系统设计和具体实现分离开，让系统的每个部分都可以在不影响别的部分的情况下，改变自身的具体实现。接口的设计就反映了系统
搭建 CentOS 6 服务器(15) - Keepalived、HAProxy、LVS rensanning keepalived
（一）Keepalived （1）安装 # cd /usr/local/src # wget http://www.keepalived.org/software/keepalived-1.2.15.tar.gz # tar zxvf keepalived-1.2.15.tar.gz # cd keepalived-1.2.15 # ./configure # make &a
ORACLE数据库SCN和时间的互相转换 tomcat_oracle oracle sql
SCN（System Change Number 简称 SCN）是当Oracle数据库更新后，由DBMS自动维护去累积递增的一个数字，可以理解成ORACLE数据库的时间戳，从ORACLE 10G开始，提供了函数可以实现SCN和时间进行相互转换；　　用途：在进行数据库的还原和利用数据库的闪回功能时，进行SCN和时间的转换就变的非常必要了；　　操作方法：　　1、通过dbms_f
Spring MVC 方法注解拦截器 xp9802 spring mvc
应用场景，在方法级别对本次调用进行鉴权，如api接口中有个用户唯一标示accessToken,对于有accessToken的每次请求可以在方法加一个拦截器，获得本次请求的用户，存放到request或者session域。 python中，之前在python flask中可以使用装饰器来对方法进行预处理，进行权限处理先看一个实例,使用@access_required拦截： ?