pan_jinquan

Python常用的模块的使用技巧

Python常用模块的使用技巧

1.Python配置说明

（1）Python注释说明

（2）函数说明

（3）ipynb文件转.py文件

（4）Python计算运行时间

（5）镜像加速方法

（6）代码分析工具 Pylint安装+pycharm下的配置

（7）Python添加环境路径和搜索路径的方法

（8）conda常用命令

2.常用的模块

2.1 numpy模块：

(1)矩阵的拼接和分割,奇偶项分割数据

(2)按照列进行排序

(3)提取符合条件的某行某列

(4)查找符合条件的向量

(5)打乱顺序

2.2 pickle模块

2.3 random.shuffle产生固定种子

2.4 zip()与zip(*) 函数：

2.5 map、for快速遍历方法：

2.6 glob模块

2.7 os模块

2.8 判断图像文件为空和文件不存，文件过小

2.9 保存多维array数组的方法

2.10读取txt数据的方法

2.11 pandas模块

（1）文件数据拼接

（2）DataFrame

Pandas DataFrame数据的增、删、改、查

2.12 csv模块

2.13 logging模块

3. 数据预处理

3.1 数据（图像）分块处理

3.2 读取图片和显示

（1）matplotlib.image、PIL.Image、cv2图像读取模块

（2）将 numpy 数组转换为 PIL 图片：

（3）python中PIL.Image和OpenCV图像格式相互转换

（4）matplotlib显示阻塞问题

（5）matplotlib绘制矩形框

3.3 one-hot独热编码

3.4 循环产生batch数据:

3.5 统计元素个数和种类

3.6 python 字典(dict)按键和值排序

3.7 自定义排序sorted

3.8 加载yml配置文件

3.9 移动、复制、重命名文件

3.10 产生batch_size的数据

4.常用的图像预处理和文件处理包

4.1 image_processing.py

4.2 file_processing.py

4.3 Debug文件

4.5 NMS-GPU和CPU

1.Python配置说明

（1）Python注释说明

在pyCharm中File->Setting->Editor->File and Code Templates->Python Script:

# -*-coding: utf-8 -*-
"""
    @Project: ${PROJECT_NAME}
    @File   : ${NAME}.py
    @Author : panjq
    @E-mail : [email protected]
    @Date   : ${YEAR}-${MONTH}-${DAY} ${HOUR}:${MINUTE}:${SECOND}
"""

（2）函数说明

def my_fun(para1,para2):
    '''
    函数功能实现简介
    :param para1: 输入参数说明，类型
    :param para2: 输入参数说明,类型
    :return: 返回内容，类型
    '''

（3）ipynb文件转.py文件

 jupyter nbconvert --to script demo.ipynb

（4）Python计算运行时间

import datetime

def RUN_TIME(deta_time):
    '''
    返回毫秒,deta_time.seconds获得秒数=1000ms，deta_time.microseconds获得微妙数=1/1000ms
    :param deta_time: ms
    :return:
    '''
    time_=deta_time.seconds * 1000 + deta_time.microseconds / 1000.0
    return time_

T0 = datetime.datetime.now()
# do something 
T1 = datetime.datetime.now()

print("rum time:{}".format(RUN_TIME(T1-T0)))

（5）镜像加速方法

TUNA 还提供了 Anaconda 仓库的镜像，运行以下命令:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

conda config --set show_channel_urls yes

设置上述镜像后，瞬间提速，但该镜像仅限该命令窗口有效

windows 下在用户目录下面创建pip，然后创建pip.ini文件，把阿里的源复制进去：

[global]

trusted-host=mirrors.aliyun.com

index-url = http://mirrors.aliyun.com/pypi/simple/

Linux下，修改 ~/.pip/pip.conf (没有就创建一个文件夹及文件。文件夹要加“.”，表示是隐藏文件夹)

内容如下：

[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
[install]
trusted-host=mirrors.aliyun.com

windows下，直接在user目录中创建一个pip目录，如：C:\Users\xx\pip，新建文件pip.ini。内容同上。

临时的方法:pip时加上"-i https://mirrors.aliyun.com/pypi/simple/":,如

pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/

（6）代码分析工具 Pylint安装+pycharm下的配置

https://www.cnblogs.com/yaoliping/archive/2018/10/10/9767834.html

（7）Python添加环境路径和搜索路径的方法

添加环境路径：

# 添加graphviz环境路径
import os
os.environ["PATH"] += os.pathsep + 'D:/ProgramData/Anaconda3/envs/pytorch-py36/Library/bin/graphviz/'

搜索路径：

import sys
import os

# 打印当前python搜索模块的路径集
print(sys.path)
# 打印当前文件所在路径
print("os.path.dirname(__file__):", os.path.dirname(__file__))
print("os.getcwd():              ", os.getcwd())  # get current work directory：cwd:获得当前工作目录

'''添加相关的路径
sys.path.append(‘你的模块的名称’)。
sys.path.insert(0,’模块的名称’)
'''
# 先添加image_processing所在目录路径
sys.path.append("F:/project/python-learning-notes/utils")
# sys.path.append(os.getcwd())
# 再倒入该包名
import image_processing

#
os.environ["PATH"] += os.pathsep + 'D:/ProgramData/Anaconda3/envs/pytorch-py36/Library/bin/graphviz/'

image_path = "F:/project/python-learning-notes/dataset/test_image/1.jpg"
image = image_processing.read_image(image_path)
image_processing.cv_show_image("image", image)

（8）conda常用命令

列举当前所有环境：conda info --envs 或者conda env list

生成一个environment.yml文件：conda env export > environment.yml

根据environment.yml文件安装该环境：conda env create -f environment.yml

列举当前活跃环境下的所有包：conda list

参数某个环境：conda remove --name your_env_name --all

2.常用的模块

2.1 numpy模块：

(1)矩阵的拼接和分割,奇偶项分割数据

# 产生5*2的矩阵数据
data1=np.arange(0,10)
data1=data1.reshape([5,2])

# 矩阵拼接
y = np.concatenate([data1, data2], 0)

# 矩阵拼接
def cat_labels_indexMat(labels,indexMat):
    indexMat_labels = np.concatenate([labels,indexMat], axis=1)
    return indexMat_labels

# 矩阵分割
def split_labels_indexMat(indexMat_labels,label_index=0):
    labels = indexMat_labels[:, 0:label_index+1]     # 第一列是labels
    indexMat = indexMat_labels[:, label_index+1:]  # 其余是indexMat
    return labels, indexMat


def split_data(data):
    '''
    按照奇偶项分割数据
    :param data: 
    :return: 
    '''
    data1 = data[0::2]
    data2 = data[1::2]
    return data1,data2
if __name__=='__main__':
    data = np.arange(0, 20)
    data = data.reshape([10, 2])
    data1,data2=split_data(data)
    print("embeddings:{}".format(data))
    print("embeddings1:{}".format(data1))
    print("embeddings2:{}".format(data2))

(2)按照列进行排序

pair_issame = pair_issame[np.lexsort(pair_issame.T)]#按最后一列进行排序

(3)提取符合条件的某行某列

假设有数据:pair_issame:

如果想提取第三列的为"1"的数据,可以这样:

pair_issame_1 = pair_issame[pair_issame[:, -1] == "1", :]  # 筛选数组

(4)查找符合条件的向量

import numpy as np


def matching_data_vecror(data, vector):
    '''
    从data中匹配vector向量，查找出现vector的index,如：
    data = [[1., 0., 0.],[0., 0., 0.],[2., 0., 0.],
            [0., 0., 0.],[0., 3., 0.],[0., 0., 4.]]
    # 查找data中出现[0, 0, 0]的index
    data = np.asarray(data)
    vector=[0, 0, 0]
    index =find_index(data,vector)
    print(index)
    >>[False  True False  True False False]
    # 实现去除data数组中元素为[0, 0, 0]的行向量
    pair_issame_1 = data[~index, :]  # 筛选数组
    :param data:
    :param vector:
    :return:
    '''
    # index = (data[:, 0] == 0) & (data[:, 1] == 0) & (data[:, 2] == 0)
    row_nums = len(data)
    clo_nums = len(vector)
    index = np.asarray([True] * row_nums)
    for i in range(clo_nums):
        index = index & (data[:, i] == vector[i])
    return index


def set_mat_vecror(data, index, vector):
    '''
    实现将data指定index位置的数据设置为vector
    # 实现将大于阈值分数的point，设置为vector = [10, 10]
    point = [[0., 0.], [1., 1.], [2., 2.],
             [3., 3.], [4., 4.], [5., 5.]]
    point = np.asarray(point) # 每个数据点
    score = np.array([0.7, 0.2, 0.3, 0.4, 0.5, 0.6])# 每个数据点的分数
    score_th=0.5
    index = np.where(score > score_th) # 获得大于阈值分数的所有下标
    vector = [10, 10]                  # 将大于阈值的数据设置为vector
    out = set_mat_vecror(point, index, vector)
    :param data:
    :param index:
    :param vector:
    :return:
    '''
    data[index, :] = vector
    return data

(5)打乱顺序

https://blog.csdn.net/Song_Lynn/article/details/82817647

    per = np.random.permutation(pair_issame_1.shape[0])  # 打乱后的行号
    pair_issame_1 = pair_issame_0[per, :]  # 获取打乱后的数据

2.2 pickle模块

pickle可以存储什么类型的数据呢？

所有python支持的原生类型：布尔值，整数，浮点数，复数，字符串，字节，None。

由任何原生类型组成的列表，元组，字典和集合。

函数，类，类的实例

import pickle
import numpy as np

def save_data(data, file):
    with open(file, 'wb') as f:
        pickle.dump(data, f)


def load_data(file):
    with open(file, 'rb') as f:
        data = pickle.load(f)
    return data
if __name__ == "__main__":
    data1 = ['aa', 'bb', 'cc'] # list
    data1=np.asarray(data1)       # ndarray

    data_path = "data.pk"
    save_data(data1, data_path)
    data2 = load_data(data_path)
    print(data1)
    print(data2)

2.3 random.shuffle产生固定种子

    files_list=...
    labels_list=...
    shuffle=True
    if shuffle:
        # seeds = random.randint(0,len(files_list)) #产生一个随机数种子
        seeds = 100 # 固定种子,只要seed的值一样，后续生成的随机数都一样
        random.seed(seeds)
        random.shuffle(files_list)
        random.seed(seeds)
        random.shuffle(labels_list)

**2.4 zip()与zip(*) 函数：**

zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，然后返回由这些元组组成的列表。如果各个迭代器的元素个数不一致，则返回列表长度与最短的对象相同，利用 * 号操作符，可以将元组解压为列表。

zip 方法在 Python 2 和 Python 3 中的不同：在 Python 3.x 中为了减少内存，zip() 返回的是一个对象。如需展示列表，需手动 list() 转换。

a = [1,2,3]
b = [4,5,6]
c = [4,5,6,7,8]
zipped = zip(a,b)     # 打包为元组的列表
# 结果：[(1, 4), (2, 5), (3, 6)]
zip(a,c)              # 元素个数与最短的列表一致
# 结果：[(1, 4), (2, 5), (3, 6)]
zip(*zipped)          # 与 zip 相反，*zipped 可理解为解压，返回二维矩阵式
# 结果：[(1, 2, 3), (4, 5, 6)]

2.5 map、for快速遍历方法：

# 假设files_list为：
files_list=['../training_data/test\\0.txt', '../training_data/test\\1.txt', '../training_data/test\\2.txt', '../training_data/test\\3.txt', '../training_data/test\\4.txt', '../training_data/test\\5.txt', '../training_data/test\\6.txt']

# 下面的三个方法都是现实获得files_list的文件名
files_nemes1=list(map(lambda s: os.path.basename(s),files_list))
files_nemes2=list(os.path.basename(i)for i in files_list)
files_nemes3=[os.path.basename(i)for i in files_list]

2.6 glob模块

glob模块是最简单的模块之一，内容非常少。用它可以查找符合特定规则的文件路径名。跟使用windows下的文件搜索差不多。查找文件只用到三个匹配符："*", "?", "[]"。"*"匹配0个或多个字符；"?"匹配单个字符；"[]"匹配指定范围内的字符，如：[0-9]匹配数字。

import glob  
#获取指定目录下的所有图片  
print glob.glob(r"E:\Picture\*\*.jpg")  
#获取上级目录的所有.py文件  
print glob.glob(r'../*.py') #相对路径

对于遍历指定目录的jpg图片,可以这样：

# -*- coding:utf-8 -*-
import glob
#遍历指定目录下的jpg图片
image_path="/home/ubuntu/TFProject/view-finding-network/test_images/*.jpg"
for per_path in glob.glob(image_path):
    print(per_path)

若想遍历多个格式的文件，可以这样：

# 遍历'jpg','png','jpeg'的图片
image_format=['jpg','png','jpeg']#图片格式
image_dir='./test_image'         #图片目录
image_list=[]
for format in image_format:
    path=image_dir+'/*.'+format
    image_list.extend(glob.glob(path))
print(image_list)

2.7 os模块

import os
os.getcwd()#获得当前工作目录
os.path.abspath('.')#获得当前工作目录
os.path.abspath('..')#获得当前工作目录的父目录
os.path.abspath(os.curdir)#获得当前工作目录
os.path.join(os.getcwd(),'filename')#获取当前目录，并组合成新目录
os.path.exists(path)#判断文件是否存在
os.path.isfile(path)#如果path是一个存在的文件，返回True。否则返回False。 
os.path.basename('path/to/test.jpg')#获得路径下的文件名:test.jpg
os.path.getsize(path) #返回文件大小，如果文件不存在就返回错误
path=os.path.dirname('path/to/test.jpg')#获得路径：path/to
os.sep#当前操作系统的路径分隔符，Linux/UNIX是‘/’,Windows是‘\\’
dirname='path/to/test.jpg'.split(os.sep)[-1]#获得当前文件夹的名称“test.jpg”
dirname='path/to/test.jpg'.split(os.sep)[-2]#获得当前文件夹的名称“to”


# 删除该目录下的所有文件
def delete_dir_file(dir_path):
    ls = os.listdir(dir_path)
    for i in ls:
        c_path = os.path.join(dir_path, i)
        if os.path.isdir(c_path):
            delete_dir_file(c_path)
        else:
            os.remove(c_path)
# 若目录不存在，则创建新的目录（只能创建一级目录）
if not os.path.exists(out_dir):
   os.mkdir(out_dir)

# 创建多级目录
if not os.path.exists(segment_out_name):
   os.makedirs(segment_out_dir)

# 删除该目录下的所有文件
delete_dir_file(out_dir)
# 或者：
shutil.rmtree(out_dir)  # delete output folder

下面是实现：【1】getFilePathList：获取file_dir目录下，所有文本路径，包括子目录文件，【2】get_files_list：获得file_dir目录下，后缀名为postfix所有文件列表，包括子目录，【3】gen_files_labels：获取files_dir路径下所有文件路径，以及labels,其中labels用子级文件名表示

# coding: utf-8
import os
import os.path
import pandas as pd
 
def getFilePathList(file_dir):
    '''
    获取file_dir目录下，所有文本路径，包括子目录文件
    :param rootDir:
    :return:
    '''
    filePath_list = []
    for walk in os.walk(file_dir):
        part_filePath_list = [os.path.join(walk[0], file) for file in walk[2]]
        filePath_list.extend(part_filePath_list)
    return filePath_list
 
def get_files_list(file_dir,postfix='ALL'):
    '''
    获得file_dir目录下，后缀名为postfix所有文件列表，包括子目录
    :param file_dir:
    :param postfix:
    :return:
    '''
    postfix=postfix.split('.')[-1]
    file_list=[]
    filePath_list = getFilePathList(file_dir)
    if postfix=='ALL':
        file_list=filePath_list
    else:
        for file in filePath_list:
            basename=os.path.basename(file)  # 获得路径下的文件名
            postfix_name=basename.split('.')[-1]
            if postfix_name==postfix:
                file_list.append(file)
    file_list.sort()
    return file_list
 
def gen_files_labels(files_dir):
    '''
    获取files_dir路径下所有文件路径，以及labels,其中labels用子级文件名表示
    files_dir目录下，同一类别的文件放一个文件夹，其labels即为文件的名
    :param files_dir:
    :return:filePath_list所有文件的路径,label_list对应的labels
    '''
    filePath_list = getFilePathList(files_dir)
    print("files nums:{}".format(len(filePath_list)))
    # 获取所有样本标签
    label_list = []
    for filePath in filePath_list:
        label = filePath.split(os.sep)[-2]
        label_list.append(label)
 
    labels_set=list(set(label_list))
    print("labels:{}".format(labels_set))
 
    # 标签统计计数
    print(pd.value_counts(label_list))
    return filePath_list,label_list

if __name__=='__main__':
    file_dir='JPEGImages'
    file_list=get_files_list(file_dir)
    for file in file_list:
        print(file)

实现遍历dir目录下,所有文件(包含子文件夹的文件)

# coding: utf-8
import os
import os.path

def get_files_list(dir):
    '''
    实现遍历dir目录下,所有文件(包含子文件夹的文件)
    :param dir:指定文件夹目录
    :return:包含所有文件的列表->list
    '''
    # parent:父目录, filenames:该目录下所有文件夹,filenames:该目录下的文件名
    files_list=[]
    for parent, dirnames, filenames in os.walk(dir):
        for filename in filenames:
            # print("parent is: " + parent)
            # print("filename is: " + filename)
            # print(os.path.join(parent, filename))  # 输出rootdir路径下所有文件（包含子文件）信息
            files_list.append([os.path.join(parent, filename)])
    return files_list
if __name__=='__main__':
    dir = 'images'
    files_list=get_files_list(dir)
    print(files_list)

下面是一个封装好的get_input_list()函数,path是文件夹,则遍历所有png,jpg,jpeg等图像文件, path是txt文件路径,则读取txt中保存的文件列表(不要出现多余一个的空行),path是单个图片文件:path/to/1.png。

# -*-coding: utf-8 -*-
"""
    @Project: hdrnet
    @File   : my_test.py
    @Author : panjq
    @E-mail : [email protected]
    @Date   : 2018-08-28 14:30:51
"""
import os
import logging
import re

logging.basicConfig(format="[%(process)d] %(levelname)s %(filename)s:%(lineno)s | %(message)s")
log = logging.getLogger("train")
log.setLevel(logging.INFO)

def get_input_list(path):
  '''
  返回所有图片的路径
  :param path:单张图片的路径,或文件夹,或者txt文件
  :return:
  '''
  regex = re.compile(".*.(png|jpeg|jpg|tif|tiff)")
  # path是文件夹,则遍历所有png,jpg,jpeg等图像文件
  # path/to
  if os.path.isdir(path):
    inputs = os.listdir(path)
    inputs = [os.path.join(path, f) for f in inputs if regex.match(f)]
    log.info("Directory input {}, with {} images".format(path, len(inputs)))

  # path是txt文件路径,则读取txt中保存的文件列表(不要出现多余一个的空行)
  # path/to/filelist.txt
  elif os.path.splitext(path)[-1] == ".txt":
    dirname = os.path.dirname(path)
    with open(path, 'r') as fid:
      inputs = [l.strip() for l in fid.readlines()]
    inputs = [os.path.join(dirname, im) for im in inputs]
    log.info("Filelist input {}, with {} images".format(path, len(inputs)))
  # path是单个图片文件:path/to/1.png
  elif regex.match(path):
    inputs = [path]
    log.info("Single input {}".format(path))
  return inputs

if __name__ == '__main__':
    path='dataset/filelist.txt';
    result=get_input_list(path);
    print(result);

2.8 判断图像文件为空和文件不存，文件过小

def isValidImage(images_list,sizeTh=1000,isRemove=False):
    ''' 去除不存的文件和文件过小的文件列表
    :param images_list:
    :param sizeTh: 文件大小阈值,单位：字节B，默认1000B
    :param isRemove: 是否在硬盘上删除被损坏的原文件
    :return:
    '''
    i=0
    while i

 
   
  2.9 保存多维array数组的方法 
     由于np.savetxt()不能直接保存三维以上的数组，因此需要转为向量的形式来保存 
  import numpy as np

arr1 = np.zeros((3,4,5), dtype='int16')     # 创建3*4*5全0三维数组
print("维度：",np.shape(arr1))
arr1[0,:,:]=0
arr1[1,:,:]=1
arr1[2,:,:]=2
print("arr1=",arr1)
# 由于savetxt不能保存三维以上的数组，因此需要转为向量来保存
vector=arr1.reshape((-1,1))
np.savetxt("data.txt", vector)

data= np.loadtxt("data.txt")
print("data=",data)
arr2=data.reshape(arr1.shape)
print("arr2=",arr2) 
   
  2.10读取txt数据的方法 
  这是封装好的txt读写模块，这里输入和输出的数据都是list列表： 
  # -*-coding: utf-8 -*-
"""
 @Project: TxtStorage
 @File   : TxtStorage.py
 @Author : panjq
 @E-mail : [email protected]
 @Date   : 2018-07-12 17:32:47
"""
from numpy import *

class TxtStorage:
    # def __init__(self):
    def write_txt(self, content, filename, mode='w'):
        """保存txt数据
        :param content:需要保存的数据,type->list
        :param filename:文件名
        :param mode:读写模式:'w' or 'a'
        :return: void
        """
        with open(filename, mode) as f:
            for line in content:
                str_line=""
                for col,data in enumerate(line):
                    if not col == len(line) - 1:
                        # 以空格作为分隔符
                        str_line=str_line+str(data)+" "
                    else:
                        # 每行最后一个数据用换行符“\n”
                        str_line=str_line+str(data)+"\n"
                f.write(str_line)


    def read_txt(self, fileName):
        """读取txt数据函数
        :param filename:文件名
        :return: txt的数据列表
        :rtype: list
        Python中有三个去除头尾字符、空白符的函数，它们依次为:
        strip： 用来去除头尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
        lstrip：用来去除开头字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
        rstrip：用来去除结尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
        注意：这些函数都只会删除头和尾的字符，中间的不会删除。
        """
        txtData=[]
        with open(fileName, 'r') as f:
            lines = f.readlines()
            for line in lines:
                lineData = line.rstrip().split(" ")
                data=[]
                for l in lineData:
                    if self.is_int(l): # isdigit() 方法检测字符串是否只由数字组成,只能判断整数
                        data.append(int(l))
                    elif self.is_float(l):#判断是否为小数
                        data.append(float(l))
                    else:
                        data.append(l)
                txtData.append(data)
        return txtData

    def is_int(self,str):
        # 判断是否为整数
        try:
            x = int(str)
            return isinstance(x, int)
        except ValueError:
            return False

    def is_float(self,str):
        # 判断是否为整数和小数
        try:
            x = float(str)
            return isinstance(x, float)
        except ValueError:
            return False


if __name__ == '__main__':
    txt_filename = 'test.txt'
    w_data = [['1.jpg', 'dog', 200, 300,1.0], ['2.jpg', 'dog', 20, 30,-2]]
    print("w_data=",w_data)
    txt_str = TxtStorage()
    txt_str.write_txt(w_data, txt_filename, mode='w')
    r_data = txt_str.read_txt(txt_filename)
    print('r_data=',r_data)

 
  一个读取TXT文本数据的常用操作： 
  # -*-coding: utf-8 -*-
"""
 @Project: TxtStorage
 @File   : TxtStorage.py
 @Author : panjq
 @E-mail : [email protected]
 @Date   : 2018-07-12 17:32:47
"""
from numpy import *

def write_txt(content, filename, mode='w'):
    """保存txt数据
    :param content:需要保存的数据,type->list
    :param filename:文件名
    :param mode:读写模式:'w' or 'a'
    :return: void
    """
    with open(filename, mode) as f:
        for line in content:
            str_line = ""
            for col, data in enumerate(line):
                if not col == len(line) - 1:
                    # 以空格作为分隔符
                    str_line = str_line + str(data) + " "
                else:
                    # 每行最后一个数据用换行符“\n”
                    str_line = str_line + str(data) + "\n"
            f.write(str_line)

def read_txt(fileName):
    """读取txt数据函数
    :param filename:文件名
    :return: txt的数据列表
    :rtype: list
    Python中有三个去除头尾字符、空白符的函数，它们依次为:
    strip： 用来去除头尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
    lstrip：用来去除开头字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
    rstrip：用来去除结尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
    注意：这些函数都只会删除头和尾的字符，中间的不会删除。
    """
    txtData = []
    with open(fileName, 'r') as f:
        lines = f.readlines()
        for line in lines:
            lineData = line.rstrip().split(" ")
            data = []
            for l in lineData:
                if is_int(l):  # isdigit() 方法检测字符串是否只由数字组成,只能判断整数
                    data.append(int(l))
                elif is_float(l):  # 判断是否为小数
                    data.append(float(l))
                else:
                    data.append(l)
            txtData.append(data)
    return txtData

def is_int(str):
    # 判断是否为整数
    try:
        x = int(str)
        return isinstance(x, int)
    except ValueError:
        return False

def is_float(str):
    # 判断是否为整数和小数
    try:
        x = float(str)
        return isinstance(x, float)
    except ValueError:
        return False

def merge_list(data1,data2):
    '''
    将两个list进行合并
    :param data1:
    :param data2:
    :return:返回合并后的list
    '''
    if not len(data1)==len(data2):
        return
    all_data=[]
    for d1,d2 in zip(data1,data2):
        all_data.append(d1+d2)
    return all_data

def split_list(data,split_index=1):
    '''
    将data切分成两部分
    :param data: list
    :param split_index: 切分的位置
    :return:
    '''
    data1=[]
    data2=[]
    for d in data:
        d1=d[0:split_index]
        d2=d[split_index:]
        data1.append(d1)
        data2.append(d2)
    return data1,data2

if __name__ == '__main__':
    txt_filename = 'test.txt'
    w_data = [['1.jpg', 'dog', 200, 300, 1.0], ['2.jpg', 'dog', 20, 30, -2]]
    print("w_data=", w_data)
    write_txt(w_data, txt_filename, mode='w')
    r_data = read_txt(txt_filename)
    print('r_data=', r_data)
    data1,data2=split_list(w_data)
    mer_data=merge_list(data1,data2)
    print('mer_data=', mer_data)



 
  读取以下txt文件，可使用以下方法： 
  test_image/dog/1.jpg 0 11
test_image/dog/2.jpg 0 12
test_image/dog/3.jpg 0 13
test_image/dog/4.jpg 0 14
test_image/cat/1.jpg 1 15
test_image/cat/2.jpg 1 16
test_image/cat/3.jpg 1 17
test_image/cat/4.jpg 1 18 
  def load_image_labels(test_files):
    '''
    载图txt文件，文件中每行为一个图片信息，且以空格隔开：图像路径 标签1 标签1，如：test_image/1.jpg 0 2
    :param test_files:
    :return:
    '''
    images_list=[]
    labels_list=[]
    with open(test_files) as f:
        lines = f.readlines()
        for line in lines:
            #rstrip：用来去除结尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
            content=line.rstrip().split(' ')
            name=content[0]
            labels=[]
            for value in content[1:]:
                labels.append(float(value))
            images_list.append(name)
            labels_list.append(labels)
    return images_list,labels_list 
   
  2.11 pandas模块 
  （1）文件数据拼接 
  假设有'data1.txt', 'data2.txt', 'data3.txt'数据： 
  #'data1.txt'
1.jpg 11
2.jpg 12
3.jpg 13
#'data2.txt'
1.jpg 110
2.jpg 120
3.jpg 130
#'data3.txt'
1.jpg 1100
2.jpg 1200
3.jpg 1300 
  需要拼接成： 
  1.jpg 11 110 1100
2.jpg 12 120 1200
3.jpg 13 130 1300
 
  实现代码： 
  # coding: utf-8
import pandas as pd

def concat_data(page,save_path):
    pd_data=[]
    for i in range(len(page)):
        content=pd.read_csv(page[i], dtype=str, delim_whitespace=True, header=None)
        if i==0:
            pd_data=pd.concat([content], axis=1)
        else:# 每一列数据拼接
            pd_data=pd.concat([pd_data,content.iloc[:,1]], axis=1)
    pd_data.to_csv(save_path, index=False, sep=' ', header=None)

if __name__=='__main__':
    txt_path = ['data1.txt', 'data2.txt', 'data3.txt']
    out_path = 'all_data.txt'
    concat_data(txt_path,out_path) 
  （2）DataFrame 
  import pandas as pd
import numpy as np

def print_info(class_name,labels):
    # index =range(len(class_name))+1
    index=np.arange(0,len(class_name))+1
    columns = ['class_name', 'labels']
    content = np.array([class_name, labels]).T
    df = pd.DataFrame(content, index=index, columns=columns)  # 生成6行4列位置
    print(df)  # 输出6行4列的表格


class_name=['C1','C2','C3']
labels=[100,200,300]
print_info(class_name,labels)
 
   
  Pandas DataFrame数据的增、删、改、查 
   https://blog.csdn.net/zhangchuang601/article/details/79583551
   
  import pandas as pd
import numpy as np

df = pd.DataFrame(data = [['tom1','f',22],['tom2','f',22],['tom3','m',21]],index = [1,2,3],columns = ['name','sex','age'])#测试数据。 
   
    
     
       
     name 
     sex 
     age 
     
    
    
     
     1 
     tom1 
     f 
     22 
     
     
     2 
     tom2 
     f 
     22 
     
     
     3 
     tom3 
     m 
     21 
     
    
   
  citys = ['shenzhen1','shenzhen2','shenzhen3']
df.insert(2,'city',citys) #在第2列，加上column名称为city，值为citys的数值。
jobs = ['student','teacher','teacher']
df['job'] = jobs #默认在df最后一列加上column名称为job，值为jobs的数据。
df.loc[:,'salary'] = ['1k','2k','2k'] #在df最后一列加上column名称为salary，值为等号右边数据。
df 
    
   
    
     
       
     name 
     sex 
     city 
     age 
     job 
     salary 
     
    
    
     
     1 
     tom1 
     f 
     shenzhen1 
     22 
     student 
     1k 
     
     
     2 
     tom2 
     f 
     shenzhen2 
     22 
     teacher 
     2k 
     
     
     3 
     tom3 
     m 
     shenzhen3 
     21 
     teacher 
     2k 
     
    
   
  #若df中没有index为“4”的这一行的话，该行代码作用是往df中加一行index为“4”，值为等号右边值的数据。
#若df中已经有index为“4”的这一行，则该行代码作用是把df中index为“4”的这一行修改为等号右边数据。
df.loc[4] = ['tom4','m','shenzhen4',24,"engineer",'3k']
df 
   
    
     
       
     name 
     sex 
     city 
     age 
     job 
     salary 
     
    
    
     
     1 
     tom1 
     f 
     shenzhen1 
     22 
     student 
     1k 
     
     
     2 
     tom2 
     f 
     shenzhen2 
     22 
     teacher 
     2k 
     
     
     3 
     tom3 
     m 
     shenzhen3 
     21 
     teacher 
     2k 
     
     
     4 
     tom4 
     m 
     shenzhen4 
     24 
     engineer 
     3k 
     
    
   
  # 按照age的值进行排序
df=df.sort_values(by=["age"],ascending=False)
df 
   
    
     
       
     name 
     sex 
     city 
     age 
     job 
     salary 
     
    
    
     
     4 
     tom4 
     m 
     shenzhen4 
     24 
     engineer 
     3k 
     
     
     1 
     tom1 
     f 
     shenzhen1 
     22 
     student 
     1k 
     
     
     2 
     tom2 
     f 
     shenzhen2 
     22 
     teacher 
     2k 
     
     
     3 
     tom3 
     m 
     shenzhen3 
     21 
     teacher 
      2k
  
     
    
   
   
  2.12 csv模块 
      使用csv模块读取csv文件的数据 
   
  # -*- coding:utf-8 -*-
import csv
csv_path='test.csv'
with open(csv_path,'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for item in reader:#遍历全部元素
        print(item)

with open(csv_path, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for item in reader:  # 遍历全部元素
        print(item['filename'],item['class'],item.get('height'),item.get('width')) 
   运行结果： 
  {'filename': 'test01.jpg', 'height': '638', 'class': 'dog', 'width': '486'}
{'filename': 'test02.jpg', 'height': '954', 'class': 'person', 'width': '726'}
test01.jpg dog 638 486
test02.jpg person 954 726 
  读写过程： 
  import csv

csv_path = 'test.csv'
#写csv
data=["1.jpg",200,300,'dog']
with open(csv_path, 'w+',newline='') as csv_file:
    # headers = [k for k in dictionaries[0]]
    headers=['filename','width','height', 'class']
    print(headers)
    writer = csv.DictWriter(csv_file, fieldnames=headers)
    writer.writeheader()
    dictionary={'filename': data[0],
                'width': data[1],
                'height': data[2],
                'class': data[3],
                }
    writer.writerow(dictionary)
    print(dictionary)

#读csv
with open(csv_path, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for item in reader:  # 遍历全部元素
        print(item)

with open(csv_path, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for item in reader:  # 遍历全部元素
        print(item['filename'], item['class'], item.get('height'), item.get('width'))
 
  2.13 logging模块 
      import logging
    # level级别：debug、info、warning、error以及critical
    # logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    logging.basicConfig(stream=sys.stdout, level=logging.DEBUG,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    logger = logging.getLogger(__name__)
    logger.debug("----1----")
    logger.info("----2----")
    logger.warning("----3----")
    logger.error("----4----") 
   
  3. 数据预处理 
  3.1 数据（图像）分块处理 
  import numpy as np

def split_cell(mat,cell=(3,3),stepsize=(1,1)):
    '''
    :param mat:输入单通道的图像数据（可能有误，需要验证）
    :param cell:块大小
    :param stepsize: 步长stepsize
 
   
  3.2 读取图片和显示 
       Python中读取图片和显示图片的方式很多，绝大部分图像处理模块读取图片的通道是RGB格式，只有opencv-python模块读取的图片的BGR格式，如果采用其他模块显示opencv读取的图片，需要转换通道顺序，方法也比较简单，即： 
  import cv2
import matplotlib.pyplot as plt

temp_img=cv2.imread(image_path) #默认:BGR(不是RGB),uint8,[0,255],ndarry()
cv2.imshow("opencv-python",temp_img5)
cv2.waitKey(0)
# b, g, r = cv2.split(temp_img5)# 将BGR转为RGB格式
# img = cv2.merge([r, g, b])
# 推荐使用cv2.COLOR_BGR2RGB->将BGR转为RGB格式
img = cv2.cvtColor(temp_img5, cv2.COLOR_BGR2RGB)

plt.imshow(img) # 显示图片
plt.axis('off') # 不显示坐标轴
plt.show()
 
  （1）matplotlib.image、PIL.Image、cv2图像读取模块 
  # coding: utf-8
'''
  在Caffe中,彩色图像的通道要求是BGR格式，输入数据是float32类型,范围[0,255],
  对每一层shape=(batch_size, channel_dim, height, width)。
  [1]caffe的训练/测试prototxt文件,一般在数据层设置:cale:0.00392156885937,即1/255.0,即将数据归一化到[0,1]
  [2]当输入数据为RGB图像,float32,[0,1],则需要转换:
    --transformer.set_raw_scale('data',255)       # 缩放至0~255
    --transformer.set_channel_swap('data',(2,1,0))# 将RGB变换到BGR
  [3]当输入数据是RGB图像,int8类型,[0,255],则输入数据之前必须乘以*1.0转换为float32
    --transformer.set_raw_scale('data',1.0)       # 数据不用缩放了
    --transformer.set_channel_swap('data',(2,1,0))#将RGB变换到BGR
    --通道：img = img.transpose(2, 0, 1) #通道由[h,w,c]->[c,h,w]
  [4]在Python所有读取图片的模块,其图像格式都是shape=[height, width, channels],
     比较另类的是,opencv-python读取的图片的BGR(caffe通道要求是BGR格式),而其他模块是RGB格式
'''

import numpy as np
import matplotlib.pyplot as plt

image_path = 'test_image/C0.jpg'#C0.jpg是高h=400,宽w=200
# 1.caffe
import caffe

img1 = caffe.io.load_image(image_path)  # 默认:RGB,float32,[0-1],ndarry,shape=[400,200,3]

# 2.skimage
import skimage.io

img2 = skimage.io.imread(image_path)  # 默认:RGB,uint8,[0,255],ndarry,shape=[400,200,3]
# img2=img2/255.0

# 3.matplotlib
import matplotlib.image

img3 = matplotlib.image.imread(image_path)  # 默认:RGB,uint8,[0,255],ndarry,shape=[400,200,3]

# 4.PIL
from PIL import Image

temp_img4 = Image.open(image_path)  # 默认:RGB,uint8,[0,255],
# temp_img4.show() #会调用系统自定的图片查看器显示图片
img4 = np.array(temp_img4)  # 转为ndarry类型,shape=[400,200,3]

# 5.opencv
import cv2

temp_img5 = cv2.imread(image_path)  # 默认:BGR(不是RGB),uint8,[0,255],ndarry,shape=[400,200,3]
# cv2.imshow("opencv-python",temp_img5)
# cv2.waitKey(0)
# b, g, r = cv2.split(temp_img5)# 将BGR转为RGB格式
# img5 = cv2.merge([r, g, b])
# 推荐使用cv2.COLOR_BGR2RGB->将BGR转为RGB格式
img5 = cv2.cvtColor(temp_img5, cv2.COLOR_BGR2RGB)
img6 = img5.transpose(2, 0, 1) #通道由[h,w,c]->[c,h,w]

# 以上ndarry类型图像数据都可以用下面的方式直接显示
plt.imshow(img5)  # 显示图片
plt.axis('off')  # 不显示坐标轴
plt.show()
 
      封装好的图像读取和保存模块： 
  import matplotlib.pyplot as plt
import cv2

def show_image(title, image):
    '''
    显示图片
    :param title: 图像标题
    :param image: 图像的数据
    :return:
    '''
    # plt.figure("show_image")
    # print(image.dtype)
    plt.imshow(image)
    plt.axis('on')  # 关掉坐标轴为 off
    plt.title(title)  # 图像题目
    plt.show()

def show_image_rect(win_name, image, rect):
    plt.figure()
    plt.title(win_name)
    plt.imshow(image)
    rect =plt.Rectangle((rect[0], rect[1]), rect[2], rect[3], linewidth=2, edgecolor='r', facecolor='none')
    plt.gca().add_patch(rect)
    plt.show()

def read_image(filename, resize_height, resize_width,normalization=False):
    '''
    读取图片数据,默认返回的是uint8,[0,255]
    :param filename:
    :param resize_height:
    :param resize_width:
    :param normalization:是否归一化到[0.,1.0]
    :return: 返回的图片数据
    '''
 
    bgr_image = cv2.imread(filename)
    if len(bgr_image.shape)==2:#若是灰度图则转为三通道
        print("Warning:gray image",filename)
        bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)
 
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)#将BGR转为RGB
    # show_image(filename,rgb_image)
    # rgb_image=Image.open(filename)
    if resize_height>0 and resize_width>0:
        rgb_image=cv2.resize(rgb_image,(resize_width,resize_height))
    rgb_image=np.asanyarray(rgb_image)
    if normalization:
        # 不能写成:rgb_image=rgb_image/255
        rgb_image=rgb_image/255.0
    # show_image("src resize image",image)
    return rgb_image

def save_image(image_path,image):
    plt.imsave(image_path,image)
 
  （2）将 numpy 数组转换为 PIL 图片： 
  这里采用 matplotlib.image 读入图片数组，注意这里读入的数组是 float32 型的，范围是 0-1，而 PIL.Image 数据是 uinit8 型的，范围是0-255，所以要进行转换： 
  import matplotlib.image as mpimg
from PIL import Image
lena = mpimg.imread('lena.png') # 这里读入的数据是 float32 型的，范围是0-1
im = Image.fromarray(np.uinit8(lena*255))
im.show() 
  （3）python中PIL.Image和OpenCV图像格式相互转换 
  PIL.Image转换成OpenCV格式： 
  import cv2
from PIL import Image
import numpy
 
image = Image.open("plane.jpg")
image.show()
img = cv2.cvtColor(numpy.asarray(image),cv2.COLOR_RGB2BGR)
cv2.imshow("OpenCV",img)
cv2.waitKey() 
  OpenCV转换成PIL.Image格式： 
  import cv2
from PIL import Image
import numpy
 
img = cv2.imread("plane.jpg")
cv2.imshow("OpenCV",img)
image = Image.fromarray(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))
image.show()
cv2.waitKey() 
  判断图像数据是否是OpenCV格式： 
  isinstance(img, np.ndarray) 
  （4）matplotlib显示阻塞问题 
  https://blog.csdn.net/wonengguwozai/article/details/79686062 
      下面这个例子讲的是如何像matlab一样同时打开多个窗口显示图片或线条进行比较，同时也是在脚本中开启交互模式后图像一闪而过的解决办法： 
      import matplotlib.pyplot as plt
    plt.ion()    # 打开交互模式

    # 同时打开两个窗口显示图片
    plt.figure()
    plt.imshow(image1)

    plt.figure()
    plt.imshow(image2)
    
    plt.ioff()# 显示前关掉交互模式,避免一闪而过
    plt.show()
 
  （5）matplotlib绘制矩形框 
  import matplotlib.pyplot as plt

def show_image(win_name, image, rect):
    plt.figure()
    plt.title(win_name)
    plt.imshow(image)
    rect =plt.Rectangle((rect[0], rect[1]), rect[2], rect[3], linewidth=2, edgecolor='r', facecolor='none')
    plt.gca().add_patch(rect)
    plt.show() 
   
  3.3 one-hot独热编码 
  import os
import numpy as np
from sklearn import preprocessing

def  gen_data_labels(label_list,ont_hot=True):
'''
label_list:输入labels ->list
'''

    # 将labels转为整数编码
    # labels_set=list(set(label_list))
    # labels=[]
    # for label in  label_list:
    #     for k in range(len(labels_set)):
    #         if label==labels_set[k]:
    #             labels+=[k]
    #             break
    # labels = np.asarray(labels)

    # 也可以用下面的方法：将labels转为整数编码
    labelEncoder = preprocessing.LabelEncoder()
    labels = labelEncoder.fit_transform(label_list)
    labels_set = labelEncoder.classes_

    for i in range(len(labels_set)):
        print("labels:{}->{}".format(labels_set[i],i))

    # 是否进行独热编码
    if ont_hot:
        labels_nums=len(labels_set)
        labels = labels.reshape(len(labels), 1)
        onehot_encoder = preprocessing.OneHotEncoder(sparse=False,categories=[range(labels_nums)])

        onehot_encoder = preprocessing.OneHotEncoder(sparse=False,categories='auto')
        labels = onehot_encoder.fit_transform(labels)
    return labels 
  3.4 循环产生batch数据: 
  TXT文本： 
  1.jpg 1 11
2.jpg 2 12
3.jpg 3 13
4.jpg 4 14
5.jpg 5 15
6.jpg 6 16
7.jpg 7 17
8.jpg 8 18 
  # -*-coding: utf-8 -*-
"""
    @Project: LSTM
    @File   : create_batch_data.py
    @Author : panjq
    @E-mail : [email protected]
    @Date   : 2018-10-27 18:20:15
"""
import math
import random
import os
import glob
import numpy as np

def get_list_batch(inputs, batch_size=None, shuffle=False):
    '''
    循环产生batch数据
    :param inputs: list数据
    :param batch_size: batch大小
    :param shuffle: 是否打乱inputs数据
    :return: 返回一个batch数据
    '''
    if shuffle:
        random.shuffle(inputs)
    while True:
        batch_inouts = inputs[0:batch_size]
        inputs=inputs[batch_size:] + inputs[:batch_size]# 循环移位，以便产生下一个batch
        yield batch_inouts

def get_data_batch(inputs, batch_size=None, shuffle=False):
    '''
    循环产生batch数据
    :param inputs: list数据
    :param batch_size: batch大小
    :param shuffle: 是否打乱inputs数据
    :return: 返回一个batch数据
    '''
    # rows,cols=inputs.shape
    rows=len(inputs)
    indices =list(range(rows))
    if shuffle:
        random.shuffle(indices )
    while True:
        batch_indices = indices[0:batch_size]
        indices= indices [batch_size:] + indices[:batch_size]  # 循环移位，以便产生下一个batch
        batch_data=find_list(batch_indices,inputs)
        # batch_data=find_array(batch_indices,inputs)
        yield batch_data

def find_list(indices,data):
    out=[]
    for i in indices:
        out=out+[data[i]]
    return out

def find_array(indices,data):
    rows,cols=data.shape
    out = np.zeros((len(indices), cols))
    for i,index in enumerate(indices):
        out[i]=data[index]
    return out

def load_file_list(text_dir):
    text_dir = os.path.join(text_dir, '*.txt')
    text_list = glob.glob(text_dir)
    return text_list

def get_next_batch(batch):
    return batch.__next__()

def load_image_labels(test_files):
    '''
    载图txt文件，文件中每行为一个图片信息，且以空格隔开：图像路径 标签1 标签1，如：test_image/1.jpg 0 2
    :param test_files:
    :return:
    '''
    images_list=[]
    labels_list=[]
    with open(test_files) as f:
        lines = f.readlines()
        for line in lines:
            #rstrip：用来去除结尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
            content=line.rstrip().split(' ')
            name=content[0]
            labels=[]
            for value in content[1:]:
                labels.append(float(value))
            images_list.append(name)
            labels_list.append(labels)
    return images_list,labels_list

if __name__ == '__main__':
    filename='./training_data/train.txt'
    images_list, labels_list=load_image_labels(filename)
    # inputs = np.reshape(np.arange(8*3), (8,3))
    iter = 10  # 迭代10次，每次输出5个
    batch = get_data_batch(images_list, batch_size=3, shuffle=False)
    for i in range(iter):
        print('**************************')
        # train_batch=batch.__next__()
        batch_images=get_next_batch(batch)
        print(batch_images)


 
   
  3.5 统计元素个数和种类 
  label_list=['星座', '星座', '财经', '财经', '财经', '教育', '教育', '教育', ]
set1 = set(label_list) # set1 ={'财经', '教育', '星座'},set集合中不允许重复元素出现
set2 = np.unique(label_list)# set2=['教育' '星座' '财经']

# 若要输出对应元素的个数：
from collections import Counter
arr = [1, 2, 3, 3, 2, 1, 0, 2]
result = {}
for i in set(arr):
    result[i] = arr.count(i)
print(result)


# 更加简单的方法：
import pandas as pd
print(pd.value_counts(label_list)) 
   
  3.6 python 字典(dict)按键和值排序 
  python 字典（dict）的特点就是无序的，按照键（key）来提取相应值（value），如果我们需要字典按值排序的话，那可以用下面的方法来进行：
1 .下面的是按照value的值从大到小的顺序来排序 
  dic = {'a':31, 'bc':5, 'c':3, 'asd':4, 'aa':74, 'd':0}
dict= sorted(dic.items(), key=lambda d:d[1], reverse = True)
print dict 
   
   输出的结果：
 [('aa', 74), ('a', 31), ('bc', 5), ('asd', 4), ('c', 3), ('d', 0)] 
   
  下面我们分解下代码
 print dic.items() 得到[(键，值)]的列表。
 然后用sorted方法，通过key这个参数，指定排序是按照value，也就是第一个元素d[1的值来排序。reverse = True表示是需要翻转的，默认是从小到大，翻转的话，那就是从大到小。
2 .对字典按键（key）排序： 
  dic = {'a':31, 'bc':5, 'c':3, 'asd':4, 'aa':74, 'd':0}
dict= sorted(dic.items(), key=lambda d:d[0]) d[0]表示字典的键
print dict 
   
  3.7 自定义排序sorted 
      下面my_sort函数，将根据labels的相同的个数进行排序，把labels相同的个数多的样本，排在前面 
  # -*-coding: utf-8 -*-
"""
    @Project: IntelligentManufacture
    @File   : statistic_analysis.py
    @Author : panjq
    @E-mail : [email protected]
    @Date   : 2019-02-15 13:47:58
"""
import pandas as pd
import numpy as np
import functools

def print_cluster_info(title,labels_id, labels,columns = ['labels_id', 'labels']):
    index= np.arange(0, len(labels_id)) + 1
    content = np.array([labels_id, labels]).T
    df = pd.DataFrame(content, index=index, columns=columns)  # 生成6行4列位置
    print('*************************************************')
    print("{}{}".format(title,df))

def print_cluster_container(title,cluster_container,columns = ['labels_id', 'labels']):
    '''
    :param cluster_container:type:list[tupe()]
    :param columns:
    :return:
    '''
    labels_id, labels=zip(*cluster_container)
    labels_id=list(labels_id)
    labels=list(labels)
    print_cluster_info(title,labels_id, labels, columns=columns)


def sort_cluster_container(cluster_container):
    '''
   自定义排序：将根据labels的相同的个数进行排序，把labels相同的个数多的样本，排在前面
   :param labels_id:
   :param labels:
   :return:
   '''
    # labels_id=list(cluster_container.keys())
    # labels=list(cluster_container.values())
    labels_id, labels=zip(*cluster_container)
    labels_id=list(labels_id)
    labels=list(labels)
    # 求每个labels的样本个数value_counts_dict
    value_counts_dict = {}
    labels_set = set(labels)
    for i in labels_set:
        value_counts_dict[i] = labels.count(i)

    def cmp(a, b):
        # 降序
        a_key, a_value = a
        b_key, b_value = b
        a_count = value_counts_dict[a_value]
        b_count = value_counts_dict[b_value]
        if a_count > b_count:  # 个数多的放在前面
            return -1
        elif (a_count == b_count) and (a_value > b_value):  # 当个数相同时，则value大的放在前面
            return -1
        else:
            return 1
    out = sorted(cluster_container, key=functools.cmp_to_key(cmp))
    return out

if __name__=='__main__':
    labels_id=["image0",'image1',"image2","image3","image4","image5","image6"]
    labels=[0.0,1.0,2.0,1.0,1.0,2.0,3.0]
    # labels=['L0','L1','L2','L1','L1','L2',"L3"]
    cluster_container=list(zip(labels_id, labels))
    print("cluster_container:{}".format(cluster_container))
    print_cluster_container("排序前:\n",cluster_container, columns=['labels_id', 'labels'])
    out=sort_cluster_container(cluster_container)
    print_cluster_container("排序后:\n",out, columns=['labels_id', 'labels'])

 
  结果：  
   
   
  3.8 加载yml配置文件 
      假设config.yml的配置文件如下： 
   
   ## Basic config
 batch_size: 2
 learning_rate: 0.001
 epoch: 1000 
   ## reset image size
 height: 128
 width: 128 
   
  利用Python可以如下加载数据： 
  import yaml

class Dict2Obj:
    '''
    dict转类对象
    '''

    def __init__(self, bokeyuan):
        self.__dict__.update(bokeyuan)


def load_config_file(file):
    with open(file, 'r') as f:
        data_dict = yaml.load(f,Loader=yaml.FullLoader)
        data_dict = Dict2Obj(data_dict)
    return data_dict


if __name__=="__main__":
    config_file='../config/config.yml'
    para=load_config_file(config_file)
    print("batch_size:{}".format(para.batch_size))
    print("learning_rate:{}".format(para.learning_rate))
    print("epoch:{}".format(para.epoch)) 
   运行输出结果： 
   
   batch_size:2
 learning_rate:0.001
 epoch:1000 
   
  3.9 移动、复制、重命名文件  
  # -*- coding: utf-8 -*-
#!/usr/bin/python
#test_copyfile.py

import os,shutil
def rename(image_list):
    for name in  image_list:
        cut_len=len('_cropped.jpg')
        newName = name[:-cut_len]+'.jpg'
        print(name)
        print(newName)
        os.rename(name, newName)

def mymovefile(srcfile,dstfile):
    if not os.path.isfile(srcfile):
        print "%s not exist!"%(srcfile)
    else:
        fpath,fname=os.path.split(dstfile)    #分离文件名和路径
        if not os.path.exists(fpath):
            os.makedirs(fpath)                #创建路径
        shutil.move(srcfile,dstfile)          #移动文件
        print "move %s -> %s"%( srcfile,dstfile)

def mycopyfile(srcfile,dstfile):
    if not os.path.isfile(srcfile):
        print "%s not exist!"%(srcfile)
    else:
        fpath,fname=os.path.split(dstfile)    #分离文件名和路径
        if not os.path.exists(fpath):
            os.makedirs(fpath)                #创建路径
        shutil.copyfile(srcfile,dstfile)      #复制文件
        print "copy %s -> %s"%( srcfile,dstfile)

srcfile='/Users/xxx/git/project1/test.sh'
dstfile='/Users/xxx/tmp/tmp/1/test.sh'

mymovefile(srcfile,dstfile) 
  3.10 产生batch_size的数据 
  def get_batch(image_list, batch_size):
    sample_num = len(image_list)
    batch_num = math.ceil(sample_num / batch_size)

    for i in range(batch_num):
        start = i * batch_size
        end = min((i + 1) * batch_size, sample_num)
        batch_image = image_list[start:end]
        print("batch_image:{}".format(batch_image))


if __name__ == "__main__":
    image_list = []
    batch_size = 7
    for i in range(10):
        image_list.append(str(i) + ".jpg")
    get_batch(image_list, batch_size)
 
   
   batch_image:['0.jpg', '1.jpg', '2.jpg', '3.jpg', '4.jpg', '5.jpg', '6.jpg']
 batch_image:['7.jpg', '8.jpg', '9.jpg']  
   
   
  4.常用的图像预处理和文件处理包 
  4.1 image_processing.py 
  # -*-coding: utf-8 -*-
"""
    @Project: IntelligentManufacture
    @File   : image_processing.py
    @Author : panjq
    @E-mail : [email protected]
    @Date   : 2019-02-14 15:34:50
"""

import os
import glob
import cv2
import numpy as np
import matplotlib.pyplot as plt
import copy

def show_batch_image(title,batch_imgs,index=0):
    image = batch_imgs[index, :]
    # image = image.numpy()  #
    image = np.array(image, dtype=np.float32)
    image=np.squeeze(image)
    if len(image.shape)==3:
        image = image.transpose(1, 2, 0)  # 通道由[c,h,w]->[h,w,c]
    else:
        image = image.transpose(1,0)
    cv_show_image(title,image)
def show_image(title, rgb_image):
    '''
    调用matplotlib显示RGB图片
    :param title: 图像标题
    :param rgb_image: 图像的数据
    :return:
    '''
    # plt.figure("show_image")
    # print(image.dtype)
    channel=len(rgb_image.shape)
    if channel==3:
        plt.imshow(rgb_image)
    else :
        plt.imshow(rgb_image, cmap='gray')
    plt.axis('on')  # 关掉坐标轴为 off
    plt.title(title)  # 图像题目
    plt.show()

def cv_show_image(title, image, type='rgb'):
    '''
    调用OpenCV显示RGB图片
    :param title: 图像标题
    :param image: 输入RGB图像
    :param type:'rgb' or 'bgr'
    :return:
    '''
    channels=image.shape[-1]
    if channels==3 and type=='rgb':
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)  # 将BGR转为RGB
    cv2.imshow(title, image)
    cv2.waitKey(0)

def show_batch_image(title,batch_imgs,index=0):
    image = batch_imgs[index, :]
    # image = image.numpy()  #
    image = np.array(image, dtype=np.float32)
    if len(image.shape)==3:
        image = image.transpose(1, 2, 0)  # 通道由[c,h,w]->[h,w,c]
    else:
        image = image.transpose(1,0)
    cv_show_image(title,image)

def get_prewhiten_image(x):
    mean = np.mean(x)
    std = np.std(x)
    std_adj = np.maximum(std, 1.0 / np.sqrt(x.size))
    y = np.multiply(np.subtract(x, mean), 1 / std_adj)
    return y

def image_normalization(image,mean=None,std=None):
    # 不能写成:image=image/255
    image = np.array(image, dtype=np.float32)
    image = image / 255.0
    if mean is not None:
        image=np.subtract(image, mean)
    if std is not None:
        np.multiply(image, 1 / std)
    return image
def get_prewhiten_images(images_list,normalization=False):
    out_images=[]
    for image in images_list:
        if normalization:
            image=image_normalization(image)
        image=get_prewhiten_image(image)
        out_images.append(image)
    return out_images

def read_image(filename, resize_height=None, resize_width=None, normalization=False,colorSpace='RGB'):
    '''
    读取图片数据,默认返回的是uint8,[0,255]
    :param filename:
    :param resize_height:
    :param resize_width:
    :param normalization:是否归一化到[0.,1.0]
    :param colorSpace 输出格式：RGB or BGR
    :return: 返回的图片数据
    '''

    bgr_image = cv2.imread(filename)
    # bgr_image = cv2.imread(filename,cv2.IMREAD_IGNORE_ORIENTATION|cv2.IMREAD_COLOR)
    if bgr_image is None:
        print("Warning:不存在:{}", filename)
        return None
    if len(bgr_image.shape) == 2:  # 若是灰度图则转为三通道
        print("Warning:gray image", filename)
        bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)

    if colorSpace=='RGB':
        image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)  # 将BGR转为RGB
    elif colorSpace=="BGR":
        image=bgr_image
    else:
        exit(0)
    # show_image(filename,image)
    # image=Image.open(filename)
    image = resize_image(image,resize_height,resize_width)
    image = np.asanyarray(image)
    if normalization:
        image=image_normalization(image)
    # show_image("src resize image",image)
    return image
def read_image_gbk(filename, resize_height=None, resize_width=None, normalization=False,colorSpace='RGB'):
    '''
    解决imread不能读取中文路径的问题,读取图片数据,默认返回的是uint8,[0,255]
    :param filename:
    :param resize_height:
    :param resize_width:
    :param normalization:是否归一化到[0.,1.0]
    :param colorSpace 输出格式：RGB or BGR
    :return: 返回的RGB图片数据
    '''
    with open(filename, 'rb') as f:
        data = f.read()
        data = np.asarray(bytearray(data), dtype="uint8")
        bgr_image = cv2.imdecode(data, cv2.IMREAD_COLOR)
    # 或者：
    # bgr_image=cv2.imdecode(np.fromfile(filename,dtype=np.uint8),cv2.IMREAD_COLOR)
    if bgr_image is None:
        print("Warning:不存在:{}", filename)
        return None
    if len(bgr_image.shape) == 2:  # 若是灰度图则转为三通道
        print("Warning:gray image", filename)
        bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)
    if colorSpace=='RGB':
        image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)  # 将BGR转为RGB
    elif colorSpace=="BGR":
        image=bgr_image
    else:
        exit(0)
    # show_image(filename,image)
    # image=Image.open(filename)
    image = resize_image(image,resize_height,resize_width)
    image = np.asanyarray(image)
    if normalization:
        image=image_normalization(image)
    # show_image("src resize image",image)
    return image




def fast_read_image_roi(filename, orig_rect, ImreadModes=cv2.IMREAD_COLOR, normalization=False,colorSpace='RGB'):
    '''
    快速读取图片的方法
    :param filename: 图片路径
    :param orig_rect:原始图片的感兴趣区域rect
    :param ImreadModes: IMREAD_UNCHANGED
                        IMREAD_GRAYSCALE
                        IMREAD_COLOR
                        IMREAD_ANYDEPTH
                        IMREAD_ANYCOLOR
                        IMREAD_LOAD_GDAL
                        IMREAD_REDUCED_GRAYSCALE_2
                        IMREAD_REDUCED_COLOR_2
                        IMREAD_REDUCED_GRAYSCALE_4
                        IMREAD_REDUCED_COLOR_4
                        IMREAD_REDUCED_GRAYSCALE_8
                        IMREAD_REDUCED_COLOR_8
                        IMREAD_IGNORE_ORIENTATION
    :param normalization: 是否归一化
    :param colorSpace 输出格式：RGB or BGR
    :return: 返回感兴趣区域ROI
    '''
    # 当采用IMREAD_REDUCED模式时，对应rect也需要缩放
    scale=1
    if ImreadModes == cv2.IMREAD_REDUCED_COLOR_2 or ImreadModes == cv2.IMREAD_REDUCED_COLOR_2:
        scale=1/2
    elif ImreadModes == cv2.IMREAD_REDUCED_GRAYSCALE_4 or ImreadModes == cv2.IMREAD_REDUCED_COLOR_4:
        scale=1/4
    elif ImreadModes == cv2.IMREAD_REDUCED_GRAYSCALE_8 or ImreadModes == cv2.IMREAD_REDUCED_COLOR_8:
        scale=1/8
    rect = np.array(orig_rect)*scale
    rect = rect.astype(int).tolist()
    bgr_image = cv2.imread(filename,flags=ImreadModes)

    if bgr_image is None:
        print("Warning:不存在:{}", filename)
        return None
    if len(bgr_image.shape) == 2:  # 若是灰度图则转为三通道
        print("Warning:gray image", filename)
        bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)
    if colorSpace == 'RGB':
        image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)  # 将BGR转为RGB
    elif colorSpace == "BGR":
        image = bgr_image
    image = np.asanyarray(image)
    if normalization:
        image=image_normalization(image)
    roi_image=get_rect_image(image , rect)
    # show_image_rect("src resize image",rgb_image,rect)
    # cv_show_image("reROI",roi_image)
    return roi_image

def resize_image(image,resize_height, resize_width):
    '''
    :param image:
    :param resize_height:
    :param resize_width:
    :return:
    '''
    image_shape=np.shape(image)
    height=image_shape[0]
    width=image_shape[1]
    if (resize_height is None) and (resize_width is None):#错误写法：resize_height and resize_width is None
        return image
    if resize_height is None:
        resize_height=int(height*resize_width/width)
    elif resize_width is None:
        resize_width=int(width*resize_height/height)
    image = cv2.resize(image, dsize=(resize_width, resize_height))
    return image
def scale_image(image,scale):
    '''
    :param image:
    :param scale: (scale_w,scale_h)
    :return:
    '''
    image = cv2.resize(image,dsize=None, fx=scale[0],fy=scale[1])
    return image

def get_rect_image(image,rect):
    '''
    :param image:
    :param rect: [x,y,w,h]
    :return:
    '''
    shape=image.shape#h,w
    height=shape[0]
    width=shape[1]
    image_rect=(0,0,width,height)
    rect=get_rect_intersection(rect, image_rect)
    x, y, w, h=rect
    cut_img = image[y:(y+ h),x:(x+w)]
    return cut_img



def get_rects_image(image,rects_list,resize_height=None, resize_width=None):
    rect_images = []
    for rect in rects_list:
        roi=get_rect_image(image, rect)
        roi=resize_image(roi, resize_height, resize_width)
        rect_images.append(roi)
    return rect_images

def get_bboxes_image(image,bboxes_list,resize_height=None, resize_width=None):
    rects_list=bboxes2rects(bboxes_list)
    rect_images = get_rects_image(image,rects_list,resize_height, resize_width)
    return rect_images

def bboxes2rects(bboxes_list):
    '''
    将bboxes=[x1,y1,x2,y2] 转为rect=[x1,y1,w,h]
    :param bboxes_list:
    :return:
    '''
    rects_list=[]
    for bbox in bboxes_list:
        x1, y1, x2, y2=bbox
        rect=[ x1, y1,(x2-x1),(y2-y1)]
        rects_list.append(rect)
    return rects_list

def rects2bboxes(rects_list):
    '''
    将rect=[x1,y1,w,h]转为bboxes=[x1,y1,x2,y2]
    :param rects_list:
    :return:
    '''
    bboxes_list=[]
    for rect in rects_list:
        x1, y1, w, h = rect
        x2=x1+w
        y2=y1+h
        b=(x1,y1,x2,y2)
        bboxes_list.append(b)
    return bboxes_list

def scale_rect(orig_rect,orig_shape,dest_shape):
    '''
    对图像进行缩放时，对应的rectangle也要进行缩放
    :param orig_rect: 原始图像的rect=[x,y,w,h]
    :param orig_shape: 原始图像的维度shape=[h,w]
    :param dest_shape: 缩放后图像的维度shape=[h,w]
    :return: 经过缩放后的rectangle
    '''
    new_x=int(orig_rect[0]*dest_shape[1]/orig_shape[1])
    new_y=int(orig_rect[1]*dest_shape[0]/orig_shape[0])
    new_w=int(orig_rect[2]*dest_shape[1]/orig_shape[1])
    new_h=int(orig_rect[3]*dest_shape[0]/orig_shape[0])
    dest_rect=[new_x,new_y,new_w,new_h]
    return dest_rect
def get_rect_intersection(rec1,rec2):
    '''
    计算两个rect的交集坐标
    :param rec1:
    :param rec2:
    :return:
    '''
    cx1, cy1, cx2, cy2 = rects2bboxes([rec1])[0]
    gx1, gy1, gx2, gy2 = rects2bboxes([rec2])[0]
    x1 = max(cx1, gx1)
    y1 = max(cy1, gy1)
    x2 = min(cx2, gx2)
    y2 = min(cy2, gy2)
    w = max(0, x2 - x1)
    h = max(0, y2 - y1)
    return (x1,y1,w,h)

def show_image_bboxes_text(title, rgb_image, boxes, boxes_name):
    '''
    :param boxes_name:
    :param bgr_image: bgr image
    :param boxes: [[x1,y1,x2,y2],[x1,y1,x2,y2]]
    :return:
    '''
    bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR)
    for name ,box in zip(boxes_name,boxes):
        box=[int(b) for b in box]
        cv2.rectangle(bgr_image, (box[0],box[1]),(box[2],box[3]), (0, 255, 0), 2, 8, 0)
        cv2.putText(bgr_image,name, (box[0],box[1]), cv2.FONT_HERSHEY_COMPLEX_SMALL, 0.8, (0, 0, 255), thickness=2)
    # cv2.imshow(title, bgr_image)
    # cv2.waitKey(0)
    rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)
    show_image(title, rgb_image)

def show_image_rects_text(title, rgb_image, rects_list, boxes_name):
    '''
    :param boxes_name:
    :param bgr_image: bgr image
    :param boxes: [[x1,y1,w,h],[x1,y1,w,h]]
    :return:
    '''
    bbox_list = rects2bboxes(rects_list)
    show_image_bboxes_text(title, rgb_image, bbox_list, boxes_name)

def show_image_rects(win_name,image,rect_list):
    '''
    :param win_name:
    :param image:
    :param rect_list:[[ x, y, w, h],[ x, y, w, h]]
    :return:
    '''
    for rect in rect_list:
        x, y, w, h=rect
        point1=(int(x),int(y))
        point2=(int(x+w),int(y+h))
        cv2.rectangle(image, point1, point2, (0, 0, 255), thickness=2)
    cv_show_image(win_name, image)

def show_landmark_boxex(win_name,img,landmarks_list,boxes):
    '''
    显示landmark和boxex
    :param win_name:
    :param image:
    :param landmarks_list: [[x1, y1], [x2, y2]]
    :param boxes:     [[ x1, y1, x2, y2],[ x1, y1, x2, y2]]
    :return:
    '''
    image=copy.copy(img)
    point_size = 1
    point_color = (0, 0, 255)  # BGR
    thickness = 4  # 可以为 0 、4、8
    for landmarks in landmarks_list:
        for landmark in landmarks:
            # 要画的点的坐标
            point = (landmark[0],landmark[1])
            cv2.circle(image, point, point_size, point_color, thickness)
    show_image_boxes(win_name, image, boxes)

def show_image_boxes(win_name,image,boxes_list):
    '''
    :param win_name:
    :param image:
    :param boxes_list:[[ x1, y1, x2, y2],[ x1, y1, x2, y2]]
    :return:
    '''
    for box in boxes_list:
        x1, y1, x2, y2=box
        point1=(int(x1),int(y1))
        point2=(int(x2),int(y2))
        cv2.rectangle(image, point1, point2, (0, 0, 255), thickness=2)
    show_image(win_name, image)

def rgb_to_gray(image):
    image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    return image

def save_image(image_path, rgb_image,toUINT8=True):
    if toUINT8:
        rgb_image = np.asanyarray(rgb_image * 255, dtype=np.uint8)
    if len(rgb_image.shape) == 2:  # 若是灰度图则转为三通道
        bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_GRAY2BGR)
    else:
        bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR)
    cv2.imwrite(image_path, bgr_image)

def combime_save_image(orig_image, dest_image, out_dir,name,prefix):
    '''
    命名标准：out_dir/name_prefix.jpg
    :param orig_image:
    :param dest_image:
    :param image_path:
    :param out_dir:
    :param prefix:
    :return:
    '''
    dest_path = os.path.join(out_dir, name + "_"+prefix+".jpg")
    save_image(dest_path, dest_image)

    dest_image = np.hstack((orig_image, dest_image))
    save_image(os.path.join(out_dir, "{}_src_{}.jpg".format(name,prefix)), dest_image)

if __name__=="__main__":
    # image_path="../dataset/test_images/lena1.jpg"
    # image_path="E:/git/dataset/tgs-salt-identification-challenge/train/my_masks/4.png"
    image_path = 'E:/Face/dataset/bzl/test3/test_dataset/陈思远_716/8205_0.936223.jpg'

    # target_rect=main.select_user_roi(target_path)#rectangle=[x,y,w,h]
    # orig_rect = [50, 50, 100000, 10000]

    image = read_image_gbk(image_path, resize_height=None, resize_width=None)
    # orig_image=get_rect_image(image,orig_rect)

    # show_image_rects("image",image,[orig_rect])
    show_image("orig_image",image)


 
   
  4.2 file_processing.py 
  # -*-coding: utf-8 -*-
"""
    @Project: IntelligentManufacture
    @File   : file_processing.py
    @Author : panjq
    @E-mail : [email protected]
    @Date   : 2019-02-14 15:08:19
"""
import glob
import os
import os,shutil
import numpy as np

import pandas as pd

def write_data(filename, content_list,mode='w'):
    """保存txt数据
    :param filename:文件名
    :param content_list:需要保存的数据,type->list
    :param mode:读写模式:'w' or 'a'
    :return: void
    """
    with open(filename, mode=mode, encoding='utf-8') as f:
        for line_list in content_list:
            # 将list转为string
            line=" ".join('%s' % id for id in line_list)
            f.write(line+"\n")

def read_data(filename,split=" ",convertNum=True):
    """
    读取txt数据函数
    :param filename:文件名
    :param split   :分割符
    :param convertNum :是否将list中的string转为int/float类型的数字
    :return: txt的数据列表
    Python中有三个去除头尾字符、空白符的函数，它们依次为:
    strip： 用来去除头尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
    lstrip：用来去除开头字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
    rstrip：用来去除结尾字符、空白符(包括\n、\r、\t、' '，即：换行、回车、制表符、空格)
    注意：这些函数都只会删除头和尾的字符，中间的不会删除。
    """
    with open(filename, mode="r",encoding='utf-8') as f:
        content_list = f.readlines()
        if split is None:
            content_list = [content.rstrip() for content in content_list]
            return content_list
        else:
            content_list = [content.rstrip().split(split) for content in content_list]
        if convertNum:
            for i,line in enumerate(content_list):
                line_data=[]
                for l in line:
                    if is_int(l):  # isdigit() 方法检测字符串是否只由数字组成,只能判断整数
                        line_data.append(int(l))
                    elif is_float(l):  # 判断是否为小数
                        line_data.append(float(l))
                    else:
                        line_data.append(l)
                content_list[i]=line_data
    return content_list


def is_int(str):
    # 判断是否为整数
    try:
        x = int(str)
        return isinstance(x, int)
    except ValueError:
        return False


def is_float(str):
    # 判断是否为整数和小数
    try:
        x = float(str)
        return isinstance(x, float)
    except ValueError:
        return False


def list2str(content_list):
    content_str_list=[]
    for line_list in content_list:
        line_str = " ".join('%s' % id for id in line_list)
        content_str_list.append(line_str)
    return content_str_list

def get_images_list(image_dir,postfix=['*.jpg'],basename=False):
    '''
    获得文件列表
    :param image_dir: 图片文件目录
    :param postfix: 后缀名，可是多个如，['*.jpg','*.png']
    :param basename: 返回的列表是文件名（True），还是文件的完整路径(False)
    :return:
    '''
    images_list=[]
    for format in postfix:
        image_format=os.path.join(image_dir,format)
        image_list=glob.glob(image_format)
        if not image_list==[]:
            images_list+=image_list
    images_list=sorted(images_list)
    if basename:
        images_list=get_basename(images_list)
    return images_list

def get_basename(file_list):
    dest_list=[]
    for file_path in file_list:
        basename=os.path.basename(file_path)
        dest_list.append(basename)
    return dest_list

def copyfile(srcfile,dstfile):
    if not os.path.isfile(srcfile):
        print("%s not exist!"%(srcfile))
    else:
        fpath,fname=os.path.split(dstfile)    #分离文件名和路径
        if not os.path.exists(fpath):
            os.makedirs(fpath)                #创建路径
        shutil.copyfile(srcfile,dstfile)      #复制文件
        # print("copy %s -> %s"%( srcfile,dstfile))


def merge_list(data1, data2):
    '''
    将两个list进行合并
    :param data1:
    :param data2:
    :return:返回合并后的list
    '''
    if not len(data1) == len(data2):
        return
    all_data = []
    for d1, d2 in zip(data1, data2):
        all_data.append(d1 + d2)
    return all_data


def split_list(data, split_index=1):
    '''
    将data切分成两部分
    :param data: list
    :param split_index: 切分的位置
    :return:
    '''
    data1 = []
    data2 = []
    for d in data:
        d1 = d[0:split_index]
        d2 = d[split_index:]
        data1.append(d1)
        data2.append(d2)
    return data1, data2


def getFilePathList(file_dir):
    '''
    获取file_dir目录下，所有文本路径，包括子目录文件
    :param rootDir:
    :return:
    '''
    filePath_list = []
    for walk in os.walk(file_dir):
        part_filePath_list = [os.path.join(walk[0], file) for file in walk[2]]
        filePath_list.extend(part_filePath_list)
    return filePath_list


def get_files_list(file_dir, postfix='ALL'):
    '''
    获得file_dir目录下，后缀名为postfix所有文件列表，包括子目录
    :param file_dir:
    :param postfix: jpg.png
    :return:
    '''
    postfix = postfix.split('.')[-1]
    file_list = []
    filePath_list = getFilePathList(file_dir)
    if postfix == 'ALL':
        file_list = filePath_list
    else:
        for file in filePath_list:
            basename = os.path.basename(file)  # 获得路径下的文件名
            postfix_name = basename.split('.')[-1]
            if postfix_name == postfix:
                file_list.append(file)
    file_list.sort()
    return file_list


def gen_files_labels(files_dir,postfix='ALL'):
    '''
    获取files_dir路径下所有文件路径，以及labels,其中labels用子级文件名表示
    files_dir目录下，同一类别的文件放一个文件夹，其labels即为文件的名
    :param files_dir:
    :postfix 后缀名
    :return:filePath_list所有文件的路径,label_list对应的labels
    '''
    # filePath_list = getFilePathList(files_dir)
    filePath_list=get_files_list(files_dir, postfix=postfix)
    print("files nums:{}".format(len(filePath_list)))
    # 获取所有样本标签
    label_list = []
    for filePath in filePath_list:
        label = filePath.split(os.sep)[-2]
        label_list.append(label)

    labels_set = list(set(label_list))
    print("labels:{}".format(labels_set))

    # 标签统计计数
    # print(pd.value_counts(label_list))
    return filePath_list, label_list

def decode_label(label_list,name_table):
    '''
    根据name_table解码label
    :param label_list:
    :param name_table:
    :return:
    '''
    name_list=[]
    for label in label_list:
        name = name_table[label]
        name_list.append(name)
    return name_list

def encode_label(name_list,name_table):
    '''
    根据name_table，编码label
    :param name_list:
    :param name_table:
    :return:
    '''
    label_list=[]
    for name in name_list:
        index = name_table.index(name)
        label_list.append(index)
    return label_list

if __name__=='__main__':
    filename = 'test.txt'
    w_data = [['1.jpg', 'dog', 200, 300, 1.0], ['2.jpg', 'dog', 20, 30, -2]]
    print("w_data=", w_data)
    write_data(filename,w_data, mode='w')
    r_data = read_data(filename)
    print('r_data=', r_data)
 
  4.3 Debug文件 
  import datetime
import logging
import sys
import time

'''
    url:https://cuiqingcai.com/6080.html
    level级别：debug、info、warning、error以及critical
'''
# logging.basicConfig(level=logging.DEBUG,
#                     filename='output.log',
#                     datefmt='%Y/%m/%d %H:%M:%S',
#                     format='%(asctime)s - %(name)s - %(levelname)s - %(lineno)d - %(module)s - %(message)s')
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(filename)s - %(funcName)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

def RUN_TIME(deta_time):
    '''
    返回毫秒,deta_time.seconds获得秒数=1000ms，deta_time.microseconds获得微妙数=1/1000ms
    :param deta_time: ms
    :return:
    '''
    time_ = deta_time.seconds * 1000 + deta_time.microseconds / 1000.0
    return time_

def TIME():
    return datetime.datetime.now()


if __name__=='__main__':
    T0 = TIME()
    # do something
    time.sleep(5)
    T1 = TIME()
    print("rum time:{}ms".format(RUN_TIME(T1 - T0)))

    logger.info('This is a log info')
    logger.debug('Debugging')
    logger.warning('Warning exists')
    logger.error('Finish')
 
  4.5 NMS-GPU和CPU 
   
   https://www.cnblogs.com/king-lps/p/9031568.html  
   https://github.com/SirLPS/NMS

	name	sex	city	age	job	salary
1	tom1	f	shenzhen1	22	student	1k
2	tom2	f	shenzhen2	22	teacher	2k
3	tom3	m	shenzhen3	21	teacher	2k

	name	sex	city	age	job	salary
4	tom4	m	shenzhen4	24	engineer	3k
1	tom1	f	shenzhen1	22	student	1k
2	tom2	f	shenzhen2	22	teacher	2k
3	tom3	m	shenzhen3	21	teacher	2k

Python常用的模块的使用技巧

Python常用模块的使用技巧

1.Python配置说明

（1）Python注释说明

（2）函数说明

（3）ipynb文件转.py文件

（4）Python计算运行时间

（5）镜像加速方法

（6）代码分析工具 Pylint安装+pycharm下的配置

（7）Python添加环境路径和搜索路径的方法

（8）conda常用命令

2.常用的模块

2.1 numpy模块：

(1)矩阵的拼接和分割,奇偶项分割数据

(2)按照列进行排序

(3)提取符合条件的某行某列

(4)查找符合条件的向量

(5)打乱顺序

2.2 pickle模块

2.3 random.shuffle产生固定种子

2.4 zip()与zip(*) 函数：

2.5 map、for快速遍历方法：

2.6 glob模块

2.7 os模块

2.8 判断图像文件为空和文件不存，文件过小

2.9 保存多维array数组的方法

2.10读取txt数据的方法

2.11 pandas模块

（1）文件数据拼接

（2）DataFrame

Pandas DataFrame数据的增、删、改、查

2.12 csv模块

2.13 logging模块

3. 数据预处理

3.1 数据（图像）分块处理

3.2 读取图片和显示

（1）matplotlib.image、PIL.Image、cv2图像读取模块

（2）将 numpy 数组转换为 PIL 图片：

（3）python中PIL.Image和OpenCV图像格式相互转换

（4）matplotlib显示阻塞问题

（5）matplotlib绘制矩形框

3.3 one-hot独热编码

3.4 循环产生batch数据:

3.5 统计元素个数和种类

3.6 python 字典(dict)按键和值排序

3.7 自定义排序sorted

3.8 加载yml配置文件

3.9 移动、复制、重命名文件

3.10 产生batch_size的数据

4.常用的图像预处理和文件处理包

4.1 image_processing.py

4.2 file_processing.py

4.3 Debug文件

4.5 NMS-GPU和CPU

你可能感兴趣的:(Python,学习笔记)

**2.4 zip()与zip(*) 函数：**