目录
目录
Python常用模块的使用技巧
1.Python配置说明
(1)Python注释说明
(2)函数说明
(3)ipynb文件转.py文件
(4)Python计算运行时间
(5)镜像加速方法
(6)代码分析工具 Pylint安装+pycharm下的配置
(7)Python添加环境路径和搜索路径的方法
(8)conda常用命令
2.常用的模块
2.1 numpy模块:
(1)矩阵的拼接和分割,奇偶项分割数据
(2)按照列进行排序
(3)提取符合条件的某行某列
(4)查找符合条件的向量
(5)打乱顺序
2.2 pickle模块
2.3 random.shuffle产生固定种子
2.4 zip()与zip(*) 函数:
2.5 map、for快速遍历方法:
2.6 glob模块
2.7 os模块
2.8 判断图像文件为空和文件不存,文件过小
2.9 保存多维array数组的方法
2.10读取txt数据的方法
2.11 pandas模块
(1)文件数据拼接
(2)DataFrame
Pandas DataFrame数据的增、删、改、查
2.12 csv模块
2.13 logging模块
3. 数据预处理
3.1 数据(图像)分块处理
3.2 读取图片和显示
(1)matplotlib.image、PIL.Image、cv2图像读取模块
(2)将 numpy 数组转换为 PIL 图片:
(3)python中PIL.Image和OpenCV图像格式相互转换
(4)matplotlib显示阻塞问题
(5)matplotlib绘制矩形框
3.3 one-hot独热编码
3.4 循环产生batch数据:
3.5 统计元素个数和种类
3.6 python 字典(dict)按键和值排序
3.7 自定义排序sorted
3.8 加载yml配置文件
3.9 移动、复制、重命名文件
3.10 产生batch_size的数据
4.常用的图像预处理和文件处理包
4.1 image_processing.py
4.2 file_processing.py
4.3 Debug文件
4.5 NMS-GPU和CPU
在pyCharm中File->Setting->Editor->File and Code Templates->Python Script:
# -*-coding: utf-8 -*-
"""
@Project: ${PROJECT_NAME}
@File : ${NAME}.py
@Author : panjq
@E-mail : [email protected]
@Date : ${YEAR}-${MONTH}-${DAY} ${HOUR}:${MINUTE}:${SECOND}
"""
def my_fun(para1,para2):
'''
函数功能实现简介
:param para1: 输入参数说明,类型
:param para2: 输入参数说明,类型
:return: 返回内容,类型
'''
jupyter nbconvert --to script demo.ipynb
import datetime
def RUN_TIME(deta_time):
'''
返回毫秒,deta_time.seconds获得秒数=1000ms,deta_time.microseconds获得微妙数=1/1000ms
:param deta_time: ms
:return:
'''
time_=deta_time.seconds * 1000 + deta_time.microseconds / 1000.0
return time_
T0 = datetime.datetime.now()
# do something
T1 = datetime.datetime.now()
print("rum time:{}".format(RUN_TIME(T1-T0)))
TUNA 还提供了 Anaconda 仓库的镜像,运行以下命令:
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --set show_channel_urls yes
设置上述镜像后,瞬间提速,但该镜像仅限该命令窗口有效
windows 下在用户目录下面创建pip,然后创建pip.ini文件,把阿里的源复制进去:
[global]
trusted-host=mirrors.aliyun.com
index-url = http://mirrors.aliyun.com/pypi/simple/
Linux下,修改 ~/.pip/pip.conf (没有就创建一个文件夹及文件。文件夹要加“.”,表示是隐藏文件夹)
内容如下:
[global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple [install] trusted-host=mirrors.aliyun.com
windows下,直接在user目录中创建一个pip目录,如:C:\Users\xx\pip,新建文件pip.ini。内容同上。
临时的方法:pip时加上"-i https://mirrors.aliyun.com/pypi/simple/":,如
pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/
https://www.cnblogs.com/yaoliping/archive/2018/10/10/9767834.html
添加环境路径:
# 添加graphviz环境路径
import os
os.environ["PATH"] += os.pathsep + 'D:/ProgramData/Anaconda3/envs/pytorch-py36/Library/bin/graphviz/'
搜索路径:
import sys
import os
# 打印当前python搜索模块的路径集
print(sys.path)
# 打印当前文件所在路径
print("os.path.dirname(__file__):", os.path.dirname(__file__))
print("os.getcwd(): ", os.getcwd()) # get current work directory:cwd:获得当前工作目录
'''添加相关的路径
sys.path.append(‘你的模块的名称’)。
sys.path.insert(0,’模块的名称’)
'''
# 先添加image_processing所在目录路径
sys.path.append("F:/project/python-learning-notes/utils")
# sys.path.append(os.getcwd())
# 再倒入该包名
import image_processing
#
os.environ["PATH"] += os.pathsep + 'D:/ProgramData/Anaconda3/envs/pytorch-py36/Library/bin/graphviz/'
image_path = "F:/project/python-learning-notes/dataset/test_image/1.jpg"
image = image_processing.read_image(image_path)
image_processing.cv_show_image("image", image)
- 列举当前所有环境:conda info --envs 或者conda env list
- 生成一个
environment.yml
文件:conda env export > environment.yml- 根据
environment.yml
文件安装该环境:conda env create -f environment.yml- 列举当前活跃环境下的所有包:conda list
- 参数某个环境:conda remove --name your_env_name --all
# 产生5*2的矩阵数据
data1=np.arange(0,10)
data1=data1.reshape([5,2])
# 矩阵拼接
y = np.concatenate([data1, data2], 0)
# 矩阵拼接
def cat_labels_indexMat(labels,indexMat):
indexMat_labels = np.concatenate([labels,indexMat], axis=1)
return indexMat_labels
# 矩阵分割
def split_labels_indexMat(indexMat_labels,label_index=0):
labels = indexMat_labels[:, 0:label_index+1] # 第一列是labels
indexMat = indexMat_labels[:, label_index+1:] # 其余是indexMat
return labels, indexMat
def split_data(data):
'''
按照奇偶项分割数据
:param data:
:return:
'''
data1 = data[0::2]
data2 = data[1::2]
return data1,data2
if __name__=='__main__':
data = np.arange(0, 20)
data = data.reshape([10, 2])
data1,data2=split_data(data)
print("embeddings:{}".format(data))
print("embeddings1:{}".format(data1))
print("embeddings2:{}".format(data2))
pair_issame = pair_issame[np.lexsort(pair_issame.T)]#按最后一列进行排序
假设有数据:pair_issame:
如果想提取第三列的为"1"的数据,可以这样:
pair_issame_1 = pair_issame[pair_issame[:, -1] == "1", :] # 筛选数组
import numpy as np
def matching_data_vecror(data, vector):
'''
从data中匹配vector向量,查找出现vector的index,如:
data = [[1., 0., 0.],[0., 0., 0.],[2., 0., 0.],
[0., 0., 0.],[0., 3., 0.],[0., 0., 4.]]
# 查找data中出现[0, 0, 0]的index
data = np.asarray(data)
vector=[0, 0, 0]
index =find_index(data,vector)
print(index)
>>[False True False True False False]
# 实现去除data数组中元素为[0, 0, 0]的行向量
pair_issame_1 = data[~index, :] # 筛选数组
:param data:
:param vector:
:return:
'''
# index = (data[:, 0] == 0) & (data[:, 1] == 0) & (data[:, 2] == 0)
row_nums = len(data)
clo_nums = len(vector)
index = np.asarray([True] * row_nums)
for i in range(clo_nums):
index = index & (data[:, i] == vector[i])
return index
def set_mat_vecror(data, index, vector):
'''
实现将data指定index位置的数据设置为vector
# 实现将大于阈值分数的point,设置为vector = [10, 10]
point = [[0., 0.], [1., 1.], [2., 2.],
[3., 3.], [4., 4.], [5., 5.]]
point = np.asarray(point) # 每个数据点
score = np.array([0.7, 0.2, 0.3, 0.4, 0.5, 0.6])# 每个数据点的分数
score_th=0.5
index = np.where(score > score_th) # 获得大于阈值分数的所有下标
vector = [10, 10] # 将大于阈值的数据设置为vector
out = set_mat_vecror(point, index, vector)
:param data:
:param index:
:param vector:
:return:
'''
data[index, :] = vector
return data
https://blog.csdn.net/Song_Lynn/article/details/82817647
per = np.random.permutation(pair_issame_1.shape[0]) # 打乱后的行号
pair_issame_1 = pair_issame_0[per, :] # 获取打乱后的数据
pickle可以存储什么类型的数据呢?
- 所有python支持的原生类型:布尔值,整数,浮点数,复数,字符串,字节,None。
- 由任何原生类型组成的列表,元组,字典和集合。
- 函数,类,类的实例
import pickle
import numpy as np
def save_data(data, file):
with open(file, 'wb') as f:
pickle.dump(data, f)
def load_data(file):
with open(file, 'rb') as f:
data = pickle.load(f)
return data
if __name__ == "__main__":
data1 = ['aa', 'bb', 'cc'] # list
data1=np.asarray(data1) # ndarray
data_path = "data.pk"
save_data(data1, data_path)
data2 = load_data(data_path)
print(data1)
print(data2)
files_list=...
labels_list=...
shuffle=True
if shuffle:
# seeds = random.randint(0,len(files_list)) #产生一个随机数种子
seeds = 100 # 固定种子,只要seed的值一样,后续生成的随机数都一样
random.seed(seeds)
random.shuffle(files_list)
random.seed(seeds)
random.shuffle(labels_list)
zip() 函数用于将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的列表。如果各个迭代器的元素个数不一致,则返回列表长度与最短的对象相同,利用 * 号操作符,可以将元组解压为列表。
zip 方法在 Python 2 和 Python 3 中的不同:在 Python 3.x 中为了减少内存,zip() 返回的是一个对象。如需展示列表,需手动 list() 转换。
a = [1,2,3]
b = [4,5,6]
c = [4,5,6,7,8]
zipped = zip(a,b) # 打包为元组的列表
# 结果:[(1, 4), (2, 5), (3, 6)]
zip(a,c) # 元素个数与最短的列表一致
# 结果:[(1, 4), (2, 5), (3, 6)]
zip(*zipped) # 与 zip 相反,*zipped 可理解为解压,返回二维矩阵式
# 结果:[(1, 2, 3), (4, 5, 6)]
# 假设files_list为:
files_list=['../training_data/test\\0.txt', '../training_data/test\\1.txt', '../training_data/test\\2.txt', '../training_data/test\\3.txt', '../training_data/test\\4.txt', '../training_data/test\\5.txt', '../training_data/test\\6.txt']
# 下面的三个方法都是现实获得files_list的文件名
files_nemes1=list(map(lambda s: os.path.basename(s),files_list))
files_nemes2=list(os.path.basename(i)for i in files_list)
files_nemes3=[os.path.basename(i)for i in files_list]
glob模块是最简单的模块之一,内容非常少。用它可以查找符合特定规则的文件路径名。跟使用windows下的文件搜索差不多。查找文件只用到三个匹配符:"*", "?", "[]"。"*"匹配0个或多个字符;"?"匹配单个字符;"[]"匹配指定范围内的字符,如:[0-9]匹配数字。
import glob
#获取指定目录下的所有图片
print glob.glob(r"E:\Picture\*\*.jpg")
#获取上级目录的所有.py文件
print glob.glob(r'../*.py') #相对路径
对于遍历指定目录的jpg图片,可以这样:
# -*- coding:utf-8 -*-
import glob
#遍历指定目录下的jpg图片
image_path="/home/ubuntu/TFProject/view-finding-network/test_images/*.jpg"
for per_path in glob.glob(image_path):
print(per_path)
若想遍历多个格式的文件,可以这样:
# 遍历'jpg','png','jpeg'的图片
image_format=['jpg','png','jpeg']#图片格式
image_dir='./test_image' #图片目录
image_list=[]
for format in image_format:
path=image_dir+'/*.'+format
image_list.extend(glob.glob(path))
print(image_list)
import os
os.getcwd()#获得当前工作目录
os.path.abspath('.')#获得当前工作目录
os.path.abspath('..')#获得当前工作目录的父目录
os.path.abspath(os.curdir)#获得当前工作目录
os.path.join(os.getcwd(),'filename')#获取当前目录,并组合成新目录
os.path.exists(path)#判断文件是否存在
os.path.isfile(path)#如果path是一个存在的文件,返回True。否则返回False。
os.path.basename('path/to/test.jpg')#获得路径下的文件名:test.jpg
os.path.getsize(path) #返回文件大小,如果文件不存在就返回错误
path=os.path.dirname('path/to/test.jpg')#获得路径:path/to
os.sep#当前操作系统的路径分隔符,Linux/UNIX是‘/’,Windows是‘\\’
dirname='path/to/test.jpg'.split(os.sep)[-1]#获得当前文件夹的名称“test.jpg”
dirname='path/to/test.jpg'.split(os.sep)[-2]#获得当前文件夹的名称“to”
# 删除该目录下的所有文件
def delete_dir_file(dir_path):
ls = os.listdir(dir_path)
for i in ls:
c_path = os.path.join(dir_path, i)
if os.path.isdir(c_path):
delete_dir_file(c_path)
else:
os.remove(c_path)
# 若目录不存在,则创建新的目录(只能创建一级目录)
if not os.path.exists(out_dir):
os.mkdir(out_dir)
# 创建多级目录
if not os.path.exists(segment_out_name):
os.makedirs(segment_out_dir)
# 删除该目录下的所有文件
delete_dir_file(out_dir)
# 或者:
shutil.rmtree(out_dir) # delete output folder
下面是实现:【1】getFilePathList:获取file_dir目录下,所有文本路径,包括子目录文件,【2】get_files_list:获得file_dir目录下,后缀名为postfix所有文件列表,包括子目录, 【3】gen_files_labels: 获取files_dir路径下所有文件路径,以及labels,其中labels用子级文件名表示
# coding: utf-8
import os
import os.path
import pandas as pd
def getFilePathList(file_dir):
'''
获取file_dir目录下,所有文本路径,包括子目录文件
:param rootDir:
:return:
'''
filePath_list = []
for walk in os.walk(file_dir):
part_filePath_list = [os.path.join(walk[0], file) for file in walk[2]]
filePath_list.extend(part_filePath_list)
return filePath_list
def get_files_list(file_dir,postfix='ALL'):
'''
获得file_dir目录下,后缀名为postfix所有文件列表,包括子目录
:param file_dir:
:param postfix:
:return:
'''
postfix=postfix.split('.')[-1]
file_list=[]
filePath_list = getFilePathList(file_dir)
if postfix=='ALL':
file_list=filePath_list
else:
for file in filePath_list:
basename=os.path.basename(file) # 获得路径下的文件名
postfix_name=basename.split('.')[-1]
if postfix_name==postfix:
file_list.append(file)
file_list.sort()
return file_list
def gen_files_labels(files_dir):
'''
获取files_dir路径下所有文件路径,以及labels,其中labels用子级文件名表示
files_dir目录下,同一类别的文件放一个文件夹,其labels即为文件的名
:param files_dir:
:return:filePath_list所有文件的路径,label_list对应的labels
'''
filePath_list = getFilePathList(files_dir)
print("files nums:{}".format(len(filePath_list)))
# 获取所有样本标签
label_list = []
for filePath in filePath_list:
label = filePath.split(os.sep)[-2]
label_list.append(label)
labels_set=list(set(label_list))
print("labels:{}".format(labels_set))
# 标签统计计数
print(pd.value_counts(label_list))
return filePath_list,label_list
if __name__=='__main__':
file_dir='JPEGImages'
file_list=get_files_list(file_dir)
for file in file_list:
print(file)
实现遍历dir目录下,所有文件(包含子文件夹的文件)
# coding: utf-8
import os
import os.path
def get_files_list(dir):
'''
实现遍历dir目录下,所有文件(包含子文件夹的文件)
:param dir:指定文件夹目录
:return:包含所有文件的列表->list
'''
# parent:父目录, filenames:该目录下所有文件夹,filenames:该目录下的文件名
files_list=[]
for parent, dirnames, filenames in os.walk(dir):
for filename in filenames:
# print("parent is: " + parent)
# print("filename is: " + filename)
# print(os.path.join(parent, filename)) # 输出rootdir路径下所有文件(包含子文件)信息
files_list.append([os.path.join(parent, filename)])
return files_list
if __name__=='__main__':
dir = 'images'
files_list=get_files_list(dir)
print(files_list)
下面是一个封装好的get_input_list()函数,path是文件夹,则遍历所有png,jpg,jpeg等图像文件, path是txt文件路径,则读取txt中保存的文件列表(不要出现多余一个的空行),path是单个图片文件:path/to/1.png。
# -*-coding: utf-8 -*-
"""
@Project: hdrnet
@File : my_test.py
@Author : panjq
@E-mail : [email protected]
@Date : 2018-08-28 14:30:51
"""
import os
import logging
import re
logging.basicConfig(format="[%(process)d] %(levelname)s %(filename)s:%(lineno)s | %(message)s")
log = logging.getLogger("train")
log.setLevel(logging.INFO)
def get_input_list(path):
'''
返回所有图片的路径
:param path:单张图片的路径,或文件夹,或者txt文件
:return:
'''
regex = re.compile(".*.(png|jpeg|jpg|tif|tiff)")
# path是文件夹,则遍历所有png,jpg,jpeg等图像文件
# path/to
if os.path.isdir(path):
inputs = os.listdir(path)
inputs = [os.path.join(path, f) for f in inputs if regex.match(f)]
log.info("Directory input {}, with {} images".format(path, len(inputs)))
# path是txt文件路径,则读取txt中保存的文件列表(不要出现多余一个的空行)
# path/to/filelist.txt
elif os.path.splitext(path)[-1] == ".txt":
dirname = os.path.dirname(path)
with open(path, 'r') as fid:
inputs = [l.strip() for l in fid.readlines()]
inputs = [os.path.join(dirname, im) for im in inputs]
log.info("Filelist input {}, with {} images".format(path, len(inputs)))
# path是单个图片文件:path/to/1.png
elif regex.match(path):
inputs = [path]
log.info("Single input {}".format(path))
return inputs
if __name__ == '__main__':
path='dataset/filelist.txt';
result=get_input_list(path);
print(result);
def isValidImage(images_list,sizeTh=1000,isRemove=False):
''' 去除不存的文件和文件过小的文件列表
:param images_list:
:param sizeTh: 文件大小阈值,单位:字节B,默认1000B
:param isRemove: 是否在硬盘上删除被损坏的原文件
:return:
'''
i=0
while i
由于np.savetxt()不能直接保存三维以上的数组,因此需要转为向量的形式来保存
import numpy as np
arr1 = np.zeros((3,4,5), dtype='int16') # 创建3*4*5全0三维数组
print("维度:",np.shape(arr1))
arr1[0,:,:]=0
arr1[1,:,:]=1
arr1[2,:,:]=2
print("arr1=",arr1)
# 由于savetxt不能保存三维以上的数组,因此需要转为向量来保存
vector=arr1.reshape((-1,1))
np.savetxt("data.txt", vector)
data= np.loadtxt("data.txt")
print("data=",data)
arr2=data.reshape(arr1.shape)
print("arr2=",arr2)
这是封装好的txt读写模块,这里输入和输出的数据都是list列表:
# -*-coding: utf-8 -*-
"""
@Project: TxtStorage
@File : TxtStorage.py
@Author : panjq
@E-mail : [email protected]
@Date : 2018-07-12 17:32:47
"""
from numpy import *
class TxtStorage:
# def __init__(self):
def write_txt(self, content, filename, mode='w'):
"""保存txt数据
:param content:需要保存的数据,type->list
:param filename:文件名
:param mode:读写模式:'w' or 'a'
:return: void
"""
with open(filename, mode) as f:
for line in content:
str_line=""
for col,data in enumerate(line):
if not col == len(line) - 1:
# 以空格作为分隔符
str_line=str_line+str(data)+" "
else:
# 每行最后一个数据用换行符“\n”
str_line=str_line+str(data)+"\n"
f.write(str_line)
def read_txt(self, fileName):
"""读取txt数据函数
:param filename:文件名
:return: txt的数据列表
:rtype: list
Python中有三个去除头尾字符、空白符的函数,它们依次为:
strip: 用来去除头尾字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
lstrip:用来去除开头字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
rstrip:用来去除结尾字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
注意:这些函数都只会删除头和尾的字符,中间的不会删除。
"""
txtData=[]
with open(fileName, 'r') as f:
lines = f.readlines()
for line in lines:
lineData = line.rstrip().split(" ")
data=[]
for l in lineData:
if self.is_int(l): # isdigit() 方法检测字符串是否只由数字组成,只能判断整数
data.append(int(l))
elif self.is_float(l):#判断是否为小数
data.append(float(l))
else:
data.append(l)
txtData.append(data)
return txtData
def is_int(self,str):
# 判断是否为整数
try:
x = int(str)
return isinstance(x, int)
except ValueError:
return False
def is_float(self,str):
# 判断是否为整数和小数
try:
x = float(str)
return isinstance(x, float)
except ValueError:
return False
if __name__ == '__main__':
txt_filename = 'test.txt'
w_data = [['1.jpg', 'dog', 200, 300,1.0], ['2.jpg', 'dog', 20, 30,-2]]
print("w_data=",w_data)
txt_str = TxtStorage()
txt_str.write_txt(w_data, txt_filename, mode='w')
r_data = txt_str.read_txt(txt_filename)
print('r_data=',r_data)
一个读取TXT文本数据的常用操作:
# -*-coding: utf-8 -*-
"""
@Project: TxtStorage
@File : TxtStorage.py
@Author : panjq
@E-mail : [email protected]
@Date : 2018-07-12 17:32:47
"""
from numpy import *
def write_txt(content, filename, mode='w'):
"""保存txt数据
:param content:需要保存的数据,type->list
:param filename:文件名
:param mode:读写模式:'w' or 'a'
:return: void
"""
with open(filename, mode) as f:
for line in content:
str_line = ""
for col, data in enumerate(line):
if not col == len(line) - 1:
# 以空格作为分隔符
str_line = str_line + str(data) + " "
else:
# 每行最后一个数据用换行符“\n”
str_line = str_line + str(data) + "\n"
f.write(str_line)
def read_txt(fileName):
"""读取txt数据函数
:param filename:文件名
:return: txt的数据列表
:rtype: list
Python中有三个去除头尾字符、空白符的函数,它们依次为:
strip: 用来去除头尾字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
lstrip:用来去除开头字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
rstrip:用来去除结尾字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
注意:这些函数都只会删除头和尾的字符,中间的不会删除。
"""
txtData = []
with open(fileName, 'r') as f:
lines = f.readlines()
for line in lines:
lineData = line.rstrip().split(" ")
data = []
for l in lineData:
if is_int(l): # isdigit() 方法检测字符串是否只由数字组成,只能判断整数
data.append(int(l))
elif is_float(l): # 判断是否为小数
data.append(float(l))
else:
data.append(l)
txtData.append(data)
return txtData
def is_int(str):
# 判断是否为整数
try:
x = int(str)
return isinstance(x, int)
except ValueError:
return False
def is_float(str):
# 判断是否为整数和小数
try:
x = float(str)
return isinstance(x, float)
except ValueError:
return False
def merge_list(data1,data2):
'''
将两个list进行合并
:param data1:
:param data2:
:return:返回合并后的list
'''
if not len(data1)==len(data2):
return
all_data=[]
for d1,d2 in zip(data1,data2):
all_data.append(d1+d2)
return all_data
def split_list(data,split_index=1):
'''
将data切分成两部分
:param data: list
:param split_index: 切分的位置
:return:
'''
data1=[]
data2=[]
for d in data:
d1=d[0:split_index]
d2=d[split_index:]
data1.append(d1)
data2.append(d2)
return data1,data2
if __name__ == '__main__':
txt_filename = 'test.txt'
w_data = [['1.jpg', 'dog', 200, 300, 1.0], ['2.jpg', 'dog', 20, 30, -2]]
print("w_data=", w_data)
write_txt(w_data, txt_filename, mode='w')
r_data = read_txt(txt_filename)
print('r_data=', r_data)
data1,data2=split_list(w_data)
mer_data=merge_list(data1,data2)
print('mer_data=', mer_data)
读取以下txt文件,可使用以下方法:
test_image/dog/1.jpg 0 11
test_image/dog/2.jpg 0 12
test_image/dog/3.jpg 0 13
test_image/dog/4.jpg 0 14
test_image/cat/1.jpg 1 15
test_image/cat/2.jpg 1 16
test_image/cat/3.jpg 1 17
test_image/cat/4.jpg 1 18
def load_image_labels(test_files):
'''
载图txt文件,文件中每行为一个图片信息,且以空格隔开:图像路径 标签1 标签1,如:test_image/1.jpg 0 2
:param test_files:
:return:
'''
images_list=[]
labels_list=[]
with open(test_files) as f:
lines = f.readlines()
for line in lines:
#rstrip:用来去除结尾字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
content=line.rstrip().split(' ')
name=content[0]
labels=[]
for value in content[1:]:
labels.append(float(value))
images_list.append(name)
labels_list.append(labels)
return images_list,labels_list
假设有'data1.txt', 'data2.txt', 'data3.txt'数据:
#'data1.txt'
1.jpg 11
2.jpg 12
3.jpg 13
#'data2.txt'
1.jpg 110
2.jpg 120
3.jpg 130
#'data3.txt'
1.jpg 1100
2.jpg 1200
3.jpg 1300
需要拼接成:
1.jpg 11 110 1100
2.jpg 12 120 1200
3.jpg 13 130 1300
实现代码:
# coding: utf-8
import pandas as pd
def concat_data(page,save_path):
pd_data=[]
for i in range(len(page)):
content=pd.read_csv(page[i], dtype=str, delim_whitespace=True, header=None)
if i==0:
pd_data=pd.concat([content], axis=1)
else:# 每一列数据拼接
pd_data=pd.concat([pd_data,content.iloc[:,1]], axis=1)
pd_data.to_csv(save_path, index=False, sep=' ', header=None)
if __name__=='__main__':
txt_path = ['data1.txt', 'data2.txt', 'data3.txt']
out_path = 'all_data.txt'
concat_data(txt_path,out_path)
import pandas as pd
import numpy as np
def print_info(class_name,labels):
# index =range(len(class_name))+1
index=np.arange(0,len(class_name))+1
columns = ['class_name', 'labels']
content = np.array([class_name, labels]).T
df = pd.DataFrame(content, index=index, columns=columns) # 生成6行4列位置
print(df) # 输出6行4列的表格
class_name=['C1','C2','C3']
labels=[100,200,300]
print_info(class_name,labels)
https://blog.csdn.net/zhangchuang601/article/details/79583551
import pandas as pd
import numpy as np
df = pd.DataFrame(data = [['tom1','f',22],['tom2','f',22],['tom3','m',21]],index = [1,2,3],columns = ['name','sex','age'])#测试数据。
name | sex | age | |
---|---|---|---|
1 | tom1 | f | 22 |
2 | tom2 | f | 22 |
3 | tom3 | m | 21 |
citys = ['shenzhen1','shenzhen2','shenzhen3']
df.insert(2,'city',citys) #在第2列,加上column名称为city,值为citys的数值。
jobs = ['student','teacher','teacher']
df['job'] = jobs #默认在df最后一列加上column名称为job,值为jobs的数据。
df.loc[:,'salary'] = ['1k','2k','2k'] #在df最后一列加上column名称为salary,值为等号右边数据。
df
name | sex | city | age | job | salary | |
---|---|---|---|---|---|---|
1 | tom1 | f | shenzhen1 | 22 | student | 1k |
2 | tom2 | f | shenzhen2 | 22 | teacher | 2k |
3 | tom3 | m | shenzhen3 | 21 | teacher | 2k |
#若df中没有index为“4”的这一行的话,该行代码作用是往df中加一行index为“4”,值为等号右边值的数据。
#若df中已经有index为“4”的这一行,则该行代码作用是把df中index为“4”的这一行修改为等号右边数据。
df.loc[4] = ['tom4','m','shenzhen4',24,"engineer",'3k']
df
name | sex | city | age | job | salary | |
---|---|---|---|---|---|---|
1 | tom1 | f | shenzhen1 | 22 | student | 1k |
2 | tom2 | f | shenzhen2 | 22 | teacher | 2k |
3 | tom3 | m | shenzhen3 | 21 | teacher | 2k |
4 | tom4 | m | shenzhen4 | 24 | engineer | 3k |
# 按照age的值进行排序
df=df.sort_values(by=["age"],ascending=False)
df
name | sex | city | age | job | salary | |
---|---|---|---|---|---|---|
4 | tom4 | m | shenzhen4 | 24 | engineer | 3k |
1 | tom1 | f | shenzhen1 | 22 | student | 1k |
2 | tom2 | f | shenzhen2 | 22 | teacher | 2k |
3 | tom3 | m | shenzhen3 | 21 | teacher | 2k |
使用csv模块读取csv文件的数据
# -*- coding:utf-8 -*-
import csv
csv_path='test.csv'
with open(csv_path,'r') as csvfile:
reader = csv.DictReader(csvfile)
for item in reader:#遍历全部元素
print(item)
with open(csv_path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for item in reader: # 遍历全部元素
print(item['filename'],item['class'],item.get('height'),item.get('width'))
运行结果:
{'filename': 'test01.jpg', 'height': '638', 'class': 'dog', 'width': '486'}
{'filename': 'test02.jpg', 'height': '954', 'class': 'person', 'width': '726'}
test01.jpg dog 638 486
test02.jpg person 954 726
读写过程:
import csv
csv_path = 'test.csv'
#写csv
data=["1.jpg",200,300,'dog']
with open(csv_path, 'w+',newline='') as csv_file:
# headers = [k for k in dictionaries[0]]
headers=['filename','width','height', 'class']
print(headers)
writer = csv.DictWriter(csv_file, fieldnames=headers)
writer.writeheader()
dictionary={'filename': data[0],
'width': data[1],
'height': data[2],
'class': data[3],
}
writer.writerow(dictionary)
print(dictionary)
#读csv
with open(csv_path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for item in reader: # 遍历全部元素
print(item)
with open(csv_path, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for item in reader: # 遍历全部元素
print(item['filename'], item['class'], item.get('height'), item.get('width'))
import logging
# level级别:debug、info、warning、error以及critical
# logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG,format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
logger.debug("----1----")
logger.info("----2----")
logger.warning("----3----")
logger.error("----4----")
import numpy as np
def split_cell(mat,cell=(3,3),stepsize=(1,1)):
'''
:param mat:输入单通道的图像数据(可能有误,需要验证)
:param cell:块大小
:param stepsize: 步长stepsize |
Python中读取图片和显示图片的方式很多,绝大部分图像处理模块读取图片的通道是RGB格式,只有opencv-python模块读取的图片的BGR格式,如果采用其他模块显示opencv读取的图片,需要转换通道顺序,方法也比较简单,即:
import cv2
import matplotlib.pyplot as plt
temp_img=cv2.imread(image_path) #默认:BGR(不是RGB),uint8,[0,255],ndarry()
cv2.imshow("opencv-python",temp_img5)
cv2.waitKey(0)
# b, g, r = cv2.split(temp_img5)# 将BGR转为RGB格式
# img = cv2.merge([r, g, b])
# 推荐使用cv2.COLOR_BGR2RGB->将BGR转为RGB格式
img = cv2.cvtColor(temp_img5, cv2.COLOR_BGR2RGB)
plt.imshow(img) # 显示图片
plt.axis('off') # 不显示坐标轴
plt.show()
# coding: utf-8
'''
在Caffe中,彩色图像的通道要求是BGR格式,输入数据是float32类型,范围[0,255],
对每一层shape=(batch_size, channel_dim, height, width)。
[1]caffe的训练/测试prototxt文件,一般在数据层设置:cale:0.00392156885937,即1/255.0,即将数据归一化到[0,1]
[2]当输入数据为RGB图像,float32,[0,1],则需要转换:
--transformer.set_raw_scale('data',255) # 缩放至0~255
--transformer.set_channel_swap('data',(2,1,0))# 将RGB变换到BGR
[3]当输入数据是RGB图像,int8类型,[0,255],则输入数据之前必须乘以*1.0转换为float32
--transformer.set_raw_scale('data',1.0) # 数据不用缩放了
--transformer.set_channel_swap('data',(2,1,0))#将RGB变换到BGR
--通道:img = img.transpose(2, 0, 1) #通道由[h,w,c]->[c,h,w]
[4]在Python所有读取图片的模块,其图像格式都是shape=[height, width, channels],
比较另类的是,opencv-python读取的图片的BGR(caffe通道要求是BGR格式),而其他模块是RGB格式
'''
import numpy as np
import matplotlib.pyplot as plt
image_path = 'test_image/C0.jpg'#C0.jpg是高h=400,宽w=200
# 1.caffe
import caffe
img1 = caffe.io.load_image(image_path) # 默认:RGB,float32,[0-1],ndarry,shape=[400,200,3]
# 2.skimage
import skimage.io
img2 = skimage.io.imread(image_path) # 默认:RGB,uint8,[0,255],ndarry,shape=[400,200,3]
# img2=img2/255.0
# 3.matplotlib
import matplotlib.image
img3 = matplotlib.image.imread(image_path) # 默认:RGB,uint8,[0,255],ndarry,shape=[400,200,3]
# 4.PIL
from PIL import Image
temp_img4 = Image.open(image_path) # 默认:RGB,uint8,[0,255],
# temp_img4.show() #会调用系统自定的图片查看器显示图片
img4 = np.array(temp_img4) # 转为ndarry类型,shape=[400,200,3]
# 5.opencv
import cv2
temp_img5 = cv2.imread(image_path) # 默认:BGR(不是RGB),uint8,[0,255],ndarry,shape=[400,200,3]
# cv2.imshow("opencv-python",temp_img5)
# cv2.waitKey(0)
# b, g, r = cv2.split(temp_img5)# 将BGR转为RGB格式
# img5 = cv2.merge([r, g, b])
# 推荐使用cv2.COLOR_BGR2RGB->将BGR转为RGB格式
img5 = cv2.cvtColor(temp_img5, cv2.COLOR_BGR2RGB)
img6 = img5.transpose(2, 0, 1) #通道由[h,w,c]->[c,h,w]
# 以上ndarry类型图像数据都可以用下面的方式直接显示
plt.imshow(img5) # 显示图片
plt.axis('off') # 不显示坐标轴
plt.show()
封装好的图像读取和保存模块:
import matplotlib.pyplot as plt
import cv2
def show_image(title, image):
'''
显示图片
:param title: 图像标题
:param image: 图像的数据
:return:
'''
# plt.figure("show_image")
# print(image.dtype)
plt.imshow(image)
plt.axis('on') # 关掉坐标轴为 off
plt.title(title) # 图像题目
plt.show()
def show_image_rect(win_name, image, rect):
plt.figure()
plt.title(win_name)
plt.imshow(image)
rect =plt.Rectangle((rect[0], rect[1]), rect[2], rect[3], linewidth=2, edgecolor='r', facecolor='none')
plt.gca().add_patch(rect)
plt.show()
def read_image(filename, resize_height, resize_width,normalization=False):
'''
读取图片数据,默认返回的是uint8,[0,255]
:param filename:
:param resize_height:
:param resize_width:
:param normalization:是否归一化到[0.,1.0]
:return: 返回的图片数据
'''
bgr_image = cv2.imread(filename)
if len(bgr_image.shape)==2:#若是灰度图则转为三通道
print("Warning:gray image",filename)
bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)
rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)#将BGR转为RGB
# show_image(filename,rgb_image)
# rgb_image=Image.open(filename)
if resize_height>0 and resize_width>0:
rgb_image=cv2.resize(rgb_image,(resize_width,resize_height))
rgb_image=np.asanyarray(rgb_image)
if normalization:
# 不能写成:rgb_image=rgb_image/255
rgb_image=rgb_image/255.0
# show_image("src resize image",image)
return rgb_image
def save_image(image_path,image):
plt.imsave(image_path,image)
这里采用 matplotlib.image 读入图片数组,注意这里读入的数组是 float32 型的,范围是 0-1,而 PIL.Image 数据是 uinit8 型的,范围是0-255,所以要进行转换:
import matplotlib.image as mpimg
from PIL import Image
lena = mpimg.imread('lena.png') # 这里读入的数据是 float32 型的,范围是0-1
im = Image.fromarray(np.uinit8(lena*255))
im.show()
PIL.Image转换成OpenCV格式:
import cv2
from PIL import Image
import numpy
image = Image.open("plane.jpg")
image.show()
img = cv2.cvtColor(numpy.asarray(image),cv2.COLOR_RGB2BGR)
cv2.imshow("OpenCV",img)
cv2.waitKey()
OpenCV转换成PIL.Image格式:
import cv2
from PIL import Image
import numpy
img = cv2.imread("plane.jpg")
cv2.imshow("OpenCV",img)
image = Image.fromarray(cv2.cvtColor(img,cv2.COLOR_BGR2RGB))
image.show()
cv2.waitKey()
判断图像数据是否是OpenCV格式:
isinstance(img, np.ndarray)
https://blog.csdn.net/wonengguwozai/article/details/79686062
下面这个例子讲的是如何像matlab一样同时打开多个窗口显示图片或线条进行比较,同时也是在脚本中开启交互模式后图像一闪而过的解决办法:
import matplotlib.pyplot as plt
plt.ion() # 打开交互模式
# 同时打开两个窗口显示图片
plt.figure()
plt.imshow(image1)
plt.figure()
plt.imshow(image2)
plt.ioff()# 显示前关掉交互模式,避免一闪而过
plt.show()
import matplotlib.pyplot as plt
def show_image(win_name, image, rect):
plt.figure()
plt.title(win_name)
plt.imshow(image)
rect =plt.Rectangle((rect[0], rect[1]), rect[2], rect[3], linewidth=2, edgecolor='r', facecolor='none')
plt.gca().add_patch(rect)
plt.show()
import os
import numpy as np
from sklearn import preprocessing
def gen_data_labels(label_list,ont_hot=True):
'''
label_list:输入labels ->list
'''
# 将labels转为整数编码
# labels_set=list(set(label_list))
# labels=[]
# for label in label_list:
# for k in range(len(labels_set)):
# if label==labels_set[k]:
# labels+=[k]
# break
# labels = np.asarray(labels)
# 也可以用下面的方法:将labels转为整数编码
labelEncoder = preprocessing.LabelEncoder()
labels = labelEncoder.fit_transform(label_list)
labels_set = labelEncoder.classes_
for i in range(len(labels_set)):
print("labels:{}->{}".format(labels_set[i],i))
# 是否进行独热编码
if ont_hot:
labels_nums=len(labels_set)
labels = labels.reshape(len(labels), 1)
onehot_encoder = preprocessing.OneHotEncoder(sparse=False,categories=[range(labels_nums)])
onehot_encoder = preprocessing.OneHotEncoder(sparse=False,categories='auto')
labels = onehot_encoder.fit_transform(labels)
return labels
TXT文本:
1.jpg 1 11
2.jpg 2 12
3.jpg 3 13
4.jpg 4 14
5.jpg 5 15
6.jpg 6 16
7.jpg 7 17
8.jpg 8 18
# -*-coding: utf-8 -*-
"""
@Project: LSTM
@File : create_batch_data.py
@Author : panjq
@E-mail : [email protected]
@Date : 2018-10-27 18:20:15
"""
import math
import random
import os
import glob
import numpy as np
def get_list_batch(inputs, batch_size=None, shuffle=False):
'''
循环产生batch数据
:param inputs: list数据
:param batch_size: batch大小
:param shuffle: 是否打乱inputs数据
:return: 返回一个batch数据
'''
if shuffle:
random.shuffle(inputs)
while True:
batch_inouts = inputs[0:batch_size]
inputs=inputs[batch_size:] + inputs[:batch_size]# 循环移位,以便产生下一个batch
yield batch_inouts
def get_data_batch(inputs, batch_size=None, shuffle=False):
'''
循环产生batch数据
:param inputs: list数据
:param batch_size: batch大小
:param shuffle: 是否打乱inputs数据
:return: 返回一个batch数据
'''
# rows,cols=inputs.shape
rows=len(inputs)
indices =list(range(rows))
if shuffle:
random.shuffle(indices )
while True:
batch_indices = indices[0:batch_size]
indices= indices [batch_size:] + indices[:batch_size] # 循环移位,以便产生下一个batch
batch_data=find_list(batch_indices,inputs)
# batch_data=find_array(batch_indices,inputs)
yield batch_data
def find_list(indices,data):
out=[]
for i in indices:
out=out+[data[i]]
return out
def find_array(indices,data):
rows,cols=data.shape
out = np.zeros((len(indices), cols))
for i,index in enumerate(indices):
out[i]=data[index]
return out
def load_file_list(text_dir):
text_dir = os.path.join(text_dir, '*.txt')
text_list = glob.glob(text_dir)
return text_list
def get_next_batch(batch):
return batch.__next__()
def load_image_labels(test_files):
'''
载图txt文件,文件中每行为一个图片信息,且以空格隔开:图像路径 标签1 标签1,如:test_image/1.jpg 0 2
:param test_files:
:return:
'''
images_list=[]
labels_list=[]
with open(test_files) as f:
lines = f.readlines()
for line in lines:
#rstrip:用来去除结尾字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
content=line.rstrip().split(' ')
name=content[0]
labels=[]
for value in content[1:]:
labels.append(float(value))
images_list.append(name)
labels_list.append(labels)
return images_list,labels_list
if __name__ == '__main__':
filename='./training_data/train.txt'
images_list, labels_list=load_image_labels(filename)
# inputs = np.reshape(np.arange(8*3), (8,3))
iter = 10 # 迭代10次,每次输出5个
batch = get_data_batch(images_list, batch_size=3, shuffle=False)
for i in range(iter):
print('**************************')
# train_batch=batch.__next__()
batch_images=get_next_batch(batch)
print(batch_images)
label_list=['星座', '星座', '财经', '财经', '财经', '教育', '教育', '教育', ]
set1 = set(label_list) # set1 ={'财经', '教育', '星座'},set集合中不允许重复元素出现
set2 = np.unique(label_list)# set2=['教育' '星座' '财经']
# 若要输出对应元素的个数:
from collections import Counter
arr = [1, 2, 3, 3, 2, 1, 0, 2]
result = {}
for i in set(arr):
result[i] = arr.count(i)
print(result)
# 更加简单的方法:
import pandas as pd
print(pd.value_counts(label_list))
python 字典(dict)的特点就是无序的,按照键(key)来提取相应值(value),如果我们需要字典按值排序的话,那可以用下面的方法来进行:
1 .下面的是按照value的值从大到小的顺序来排序
dic = {'a':31, 'bc':5, 'c':3, 'asd':4, 'aa':74, 'd':0}
dict= sorted(dic.items(), key=lambda d:d[1], reverse = True)
print dict
输出的结果:
[('aa', 74), ('a', 31), ('bc', 5), ('asd', 4), ('c', 3), ('d', 0)]
下面我们分解下代码
print dic.items() 得到[(键,值)]的列表。
然后用sorted方法,通过key这个参数,指定排序是按照value,也就是第一个元素d[1的值来排序。reverse = True表示是需要翻转的,默认是从小到大,翻转的话,那就是从大到小。
2 .对字典按键(key)排序:
dic = {'a':31, 'bc':5, 'c':3, 'asd':4, 'aa':74, 'd':0}
dict= sorted(dic.items(), key=lambda d:d[0]) d[0]表示字典的键
print dict
下面my_sort函数,将根据labels的相同的个数进行排序,把labels相同的个数多的样本,排在前面
# -*-coding: utf-8 -*-
"""
@Project: IntelligentManufacture
@File : statistic_analysis.py
@Author : panjq
@E-mail : [email protected]
@Date : 2019-02-15 13:47:58
"""
import pandas as pd
import numpy as np
import functools
def print_cluster_info(title,labels_id, labels,columns = ['labels_id', 'labels']):
index= np.arange(0, len(labels_id)) + 1
content = np.array([labels_id, labels]).T
df = pd.DataFrame(content, index=index, columns=columns) # 生成6行4列位置
print('*************************************************')
print("{}{}".format(title,df))
def print_cluster_container(title,cluster_container,columns = ['labels_id', 'labels']):
'''
:param cluster_container:type:list[tupe()]
:param columns:
:return:
'''
labels_id, labels=zip(*cluster_container)
labels_id=list(labels_id)
labels=list(labels)
print_cluster_info(title,labels_id, labels, columns=columns)
def sort_cluster_container(cluster_container):
'''
自定义排序:将根据labels的相同的个数进行排序,把labels相同的个数多的样本,排在前面
:param labels_id:
:param labels:
:return:
'''
# labels_id=list(cluster_container.keys())
# labels=list(cluster_container.values())
labels_id, labels=zip(*cluster_container)
labels_id=list(labels_id)
labels=list(labels)
# 求每个labels的样本个数value_counts_dict
value_counts_dict = {}
labels_set = set(labels)
for i in labels_set:
value_counts_dict[i] = labels.count(i)
def cmp(a, b):
# 降序
a_key, a_value = a
b_key, b_value = b
a_count = value_counts_dict[a_value]
b_count = value_counts_dict[b_value]
if a_count > b_count: # 个数多的放在前面
return -1
elif (a_count == b_count) and (a_value > b_value): # 当个数相同时,则value大的放在前面
return -1
else:
return 1
out = sorted(cluster_container, key=functools.cmp_to_key(cmp))
return out
if __name__=='__main__':
labels_id=["image0",'image1',"image2","image3","image4","image5","image6"]
labels=[0.0,1.0,2.0,1.0,1.0,2.0,3.0]
# labels=['L0','L1','L2','L1','L1','L2',"L3"]
cluster_container=list(zip(labels_id, labels))
print("cluster_container:{}".format(cluster_container))
print_cluster_container("排序前:\n",cluster_container, columns=['labels_id', 'labels'])
out=sort_cluster_container(cluster_container)
print_cluster_container("排序后:\n",out, columns=['labels_id', 'labels'])
结果:
假设config.yml的配置文件如下:
## Basic config
batch_size: 2
learning_rate: 0.001
epoch: 1000## reset image size
height: 128
width: 128
利用Python可以如下加载数据:
import yaml
class Dict2Obj:
'''
dict转类对象
'''
def __init__(self, bokeyuan):
self.__dict__.update(bokeyuan)
def load_config_file(file):
with open(file, 'r') as f:
data_dict = yaml.load(f,Loader=yaml.FullLoader)
data_dict = Dict2Obj(data_dict)
return data_dict
if __name__=="__main__":
config_file='../config/config.yml'
para=load_config_file(config_file)
print("batch_size:{}".format(para.batch_size))
print("learning_rate:{}".format(para.learning_rate))
print("epoch:{}".format(para.epoch))
运行输出结果:
batch_size:2
learning_rate:0.001
epoch:1000
# -*- coding: utf-8 -*-
#!/usr/bin/python
#test_copyfile.py
import os,shutil
def rename(image_list):
for name in image_list:
cut_len=len('_cropped.jpg')
newName = name[:-cut_len]+'.jpg'
print(name)
print(newName)
os.rename(name, newName)
def mymovefile(srcfile,dstfile):
if not os.path.isfile(srcfile):
print "%s not exist!"%(srcfile)
else:
fpath,fname=os.path.split(dstfile) #分离文件名和路径
if not os.path.exists(fpath):
os.makedirs(fpath) #创建路径
shutil.move(srcfile,dstfile) #移动文件
print "move %s -> %s"%( srcfile,dstfile)
def mycopyfile(srcfile,dstfile):
if not os.path.isfile(srcfile):
print "%s not exist!"%(srcfile)
else:
fpath,fname=os.path.split(dstfile) #分离文件名和路径
if not os.path.exists(fpath):
os.makedirs(fpath) #创建路径
shutil.copyfile(srcfile,dstfile) #复制文件
print "copy %s -> %s"%( srcfile,dstfile)
srcfile='/Users/xxx/git/project1/test.sh'
dstfile='/Users/xxx/tmp/tmp/1/test.sh'
mymovefile(srcfile,dstfile)
def get_batch(image_list, batch_size):
sample_num = len(image_list)
batch_num = math.ceil(sample_num / batch_size)
for i in range(batch_num):
start = i * batch_size
end = min((i + 1) * batch_size, sample_num)
batch_image = image_list[start:end]
print("batch_image:{}".format(batch_image))
if __name__ == "__main__":
image_list = []
batch_size = 7
for i in range(10):
image_list.append(str(i) + ".jpg")
get_batch(image_list, batch_size)
batch_image:['0.jpg', '1.jpg', '2.jpg', '3.jpg', '4.jpg', '5.jpg', '6.jpg']
batch_image:['7.jpg', '8.jpg', '9.jpg']
# -*-coding: utf-8 -*-
"""
@Project: IntelligentManufacture
@File : image_processing.py
@Author : panjq
@E-mail : [email protected]
@Date : 2019-02-14 15:34:50
"""
import os
import glob
import cv2
import numpy as np
import matplotlib.pyplot as plt
import copy
def show_batch_image(title,batch_imgs,index=0):
image = batch_imgs[index, :]
# image = image.numpy() #
image = np.array(image, dtype=np.float32)
image=np.squeeze(image)
if len(image.shape)==3:
image = image.transpose(1, 2, 0) # 通道由[c,h,w]->[h,w,c]
else:
image = image.transpose(1,0)
cv_show_image(title,image)
def show_image(title, rgb_image):
'''
调用matplotlib显示RGB图片
:param title: 图像标题
:param rgb_image: 图像的数据
:return:
'''
# plt.figure("show_image")
# print(image.dtype)
channel=len(rgb_image.shape)
if channel==3:
plt.imshow(rgb_image)
else :
plt.imshow(rgb_image, cmap='gray')
plt.axis('on') # 关掉坐标轴为 off
plt.title(title) # 图像题目
plt.show()
def cv_show_image(title, image, type='rgb'):
'''
调用OpenCV显示RGB图片
:param title: 图像标题
:param image: 输入RGB图像
:param type:'rgb' or 'bgr'
:return:
'''
channels=image.shape[-1]
if channels==3 and type=='rgb':
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # 将BGR转为RGB
cv2.imshow(title, image)
cv2.waitKey(0)
def show_batch_image(title,batch_imgs,index=0):
image = batch_imgs[index, :]
# image = image.numpy() #
image = np.array(image, dtype=np.float32)
if len(image.shape)==3:
image = image.transpose(1, 2, 0) # 通道由[c,h,w]->[h,w,c]
else:
image = image.transpose(1,0)
cv_show_image(title,image)
def get_prewhiten_image(x):
mean = np.mean(x)
std = np.std(x)
std_adj = np.maximum(std, 1.0 / np.sqrt(x.size))
y = np.multiply(np.subtract(x, mean), 1 / std_adj)
return y
def image_normalization(image,mean=None,std=None):
# 不能写成:image=image/255
image = np.array(image, dtype=np.float32)
image = image / 255.0
if mean is not None:
image=np.subtract(image, mean)
if std is not None:
np.multiply(image, 1 / std)
return image
def get_prewhiten_images(images_list,normalization=False):
out_images=[]
for image in images_list:
if normalization:
image=image_normalization(image)
image=get_prewhiten_image(image)
out_images.append(image)
return out_images
def read_image(filename, resize_height=None, resize_width=None, normalization=False,colorSpace='RGB'):
'''
读取图片数据,默认返回的是uint8,[0,255]
:param filename:
:param resize_height:
:param resize_width:
:param normalization:是否归一化到[0.,1.0]
:param colorSpace 输出格式:RGB or BGR
:return: 返回的图片数据
'''
bgr_image = cv2.imread(filename)
# bgr_image = cv2.imread(filename,cv2.IMREAD_IGNORE_ORIENTATION|cv2.IMREAD_COLOR)
if bgr_image is None:
print("Warning:不存在:{}", filename)
return None
if len(bgr_image.shape) == 2: # 若是灰度图则转为三通道
print("Warning:gray image", filename)
bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)
if colorSpace=='RGB':
image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB) # 将BGR转为RGB
elif colorSpace=="BGR":
image=bgr_image
else:
exit(0)
# show_image(filename,image)
# image=Image.open(filename)
image = resize_image(image,resize_height,resize_width)
image = np.asanyarray(image)
if normalization:
image=image_normalization(image)
# show_image("src resize image",image)
return image
def read_image_gbk(filename, resize_height=None, resize_width=None, normalization=False,colorSpace='RGB'):
'''
解决imread不能读取中文路径的问题,读取图片数据,默认返回的是uint8,[0,255]
:param filename:
:param resize_height:
:param resize_width:
:param normalization:是否归一化到[0.,1.0]
:param colorSpace 输出格式:RGB or BGR
:return: 返回的RGB图片数据
'''
with open(filename, 'rb') as f:
data = f.read()
data = np.asarray(bytearray(data), dtype="uint8")
bgr_image = cv2.imdecode(data, cv2.IMREAD_COLOR)
# 或者:
# bgr_image=cv2.imdecode(np.fromfile(filename,dtype=np.uint8),cv2.IMREAD_COLOR)
if bgr_image is None:
print("Warning:不存在:{}", filename)
return None
if len(bgr_image.shape) == 2: # 若是灰度图则转为三通道
print("Warning:gray image", filename)
bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)
if colorSpace=='RGB':
image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB) # 将BGR转为RGB
elif colorSpace=="BGR":
image=bgr_image
else:
exit(0)
# show_image(filename,image)
# image=Image.open(filename)
image = resize_image(image,resize_height,resize_width)
image = np.asanyarray(image)
if normalization:
image=image_normalization(image)
# show_image("src resize image",image)
return image
def fast_read_image_roi(filename, orig_rect, ImreadModes=cv2.IMREAD_COLOR, normalization=False,colorSpace='RGB'):
'''
快速读取图片的方法
:param filename: 图片路径
:param orig_rect:原始图片的感兴趣区域rect
:param ImreadModes: IMREAD_UNCHANGED
IMREAD_GRAYSCALE
IMREAD_COLOR
IMREAD_ANYDEPTH
IMREAD_ANYCOLOR
IMREAD_LOAD_GDAL
IMREAD_REDUCED_GRAYSCALE_2
IMREAD_REDUCED_COLOR_2
IMREAD_REDUCED_GRAYSCALE_4
IMREAD_REDUCED_COLOR_4
IMREAD_REDUCED_GRAYSCALE_8
IMREAD_REDUCED_COLOR_8
IMREAD_IGNORE_ORIENTATION
:param normalization: 是否归一化
:param colorSpace 输出格式:RGB or BGR
:return: 返回感兴趣区域ROI
'''
# 当采用IMREAD_REDUCED模式时,对应rect也需要缩放
scale=1
if ImreadModes == cv2.IMREAD_REDUCED_COLOR_2 or ImreadModes == cv2.IMREAD_REDUCED_COLOR_2:
scale=1/2
elif ImreadModes == cv2.IMREAD_REDUCED_GRAYSCALE_4 or ImreadModes == cv2.IMREAD_REDUCED_COLOR_4:
scale=1/4
elif ImreadModes == cv2.IMREAD_REDUCED_GRAYSCALE_8 or ImreadModes == cv2.IMREAD_REDUCED_COLOR_8:
scale=1/8
rect = np.array(orig_rect)*scale
rect = rect.astype(int).tolist()
bgr_image = cv2.imread(filename,flags=ImreadModes)
if bgr_image is None:
print("Warning:不存在:{}", filename)
return None
if len(bgr_image.shape) == 2: # 若是灰度图则转为三通道
print("Warning:gray image", filename)
bgr_image = cv2.cvtColor(bgr_image, cv2.COLOR_GRAY2BGR)
if colorSpace == 'RGB':
image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB) # 将BGR转为RGB
elif colorSpace == "BGR":
image = bgr_image
image = np.asanyarray(image)
if normalization:
image=image_normalization(image)
roi_image=get_rect_image(image , rect)
# show_image_rect("src resize image",rgb_image,rect)
# cv_show_image("reROI",roi_image)
return roi_image
def resize_image(image,resize_height, resize_width):
'''
:param image:
:param resize_height:
:param resize_width:
:return:
'''
image_shape=np.shape(image)
height=image_shape[0]
width=image_shape[1]
if (resize_height is None) and (resize_width is None):#错误写法:resize_height and resize_width is None
return image
if resize_height is None:
resize_height=int(height*resize_width/width)
elif resize_width is None:
resize_width=int(width*resize_height/height)
image = cv2.resize(image, dsize=(resize_width, resize_height))
return image
def scale_image(image,scale):
'''
:param image:
:param scale: (scale_w,scale_h)
:return:
'''
image = cv2.resize(image,dsize=None, fx=scale[0],fy=scale[1])
return image
def get_rect_image(image,rect):
'''
:param image:
:param rect: [x,y,w,h]
:return:
'''
shape=image.shape#h,w
height=shape[0]
width=shape[1]
image_rect=(0,0,width,height)
rect=get_rect_intersection(rect, image_rect)
x, y, w, h=rect
cut_img = image[y:(y+ h),x:(x+w)]
return cut_img
def get_rects_image(image,rects_list,resize_height=None, resize_width=None):
rect_images = []
for rect in rects_list:
roi=get_rect_image(image, rect)
roi=resize_image(roi, resize_height, resize_width)
rect_images.append(roi)
return rect_images
def get_bboxes_image(image,bboxes_list,resize_height=None, resize_width=None):
rects_list=bboxes2rects(bboxes_list)
rect_images = get_rects_image(image,rects_list,resize_height, resize_width)
return rect_images
def bboxes2rects(bboxes_list):
'''
将bboxes=[x1,y1,x2,y2] 转为rect=[x1,y1,w,h]
:param bboxes_list:
:return:
'''
rects_list=[]
for bbox in bboxes_list:
x1, y1, x2, y2=bbox
rect=[ x1, y1,(x2-x1),(y2-y1)]
rects_list.append(rect)
return rects_list
def rects2bboxes(rects_list):
'''
将rect=[x1,y1,w,h]转为bboxes=[x1,y1,x2,y2]
:param rects_list:
:return:
'''
bboxes_list=[]
for rect in rects_list:
x1, y1, w, h = rect
x2=x1+w
y2=y1+h
b=(x1,y1,x2,y2)
bboxes_list.append(b)
return bboxes_list
def scale_rect(orig_rect,orig_shape,dest_shape):
'''
对图像进行缩放时,对应的rectangle也要进行缩放
:param orig_rect: 原始图像的rect=[x,y,w,h]
:param orig_shape: 原始图像的维度shape=[h,w]
:param dest_shape: 缩放后图像的维度shape=[h,w]
:return: 经过缩放后的rectangle
'''
new_x=int(orig_rect[0]*dest_shape[1]/orig_shape[1])
new_y=int(orig_rect[1]*dest_shape[0]/orig_shape[0])
new_w=int(orig_rect[2]*dest_shape[1]/orig_shape[1])
new_h=int(orig_rect[3]*dest_shape[0]/orig_shape[0])
dest_rect=[new_x,new_y,new_w,new_h]
return dest_rect
def get_rect_intersection(rec1,rec2):
'''
计算两个rect的交集坐标
:param rec1:
:param rec2:
:return:
'''
cx1, cy1, cx2, cy2 = rects2bboxes([rec1])[0]
gx1, gy1, gx2, gy2 = rects2bboxes([rec2])[0]
x1 = max(cx1, gx1)
y1 = max(cy1, gy1)
x2 = min(cx2, gx2)
y2 = min(cy2, gy2)
w = max(0, x2 - x1)
h = max(0, y2 - y1)
return (x1,y1,w,h)
def show_image_bboxes_text(title, rgb_image, boxes, boxes_name):
'''
:param boxes_name:
:param bgr_image: bgr image
:param boxes: [[x1,y1,x2,y2],[x1,y1,x2,y2]]
:return:
'''
bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR)
for name ,box in zip(boxes_name,boxes):
box=[int(b) for b in box]
cv2.rectangle(bgr_image, (box[0],box[1]),(box[2],box[3]), (0, 255, 0), 2, 8, 0)
cv2.putText(bgr_image,name, (box[0],box[1]), cv2.FONT_HERSHEY_COMPLEX_SMALL, 0.8, (0, 0, 255), thickness=2)
# cv2.imshow(title, bgr_image)
# cv2.waitKey(0)
rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)
show_image(title, rgb_image)
def show_image_rects_text(title, rgb_image, rects_list, boxes_name):
'''
:param boxes_name:
:param bgr_image: bgr image
:param boxes: [[x1,y1,w,h],[x1,y1,w,h]]
:return:
'''
bbox_list = rects2bboxes(rects_list)
show_image_bboxes_text(title, rgb_image, bbox_list, boxes_name)
def show_image_rects(win_name,image,rect_list):
'''
:param win_name:
:param image:
:param rect_list:[[ x, y, w, h],[ x, y, w, h]]
:return:
'''
for rect in rect_list:
x, y, w, h=rect
point1=(int(x),int(y))
point2=(int(x+w),int(y+h))
cv2.rectangle(image, point1, point2, (0, 0, 255), thickness=2)
cv_show_image(win_name, image)
def show_landmark_boxex(win_name,img,landmarks_list,boxes):
'''
显示landmark和boxex
:param win_name:
:param image:
:param landmarks_list: [[x1, y1], [x2, y2]]
:param boxes: [[ x1, y1, x2, y2],[ x1, y1, x2, y2]]
:return:
'''
image=copy.copy(img)
point_size = 1
point_color = (0, 0, 255) # BGR
thickness = 4 # 可以为 0 、4、8
for landmarks in landmarks_list:
for landmark in landmarks:
# 要画的点的坐标
point = (landmark[0],landmark[1])
cv2.circle(image, point, point_size, point_color, thickness)
show_image_boxes(win_name, image, boxes)
def show_image_boxes(win_name,image,boxes_list):
'''
:param win_name:
:param image:
:param boxes_list:[[ x1, y1, x2, y2],[ x1, y1, x2, y2]]
:return:
'''
for box in boxes_list:
x1, y1, x2, y2=box
point1=(int(x1),int(y1))
point2=(int(x2),int(y2))
cv2.rectangle(image, point1, point2, (0, 0, 255), thickness=2)
show_image(win_name, image)
def rgb_to_gray(image):
image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
return image
def save_image(image_path, rgb_image,toUINT8=True):
if toUINT8:
rgb_image = np.asanyarray(rgb_image * 255, dtype=np.uint8)
if len(rgb_image.shape) == 2: # 若是灰度图则转为三通道
bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_GRAY2BGR)
else:
bgr_image = cv2.cvtColor(rgb_image, cv2.COLOR_RGB2BGR)
cv2.imwrite(image_path, bgr_image)
def combime_save_image(orig_image, dest_image, out_dir,name,prefix):
'''
命名标准:out_dir/name_prefix.jpg
:param orig_image:
:param dest_image:
:param image_path:
:param out_dir:
:param prefix:
:return:
'''
dest_path = os.path.join(out_dir, name + "_"+prefix+".jpg")
save_image(dest_path, dest_image)
dest_image = np.hstack((orig_image, dest_image))
save_image(os.path.join(out_dir, "{}_src_{}.jpg".format(name,prefix)), dest_image)
if __name__=="__main__":
# image_path="../dataset/test_images/lena1.jpg"
# image_path="E:/git/dataset/tgs-salt-identification-challenge/train/my_masks/4.png"
image_path = 'E:/Face/dataset/bzl/test3/test_dataset/陈思远_716/8205_0.936223.jpg'
# target_rect=main.select_user_roi(target_path)#rectangle=[x,y,w,h]
# orig_rect = [50, 50, 100000, 10000]
image = read_image_gbk(image_path, resize_height=None, resize_width=None)
# orig_image=get_rect_image(image,orig_rect)
# show_image_rects("image",image,[orig_rect])
show_image("orig_image",image)
# -*-coding: utf-8 -*-
"""
@Project: IntelligentManufacture
@File : file_processing.py
@Author : panjq
@E-mail : [email protected]
@Date : 2019-02-14 15:08:19
"""
import glob
import os
import os,shutil
import numpy as np
import pandas as pd
def write_data(filename, content_list,mode='w'):
"""保存txt数据
:param filename:文件名
:param content_list:需要保存的数据,type->list
:param mode:读写模式:'w' or 'a'
:return: void
"""
with open(filename, mode=mode, encoding='utf-8') as f:
for line_list in content_list:
# 将list转为string
line=" ".join('%s' % id for id in line_list)
f.write(line+"\n")
def read_data(filename,split=" ",convertNum=True):
"""
读取txt数据函数
:param filename:文件名
:param split :分割符
:param convertNum :是否将list中的string转为int/float类型的数字
:return: txt的数据列表
Python中有三个去除头尾字符、空白符的函数,它们依次为:
strip: 用来去除头尾字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
lstrip:用来去除开头字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
rstrip:用来去除结尾字符、空白符(包括\n、\r、\t、' ',即:换行、回车、制表符、空格)
注意:这些函数都只会删除头和尾的字符,中间的不会删除。
"""
with open(filename, mode="r",encoding='utf-8') as f:
content_list = f.readlines()
if split is None:
content_list = [content.rstrip() for content in content_list]
return content_list
else:
content_list = [content.rstrip().split(split) for content in content_list]
if convertNum:
for i,line in enumerate(content_list):
line_data=[]
for l in line:
if is_int(l): # isdigit() 方法检测字符串是否只由数字组成,只能判断整数
line_data.append(int(l))
elif is_float(l): # 判断是否为小数
line_data.append(float(l))
else:
line_data.append(l)
content_list[i]=line_data
return content_list
def is_int(str):
# 判断是否为整数
try:
x = int(str)
return isinstance(x, int)
except ValueError:
return False
def is_float(str):
# 判断是否为整数和小数
try:
x = float(str)
return isinstance(x, float)
except ValueError:
return False
def list2str(content_list):
content_str_list=[]
for line_list in content_list:
line_str = " ".join('%s' % id for id in line_list)
content_str_list.append(line_str)
return content_str_list
def get_images_list(image_dir,postfix=['*.jpg'],basename=False):
'''
获得文件列表
:param image_dir: 图片文件目录
:param postfix: 后缀名,可是多个如,['*.jpg','*.png']
:param basename: 返回的列表是文件名(True),还是文件的完整路径(False)
:return:
'''
images_list=[]
for format in postfix:
image_format=os.path.join(image_dir,format)
image_list=glob.glob(image_format)
if not image_list==[]:
images_list+=image_list
images_list=sorted(images_list)
if basename:
images_list=get_basename(images_list)
return images_list
def get_basename(file_list):
dest_list=[]
for file_path in file_list:
basename=os.path.basename(file_path)
dest_list.append(basename)
return dest_list
def copyfile(srcfile,dstfile):
if not os.path.isfile(srcfile):
print("%s not exist!"%(srcfile))
else:
fpath,fname=os.path.split(dstfile) #分离文件名和路径
if not os.path.exists(fpath):
os.makedirs(fpath) #创建路径
shutil.copyfile(srcfile,dstfile) #复制文件
# print("copy %s -> %s"%( srcfile,dstfile))
def merge_list(data1, data2):
'''
将两个list进行合并
:param data1:
:param data2:
:return:返回合并后的list
'''
if not len(data1) == len(data2):
return
all_data = []
for d1, d2 in zip(data1, data2):
all_data.append(d1 + d2)
return all_data
def split_list(data, split_index=1):
'''
将data切分成两部分
:param data: list
:param split_index: 切分的位置
:return:
'''
data1 = []
data2 = []
for d in data:
d1 = d[0:split_index]
d2 = d[split_index:]
data1.append(d1)
data2.append(d2)
return data1, data2
def getFilePathList(file_dir):
'''
获取file_dir目录下,所有文本路径,包括子目录文件
:param rootDir:
:return:
'''
filePath_list = []
for walk in os.walk(file_dir):
part_filePath_list = [os.path.join(walk[0], file) for file in walk[2]]
filePath_list.extend(part_filePath_list)
return filePath_list
def get_files_list(file_dir, postfix='ALL'):
'''
获得file_dir目录下,后缀名为postfix所有文件列表,包括子目录
:param file_dir:
:param postfix: jpg.png
:return:
'''
postfix = postfix.split('.')[-1]
file_list = []
filePath_list = getFilePathList(file_dir)
if postfix == 'ALL':
file_list = filePath_list
else:
for file in filePath_list:
basename = os.path.basename(file) # 获得路径下的文件名
postfix_name = basename.split('.')[-1]
if postfix_name == postfix:
file_list.append(file)
file_list.sort()
return file_list
def gen_files_labels(files_dir,postfix='ALL'):
'''
获取files_dir路径下所有文件路径,以及labels,其中labels用子级文件名表示
files_dir目录下,同一类别的文件放一个文件夹,其labels即为文件的名
:param files_dir:
:postfix 后缀名
:return:filePath_list所有文件的路径,label_list对应的labels
'''
# filePath_list = getFilePathList(files_dir)
filePath_list=get_files_list(files_dir, postfix=postfix)
print("files nums:{}".format(len(filePath_list)))
# 获取所有样本标签
label_list = []
for filePath in filePath_list:
label = filePath.split(os.sep)[-2]
label_list.append(label)
labels_set = list(set(label_list))
print("labels:{}".format(labels_set))
# 标签统计计数
# print(pd.value_counts(label_list))
return filePath_list, label_list
def decode_label(label_list,name_table):
'''
根据name_table解码label
:param label_list:
:param name_table:
:return:
'''
name_list=[]
for label in label_list:
name = name_table[label]
name_list.append(name)
return name_list
def encode_label(name_list,name_table):
'''
根据name_table,编码label
:param name_list:
:param name_table:
:return:
'''
label_list=[]
for name in name_list:
index = name_table.index(name)
label_list.append(index)
return label_list
if __name__=='__main__':
filename = 'test.txt'
w_data = [['1.jpg', 'dog', 200, 300, 1.0], ['2.jpg', 'dog', 20, 30, -2]]
print("w_data=", w_data)
write_data(filename,w_data, mode='w')
r_data = read_data(filename)
print('r_data=', r_data)
import datetime
import logging
import sys
import time
'''
url:https://cuiqingcai.com/6080.html
level级别:debug、info、warning、error以及critical
'''
# logging.basicConfig(level=logging.DEBUG,
# filename='output.log',
# datefmt='%Y/%m/%d %H:%M:%S',
# format='%(asctime)s - %(name)s - %(levelname)s - %(lineno)d - %(module)s - %(message)s')
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(filename)s - %(funcName)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
def RUN_TIME(deta_time):
'''
返回毫秒,deta_time.seconds获得秒数=1000ms,deta_time.microseconds获得微妙数=1/1000ms
:param deta_time: ms
:return:
'''
time_ = deta_time.seconds * 1000 + deta_time.microseconds / 1000.0
return time_
def TIME():
return datetime.datetime.now()
if __name__=='__main__':
T0 = TIME()
# do something
time.sleep(5)
T1 = TIME()
print("rum time:{}ms".format(RUN_TIME(T1 - T0)))
logger.info('This is a log info')
logger.debug('Debugging')
logger.warning('Warning exists')
logger.error('Finish')
https://www.cnblogs.com/king-lps/p/9031568.html
https://github.com/SirLPS/NMS