大多数文献用CWRU数据制作数据集的思路是差不多的,这里就说一个最常见的:用窗口滑移的方式制作样本,例如每2048个采样点为一个样本。(吐槽一下:我本来以为这个实验是做了好多轴承的实验结果呢,没想到同种情况就只有一个轴承,样本是部分重叠的情况下切出来的,话说这种样本训练出的结果真的可靠吗?)
因为最近在尝试LSTM,所以最后做出来的数据集是又分了时间步的,但是不需要的小伙伴可以忽略这一步的操作。
官方连接:https://csegroups.case.edu/bearingdatacenter/pages/download-data-file
或者我上传的:https://pan.baidu.com/s/1Faygebmjw3kEPli6ikM0eA
提取码:gdsk
因为我的需求是十种状态,每种状态有120个样本,每个样本2048个采样点,所以选择了12kHz驱动端的数据,利用matlab简单处理了一下,得到了10×121048的.mat文件(千万别问我121048怎么算出来的。。。)
import numpy as np
import scipy.io as scio
from random import shuffle
def normalize(data):
'''(0,1)normalization
:param data : the object which is a 1*2048 vector to be normalized
'''
s= (data-min(data)) / (max(data)-min(data))
return s
def cut_samples(org_signals):
''' get original signals to 10*120*2048 samples, meanwhile normalize these samples
:param org_signals :a 10* 121048 matrix of ten original signals
'''
results=np.zeros(shape=(10,120,2048))
temporary_s=np.zeros(shape=(120,2048))
for i in range(10):
s=org_signals[i]
for x in range(120):
temporary_s[x]=s[1000*x:2048+1000*x]
temporary_s[x]=normalize(temporary_s[x]) #顺道对每个样本归一化
results[i]=temporary_s
return results
def make_datasets(org_samples):
'''输入10*120*2048的原始样本,输出带标签的训练集(占75%)和测试集(占25%)'''
train_x=np.zeros(shape=(10,90,2048))
train_y=np.zeros(shape=(10,90,10))
test_x=np.zeros(shape=(10,30,2048))
test_y=np.zeros(shape=(10,30,10))
for i in range(10):
s=org_samples[i]
# 打乱顺序
index_s = [a for a in range(len(s))]
shuffle(index_s)
s=s[index_s]
# 对每种类型都划分训练集和测试集
train_x[i]=s[:90]
test_x[i]=s[90:120]
# 填写标签
label = np.zeros(shape=(10,))
label[i] = 1
train_y[i, :] = label
test_y[i, :] = label
#将十种类型的训练集和测试集分别合并并打乱
x1 = train_x[0]
y1 = train_y[0]
x2 = test_x[0]
y2 = test_y[0]
for i in range(9):
x1 = np.row_stack((x1, train_x[i + 1]))
x2 = np.row_stack((x2, test_x[i + 1]))
y1 = np.row_stack((y1, train_y[i + 1]))
y2 = np.row_stack((y2, test_y[i + 1]))
index_x1= [i for i in range(len(x1))]
index_x2= [i for i in range(len(x2))]
shuffle(index_x1)
shuffle(index_x2)
x1=x1[index_x1]
y1=y1[index_x1]
x2=x2[index_x2]
y2=y2[index_x2]
return x1, y1, x2, y2 #分别代表:训练集样本,训练集标签,测试集样本,测试集标签
def get_timesteps(samples):
''' get timesteps of train_x and test_X to 10*120*31*128
:param samples : a matrix need cut to 31*128
'''
s1 = np.zeros(shape=(31, 128))
s2 = np.zeros(shape=(len(samples), 31, 128))
for i in range(len(samples)):
sample = samples[i]
for a in range(31):
s1[a]= sample[64*a:128+64*a]
s2[i]=s1
return s2
# 读取原始数据,处理后保存
dataFile= 'G://study of machine learing//deep learning//LSTM//datasets//十个原始信号.mat'
data=scio.loadmat(dataFile)
org_signals=data['signals']
org_samples=cut_samples(org_signals)
train_x, train_y, test_x, test_y=make_datasets(org_samples)
train_x= get_timesteps(train_x)
test_x= get_timesteps(test_x)
saveFile = 'G://study of machine learing//deep learning//LSTM//datasets//datasets.mat'
scio.savemat(saveFile, {'train_x':train_x, 'train_y':train_y, 'test_x':test_x, 'test_y':test_y})
不需要划分timesteps的可以不用这个函数,最后就得到了打乱顺序的训练集(900×2048)、测试集(300×2048)
码字不易,转载请附出处:https://blog.csdn.net/weixin_44620044/article/details/106877805
小白一枚,请大牛批评指正。