利用python整理凯斯西储大学(CWRU)轴承数据,制作数据集

利用python整理凯斯西储大学(CWRU)轴承数据,制作数据集

  • 1 前言
  • 2 制作数据集
    • 2.1 下载数据,初步处理
    • 2.2 上代码

1 前言

大多数文献用CWRU数据制作数据集的思路是差不多的,这里就说一个最常见的:用窗口滑移的方式制作样本,例如每2048个采样点为一个样本。(吐槽一下:我本来以为这个实验是做了好多轴承的实验结果呢,没想到同种情况就只有一个轴承,样本是部分重叠的情况下切出来的,话说这种样本训练出的结果真的可靠吗?)

2 制作数据集

因为最近在尝试LSTM,所以最后做出来的数据集是又分了时间步的,但是不需要的小伙伴可以忽略这一步的操作。

2.1 下载数据,初步处理

官方连接:https://csegroups.case.edu/bearingdatacenter/pages/download-data-file
或者我上传的:https://pan.baidu.com/s/1Faygebmjw3kEPli6ikM0eA
提取码:gdsk

因为我的需求是十种状态,每种状态有120个样本,每个样本2048个采样点,所以选择了12kHz驱动端的数据,利用matlab简单处理了一下,得到了10×121048的.mat文件(千万别问我121048怎么算出来的。。。)

2.2 上代码

import numpy as np
import scipy.io as scio
from random import shuffle

def normalize(data):
    '''(0,1)normalization
    :param data : the object which is a 1*2048 vector to be normalized 
    '''
    s= (data-min(data)) / (max(data)-min(data))

    return  s


def cut_samples(org_signals):
    ''' get original signals to 10*120*2048 samples, meanwhile normalize these samples
    :param org_signals :a 10* 121048 matrix of ten original signals 
    '''

    results=np.zeros(shape=(10,120,2048))
    temporary_s=np.zeros(shape=(120,2048))

    for i in range(10):
        s=org_signals[i]
        for x in range(120):
            temporary_s[x]=s[1000*x:2048+1000*x]
            temporary_s[x]=normalize(temporary_s[x])     #顺道对每个样本归一化
        results[i]=temporary_s

    return results


def make_datasets(org_samples):
    '''输入10*120*2048的原始样本,输出带标签的训练集(占75%)和测试集(占25%)'''

    train_x=np.zeros(shape=(10,90,2048))
    train_y=np.zeros(shape=(10,90,10))
    test_x=np.zeros(shape=(10,30,2048))
    test_y=np.zeros(shape=(10,30,10))
    for i in range(10):
        s=org_samples[i]
        # 打乱顺序
        index_s = [a for a in range(len(s))]
        shuffle(index_s)
        s=s[index_s]
        # 对每种类型都划分训练集和测试集
        train_x[i]=s[:90]
        test_x[i]=s[90:120]
        # 填写标签
        label = np.zeros(shape=(10,))
        label[i] = 1
        train_y[i, :] = label
        test_y[i, :] = label

    #将十种类型的训练集和测试集分别合并并打乱
    x1 = train_x[0]
    y1 = train_y[0]
    x2 = test_x[0]
    y2 = test_y[0]
    for i in range(9):
        x1 = np.row_stack((x1, train_x[i + 1]))
        x2 = np.row_stack((x2, test_x[i + 1]))
        y1 = np.row_stack((y1, train_y[i + 1]))
        y2 = np.row_stack((y2, test_y[i + 1]))

    index_x1= [i for i in range(len(x1))]
    index_x2= [i for i in range(len(x2))]
    shuffle(index_x1)
    shuffle(index_x2)
    x1=x1[index_x1]
    y1=y1[index_x1]
    x2=x2[index_x2]
    y2=y2[index_x2]

    return x1, y1, x2, y2    #分别代表:训练集样本,训练集标签,测试集样本,测试集标签

def get_timesteps(samples):
    ''' get timesteps of train_x and test_X to 10*120*31*128
    :param samples : a matrix need cut to 31*128
    '''

    s1 = np.zeros(shape=(31, 128))
    s2 = np.zeros(shape=(len(samples), 31, 128))
    for i in range(len(samples)):
        sample = samples[i]
        for a in range(31):
            s1[a]= sample[64*a:128+64*a]
        s2[i]=s1

    return s2


# 读取原始数据,处理后保存
dataFile= 'G://study of machine learing//deep learning//LSTM//datasets//十个原始信号.mat'
data=scio.loadmat(dataFile)
org_signals=data['signals']
org_samples=cut_samples(org_signals)
train_x, train_y, test_x, test_y=make_datasets(org_samples)
train_x= get_timesteps(train_x)
test_x= get_timesteps(test_x)

saveFile = 'G://study of machine learing//deep learning//LSTM//datasets//datasets.mat'
scio.savemat(saveFile, {'train_x':train_x, 'train_y':train_y, 'test_x':test_x, 'test_y':test_y})

不需要划分timesteps的可以不用这个函数,最后就得到了打乱顺序的训练集(900×2048)、测试集(300×2048)

码字不易,转载请附出处:https://blog.csdn.net/weixin_44620044/article/details/106877805
小白一枚,请大牛批评指正。

你可能感兴趣的:(利用python整理凯斯西储大学(CWRU)轴承数据,制作数据集)