我在前段时间做了一个试点项目:使用数据挖掘技术改进工艺参数从而提升铅粉机产量;现将项目中的主要过程进行复盘。
一、客户: 客户是一家生产汽车启停电池的公司
二、工艺:铅酸蓄电池第一阶段是生产铅粉。生产过程简单来说,是先将铅锭切成铅粒,再将铅粒投入滚筒中,滚筒保持一定的转速使得铅粒之间摩擦放热,铅粒表面氧化形成结晶体,脱落后即形成铅粉。
三、主要生产设备:铅粒制造及储存系统、铅粉制造及储存系统、铅粉输送及储存系统
四、工艺参数:主机功率、铅粒仓重量、进口风压、滚筒重量、出粉口温度、布袋温度、布袋袋压差、绝对过滤器压差、系统风压、环境温度、环境湿度
五、数据采集:采集PLC数据存入数据库,每分钟一条,共收集了三个月的数据
六、工艺原理学习,为了了解铅粉生产的原理细节和问题,先在现场跟工人师傅观察、提问;然后买了两本书查阅相关资料。
七、建模思路:
1、先收集一个月的工艺参数数据建模(产量作为标签,其他变量(主机功率、铅粒仓重量、进口风压、滚筒重量、出粉口温度、布袋温度、布袋袋压差、绝对过滤器压差、系统风压、环境温度、环境湿度)作为影响变量),然后将“进口风压”、“滚筒重量”作为求解量输出(遗传算法);工人根据模型输出值进行操作。
2、设备中并没有直接的产量数据,因此该数据需要计算求得,比如:一分钟的产量=上一分钟滚筒重量-下一分钟滚筒重量;滚筒重量减少是因为铅变成了铅粉然后被输送至料仓中。
3、特征工程:步骤四提到的参数都是与产量相关的,而且参数很少,没有必要做降维之类的特征工程
4、工艺特性-时滞:铅粉生产是存在时滞性的(这也是流程工业的特点),现场工人也不知道这个值准确是多少,因此在后续的建模中,这个因素会不断验证直到找到最佳的值(最后不断测试发现设为30分钟的话模型效果比较好)
5、数据、清洗:从数据库导出成CSV格式,在表格中直接删掉空值、异常值
6、关于工艺质量:客户提了一个需求,要在符合质量的前提下提高产量,因此还要建立一个质量模型;铅粉质量特性主要是“氧化度”,该值会影响电池的性能
7、算法思路:1、先建立影响变量与产量的模型,数据是时序性的,这类问题建模用循环神经网络比较合适,RNN、LSTM、GRU都试一遍 2、找到工艺参数最优值,优化算法用遗传算法,评价函数即神经网络模型预测值(优化[进口风压、滚筒重量]使得产量最大)
下面贴出Python建模代码:
一、使用Keras直接建模循环神经网络模型;前面还有很多预处理的代码,篇幅所限就不贴出了。
from keras.models import Sequential
from keras.layers import Dense ,LSTM,Dropout
from keras.optimizers import SGD
model = Sequential()
model.add(LSTM(5,activation='relu',return_sequences=True), input_shape=(40, 10))
model.add(Dropout(0.01))
model.add(LSTM(5,return_sequences=True))
model.add(LSTM(5,return_sequences=True))
model.add(LSTM(5))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.fit(trainset_scalex,trainset_scaley,epochs=50,batch_size=64,verbose=1)
经过上百次实验后,发现一些规律:
1、GRU模型精确度比较高,且收敛速度快
2、优化器用adma效果最好
3、正则化dropout不稳定,有时精确度高,有时偏离比较多
4、3层或4层网络比较合适,超过4层的模型表现不好
5、cell值在3-5之间比较合适
6、激活函数用relu,模型效果和速度都最好;我以后的项目中只会用relu或其变体,个人猜测在深度学习领域,sigmoid作为激活函数可能会慢慢被边缘化,relu会成为主流(也许现在已经成为了主流)
关于调参:本次项目中最麻烦的其实是两个工作,一个是数据清洗,一个是调参优化;调参过程很复杂且耗时,查了很多资料,神经网络调参都没有比较直接到位的解决方案。 这也是以后要多注意的方向。
二、使用第三方开源遗传算法库(据说是某位阿里的大佬开发的)
遗传算法求 [进口风压] [滚筒重量],目标函数是神经网络模型(产量最大)
import numpy as np
import sys
from sko.GA import GA
import time
import datetime
from sko.tools import set_run_mode
import pickle
from keras.models import load_model
import pandas as pd
import json as js
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
y="[[87.2648,48.1824,6016,133.984,133.648,107.258,15.752,31.9629,341.914,318,525],[87.2227,47.184,6020,132.344,132.086,107.211,15.5957,31.7969,335.664,318,526],[87.1945,43.0821,6007,131.406,131.328,107.414,14.1895,31.4258,328.594,318,526],[87.1523,48.1906,6005,133.484,133.258,107.266,16.2793,31.9336,341.602,318,527],[87.6656,42.1265,6030,132.383,132.156,107.375,14.6582,31.8164,331.953,318,526],[87.8133,40.5205,5989,133.273,133.016,107.195,16.1816,32.0996,340.078,317,528],[87.8836,41.905,6027,133.969,133.773,107.258,15.8301,32.1289,339.258,318,528],[87.7992,41.0334,6019,132.914,132.672,107.195,16.1523,31.7871,334.258,318,527],[87.9398,40.4968,6032,132.938,132.641,107.188,15.498,31.7676,333.359,318,527],[87.7852,39.8991,6028,132.188,131.719,107.133,14.7754,31.5234,326.914,318,527],[87.6094,39.1766,5997,131.391,130.625,107.133,14.4141,31.6992,325.508,318,527],[87.7922,40.8723,6030,134.398,133.914,106.82,15.8105,32.041,338.906,318,527],[87.9328,41.2216,6033,133.984,133.508,106.984,16.0352,31.8164,336.836,318,528],[87.8203,40.5352,5993,133.078,132.82,106.953,16.5918,31.582,335.586,318,529],[87.6516,40.4907,6008,133.359,132.922,107.023,14.6191,32.0215,331.836,318,527],[87.7711,41.8055,6016,133.469,133.148,106.938,15.7715,32.0605,336.523,317,529],[87.743,41.651,6027,133.984,133.836,107,15.9473,32.1094,340.625,318,529],[87.8484,39.9414,6045,132.516,132.523,107.188,15.625,31.5234,329.336,318,531],[87.9117,41.3693,6017,133.203,133.148,107.125,15.0488,31.9434,333.164,318,528],[87.6867,39.8677,6024,131.484,131.328,107.414,14.7656,31.7188,327.969,318,528],[87.8695,41.3021,6040,133.844,133.742,107.188,15.625,32.2363,337.773,318,527],[87.7922,39.7916,6029,132.75,132.57,107.523,15.1074,31.5527,328.008,318,528],[87.7008,40.4896,6010,131.633,131.273,107.383,14.7266,31.7773,329.57,318,528],[87.7781,41.11,6034,133.766,133.438,107.219,15.3906,32.2754,336.016,318,528],[87.7711,41.9102,6009,133.711,133.422,107.289,16.2988,32.0996,339.766,318,527],[87.7711,39.5726,6012,131.289,130.523,107.453,14.3359,31.2695,322.539,318,528],[87.6727,41.3325,6006,134.359,133.781,107.281,16.3184,32.1777,339.141,318,529],[87.8766,39.9349,6021,131.773,131.305,107.203,16.9434,31.2598,332.695,318,531],[87.8133,41.179,6005,133.336,133.047,106.953,15.4102,32.0996,335.156,318,529],[87.6797,40.2437,6001,132.914,132.57,107.094,15.2148,32.4121,333.789,319,529]]"
def dtcs():
y1 = js.loads(y)
#dataset = pd.read_csv("eihgt.csv",index_col='num')
y1=np.array(y1)
data=y1
x1=y1[29,1]
x2=y1[29,3]
#加载sc文件,用于反归一化
scalerfilex ='c:\\sc0906ax.sav'
scalerx = pickle.load(open(scalerfilex, 'rb'))
scalerx.clip = False
scalerfiley ='c:\\sc0906ay.sav'
scalery = pickle.load(open(scalerfiley, 'rb'))
scalery.clip = False
#加载预测模型
modelcl = load_model("c:\\modelchanliang13-0906a.h5")
#定义costly函数
def generate_costly_function(task_type='io_costly'):
# generate a high cost function to test all the modes
# cost_type can be 'io_costly' or 'cpu_costly'
if task_type == 'io_costly':
def costly_function():
time.sleep(0.1)
return 1
else:
def costly_function():
n = 10000
step1 = [np.log(i + 1) for i in range(n)]
step2 = [np.power(i, 1.1) for i in range(n)]
step3 = sum(step1) + sum(step2)
return step3
return costly_function
#定义tasktype函数
for task_type in ('io_costly', 'cpu_costly'):
costly_function = generate_costly_function(task_type='cpu_costly')
#向量化的目标函数
mode = 'vectorization'
def vtarget(p):
costly_function()
p=np.array(p) #先将入参转成数组
d=np.tile(data,(p.shape[0],1))#入参是10个数据的数组,整体复制次数为成GA传进来的种群数
for i in range(0,(p.shape[0])):
d[i*30+14:i*30+29,1]=p[(i),0]
for j in range(0,(p.shape[0])):
d[j*30+14:j*30+29,3]=p[(j),1]
p1= scalerx.transform(d)
p2cl=p1.reshape(-1,30,11) #按照LSTM中模型的shape转换
predict_t1 = modelcl.predict(p2cl)
predict_data=scalery.inverse_transform(predict_t1)
v=np.abs((predict_data[:,0]-500)) #预测值和目标值绝对值
return v
#设置加速模式为向量化
set_run_mode(vtarget,'vectorization')
#运行GA模型
start_time = datetime.datetime.now()
ga = GA(func=vtarget, n_dim=2, size_pop=150, max_iter=100,lb=[x1-3,x2-2], ub=[x1+3,x2+2], precision=1e-3)
best_x, best_y = ga.run()
best=best_x.tolist()
print([best[0],best[1]])
return [best[0],best[1]]
该库使用向量模式时计算速度非常快, mode = 'vectorization'
实现过程其实比较清晰,先建一个循环神经网络模型,再用遗传算法调用神经网络来评价染色体好坏。
三、应用
通过java开发前端界面,调用Python模型,让用户直观看到推荐工艺参数结果
该项目技术实现过程简单如上所述,下一篇会介绍该项目的结果、经验教训。