遗传算法系列的第三期介绍了如何不用任何框架从零开始解决OneMax问题,第四期介绍了DEAP框架的基本用法。若读者对下文中定义或术语不熟悉,可以查看本系列的前几篇文章。本期文章将介绍如何使用DEAP解决OneMax问题。
OneMax问题是遗传算法的入门问题,其内容是:如何使一段长度固定的二进制字符串所有位置上数字之和最大。
让我们用一个长度为5的二进制字符串为例:
对一般人,显而易见,当所有位数都为1时,该字符串的和最大,但在我们用遗传算法解决该问题时,算法本身并不具备这样的知识。
首先,我们得把这个问题转换成一个遗传算法问题,即:我们得定义个体、种群,选择、杂交、突变方法、适应度函数等。假设有一个长度为100的字符串,我们可以做出以下定义:
若对上述定义不太了解的,可以回看遗传算法系列的第二期。
以下将分步骤解释每一部分的代码,完整代码在本文的最后可见。
# 1.load modules
from deap import base,creator,tools,algorithms
import random
import numpy as np
import matplotlib.pyplot as plt
toolbox = base.Toolbox()
首先我们需要导入遗传算法所必须的模组:
# 2.parameters:
INDIVIDUAL_LENGTH = 100 # length of bit string to be optimized
POPULATION_SIZE = 200
P_CROSSOVER = 0.9 # probability for crossover
P_MUTATION = 0.1 # probability for mutating an individual
MAX_GENERATIONS = 50
random.seed(39)
# 3.create individual and population
toolbox.register("genBinary", random.randint, 0, 1) # 1
creator.create("FitnessMax", base.Fitness, weights=(1.0,)) # 2
creator.create("Individual", list, fitness=creator.FitnessMax) # 3
toolbox.register("createIndividual", tools.initRepeat, creator.Individual, toolbox.genBinary, INDIVIDUAL_LENGTH) # 4
toolbox.register("createPopulation", tools.initRepeat, list, toolbox.createIndividual) # 5
如果读者对DEAP框架内的方程不太了解,看完上述解释后还是感觉五里雾中,建议查看本系列前两期文章,或者到DEAP官网上多了解下这些方程的定义。
# 4. define evaluation function
toolbox.register("evaluate", lambda ind: (sum(ind),))
在OneMax问题中,适应度函数即为个体(用二进制字符串表示)上所有数字之和,因此我们将适应度函数定义为(sum(ind),)。
在这里请特别注意:
# 5. define operators
toolbox.register("select", tools.selTournament, tournsize=2)
toolbox.register("mate", tools.cxOnePoint)
toolbox.register("mutate", tools.mutFlipBit, indpb=1.0/INDIVIDUAL_LENGTH)
请特别注意:
select, mate, mutate都是DEAP框架中的关键词,在定义运算符时,必须使用使用这几个单词。
# Genetic Algorithm flow:
def main():
# create population
population = toolbox.createPopulation(n=POPULATION_SIZE)
# initialize statistics
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("max", np.max)
stats.register("avg", np.mean)
# Genetic Algorithm
population, logbook = algorithms.eaSimple(population, toolbox, cxpb=P_CROSSOVER,
mutpb=P_MUTATION, ngen=MAX_GENERATIONS,stats=stats, verbose=True)
# gather statistics
maxFitnessValues, meanFitnessValues = logbook.select("max", "avg")
# plot statistics:
plt.plot(maxFitnessValues, color='red',label="Max Fitness")
plt.plot(meanFitnessValues, color='green',label="Average Fitness")
plt.legend()
plt.xlabel('Generation')
plt.ylabel('Fitness')
plt.title('Max and Average Fitness over Generations')
plt.show()
print(max(population,key=lambda ind:sum(ind)))
if __name__ == "__main__":
main()
在上述代码中,我们定义了main()方程,运行该方程就可以进行遗传算法的运算,同时生成对应的统计数据与图表。
该方程中有以下步骤:
由下图所示,在第38代算法已经产生了最优解,即个体所有位置上都为1:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
当我们设置verbose=True时,algorithms.eaSimple会在迭代过程中实时生成每一代的数据并输出至标准输出:
gen nevals max avg
0 200 65 50.35
1 191 62 52.905
2 175 65 55.225
3 189 67 57.345
4 169 68 59.25
5 176 72 61.285
6 170 73 63.345
7 182 73 65.165
8 179 74 66.835
9 181 78 68.475
10 188 80 70.13
11 178 82 71.95
12 189 81 73.755
13 188 82 75.145
14 194 83 76.755
15 176 83 78.29
16 182 83 79.35
17 169 85 80.34
18 177 86 81.37
19 178 86 82.435
20 184 87 83.31
21 179 88 84.175
22 179 89 84.835
23 179 90 85.575
24 188 91 86.525
25 189 93 87.55
26 181 94 88.44
27 190 94 89.31
28 185 95 89.94
29 174 96 90.7
30 188 98 91.41
31 180 98 92.195
32 177 98 93.005
33 179 98 93.83
34 175 98 94.555
35 185 99 95.165
36 179 99 95.725
37 188 99 96.345
38 182 100 97.04
39 184 100 97.63
40 183 100 98.03
41 177 100 98.37
42 192 100 98.685
43 186 100 98.985
44 175 100 99.325
45 187 100 99.565
46 171 100 99.73
47 180 100 99.89
48 186 100 99.87
49 189 100 99.91
50 187 100 99.9
本文中介绍了如何使用DEAP框架来解决OneMax问题,本系列的接下来几篇文中,我将详细解释algorithms.eaSimple如何运作,以及如何自定义tools.Statistics和logbook。
# 1.load modules
from deap import base,creator,tools,algorithms
import random
import numpy as np
import matplotlib.pyplot as plt
toolbox = base.Toolbox()
# 2.parameters:
INDIVIDUAL_LENGTH = 100 # length of bit string to be optimized
POPULATION_SIZE = 200
P_CROSSOVER = 0.9 # probability for crossover
P_MUTATION = 0.1 # probability for mutating an individual
MAX_GENERATIONS = 50
random.seed(39)
# 3.create individual and population
toolbox.register("genBinary", random.randint, 0, 1)
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)
toolbox.register("createIndividual", tools.initRepeat, creator.Individual, toolbox.genBinary, INDIVIDUAL_LENGTH)
toolbox.register("createPopulation", tools.initRepeat, list, toolbox.createIndividual)
# 4. define evaluation function
toolbox.register("evaluate", lambda ind: (sum(ind),))
# 5. define operators
toolbox.register("select", tools.selTournament, tournsize=2)
toolbox.register("mate", tools.cxOnePoint)
toolbox.register("mutate", tools.mutFlipBit, indpb=1.0/INDIVIDUAL_LENGTH)
# Genetic Algorithm flow:
def main():
# create population
population = toolbox.createPopulation(n=POPULATION_SIZE)
# initialize statistics
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("max", np.max)
stats.register("avg", np.mean)
# Genetic Algorithm
population, logbook = algorithms.eaSimple(population, toolbox, cxpb=P_CROSSOVER,
mutpb=P_MUTATION, ngen=MAX_GENERATIONS,stats=stats, verbose=True)
# gather statistics
maxFitnessValues, meanFitnessValues = logbook.select("max", "avg")
# plot statistics:
plt.plot(maxFitnessValues, color='red',label="Max Fitness")
plt.plot(meanFitnessValues, color='green',label="Average Fitness")
plt.legend()
plt.xlabel('Generation')
plt.ylabel('Fitness')
plt.title('Max and Average Fitness over Generations')
plt.show()
if __name__ == "__main__":
main()