OneMax问题是遗传算法的入门问题,其内容是:如何使一段长度固定的二进制字符串所有位置上数字之和最大。
让我们用一个长度为5的二进制字符串为例:
对一般人,显而易见,当所有位数都为1时,该字符串的和最大,但在我们用遗传算法解决该问题时,遗传算法本身并没有这样的知识。接下来我们将不依靠任何遗传算法的包,从头开始用遗传算法解决OneMax问题。
首先,我们得把这个问题转换成一个遗传算法问题,即:我们得定义个体、种群,选择、杂交、突变方法、适应度函数等。假设有一个长度为100的字符串,我们可以做出以下定义:
若对上述定义不太了解的,可以回看遗传算法系列的第二期。
以下将分步骤解释每一部分的代码,完整代码在本文的最后可见。
import random
import matplotlib.pyplot as plt
random.seed(39)
导入所有需要的包:
# 1. define individual and population
def CreateIndividual():
return([random.randint(0,1) for _ in range(100)])
def CreatePopulation(size):
return([CreateIndividual() for _ in range(size)])
# 2.1. define select function
def tournament(population,size):
participants = random.sample(population,size)
# evaluate function is defined in 3.4
winner = max(participants,key=lambda ind:evaluate(ind))
return(winner.copy())
def select(population,size):
return([tournament(population,size) for _ in range(len(population))])
# 2.2. define mate function
def SinglePointCrossover(ind1,ind2):
loc = random.randint(0,len(ind1)-1)
genes1 = ind1[loc:]
genes2 = ind2[loc:]
ind1[loc:] = genes2
ind2[loc:] = genes1
return([ind1.copy(),ind2.copy()])
def mate(population,probability):
new_population = []
for i in range(0,len(population),2):
ind1 = population[i].copy()
ind2 = population[i+1].copy()
if random.random() < probability:
new_population.extend(SinglePointCrossover(ind1,ind2))
else:
new_population.extend([ind1,ind2])
return(new_population)
# 2.3. define mutate function
def flipOneGene(ind):
loc = random.randint(0,len(ind)-1)
ind[loc] = 1 - ind[loc] # 0->1 or 1->0
return(ind.copy())
def mutate(population,probability):
new_population = []
for ind in population:
if random.random() < probability:
new_population.append(flipOneGene(ind))
else:
new_population.append(ind.copy())
return(new_population)
# 2.4. define evaluate function
def evaluate(individual):
return(sum(individual))
OneMax的适应度函数就是列表中所有数字之和。
# 2.5. define statistical metrics to monitor algorithm performance
def population_score_max(population):
return(max([evaluate(ind) for ind in population]))
def population_score_mean(population):
return(sum([evaluate(ind) for ind in population])/len(population))
为了追踪算法的进度和发现算法中可能出现的错误,我们可以统计每次迭代中种群适应度的最大值与均值。
# 3. Run genetic algorithm
def main(
POPULATION_SIZE = 100,
TOURNAMENT_SIZE = 3,
CROSSOVER_PROB = 0.9,
MUTATE_PROB = 0.1,
MAX_GENERATIONS = 100
):
generation = 0
population = CreatePopulation(POPULATION_SIZE)
max_scores = [population_score_max(population)]
mean_scores = [population_score_mean(population)]
best_individual = []
while generation < MAX_GENERATIONS:
population = select(population,TOURNAMENT_SIZE)
population = mate(population,CROSSOVER_PROB)
population = mutate(population,MUTATE_PROB)
# collect statistics
max_scores.append(population_score_max(population))
mean_scores.append(population_score_mean(population))
best_individual = max(
best_individual,
max(population,key=lambda ind: evaluate(ind))
).copy()
generation += 1
print("Best Solution:")
print(best_individual)
plt.plot(max_scores, color='red',label="Max Score")
plt.plot(mean_scores, color='green',label="Mean Score")
plt.legend()
plt.xlabel("Generations")
plt.ylabel("Fitness Score")
plt.grid()
plt.show()
if __name__ == "__main__":
main()
在运行算法前,我们首先得定义一些参数:
在迭代过程中,每一代里我们都进行选择(select),杂交(mate),突变(mate)运算,并收集种群的最大和平均适应度数据,用以追踪算法的进度,或发现算法中存在的问题。视问题而定,我们还可以记录下每代中适应度最高的个体(best individual),以防止其因杂交和突变而消失。
最后,我们观察最优解与种群的数据。
我们成功获得了OneMax的最优解(所有位置上都是1):[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
种群的进化过程如下图所示,我们可以观察到种群的进化速度由快到慢,且在61代左右停止进化,且在第58代时已经产生了最优解。
import random
import matplotlib.pyplot as plt
random.seed(39)
# 1. define individual and population
def CreateIndividual():
return([random.randint(0,1) for _ in range(100)])
def CreatePopulation(size):
return([CreateIndividual() for _ in range(size)])
# 2. define select, mate, mutate, evaluate function ----
# 2.1. define select function
def tournament(population,size):
participants = random.sample(population,size)
# evaluate function is defined in 3.4
winner = max(participants,key=lambda ind:evaluate(ind))
return(winner.copy())
def select(population,size):
return([tournament(population,size) for _ in range(len(population))])
# 2.2. define mate function
def SinglePointCrossover(ind1,ind2):
loc = random.randint(0,len(ind1)-1)
genes1 = ind1[loc:]
genes2 = ind2[loc:]
ind1[loc:] = genes2
ind2[loc:] = genes1
return([ind1.copy(),ind2.copy()])
def mate(population,probability):
new_population = []
for i in range(0,len(population),2):
ind1 = population[i].copy()
ind2 = population[i+1].copy()
if random.random() < probability:
new_population.extend(SinglePointCrossover(ind1,ind2))
else:
new_population.extend([ind1,ind2])
return(new_population)
# 2.3. define mutate function
def flipOneGene(ind):
loc = random.randint(0,len(ind)-1)
ind[loc] = 1 - ind[loc] # 0->1 or 1->0
return(ind.copy())
def mutate(population,probability):
new_population = []
for ind in population:
if random.random() < probability:
new_population.append(flipOneGene(ind))
else:
new_population.append(ind.copy())
return(new_population)
# 2.4. define evaluate function
def evaluate(individual):
return(sum(individual))
# 2.5. define statistical metrics to monitor algorithm performance
def population_score_max(population):
return(max([evaluate(ind) for ind in population]))
def population_score_mean(population):
return(sum([evaluate(ind) for ind in population])/len(population))
# 3. Run genetic algorithm
def main(
POPULATION_SIZE = 100,
TOURNAMENT_SIZE = 3,
CROSSOVER_PROB = 0.9,
MUTATE_PROB = 0.1,
MAX_GENERATIONS = 100
):
generation = 0
population = CreatePopulation(POPULATION_SIZE)
max_scores = [population_score_max(population)]
mean_scores = [population_score_mean(population)]
best_individual = []
while generation < MAX_GENERATIONS:
population = select(population,TOURNAMENT_SIZE)
population = mate(population,CROSSOVER_PROB)
population = mutate(population,MUTATE_PROB)
# collect statistics
max_scores.append(population_score_max(population))
mean_scores.append(population_score_mean(population))
best_individual = max(
best_individual,
max(population,key=lambda ind: evaluate(ind))
).copy()
generation += 1
print("Best Solution:")
print(best_individual)
plt.plot(max_scores, color='red',label="Max Score")
plt.plot(mean_scores, color='green',label="Mean Score")
plt.legend()
plt.xlabel("Generations")
plt.ylabel("Fitness Score")
plt.grid()
plt.show()
if __name__ == "__main__":
main()
为了深入的解释遗传算法,本文中没有使用任何的遗传算法包来解决OneMax问题。而下一期,我们会介绍DEAP(Distributed Evolutionary Algorithm in Python)包,并在之后的文章里用DEAP框架来解决遗传算法问题。
本人近期刚开始写文章,欢迎交流学习!