原文 演化博弈方法用于多智能体系统最优资源分配 -CSDN博客
https://ieeexplore.ieee.org/document/8243778/
有2种资源分配给6个个体,2种资源的总量分别为 y 1 = 545 , y 2 = 467 y_1=545,y_2=467 y1=545,y2=467,不同的个体收到不同的资源会产生不同的收益,分配的资源量均大于等于0。以第3个个体为例,它接受的资源为 x ⃗ 3 = [ x 31 , x 32 ] T \vec{x}_3=[x_{31},x_{32}]^\text{T} x3=[x31,x32]T,两种资源的上限分别为 x 31 max = 66 , x 32 max = 120 x_{31}^\text{max}=66,x_{32}^\text{max}=120 x31max=66,x32max=120。可行资源组合 R 3 = x ⃗ 3 \mathcal{R}_3=\vec{x}_3 R3=x3(这我没看懂 R 3 \mathcal{R}_3 R3 有什么用),收到2种资源的收益为
u 3 ( x ⃗ 3 ) = x 31 ( 2 x 31 max − x 31 ) c 31 x 31 max + x 32 ( 2 x 32 max − x 32 ) c 32 x 32 max = x 31 ( 2 ⋅ 66 − x 31 ) 0.4 ⋅ 66 + x 32 ( 2 ⋅ 120 − x 32 ) 0.5 ⋅ 120 \begin{aligned} u_3(\vec{x}_3) =& \frac{x_{31}(2x_{31}^\text{max}-x_{31})} {c_{31}x_{31}^\text{max}} +\frac{x_{32}(2x_{32}^\text{max}-x_{32})} {c_{32}x_{32}^\text{max}} \\ =& \frac{x_{31}(2\cdot 66-x_{31})}{0.4\cdot 66} +\frac{x_{32}(2\cdot 120-x_{32})}{0.5\cdot 120} \\ \end{aligned} u3(x3)==c31x31maxx31(2x31max−x31)+c32x32maxx32(2x32max−x32)0.4⋅66x31(2⋅66−x31)+0.5⋅120x32(2⋅120−x32)
需要求的是在
x 11 + x 21 + x 31 + x 41 + x 51 + x 61 = 545 x 12 + x 22 + x 32 + x 42 + x 52 + x 62 = 467 x_{11}+x_{21}+x_{31}+x_{41}+x_{51}+x_{61}=545 \\ x_{12}+x_{22}+x_{32}+x_{42}+x_{52}+x_{62}=467 x11+x21+x31+x41+x51+x61=545x12+x22+x32+x42+x52+x62=467
的约束条件下,求
u 1 + u 2 + u 3 + u 4 + u 5 + u 6 u_1+u_2+u_3+u_4+u_5+u_6 u1+u2+u3+u4+u5+u6
的最大值。
x ˙ 31 = x 31 ( p 31 x 21 + x 41 y 1 − x 21 p 21 + x 41 p 41 y 1 ) p 31 = ∂ u 3 ∂ x 31 = 2 ( x 31 max − x 31 ) c 31 x 31 max \begin{aligned} \dot{x}_{31} =& x_{31}\left(p_{31}\frac{x_{21}+x_{41}}{y_1} -\frac{x_{21}p_{21}+x_{41}p_{41}}{y_1}\right) \\ p_{31} =& \frac{\partial u_3}{\partial x_{31}} =\frac{2(x_{31}^\text{max}-x_{31})}{c_{31}x_{31}^\text{max}} \end{aligned} x˙31=p31=x31(p31y1x21+x41−y1x21p21+x41p41)∂x31∂u3=c31x31max2(x31max−x31)
原文中求和不变性的证明式(16)中,假设对个体3和它的邻居2,包含这两个个体的式子为
x ˙ 31 = x 31 ( p 31 x 21 + x 41 y 1 − x 21 p 21 + x 41 p 41 y 1 ) x ˙ 21 = x 21 ( p 21 x 11 + x 31 y 1 − x 11 p 11 + x 31 p 31 y 1 ) \dot{x}_{31} = x_{31}\left(p_{31}\frac{x_{21}+x_{41}}{y_1} -\frac{x_{21}p_{21}+x_{41}p_{41}}{y_1}\right) \\ \dot{x}_{21} = x_{21}\left(p_{21}\frac{x_{11}+x_{31}}{y_1} -\frac{x_{11}p_{11}+x_{31}p_{31}}{y_1}\right) \\ x˙31=x31(p31y1x21+x41−y1x21p21+x41p41)x˙21=x21(p21y1x11+x31−y1x11p11+x31p31)
取出所有包含 x 21 x_{21} x21 和 x 31 x_{31} x31 的项就是式(16)中
( x 31 p 31 x 21 − x 31 x 21 p 21 ) + ( x 21 p 21 x 31 − x 21 x 31 p 31 ) = 0 (x_{31}p_{31}x_{21}-x_{31}x_{21}p_{21}) + (x_{21}p_{21}x_{31}-x_{21}x_{31}p_{31})=0 (x31p31x21−x31x21p21)+(x21p21x31−x21x31p31)=0
原论文中给出的例子的收益函数里两个资源的收益相互独立,因此本文只分析第一种资源的收益。除了原文介绍的演化博弈法以外,注意到这个问题就是一个约束优化问题。可以试一下拉格朗日乘子法。为便于说明只写3个变量,代码里完整给出。
U = x 1 ( 2 x m 1 − x 1 ) c 1 x m 1 + x 2 ( 2 x m 2 − x 2 ) c 2 x m 2 + x 3 ( 2 x m 3 − x 3 ) c 3 x m 3 U=\frac{x_1(2x_{m1}-x_1)}{c_1x_{m1}} +\frac{x_2(2x_{m2}-x_2)}{c_2x_{m2}} +\frac{x_3(2x_{m3}-x_3)}{c_3x_{m3}} U=c1xm1x1(2xm1−x1)+c2xm2x2(2xm2−x2)+c3xm3x3(2xm3−x3)
∂ L ∂ x i = x m i − x i k i + λ = 0 x i − k i λ = x m i \frac{\partial L}{\partial x_i}=\frac{x_{mi}-x_i}{k_i} + \lambda = 0 \\ x_i-k_i\lambda = x_{mi} ∂xi∂L=kixmi−xi+λ=0xi−kiλ=xmi
其中 k i = c i x m i 2 k_i=\displaystyle\frac{c_ix_{mi}}{2} ki=2cixmi。写成矩阵形式
[ 1 k 1 1 k 2 1 k 3 1 1 1 0 ] [ x 1 x 2 x 3 λ ] = [ x m 1 x m 2 x m 3 545 ] \left[\begin{matrix} 1 &&& k_1 \\ & 1 && k_2 \\ && 1 & k_3 \\ 1 & 1 & 1 & 0 \\ \end{matrix}\right] \left[\begin{matrix} x_1 \\ x_2 \\ x_3 \\ \lambda \end{matrix}\right] =\left[\begin{matrix} x_{m1} \\ x_{m2} \\ x_{m3} \\ 545 \end{matrix}\right] 111111k1k2k30 x1x2x3λ = xm1xm2xm3545
解得
X = [ 167.604914004914 45.1985257985260 62.6270270270270 104.645700245700 93.6117936117936 71.3120393120392 0.255528255528253 ] X=\left[\begin{matrix} 167.604914004914 \\ 45.1985257985260 \\ 62.6270270270270 \\ 104.645700245700 \\ 93.6117936117936 \\ 71.3120393120392 \\ 0.255528255528253 \end{matrix}\right] X= 167.60491400491445.198525798526062.6270270270270104.64570024570093.611793611793671.31203931203920.255528255528253
演化博弈法的仿真结果为
X = [ 167.75139 45.21687 62.42745 104.58382 93.432434 71.588005 ] X=\left[\begin{matrix} 167.75139 \\ 45.21687 \\ 62.42745 \\ 104.58382 \\ 93.432434 \\ 71.588005 \\ \end{matrix}\right] X= 167.7513945.2168762.42745104.5838293.43243471.588005
可见结果相同。
计算出的最大值是 3615.688 3615.688 3615.688,论文里没给出最大值的具体值但结果差不多。
拉格朗日乘子法的matlab代码:
clear;clc;
Cij = [0.2;0.3;0.4;0.1;0.5;0.85];
Xmaxij = [172;47;66;106;100;80];
K = zeros(6,1);
for n = 1:6
K(n) = Cij(n)*Xmaxij(n)*0.5;
end
A = [eye(6), K; ones(1, 6), 0];
B = [Xmaxij; 545];
X = pinv(A)*B
演化博弈法的python代码:
import matplotlib.pyplot as plt
import numpy as np
def Marginal_Utility(i, j):
return 2*(Xmaxij[i, j] - Xij[i, j]) / (Cij[i, j]*Xmaxij[i, j])
def StateSum():
sum = 0
for n in range(Xij.shape[0]):
sum += Xij[n, 0]
for n in range(Xij.shape[0]):
sum += Xij[n, 1]
return sum
def Utility():
u = 0
for j in range(2):
for i in range(6):
u += Xij[i, j]*(2*Xmaxij[i, j] - Xij[i, j]) / Cij[i, j] / Xmaxij[i, j]
return u
Cij = np.array([
[0.2 , 0.85],
[0.3 , 0.4 ],
[0.4 , 0.5 ],
[0.1 , 0.2 ],
[0.5 , 0.4 ],
[0.85, 0.8 ],
], dtype=np.float32)
Xmaxij = np.array([
[172, 86 ],
[47 , 70 ],
[66 , 120],
[106, 45 ],
[100, 90 ],
[80 , 100],
], dtype=np.float32)
Xij = np.array([
[10, 10],
[10, 10],
[10, 10],
[10, 10],
[10, 10],
[10, 10],
], dtype=np.float32)
Y = np.array([545, 467])
sum = 0
for n in range(Xij.shape[0] - 1):
sum += Xij[n, 0]
Xij[-1, 0] = Y[0] - sum
sum = 0
for n in range(Xij.shape[0] - 1):
sum += Xij[n, 1]
Xij[-1, 1] = Y[1] - sum
Xnew = Xij.copy()
nvec = []
Uvec = []
Xvec = []
for i in range(6):
Xvec.append([])
for n in range(100):
for j in range(2):
for i in range(6):
Ineigh0 = (i+5) % 6
Ineigh1 = (i+1) % 6
f = Marginal_Utility(i, j) * (Xij[Ineigh0, j] + Xij[Ineigh1, j])
fbar = Xij[Ineigh0, j] * Marginal_Utility(Ineigh0, j)
fbar += Xij[Ineigh1, j] * Marginal_Utility(Ineigh1, j)
deltax = Xij[i, j] * (f - fbar) / Y[j]
Xnew[i, j] += deltax*0.1
Xij = Xnew.copy()
for i in range(6):
Xvec[i].append([Xij[i, 0]])
# print(StateSum())
nvec.append(n)
Uvec.append(Utility())
print(Uvec[-1])
# plt.plot(nvec, Uvec)
for i in range(6):
plt.plot(nvec, Xvec[i])
plt.legend([str(i) for i in range(6)])
plt.show()