Learning Combinatorial Optimization Algorithms over Graphs

Abstract

解决NP-hard问题通常需要大量的专业知识和反复试验, learn algorithm to automate this challenging, tedious process. learn heuristic algorithm利用结构特征解决问题. (same structure but differ in the data)的性质,

RL + graph embedding (incrementally constructs a solution)

how good the graph embedding network capturing the current state of the solution.决定了整个model的action .
该framework can be applied to a diverse range of optimization problems over graphs.

Introduction

传统解决np-hard graph optimization problem 又三种flavors:

exact algorithms (enumeration or branch-and-bound) (缺点: difficult for large instance)
polynomial-time approximation algorithms (缺点:weak optimality guarantee甚至是不可近似的问题)
Heuristics are often fast,effective algorithms (缺点: 缺乏theoretical guarantees,需要 substantial problem-speciﬁc research)
三种方法没有利用现实世界的优化问题特征: ,values of (coefficients in the objective function or constraints) 可以认为是from the same underlying distribution.

该模型在三个方面与previous work又不同
1.算法设计模式: 贪心算法,不断加点构造.
2.算法表示。structure2vec: use graph embedding network to represent the policy
节点“功能化”，并在其图邻域的上下文中捕获其属性。
3.算法训练, 不需要等到一轮结束才给reward, 不断的给node赋予新知识, 在贪婪算法的每一步中，将根据部分解决方案更新图形嵌入，以反映有关每个节点对最终目标值的益处的新知识。

image.png

Common Formulation for Greedy Algorithms on Graphs

image.png

construct a solution by sequentially adding nodes to a partial solution S,利用evaluation function Q measure每一个剩余的点, 找到最优的加入现在的partial solution

image.png

embedding for each node :

image.png

This work没有label,所以不用softmaxlayer

image.png

Use RL to train evaluation function Q

image.png

state:

a vector in p-dimensional space,

image.png

action:

选择一个未在当前解中的node v, we will represent actions as their corresponding p-dimensional node embedding µv,

reward:

将状态S处的向后函数r（S，v）定义为采取行动v并转换为新状态S'后的成本函数变化
S' :=（S，v）。

image.png

terminal state s hat.

image.png

select a node v of G, add into current partial solution, collect reward. the policy is to make cumulative reward maximize.

image.png

at each step of an episode by performing a gradient step to minimize the squared loss:

image.png

Word & Phrase

trial-and-error 反复实验

tedious process 繁琐过程

on a regular basis,

prohibitive: (of a price or charge) excessively high; difficult or impossible to pay.

seminal work 开创性的工作

解释 explanation, interpretation, construction, understanding, exposition, explication

Learning Combinatorial Optimization Algorithms over Graphs 2020-05-06

Abstract

RL + graph embedding (incrementally constructs a solution)

Introduction

Common Formulation for Greedy Algorithms on Graphs

Use RL to train evaluation function Q

state:

action:

reward:

select a node v of G, add into current partial solution, collect reward. the policy is to make cumulative reward maximize.

Word & Phrase

你可能感兴趣的:(Learning Combinatorial Optimization Algorithms over Graphs 2020-05-06)