Learning Combinatorial Optimization Algorithms over Graphs 2020-05-06

Learning Combinatorial Optimization Algorithms over Graphs

Abstract

解决NP-hard问题通常需要大量的专业知识和反复试验, learn algorithm to automate this challenging, tedious process. learn heuristic algorithm利用结构特征解决问题. (same structure but differ in the data)的性质,

RL + graph embedding (incrementally constructs a solution)

how good the graph embedding network capturing the current state of the solution.决定了整个model的action .
该framework can be applied to a diverse range of optimization problems over graphs.

Introduction

传统解决np-hard graph optimization problem 又三种flavors:

  1. exact algorithms (enumeration or branch-and-bound) (缺点: difficult for large instance)
  2. polynomial-time approximation algorithms (缺点:weak optimality guarantee甚至是不可近似的问题)
  3. Heuristics are often fast,effective algorithms (缺点: 缺乏theoretical guarantees,需要 substantial problem-specific research)
    三种方法没有利用现实世界的优化问题特征: ,values of (coefficients in the objective function or constraints) 可以认为是from the same underlying distribution.

该模型在三个方面与previous work又不同
1.算法设计模式: 贪心算法,不断加点构造.
2.算法表示。structure2vec: use graph embedding network to represent the policy
节点“功能化”,并在其图邻域的上下文中捕获其属性。
3.算法训练, 不需要等到一轮结束才给reward, 不断的给node赋予新知识, 在贪婪算法的每一步中,将根据部分解决方案更新图形嵌入,以反映有关每个节点对最终目标值的益处的新知识。


image.png

Common Formulation for Greedy Algorithms on Graphs

image.png
image.png

construct a solution by sequentially adding nodes to a partial solution S,利用evaluation function Q measure每一个剩余的点, 找到最优的加入现在的partial solution


image.png
image.png

embedding for each node :

image.png
image.png

This work没有label,所以不用softmaxlayer


image.png

Use RL to train evaluation function Q

image.png

state:

a vector in p-dimensional space,
image.png

action:

选择一个未在当前解中的node v, we will represent actions as their corresponding p-dimensional node embedding µv,

reward:

将状态S处的向后函数r(S,v)定义为采取行动v并转换为新状态S'后的成本函数变化
S' :=(S,v)。


image.png

terminal state s hat.


image.png
image.png

select a node v of G, add into current partial solution, collect reward. the policy is to make cumulative reward maximize.

image.png

at each step of an episode by performing a gradient step to minimize the squared loss:


image.png

image.png

Word & Phrase

trial-and-error 反复实验

tedious process 繁琐过程

on a regular basis,

prohibitive: (of a price or charge) excessively high; difficult or impossible to pay.

seminal work 开创性的工作

解释 explanation, interpretation, construction, understanding, exposition, explication

你可能感兴趣的:(Learning Combinatorial Optimization Algorithms over Graphs 2020-05-06)