优化|运筹学应用之顶刊Operations Research论文综述(68(6)期)

作者:陈宇文(牛津大学在读博士) 翁欣(清华大学在读博士)

Simple Bayesian Algorithms for Best-Arm Identification

In “Simple Bayesian Algorithms for Best-Arm Identification,” Russo considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their quality with the goal of confidently identifying the best design after a small number of measurements. Just as the multiarmed bandit problem crystallizes the tradeoff between exploration and exploitation, this “pure exploration” variant crystallizes the challenge of rapidly gathering information before committing to a final decision. The author proposes several simple Bayesian algorithms for allocating measurement effort and, by characterizing fundamental asymptotic limits on the performance of any algorithm, formalizes a sense in which these seemingly naive algorithms are the best possible.

01 找寻最佳臂的简单贝叶斯算法

作者:Daniel Russo

原文:

https://pubsonline.informs.org/doi/10.1287/opre.2019.1911

本文中,作者考虑了在有限的选项或设计方案中的最优自适应分配的测量工作。实验者按顺序选择设计来测量和观察噪声信号的质量,并且确保在少许样本的测量后就能给出准确的最佳设计。正如多臂老虎机问题需要权衡探索与开发的选择,这种“纯粹的探索”变体算法需要能够在做出最终决定之前解决快速收集信息的挑战。作者提出了几种简单的贝叶斯算法分配测量工作量,并通过描述任意算法性能的基本渐近极限特征,说明了这些看似简单的算法实际上是最好的选择。

Reducing Delay in Retrial Queues by Simultaneously Differentiating Service and Retrial Rates

Customer retrials commonly occur in many service systems, such as healthcare, call centers, mobile networks, computer systems, and inventory systems. However, because of their complex nature, retrial queues are often more difficult to analyze than queues without retrials. In “Reducing Delay in Retrial Queues by Simultaneously Differentiating Service and Retrial Rates,” J. Wang, Z. Wang, and Liu develop a service grade differentiation policy for queueing models with customer retrials. They show that the average waiting time can be reduced through strategically allocating the rates of service and retrial times without

needing additional service capacity. Counter to the intuition that higher service variability usually yields a larger delay, the authors show that the benefits of this simultaneous service-and-retrial differentiation (SSRD) policy outweigh the impact of the increased service variability. To validate the effectiveness of the new SSRD policy, the authors provide (i) conditions under which SSRD is more beneficial, (ii) closed-form expressions of the optimal policy, (iii) asymptotic reduction of customer delays when the system is in heavy traffic, and (iv) insightful observations/discussions and numerical results.

02 通过同时进行服务和重试率差异化来减少重试排队的延迟

作者:Jinting Wang, Zhongbin Wang, Yunan Liu

原文:

https://pubsonline.informs.org/doi/10.1287/opre.2019.1933

客户重试通常发生在许多服务系统中,例如医疗保健,呼叫中心,移动网络,计算机系统和库存系统。但是由于其复杂性,重试排队通常比没有重试的排队问题更难分析。本文中,作者提出了一种服务等级区分策略,用于具有客户重试的排队模型。作者们展示了可以通过战略性地分配服务费率和重试时间来减少平均等待时间,并且无需增加额外的服务能力。作者同时表明,与直觉上更高的服务可变性通常会产生更大的延迟相反,这种同时进行的服务和重试差异(SSRD)策略的好处大于服务可变性增加带来的影响。为了验证新的SSRD策略的有效性,作者提供 (i) 使用SSRD更有利的条件,(ii) 最优策略的显式表达,(iii) 系统繁忙时客户延迟的渐近减少特性;以及 (iv) 有洞察力的观察/讨论和数值结果。

Technical Note—On the Optimality of Reflection Control

The so-called reflection control is easy to implement and widely applied in many applications such as inventory management and financial systems. To apply reflection control in a production-inventory system, for example, production will stop when the finished-goods inventory reaches a certain level. What is the best level for this control? In what sense is it optimal? In “Technical Note—On the Optimality of Reflection

Control,” Yang, Yao, and Ye have established the optimality of reflection control under an exponential holding cost in three settings—namely, a Brownian motion model, a single-server system, and a birth–death queue model. The study provides a thorough understanding of the control and extends significantly its domain of applications.

03 反射控制的最优性质分析

作者:Jiankui Yang, David D. Yao, Heng-Qing Ye

原文:

https://pubsonline.informs.org/doi/10.1287/opre.2019.1935

反射控制由于其易于实现的特性被广泛应用于诸如库存管理和财务系统等应用中。例如,当在生产库存系统中应用反射控制时,成品库存达到一定水平时生产将停止。那么这种控制策略的最佳级别是多少?从什么意义上说它是最优的?对于以下三种模型(布朗运动模型,单服务台系统和生灭排队模型),作者们建立了在指数持有成本函数下反射控制的最优性理论。本研究详细阐述了反射控制策略,并且极大地扩展了对应的应用领域。

Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

In “Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds,” Jiang, Al-Kanj, and Powell propose an extension to Monte Carlo tree search that uses the idea of “sampling the future” to produce noisy upper bounds on nodes in the decision tree. These upper bounds can help guide the tree expansion process and produce decision trees that are deeper rather than wider, in effect concentrating computation toward more useful parts of the state space. The algorithm’s effectiveness is illustrated in a ride-sharing setting, where a driver/vehicle needs to make dynamic decisions regarding trip acceptance and re

你可能感兴趣的:(优化,人工智能,计算机视觉,深度学习)