《DevOps for Finance》CHAPTER 1 -System Complexity and Interdependency-案例

The Knight Capital Accident
骑士资本事故
On August 1, 2012, Knight Capital, a leading market maker in the
US equities market, updated its SMARS high-speed automated
order routing system to support new trading rules at the New York
Stock Exchange. The order routing system took parent orders,
broke them out, and routed one or more child orders to different
execution points, such as the NYSE.
骑士资本是美国股票市场上的一个领先的做市商。在2012年8月1日,为了支持纽约证券交易所的新交易规则,它更新了其高速自动的订单路由系统SMARS。订单路由系统接受父订单,将它们拆分,并将一个或多个子订单发送给不同的执行点,如纽约证券交易所。
The new code was manually rolled out in steps prior to August 1.
Unfortunately, an operator missed deploying the changes to one
server. That’s all that was needed to cause one of the largest financial
systems failures in history.5
新的代码是在8月1日之前按步骤手动推出的。不幸的是,一个操作员没有将变更部署到一台服务器。这就足够导致历史上最大的金融机构系统故障之一的发生。
Prior to the market open on August 1, Knight’s system alerted operations
about some problems with an old order routing feature
called “Power Peg.” The alerts were sent by email to operations staff
who didn’t understand what they meant or how important they
were. This meant that they missed their last chance to stop very bad
things from happening.
在8月1日开市前,骑士资本的系统向运营部门发出警报,称关于被称为Power Peg的旧订单路由功能存在一些问题。告警是通过电子邮件发送给运营人员的,他们不明白告警的意思,也不知道这些告警是有多重要。这意味着他们错过了最后一次机会来阻止灾难的发生。
In implementing the new order routing rules, developers had
repurposed an old flag used for a Power Peg function that had been
dormant for several years and had not been tested for a long time.
When the new rule was turned on, this “dead code” was resurrected
accidentally on the one server that had not been correctly updated.
在实现新的订单路由规则时,开发人员重新调整了Power Peg的一个旧标志的用途,该标志已经多年未使用,并很久没有测试过了。当新规则被启用时,这个“死代码”在那台未正确更新的服务器上被意外启用。
When the market opened, everything went to hell quickly. The
server that was still running the old code rapidly fired off millions
of child orders into the markets—far more orders than should have
been created. This wasn’t stopped by checks in Knight’s system,
because the limits checks in the dead code had been removed years
before. Unfortunately, many of these child orders matched with
counterparty orders at the exchanges, resulting in millions of trade
executions in only a few minutes.
当市场开市时,一切很快陷入了地狱。这个仍在运行旧代码的服务器迅速向市场发出了数百万个子订单,远远超过了应有的数量。这些订单并没有被骑士的系统所阻止,因为死代码中的限制检查多年以前就已被删除。不幸的是,这些子订单中有许多在交易所与交易对手方的订单成交了,导致在几分钟内执行数百万笔交易。
Once they realized that something had gone badly wrong, operations
at Knight rolled back the update—which meant that all of the
servers were now running the old code, making the problem temporarily
much worse before the system was finally shut down.
当他们意识到发生了严重的错误,骑士的运营人员回滚了更新,这意味着现在所有服务器都在运行旧代码,在系统最终关闭之前短时间内让问题更加糟糕。
The incident lasted a total of around 45 minutes. Knight ended up with a portfolio of stock worth billions of dollars, and a shortfall of 460 million. The company needed an emergency financial bailout from investors to remain operational, and four months later the financially weakened company was acquired by a competitor. The SEC fined Knight 12 million for several securities law violations,and the company also paid out $13 million in a lawsuit.
这起事件总共持续了45分钟左右。骑士总共持有了价值数十亿美元的股票投资组合,并亏空了4亿6000万美元。公司需要从投资者那获得紧急财政援助以继续运营,四个月后这家财务状况不佳的公司被竞争对手收购。美国证券交易委员会因几起违反证券法的行为,对骑士处以1200万美元的罚款,该公司还支付了1300万美元的诉讼费。
In response to this incident (and other recent high-profile system failures in the financial industry), the SEC, FINRA, and ESMA have all introduced new guidelines and regulations requiring additional oversight of how financial market systems are designed and tested,and how changes to these systems are managed.
作为对这一事件(以及其他最近备受关注的系统金融业的故障)的回应,SEC、FINRA和ESMA都引入了新的指南和法规,要求需要额外监督金融市场系统的设计和测试,以及如何管理这些系统的变更。

你可能感兴趣的:(《DevOps for Finance》CHAPTER 1 -System Complexity and Interdependency-案例)