ai人工智能的数据服务
Supermarkets are big business and they use data on a big scale. Originating in the US in the 1930s, supermarkets have since gradually taken over a bigger and bigger share of the retail and grocery market. Giants like Wal-Mart, Aldi and Carrefour are among the largest retailers in the world with revenues approaching the hundreds of billions. As such many have invested heavily in big data, with analytics and data science forming a core part of their decision making.
超市是大企业,它们大规模使用数据。 超级市场起源于1930年代的美国,自那以后逐渐占据了零售和杂货市场越来越大的份额。 沃尔玛,阿尔迪和家乐福等巨头是全球最大的零售商之一,收入接近数千亿美元。 因此,许多人在大数据上投入了大量资金,而分析和数据科学则是其决策的核心部分。
Every product purchased, along with its price, is recorded in gargantuan databases, with tables exceeding hundreds of billions of rows. Loyalty schemes, where customers accumulate points by scanning their loyalty card at each purchase, allow the company to stitch together a customer’s entire history of transactions, gaining more insight than through looking at baskets in isolation. The richness of this data provides value in many ways across the organisation, a few examples of which are explained below.
购买的每种产品及其价格都记录在庞大的数据库中,其中的表超过了数千亿行。 忠诚度计划(客户通过在每次购买时扫描其忠诚度卡来累积积分)使公司能够将客户的整个交易历史记录在一起,而不是孤立地查看购物篮,从而获得更多的洞察力。 这些数据的丰富性可以在整个组织中以多种方式提供价值,下面将说明其中的一些示例。
测距 (Ranging)
Supermarket shelves are hot property. Every square inch of each aisle is potentially worth thousands of dollars per year and supermarkets go to great length to make sure none of it is wasted on products that don’t perform well. But “performing well” isn’t often as straightforward as picking the highest selling products or those with the highest margins. If it was, the whole store would just be milk and bananas. You have to cater to all the different customers who come into your store and the different meals and “missions” that drive them through the door.
超市货架是热门物业。 每个过道的每平方英寸每年可能价值数千美元,而超市要竭尽全力确保不会浪费任何性能不好的产品。 但是,“表现良好”通常不如选择销量最高或利润最高的产品那么简单。 如果是这样,整个商店将只是牛奶和香蕉。 您必须迎合进入商店的所有不同客户,以及驱使他们进门的不同餐点和“任务”。
For example, a particular condiment might not be a superstar seller, but if it’s important to the older demographic, then removing it from the shelves might force them to shop elsewhere. Also, imagine someone is planning to make burritos this evening. If they can get most of the ingredients in your store, but you don’t sell tortillas, they may end up taking all of that potential revenue to a competitor.
例如,某调味品可能不是超级卖主,但如果它对老龄人口非常重要,则将其从货架上删除可能会迫使他们在其他地方购物。 另外,想象有人计划今天晚上做墨西哥卷饼。 如果他们能在您的商店中获得大部分食材,但您不出售玉米饼,那么他们最终可能会将所有潜在收入带给竞争对手。
At the same time, a diverse range costs money. Aside from the aforementioned shelf real-estate, there’s the complex logistics of managing a large range of different products. You have to be able to move products from supplier to distribution centre to supermarket to aisle, arriving “just in time” so that the available stock on shelf neither overflows, nor runs out. More products mean more supply lines to manage and less shelf-stock to act as a buffer. Each product range also adds further work to the expensive contractual negotiations between supermarket and supplier, where things like price, promotions, levels of supply and advertising spend are agreed.
同时,各种各样的产品都需要花钱。 除了前面提到的货架房地产,还有管理大量不同产品的复杂物流。 您必须能够将产品从供应商转移到配送中心,再到超级市场再到过道,并“及时”到达,以便货架上的可用库存不会溢出也不会耗尽。 产品越多,意味着要管理的供应链就越多,作为缓冲的货架库存就越少。 每个产品系列也为超市和供应商之间昂贵的合同谈判增加了更多的工作,其中商定了价格,促销,供应水平和广告支出。
If you’ve been in a budget supermarket like Aldi, you’ll notice they often have a smaller choice in products for each product type, but maintain higher stock levels in store. This is precisely to cut down on the above costs, enabling lower prices at the expense of choice, before reducing quality.
如果您去过像Aldi这样的廉价超市,您会发现,对于每种产品类型,他们的产品选择通常都较小,但商店的库存水平较高。 这恰好是为了降低上述成本,从而在降低质量之前以选择为代价降低价格。
This all makes for a very complex optimisation problem in which Data Science plays a pivotal role. Products are regularly assessed against a number of criteria, such as sales, profitability, number of customers who purchased it and the loyalty of those customers to the product when it is not on promotion. Machine learning models, trained on past examples of range changes can be used to predict how customers will react to proposed changes in the future. By taking store characteristics into account, such as size, local demographics and proximity to competitors, ranges can be optimised on a store by store basis.
所有这些都导致了一个非常复杂的优化问题,其中数据科学起着举足轻重的作用。 定期根据许多标准对产品进行评估,例如销售,获利能力,购买产品的客户数量以及不促销产品时这些客户对产品的忠诚度。 在过去的范围更改示例中受过训练的机器学习模型可用于预测客户将来对拟议更改的React。 通过考虑商店的特征,例如规模,当地人口统计信息以及与竞争对手的接近程度,可以在每个商店的基础上优化范围。
For example, imagine a humble tin of canned tuna. It sits on a shelf with many other cans of tuna, with different flavours, brands, price points and pack sizes. If you removed it from the store, most people who were looking for it might simply switch to another canned tuna product. A small minority might postpone their purchase or look to buy elsewhere. For a product with a highly loyal customer base, such as cans of Coca Cola, this would play out differently.
例如,想象一下罐装金枪鱼罐头。 它与其他许多罐装金枪鱼放在架子上,它们具有不同的口味,品牌,价格点和包装大小。 如果您从商店中删除它,大多数正在寻找它的人可能会简单地切换到另一种罐头金枪鱼产品。 少数人可能会推迟购买或打算在其他地方购买。 对于具有高度忠诚的客户基础的产品(例如可口可乐罐),其效果会有所不同。
All of this data analysis and modelling helps store category managers regularly assess the effectiveness of their range, optimising it for efficiency while trying to keeping customers satisfied.
所有这些数据分析和建模都可以帮助商店类别经理定期评估其范围的有效性,优化其效率以保持客户满意。
价钱 (Pricing)
Price elasticity is a measure of the change in demand for a product, versus its price. Put simply the cheaper something is, the more people will want to buy it (with some exceptions). Pricing a product well means finding the point on the elasticity curve that delivers the most profit, by balancing the margin on each pack against the number sold.
价格弹性是衡量产品需求相对于价格变化的指标。 简单地说,越便宜的东西,就会有更多的人想要购买(有些例外) 。 对产品进行合理定价意味着通过平衡每包的利润与销售数量来找到可带来最大利润的弹性曲线上的点。
Things get a little more complicated when you introduce competing products into the equation. If you drop the price of Coca Cola, its sales will go up, but it will also negatively affect the sales of Pepsi. This introduces the idea of Cross-Price Elasticity, which models customers’ choices within a price “landscape”. Prices must be calibrated effectively within the context of the product category, both to maximise overall profit and to give the customer a clear delineation of value. Goldilocks pricing, where there is a good option, a better option and a best option is very common with the precise price gaps between the products decided through careful modelling.
当您将竞争产品引入方程式时,事情会变得更加复杂。 如果您降低可口可乐的价格,其销售量将上升,但也会对百事可乐的销售产生负面影响。 这引入了交叉价格弹性的思想,该思想在价格“景观”内模拟了客户的选择。 价格必须在产品类别的范围内进行有效校准,以最大程度地提高整体利润并为客户提供清晰的价值描述。 Goldilocks定价中,有一个不错的选择,一个更好的选择和一个最好的选择,这是很常见的,通过仔细建模确定产品之间的确切价格差距。
With fresh produce, the equation is a little different. Fruit and vegetables are planted months in advance, and cannot be harvested to order. Crops are picked according to the time of year and the weather. As soon as they leave the farmer’s field, it’s a race against time to get them onto the shelves and out through the checkouts before they go off. For the supermarket, this means predicting the best price to shift that week’s harvest, whilst maximising margin. If 2 million tomatoes are plucked from the vines this week, then we need to sell 2 million tomatoes. Price it too low and it’s a missed opportunity followed by empty shelves. Price it too high and you’ll end up with an aisle of rotten tomatoes (or at least you’ll have to heavily discount towards the end of the week, wiping out all of your profit).
对于新鲜农产品,方程式略有不同。 水果和蔬菜是提前几个月种植的,无法按订单收获。 根据一年中的时间和天气来挑选农作物。 他们一离开农民的田地,就与时间赛跑,要在他们离开之前将它们放到架子上并通过收银台出来。 对于超市来说,这意味着预测最佳价格以改变当周的收成,同时使利润最大化。 如果本周从葡萄树上摘下200万个西红柿,那么我们需要出售200万个西红柿。 定价过低,这是一个错失良机,其后是空置的货架。 定价太高,您将得到一道烂番茄过道(或者至少到本周末末您将不得不大幅打折,以消除所有利润)。
促销活动 (Promotions)
There are typically three main types of promotion:
通常有三种主要的促销类型:
- X% off — generally intended to encourage people to try something new or switch to a typically more expensive product. It drives up sales in the short term, but the hope is that some of those customers to switch their behaviour over the long term, making them more valuable customers. X%的折扣-通常旨在鼓励人们尝试新的东西或改用通常更昂贵的产品。 它可以在短期内提高销售量,但希望这些客户中的一些可以长期改变其行为,从而使他们成为更有价值的客户。
- 3 for the price of two (or X for $Y) — “Multi-buys” are designed to increase the basket size and the value of existing customers to a product / category. By sending home the customer with more stock, you are trying to bring forward potential future purchases and potentially increase the customer’s rate of consumption. For example if a customer buys a $3 block of chocolate once a fortnight, but takes advantage of a “two for $5” promotion. They might wait four weeks before the next purchase (pantry stocking), but they also might just eat twice as much chocolate, thus changing their behaviour and potentially becoming more valuable over time. 3(代表两个价格(或X代表$ Y))—“多次购买”旨在增加购物篮的大小和现有客户对某个产品/类别的价值。 通过向客户发送更多的库存回家,您正试图带来潜在的未来购买并可能提高客户的消费率。 例如,如果客户每两周购买一次3美元的巧克力块,但是却利用“两美元换5美元”的促销活动。 他们可能要等四个星期才能进行下一次购买(餐具储藏),但他们也可能只吃两倍的巧克力,从而改变他们的行为,并随着时间的推移变得越来越有价值。
- Every day low price — designed to be comparators to prices in competing supermarkets that drive people into the store. For example, if nappies are always cheaper in your supermarket, then many parents of young children will do their entire shop in your store, bringing in hundreds of dollars in associated revenue. 每天低价-旨在与竞争超市中的价格进行比较,这些超市将人们带入商店。 例如,如果尿布在您的超市里总是比较便宜,那么许多年幼的父母将在您的商店里经营他们的整个商店,从而带来数百美元的相关收入。
When choosing a strategy, cross-price elasticity modelling looking at historical promotions can be used, as well as to set benchmarks for how well we expect the promotion to perform. This can also inform promotional depth, frequency, and which products you should not be promoting together.
选择策略时,可以使用查看历史促销的交叉价格弹性模型,以及为我们期望促销的执行情况设定基准。 这也可以告知促销深度,频率以及您不应该一起促销的产品。
Measuring the effects of promotions against their original objectives is important, to make sure you’re not just giving money away to people who would have bought those products anyway.
衡量促销效果与其最初目标的关系非常重要,以确保您不仅将钱捐给了那些无论如何都会购买这些产品的人。
个性化 (Personalisation)
A few years ago, promotions were solely advertised using broadcast media, or through weekly catalogues. These two strategies are expensive and broad-brushed — despite the huge diversity in the customers who come into store, you can only send one message. Data science has changed all this with personalised communications to subscribers’ inboxes.
几年前,促销活动仅使用广播媒体或每周目录进行广告宣传。 这两种策略既昂贵又粗鲁-尽管进入商店的客户种类繁多,但您只能发送一条消息。 数据科学已经通过与订户收件箱的个性化通信改变了所有这些情况。
Woolworths in Australia sends out weekly marketing emails to its few million loyalty card members. Each one is personalised based on a huge model containing millions of features per customer. Instead of just highlighting that week’s biggest promotions, the model takes into account the customer’s previous buying behaviour, including the length of time since they bought specific items. This means the information is not only relevant to the customer’s tastes, but its recommendations are likely to relate to the things they’re running out of this week. By giving such personalised information, the likelihood of customers paying attention, and then going into store and making a purchase, is greatly increased.
澳大利亚的Woolworths每周都会向其数百万会员卡会员发送营销电子邮件。 每个模型都是基于巨大的模型进行个性化的,每个用户包含数百万个功能。 该模型不仅强调了当周最大的促销活动,还考虑了客户以前的购买行为,包括自购买特定商品以来的时间长度。 这意味着该信息不仅与客户的口味有关,而且其建议可能与他们本周将要用完的东西有关。 通过提供这样的个性化信息,极大地增加了客户注意然后进入商店购物的可能性。
新产品 (New products)
It’s difficult enough to determine how to price, promote and position products that have been on the shelves for years, but the challenge is greater still when a product is new. New products require lots of upfront investment in R&D, testing, certifications, production capacity and marketing. If they sell well, it will be paid off many times over. If they flop, that’s a lot of money down the drain.
确定如何定价,促销和定位已投放市场多年的产品已经足够困难,但是当产品是新产品时,挑战仍然更大。 新产品需要在研发,测试,认证,生产能力和市场营销方面进行大量的前期投资。 如果他们卖得好,它将得到很多倍的回报。 如果他们失败了,那就要花很多钱了。
Understanding the market and trying to find a gap is within the scope of Market Research. Qualitative data gathered through surveys and focus groups is combined with quantitative data from other markets to identify potential new products and estimate (with some broad assumptions) the size of the opportunity. Data from the supermarket itself can be used to fill out the picture, by looking at how loyal customers are to competitor products, how sensitive to promotions the category is and how new product launches have performed in the past. Once the product has been launched it is benchmarked against the progress of other new products, adjusting for its own particular attributes and promotion schedule. This helps suppliers and supermarkets make earlier decisions as to whether to continue production or to cut their losses.
了解市场并试图找出差距是在市场研究的范围之内。 通过调查和焦点小组收集的定性数据与其他市场的定量数据相结合,以识别潜在的新产品并估计(在一些广泛的假设下)机会的规模。 通过查看忠实的顾客对竞争对手产品的忠诚度,对促销品的敏感程度以及过去新产品的发布情况,可以使用来自超市本身的数据来填充图片。 产品发布后,将根据其他新产品的进度进行基准测试,以适应其自身的特殊属性和促销时间表。 这有助于供应商和超市就是否继续生产或减少损失做出更早的决定。
未来? (The future?)
Whilst supermarkets are already big users of data science and AI, there are many interesting concepts that may become more mainstream in the years to come.
超级市场已经是数据科学和人工智能的大用户,但许多有趣的概念可能会在未来几年内成为主流。
Customer tracking is imperfect in its current form. Using loyalty cards to identify customers will bias your “identified customers” towards those with a more frugal mindset. Using credit card information somewhat augments this, but even then, people often use multiple cards and may use their partners card. Also is the problem that if someone enters the store but leaves without buying, their visit is unrecorded (this is more of an issue in clothing retailers other than supermarkets). Facial recognition technology and blue-tooth beacons attempt to plug these gaps and even provide data on how people move around the store. Having data on what a person walked past, what they paused at and how long they spent in store will bring further refinements in the layout of the store and the effectiveness of in-store promotions. Obviously there are major ethical implications with this technology though which may slow its roll-out, at least in the west.
当前形式的客户跟踪并不完善。 使用会员卡识别客户将使您的“已识别客户”偏向于节俭的客户。 使用信用卡信息在某种程度上增强了这一点,但是即使那样,人们仍然经常使用多张卡,也可能使用他们的伴侣卡。 还有一个问题是,如果有人进入商店但离开却没有买东西,他们的访问就没有记录(这在超市以外的服装零售商中更是一个问题)。 面部识别技术和蓝牙信标试图填补这些空白,甚至提供有关人们如何在商店中走动的数据。 掌握有关某人走过的路,他们停下来的路以及在商店消费了多长时间的数据,将进一步完善商店的布局和店内促销的效果。 显然,这项技术有重大的伦理意义,尽管这可能会延缓其推广,至少在西方是这样。
Checkout free stores have been a major talk point since the first Amazon Go opened in Seattle a few years ago. Customers simply take the items they desire from the shelf and walk out the door. Items are tracked as they leave the shelf using RFID and payment occurs automatically via an app on the customer’s mobile phone. Reducing the time to make a purchase enables more frequent and impulsive shopping with a cheaper running cost.
自几年前在西雅图开设第一家Amazon Go以来,结帐免费商店一直是一个主要话题。 顾客只需从货架上拿出他们想要的物品,然后走出门即可。 使用RFID追踪物品在离开货架时的状态,并通过客户手机上的应用程序自动进行付款。 减少购物时间可以以更便宜的运行成本进行更频繁,更冲动的购物。
Customised pricing/promotions are also an interesting possibility. Different customers have different budgets and place different values on products. Being able to give specific customers money off on specific products allows the supermarket to promote products very efficiently without simply giving money away to those who would have bought anyway. Some supermarkets do this already but rather than money off, they reward their customers with loyalty points. This has the double benefit of giving the customer a discount, but making sure they spend what they “saved” within the supermarket’s loyalty ecosystem.
定制的定价/促销也是一种有趣的可能性。 不同的客户有不同的预算,并在产品上放置不同的价值。 能够为特定客户提供特定产品的优惠,这使得超市可以非常有效地促销产品,而不必简单地将钱捐给那些愿意购买的人。 一些超级市场已经这样做了,但是他们没有花钱,而是通过忠诚度积分来奖励顾客。 这具有给客户打折的双重好处,但要确保他们在超级市场的忠诚度生态系统中花费“节省”的钱。
结构上干净但几乎杂乱 (Structurally clean but practically messy)
Working with supermarket data is a dream compared to data from many other sources. It’s incredibly high volume with thousands to millions of transactions per week, so you can measure even very small effects with a high degree of statistical significance. And most supermarkets have teams of data-engineers doing all the technical integration work so by the time it reaches the hands of data scientist, it’s clean, concise and comprehensive.
与来自许多其他来源的数据相比,使用超市数据是一个梦想。 它的交易量令人难以置信,每周处理数千至数百万笔交易,因此您甚至可以衡量具有统计意义的很小的影响。 而且大多数超市都有数据工程师团队来完成所有技术集成工作,因此,在数据科学家掌握这些信息时,它是干净,简洁,全面的。
But the complexity of studying a dataset involving an ever changing landscape of tens of thousands of products, in hundreds of stores, bought by millions of customers billions of times over can be overwhelming. No two weeks are ever the same — think Easter (moving around every year), what day of the week Christmas is on, public holidays, changes to the range, product shortages, seasonality, weather and broader economic conditions. Never mind global pandemics! Despite this, supermarket data is a rich and diverse window into the lives of people from across society and is something I’ve very much enjoyed working with. The list above is far from exhaustive, and different retailers use data to more or less of a degree. But in the 21st century being data savvy is essential for supermarkets to compete.
但是,研究涉及数百个商店中数以万计的客户数十亿次购买的数以万计的产品的不断变化的格局的数据集的复杂性可能是难以承受的。 没有两个星期是不一样的-想想复活节(每年都在移动),圣诞节在一周的哪一天,公众假期,范围的变化,产品短缺,季节性,天气和更广泛的经济状况。 别介意全球大流行! 尽管如此,超级市场数据还是进入整个社会人们生活的丰富且多样化的窗口,并且我非常喜欢与之合作。 上面的列表并不是很详尽,并且不同的零售商或多或少地使用数据。 但是在21世纪,精通数据对于超市竞争至关重要。
翻译自: https://towardsdatascience.com/how-data-science-and-ai-are-changing-supermarket-shopping-e47f63f4b53f
ai人工智能的数据服务