Amazon’s newly AI-powered Just Walk Out (JWO) system can now simultaneously process data from multiple sources, including overhead cameras, weight sensors on shelves and RFID, to more accurately connect products to the customers purchasing them. This new multi-modal foundation model uses the same machine learning models underlying generative AI applications, but applies them to the physical store environment.
“We can combine all these different signals into a single model in order to teach a machine learning system to operate on the entirety of information all at once, meaning we can generate receipts faster, more efficiently and more accurately than the existing system — which was already pretty good,” said Jon Jenkins, VP of Just Walk Out during a July 30 briefing on the technology.
The JWO technology, launched in 2018 and now in 170 third-party retail locations, had previously analyzed shoppers’ behavior — their movement and location in the store, what they picked up and the quantity of each item — sequentially. But in unusual shopping scenarios (e.g. a camera view that was obscured due to bad lighting or a nearby shopper), this sequential method could take time to determine purchases with confidence, and in some cases required manual retraining of the model.
The new JWO looks at multiple data inputs simultaneously and prioritizes the most important ones to more accurately determine the variety and quantity of items customers select. To support continuous learning, JWO systems are trained on a 3D planogram map of the store to understand placement of fixtures and products, along with an image catalog of the store’s merchandise so it can visually recognize items more accurately. This enables the new AI-powered system to recognize shopper behavior even if the store is remerchandised or items are rearranged or misplaced.
Advertisement
Taming the Challenge of Conflicting Data Points
Jenkins described how the new AI-powered system operates: “With the 3D planogram, the system ‘knows’ what should be in every location, and cameras on the ceiling can see what your hand brings off of a shelf,” he said. “Does it match? If so, all is well. There’s also a weight sensor on the shelf that confirms that, for example, a 16-ounce can of Red Bull just left the shelf. But if someone had replaced it with a can of Coke, the weight sensor might show 12 ounces and the camera would see the red and white of a Coke can.
“Traditionally, the system has had a hard time dealing with these conflicts, but when it’s all brought together, it can make decisions,” Jenkins added, noting that in some cases the system has a higher confidence level in certain data inputs over others. “That’s a huge advantage in terms of accuracy and efficiency,” he noted.
While many JWO deployments have been in smaller venues such as convenience stores, stadium shops and hospital cafés, Jenkins said that the tech had been employed in stores as large as “40,000 square feet, with the full diversity of product that you’d see in a grocery store. We’ve trained a model that is really amazing at capturing the diversity of ways people interact.”