Much has been made about the “AI arms race” between countries (primarily the U.S. and China), but in America a similarly heated contest is unfolding among the major tech companies. The latter competition could have broad implications for any business using these platforms to connect with consumers…so, just about everyone.
Last month, Google, Meta and TikTok all debuted more AI-based capabilities for their customers (that is, advertisers, creators and merchants). The announcements were just the latest in a rolling slate of gen AI tools for advertisers and sellers that have been introduced in the past year across these platforms and many others, including Amazon, Wix, Ebay, BigCommerce, Shopify and Cart.com.
While there are differences between the capabilities each platform is creating for its customers, what’s perhaps most important is their similarity — they use generative AI models to speed up and streamline the process of creating advertising or product listing assets. The purported end result is more content that is more effective, created more quickly and economically, which hypothetically will make everyone more money — both the businesses that create these more varied, personalized, relevant listings and the places where they spend their money to list them.
But before you dive in, there are a few nagging questions that all brands should be asking about these “magical” new AI tools.
Advertisement
What content is being used to train these gen AI models (and where did it come from)?
Many of the gen AI tools being made available to advertisers and sellers on these tech platforms offer to generate images or text, such as varied backgrounds for a product ad or product listings (jobs that typically try the patience of the humans tasked with them). But while much of this content seems mundane — generic seasonal or landscape images, for example — that’s the very reason brands need to question their provenance.
While getting some help beefing up creative for the countless ads or product listings that companies must create to stand out online might seem fairly innocuous, it should be clear by now that this can be a slippery slope. Many people may not care that their casual beach or food snaps may be among the thousands of similar images used to train AI models so that it can create replicas, but ask Scarlett Johansson how it feels when they use your voice (a charge OpenAI has refuted, but there’s no denying the resemblance was eerie). What about videos of your kids or pets, your face, your body?
Based on what the leaders in gen AI have shared publicly so far, it’s a fair assumption that this is exactly what is happening in some cases, especially for any content that has been “shared publicly.” And it goes beyond just copyrighted work or the likenesses of professionals and celebrities; if individuals think their own personal content is immune from the AI “scrape,” they should think again.
At a tech summit last month, Meta’s Chief Product Officer Chris Cox said that Instagram and Facebook have an advantage in gen AI because of all the “public” photos available to them, PetaPixel reported. “We don’t train on private stuff, we don’t train on stuff that people share with their friends; we do train on things that are public,” Cox explained, but how is that distinction being made in a forum as public as social media?
Cox also said that Meta’s text-to-image AI model Emu is able to make such high-quality images because of “Instagram being the data set that was used to train it,” describing the social media platform as “one of the great repositories of incredible imagery.” That is certainly true, but it’s also true that almost no one foresaw this use case when they added their images to that repository.
Google also has confirmed that it uses “publicly available” content from across the web to train its various AI models, including its AI chatbot Google Bard. “Our privacy policy has long been transparent that Google uses publicly available information from the open web to train language models for services like Google Translate,” Google spokesperson Christa Muldoon told The Verge after Gizmodo reported an update in the company’s privacy policy to include the same disclosure for Bard and Cloud AI. “This latest update simply clarifies that newer services like Bard are also included. We incorporate privacy principles and safeguards into the development of our AI technologies, in line with our AI Principles,” the spokesperson added.
Could potential copyright violations open up brands to litigation?
To be sure, this can all feel a bit uncomfortable on a personal level, but why is it a problem for businesses? Because if you are using AI-generated content from these platforms for your advertising and brand assets, then you could unknowingly be drawing on copyrighted material. What if by chance a generated background on one of your ads happens to end up strikingly similar to that of a nature photographer whose images are copyrighted, and that photographer is feeling litigious? Could your brand be open to infringement action? Most of the platforms that are creating these tools, including Google and Meta, are very clear that the brands own the imagery they create within their platforms. Does that mean the brand also owns responsibility if issues arise with how the images were created?
There are no clear answers yet to these questions, and as FTC Chair Lina Khan recently pointed out, regulation is often slow to catch up, especially given the rapid pace of technological innovation today. One of the first major copyright lawsuits involving generative AI highlights the complications: A trio of visual artists sued several text-to-image AI platforms for copyright infringement, but the case was dismissed in part for lack of specificity. One of the biggest issues, which the claimants are now attempting to clarify in an amended lawsuit, was the differing levels of responsibility between the original AI software — Stability AI’s Stable Diffusion — and open-source AI tools like DeviantArt and Midjourney that rely on it. The question of the culpability of businesses and individuals using these platforms goes a layer further and likely won’t be addressed by this lawsuit.
It will probably be years before legislators and the justice system weed through the murky, interconnected layers of these questions to reach a point where users have clear guidelines around fair use. That’s why “caution” and “slow” should be the watchwords of the moment, admittedly hard ideas to act on at a time when many brands are feeling frenzied pressure to not be left behind.
Companies offering these gen AI tools to their customers are quick to point out the murkiness of the situation — while absolutely not slowing down themselves.
“The question of how to navigate copyright and licensing issues with these models is an ongoing conversation taking place in the industry,” said a Google spokesperson in comments shared with Retail TouchPoints regarding its latest gen AI tools for advertisers and merchants, announced last month. The company added that, “This is another reason why we’re being thoughtful, responsible and intentional as we think about sharing generative models more widely. All images created through Product Studio still need to pass through Google’s existing policy reviews required for listing products on Google, so will be held to the same set of content standards for things like inappropriate content or misrepresentation, as any other product images on Google. Like many large models, this one was trained on a variety of sources, including some publicly available content.”
“It’s obviously a complicated area where there’s a lot of ongoing discussion just generally in the industry about exactly how that should work,” said John Hegeman, Head of Monetization at Meta, in response to a similar question posed at a recent press briefing attended by Retail TouchPoints. “For our situation we have a number of protections in place and different layers of how these models work to ensure that we’re not using data or generating results that would run afoul of [things like copyright law]. Both in terms of the data that we’re training these foundation models on, the inputs to a specific query and the outputs, we have checks in each of these different steps to try to ensure that they’re not generating things that would be inappropriately using somebody’s likeness or copyrighted material.”
As Charlie Warzel pointed out in a recent article for The Atlantic, the biggest name in the AI game, OpenAI, also resorts to similarly broad, often vague responses when asked about its training data, drawing this conclusion: “At the core of these deflections is an implication: The hypothetical superintelligence they are building is too big, too world-changing, too important for prosaic concerns such as copyright and attribution,” Warzel wrote. “The Johansson scandal is merely a reminder of AI’s manifest-destiny philosophy: This is happening whether you like it or not.”
Anyone using these technologies must ask themselves if they too believe in the manifest destiny of the AI machine. If not, it may be worth slowing down on adoption until these important issues can be more clearly defined.
Are you okay with your brand content being used to train AI?
On the flip side of the equation is whether businesses are okay with their content being used to train these AI models. One of Google’s newest innovations lets brands upload a reference picture to generate similar backgrounds for their product imagery, to help brands ensure that generated images are in line with their brand guidelines and aesthetic. But when asked, executives did not provide a clear answer as to whether those uploaded images might then be used for future training of the model.
At the very least, companies should ask themselves whether they are comfortable with the idea of assets they upload into these tools — imagery, campaigns, style guides, lookbooks, language that represents your brand’s voice, and the like — being used to potentially help other companies generate their own content at scale. If that idea makes you uncomfortable, perhaps hold off, or at the very least, take a long hard look at the specific platform’s privacy policy to see if there is any clarifying language that offers safeguards.
Of course, the sticking point comes from the oft-used fallback of “publicly available content,” which very likely means that ads you are running online or content you are posting on your public social media feeds are already being drawn on to train AI models.
Why are platforms offering this transformative tech for free?
Adages become such because they carry an inherent, timeless truth. In the case of these “free” gen AI tools, two immediately spring to mind: “There’s no such thing as a free lunch” and “follow the money.”
If the revelations around data privacy and the extent to which our personal lives were being mined by big tech over the past few decades have taught us anything, it should be that nothing online is truly “free.” We’ve all learned on a personal level that the cost of all these free tools and services we now have access to online is our data and our attention, both of which digital platforms have been able to translate into real monetary value.
The online exchange is not that different for advertisers and sellers, although it is perhaps more transparent. When companies like Meta, Google and TikTok help customers sell more on their platforms with AI-powered tools, they also make more money in the form of fees and commissions. That’s a clear win-win.
But in light of the historical precedent, one has to ask: What else might these platforms be getting in exchange? One potential answer is a rich depository of brand creative, inspirational imagery, product details and language variations, all potentially being used to train AI for future replication.
To be clear, there is no hard evidence that this is happening across the board, and as the lawsuit against Stability AI and the Open AI-Johansson debacle make clear, it’s very hard to prove. But one thing we do know is that for years, many of these companies engaged in dubious data collection and tracking practices simply because they could; either no one realized they were doing it, or the mechanisms for monitoring and regulating such actions were not yet available. That alone should encourage a healthy dose of skepticism and wariness in this new area where the stakes could be even higher.
One thing these platforms certainly get is information about what performs well and what doesn’t in their ecosystem, which is why each of them are offering up their own versions of tools that all do essentially the same thing. The more they can get people to use their tool on their platform, the more information they get to improve their standing in the AI race.
Not all of this is necessarily a bad trade-off — we do after all live in a capitalist system and companies deserve to be compensated for services they provide. The problem is how opaque these exchanges are: we should know what the “cost” is for those services, even if it doesn’t come in the form of dollars and cents.
Why should I care?
One response to all of these obstruse, circuitous questions can be a sense of fatalism: It’s already happening, so why bother trying to change it?
But the fact that the tech is advancing so rapidly is actually more reason for businesses and individuals to continue researching and probing when it comes to AI. The entities created to oversee industry are struggling to keep up, so slowing down and asking questions about AI’s direction is necessary to ensure we don’t look back in regret at the choices being made today.