Categories
AI

AI Training and the Slow Poison of Opt-Out

Asking users to opt-out of AI training is a deceptive pattern. Governments and regulators must step in to enforce opt-in as the mandated international standard. In my opinion.

In May 2024, European users of Instagram and Facebook got a new system message informing them all their public posts would be used for training AI starting June 26th. To exclude their content from this program, each user (and each business account) would have to actively opt-out – a process that requires knowing where to go and what to do. Additionally, even if you do opt out, and even if you don’t even have a Facebook account, Meta grants itself generous rights to use any content it can get its hands on for AI training. From their How Meta uses information for generative AI models and features page:

“Even if you don’t use our Products and services or have an account, we may still process information about you to develop and improve AI at Meta. For example, this could happen if you appear anywhere in an image shared on our Products or services by someone who does use them or if someone mentions information about you in posts or captions that they share on our Products and services.”

Bottom Trawling the Internet

Meta is not alone in this. The established standard for acquiring AI training data has been to scrape the internet of any publicly available data and use it as each AI company sees fit. And as with bottom trawling, the consequences to privacy, copyright, and the livelihoods of many creators are severe.

Historically, AI scraping has been done by default, without warning or even acknowledgement, often as part of general web scraping to support search indexes. As awareness of this practice has grown, some companies like Automattic (WordPress.com, Tumblr, etc) and now Meta now offer opt-out features so users can exclude their content from AI scraping, but this often comes with direct consequences to visibility and functionality. My cynical hunch is the platform companies are aware of the public pushback around these practices and they are now covering themselves legally. My hope is platforms offering an explicit opt-out potion means they have realized the wholesale scraping of the web is ethically problematic and they are at least trying to do something about it.

Here’s the thing: The opt-out is part of the problem!

Power and the Principle of Least Privilege

A few years ago I attended a conference where each attendee was given a choice to attach a black or red lanyard to their badges. Black meant the event had permission to take photos and videos of the attendee, red meant they did not. If you didn’t choose (or like me didn’t listen when it was explained) they gave you a red lanyard.

This is a real-world implementation of the Principle of Least Privilege: Photographers were only allowed to create images of people who gave explicit permission; the attendees who opted in.

At a different conference that same year I saw the reverse of this approach: Scattered around the venue were posters reading as follows:

“The [Conference] reserves the right to photograph any attendee for use in promotional materials. If you do not wish to be in the pictures, please notify the roaming photographers.”

Here, the attendees were opted in by default, and it was up to each attendee to actively opt out at each interaction with a photographer. Needless to say this is not feasible, and as a result everyone at the conference either relented to having their pictures taken or left.

I think most will agree the first conference acted ethically towards the attendees, the second did not. In fact, the second conference experienced a major backlash after the event, and the following year they handed out “NO PHOTO” stickers for attendees to put on their badges if they so desired.

There are two important takeaways here:

First, when it’s a real-world situation, most people immediately see the ethical missteps of the second conference. And second, even so most attendees stayed at the conference knowing they might be photographed against their will.

The conference created a power dynamic where people who didn’t want to be photographed were left with bad options: Constantly be on guard for photographers to tell them they did not want their picture taken, or leave the conference they paid and probably travelled to attend. It’s unethical, but it’s not explicitly illegal, and in the end it means they get more promo shots to use. So be it if some attendees are uncomfortable.

AI scraping and the current opt-out strategy falls squarely in the same category as the second conference. While the obvious ethical choice is to let people opt-in to AI scraping, an opt-out option provides just enough cover to not get sued while ensuring broad access to content because most users won’t go through the trouble of opting out – especially if you make the feature hard to find and hard to use.

My Content, My Choice

Platforms have long argued they can do what they will with user content. In fact, using user content to meet business needs is the economic basis for most platforms, and this is the bargain we’ve collectively agreed to.

Building on this premise, platforms and AI companies now want to extend this principle to AI training, claiming both that they have a right to use the data without explicit permission because it’s public, and that not being able to use it without explicit permission would make it impossible for them to operate at all.

I think it’s high time we to question both these stances:

Letting platforms do what they wish with our content was always a Devil’s bargain, and we’re now acutely aware of how bad of a deal it really was. The negative effects of surveillance capitalism, filter bubbles, and ad-driven online radicalization engines (nee “recommendation algorithms”) are plain to see and play a significant part in the erosion of everything from privacy to democracy.

The claim that an entire business category can’t be competitive unless it has free access to raw materials is one we’ve heard before, and again we know the consequences. Bottom trawls and overfishing has depleted our oceans, pollution chokes our air and waters, the exploitation of cheap labour in the global south keeps billons of people in chronic poverty. To say these are false equivalences is to ignore the reality of what we’re talking about. While the actual bits and bytes collected during an AI scrape are not a finite resource, the creative energy that went into creating them are. And the purpose of scraping data from any source is to train a machine to mimic and otherwise use that data in place of a human mind.

Opt-out is a slow poison because it puts choice just far enough away that becomes out of reach for most people. It makes a choice on our behalf and then forces us to negate it. It’s exactly opposite of how it should be.

The Choice is Ours

We are at the very beginning of a new era of technology, and we’re still figuring it all out. This means right now we have the power to make decisions, and the responsibility of making the right decisions.

This is the moment for us to learn from our mistakes with surveillance capitalism and take bold steps to build a more just and equitable world for everyone who interacts with technology.

One of the first, and most straightforward steps we can take right now is to make a simple regulation for all tech companies dealing with user data:

Users must opt-in to any change in how their data is handled.

And to protect users:

Choosing not to opt in must not impact the user experience of existing features.

This puts the onus on the AI companies to get consent when collecting data to train their models, and gives users agency to choose what if any AI training they want their data included in.

If I wanted to make a name for myself in the political realm, this is where I’d start: With a self-evident regulation protecting the rights of every person to own their own work.

We shall see.


Originally published in my newsletter.

By Morten Rand-Hendriksen

Morten Rand-Hendriksen is a Senior Staff Instructor at LinkedIn Learning (formerly lynda.com specializing in AI, bleeding edge web technologies, and the intersection between technology and humanity. He also occasionally teaches at Emily Carr University of Art and Design. He is a popular conference and workshop speaker on all things tech ethics, AI, web technologies, and open source.