Analysis | Did a Robot Write This? We Need Watermarks to Spot AI - The Washington Post

A talented scribe with stunning creative abilities is having a sensational debut. ChatGPT, a text-generation system from San Francisco-based OpenAI, has been writing essays, screenplays and limericks after its recent release to the public, usually in seconds and often to a high standard. Even its jokes can be funny. Many scientists in the field of artificial intelligence have marveled at how humanlike it sounds.

And remarkably, it will soon get better. OpenAI is widely expected to release its next iteration known as GPT-4 in the coming months, and early testers say it is better than anything that came before.(1)

But all these improvements come with a price. The better the AI gets, the harder it will be to distinguish between human and machine-made text. OpenAI needs to prioritize its efforts to label the work of machines or we could soon be overwhelmed with a confusing mishmash of real and fake information online.

For now, it’s putting the onus on people to be honest. OpenAI’s policy for ChatGPT states that when sharing content from its system, users should clearly indicate that it is generated by AI “in a way that no reader could possibly miss” or misunderstand.

To that I say, good luck.

AI will almost certainly help kill the college essay. (A student in New Zealand has already admitted that they used it to help boost their grades.) Governments will use it to flood social networks with propaganda, spammers to write fake Amazon reviews and ransomware gangs to write more convincing phishing emails. None will point to the machine behind the curtain.

And you will just have to take my word for it that this column was fully drafted by a human, too.

AI-generated text desperately needs some kind of watermark, similar to how stock photo companies protect their images and movie studios deter piracy. OpenAI already has a method for flagging another content-generating tool called DALL-E with an embedded signature in each image it generates. But it is much harder to track the provenance of text. How do you put a secret, hard-to-remove label on words?

The most promising approach is cryptography. In a guest lecture last month at the University of Texas at Austin, OpenAI research scientist Scott Aaronson gave a rare glimpse into how the company might distinguish text generated by the even more humanlike GPT-4 tool.

Aaronson, who was hired by OpenAI this year to tackle the provenance challenge, explained that words could be converted into a string of tokens, representing punctuation marks, letters or parts of words, making up about 100,000 tokens in total. The GPT system would then decide the arrangement of those tokens (reflecting the text itself) in such a way that they could be detected using a cryptographic key known only to OpenAI. “This won’t make any detectable difference to the end user,” Aaronson said.

In fact, anyone who uses a GPT tool would find it hard to scrub off the watermarking signal, even by rearranging the words or taking out punctuation marks, he said. The best way to defeat it would be to use another AI system to paraphrase the GPT tool’s output. But that takes effort, and not everyone would do that. In his lecture, Aaronson said he had a working prototype.(2)

But even assuming his method works outside of a lab setting, OpenAI still has a quandary. Does it release the watermark keys to the public, or hold them privately?

If the keys are made public, professors everywhere could run their students’ essays through special software to make sure they aren’t machine-generated, in the same way that many do now to check for plagiarism. But that would also make it possible for bad actors to detect the watermark and remove it.

Keeping the keys private, meanwhile, creates a potentially powerful business model for OpenAI: charging people for access. IT administrators could pay a subscription to scan incoming email for phishing attacks, while colleges could pay a group fee for their professors — and the price to use the tool would have to be high enough to put off ransomware gangs and propaganda writers. OpenAI would essentially make money from halting the misuse of its own creation.

We also should bear in mind that technology companies don’t have the best track record for preventing their systems from being misused, especially when they are unregulated and profit-driven. (OpenAI says it’s a hybrid profit and nonprofit company that will cap its future income.) But the strict filters that OpenAI has already put place to stop its text and image tools from generating offensive content are a good start.

Now OpenAI needs to prioritize a watermarking system for its text. Our future looks set to become awash with machine-generated information, not just from OpenAI’s increasingly popular tools, but from a broader rise in fake, “synthetic” data used to train AI models and replace human-made data. Images, videos, music and more will increasingly be artificially generated to suit our hyper-personalized tastes.(3)

It’s possible of course that our future selves won’t care if a catchy song or cartoon originated from AI. Human values change over time; we care much less now about memorizing facts and driving directions than we did 20 years ago, for instance. So at some point, watermarks might not seem so necessary.

But for now, with tangible value placed on human ingenuity that others pay for, or grade, and with the near certainty that OpenAI’s tool will be misused, we need to know where the human brain stops and machines begin. A watermark would be a good start.