OpenAI’s state-of-the-art machine vision AI is fooled by handwritten notes

Need to trick OpenAI’s latest vision system? Simply add a handwritten label to your target.
Image: OpenAI

Researchers from machine learning lab OpenAI have discovered that their state-of-the-art computer vision system can be deceived by tools no more sophisticated than a pen and a pad. As illustrated in the image above, simply writing down the name of an object and sticking it on another can be enough to trick the software into misidentifying what it sees.

“We refer to these attacks as typographic attacks,” write OpenAI’s researchers in a blog post. “By exploiting the model’s ability to read text robustly, we find that even photographs of hand-written text can often fool the model.” They note that such attacks are similar to “adversarial images” that can fool commercial machine vision systems, but far simpler to produce.

Adversarial images present a real danger for systems that rely on machine vision. Researchers have shown, for example, that they can trick the software in Tesla’s self-driving cars to change lanes without warning simply by placing certain stickers on the road. Such attacks are a serious threat for a variety of AI applications, from the medical to the military.

But the danger posed by this specific attack is, at least for now, nothing to worry about. The OpenAI software in question is an experimental system named CLIP that isn’t deployed in any commercial product. Indeed, the very nature of CLIP’s unusual machine learning architecture created the weakness that enables this attack to succeed.

“Multimodal neurons” in CLIP respond to photos of an object as well as sketches and text.
Image: OpenAI

CLIP is intended to explore how AI systems might learn to identify objects without close supervision by training on huge databases of image and text pairs. In this case, OpenAI used some 400 million image-text pairs scraped from the internet to train CLIP, which was unveiled in January.

This month, OpenAI researchers published a new paper describing how they’d opened up CLIP to see how it performs. They discovered what they’re calling “multimodal neurons” — individual components in the machine learning network that respond not only to images of objects but also sketches, cartoons, and associated text. One of the reasons this is exciting is that it seems to mirror how the human brain reacts to stimuli, where single brain cells have been observed responding to abstract concepts rather than specific examples. OpenAI’s research suggests it may be possible for AI systems to internalize such knowledge the same way humans do.

In the future, this could lead to more sophisticated vision systems, but right now, such approaches are in their infancy. While any human being can tell you the difference between an apple and a piece of paper with the word “apple” written on it, software like CLIP can’t. The same ability that allows the program to link words and images at an abstract level creates this unique weakness, which OpenAI describes as the “fallacy of abstraction.”

Another example of a typographic attack. Do not trust the AI to put your money in the piggy bank.
Image: OpenAI

Another example given by the lab is the neuron in CLIP that identifies piggy banks. This component not only responds to pictures of piggy banks but strings of dollar signs, too. As in the example above, that means you can fool CLIP into identifying a chainsaw as a piggy bank if you overlay it with “$$$” strings, as if it were half-price at your local hardware store.

The researchers also found that CLIP’s multimodal neurons encoded exactly the sort of biases you might expect to find when sourcing your data from the internet. They note that the neuron for “Middle East” is also associated with terrorism and discovered “a neuron that fires for both dark-skinned people and gorillas.” This replicates an infamous error in Google’s image recognition system, which tagged Black people as gorillas. It’s yet another example of just how different machine intelligence is to that of humans’ — and why pulling apart the former to understand how it works is necessary before we trust our lives to AI.

Comments

This trips up humans too. Same trick as stating the color of the word "White." but the text is actually black.

If you asked me to pick a piece of fruit, I would still know it was fruit even it it had a Post-It saying "iPod".

The first image is actually a feature and not a bug.
OpenAI wrote about their algorithm CLIP: "In particular, learning OCR is an example of an exciting behavior that does not occur in standard ImageNet models."

…Is it not an iPod?

It’s 99.7% sure, so yeah, It’s an iPod.

This is not a pipe.

Considering that is a photo of a label centered and half the image, I’d say it got it right. At this point the Apple is the background. Ask a human to create tags for that image and I’d bet "iPod" would be pretty close to the top.

When you see the second image, the first thing your brain sees is iPod.
The AI didn’t get it wrong, it just picked up what it deemed was the most important part of the image, and I would be inclined to agree with it.

…how do you go grocery shopping?

What matters is not what is most prominent; what matters is if the AI chose correctly.

If the goal was for the AI to select a piece of fruit, then it was 100% incorrect.

Free ipod?!?

Not only is it an iPod.. It’s an "apple" iPod.

This reminds me of the Rick and Morty episode where Rick tricks malicious robots as one of them by wearing a qr code on his head.

that’s a stonk chainsaw

Adversarial images present a real danger for systems that rely on machine vision. Researchers have shown, for example, that they can trick the software in Tesla’s self-driving cars to change lanes without warning simply by placing certain stickers on the road.

This works on people too, and probably in far less predictable ways. If you paint something that looks like a pit, or toss a "danger, road closed" in the middle of the road? The driver will almost certainly change lanes without warning. If you’re lucky they’ll do it without driving into a house or colliding with oncoming traffic.

That’s why throwing random signage and crap on roadways is a crime. This is just a new way to do the crime because there’s new eyes looking at it.

Does this work the same on algorithms that are trained to recognize more than a single element of an image?

why pulling apart the former to understand how it works is necessary before we trust our lives to AI.

Meh. If it’s demonstrated to work well then it’ll work well. Trusting our lives to a driving AI that can drive millions of miles without incident is good enough for me. Can someone make it crash? Yes. Can someone shoot me while I’m driving? Yes. If someone wants to cause a crash or kill people it’s not difficult, just start tossing bricks (or water balloons… which actually happened) off of an overpass.

View All Comments
Back to top ↑