Researchers have produced a collision in iOS’s built-in hash function, raising new concerns about Apple’s CSAM-scanning system — but Apple says the finding does not threaten the integrity of the system.
The flaw affects the hashing algorithm, called NeuralHash, which allows Apple to check for exact matches of known child-abuse imagery without possessing any of the images or gleaning any information about non-matching pictures.
On Tuesday, a GitHub user called Asuhariet Ygvar posted code for a reconstructed Python version of NeuralHash, which he claimed to have reverse-engineered from previous versions of iOS. The GitHub post also includes instructions on how to extract the NeuralMatch files from a current macOS or iOS build. The resulting algorithm is a generic version of NeuralHash rather than the specific algorithm that will be used once the proposed CSAM system is deployed — but it still gives a general idea of the strengths and weaknesses of the algorithm.
“Early tests show that it can tolerate image resizing and compression, but not cropping or rotations,” Ygvar wrote on Reddit, sharing the new code. “Hope this will help us understand NeuralHash algorithm better and know its potential issues before it’s enabled on all iOS devices.”
Shortly afterward, a user called Cory Cornelius produced a collision in the algorithm: two images that generate the same hash. It’s a significant finding, although Apple says additional protections in its CSAM system will prevent it from being exploited.
On August 5th, Apple introduced a new system for stopping child-abuse imagery on iOS devices. Under the new system, iOS will check locally stored files against hashes of child abuse imagery, as generated and maintained by the National Center for Missing and Exploited Children (NCMEC). The system contains numerous privacy safeguards, limiting scans to iCloud photos and setting a threshold of as many as 30 matches found before an alert is generated. Still, privacy advocates remain concerned about the implications of scanning local storage for illegal material, and the new finding has heightened concerns about how the system could be exploited.
In a call with reporters regarding the new findings, Apple said its CSAM-scanning system had been built with collisions in mind, given the known limitations of perceptual hashing algorithms. In particular, the company emphasized a secondary server-side hashing algorithm, separate from NeuralHash, the specifics of which are not public. If an image that produced a NeuralHash collision were flagged by the system, it would be checked against the secondary system and identified as an error before reaching human moderators.
Even without that additional check, it would require extraordinary efforts to exploit the collision in practice. Generally, collision attacks allow researchers to find identical inputs that produce the same hash. In Apple’s system, this would mean generating an image that sets off the CSAM alerts even though it is not a CSAM image, since it produces the same hash as an image in the database. But actually generating that alert would require access to the NCMEC hash database, generating more than 30 colliding images, and then smuggling all of them onto the target’s phone. Even then, it would only generate an alert to Apple and NCMEC, which would easily identify the images as false positives.
A proof-of-concept collision is often disastrous for crytographic hashes, as in the case of the SHA-1 collision in 2017, but perceptual hashes like NeuralHash are known to be more collision-prone. And while Apple expects to make changes from the generic NeuralMatch algorithm currently present in iOS, the broad system is likely to remain in place.
Still, the finding will is unlikely to quiet calls for Apple to abandon its plans for on-device scans, which have continued to escalate in the weeks following the announcement. On Tuesday, the Electronic Frontier Foundation launched a petition calling on Apple to drop the system, under the title “Tell Apple: Don’t Scan Our Phones.” As of press time, it has garnered more than 1,700 signatures.
Updated 10:53AM ET: Changed headline and copy to more accurately reflect known weaknesses of perceptual hash systems in general.
Updated 1:20PM ET: Added significant details throughout after receiving further information from Apple.