More

NTroy · on Aug 8, 2021

If Apple is to keep their word about guaranteeing the privacy of non-CSAM photos (which this whole discussion is about them not doing a very good job of), then they would only be able to do that with photos stored in iCloud because of this technical specification as to how the identification process works. That being said, other photos across your device are still monitored in a different way. For example Apple will scan photos that you send or receive via iMessage to automatically detect if they're nudes, and if you're underage, they will block them/send a notification to your parents.

zimpenfish · on Aug 8, 2021

> Apple will scan photos that you send or receive via iMessage to automatically detect if they're nudes

Only if they're being sent to or from a minor, I thought?

NTroy · on Aug 8, 2021

Yes, this ^^^^^^

> The proposed attack on Apple's protocol doesn't work.

With all due respect, I think you may have misunderstood the proposed attack @jonathanmayer, as what @jobigoud said is correct.

NTroy · on Aug 8, 2021

The question doesn't presume that, as the the secret for blinding the CSAM database would only be helpful if a third party were also looking to see which accounts contained CSAM.

In this case, the question assumes that an attacker would more or less be creating their own database of hashes and derived keys (to search for and decrypt known photos and associate them with user accounts, or to bruteforce unknown photos), and would therefore have no need to worry about acquiring the key used for blinding the CSAM hash database.

shuckles · on Aug 8, 2021

> What's to stop an attacker from generating a NeuralHash of popular memes, deriving a key, then bruteforcing the leaked data until it successfully decrypts an entry, thus verifying the contents within a specific user's cloud photo library, and degrading their level of privacy?

Decrypting vouchers requires the server blinding key and the NeuralHash derived metadata of the input image (technical summary page 10, Bellare Fig. 1 line 18). This attacker only has the latter.

NTroy · on Aug 8, 2021

> For CSAM matches, the cryptographic header in the voucher combines with the server-side blinding secret (that was used to blind the known CSAM database at setup time) to successfully decrypt the outer layer of encryption.

In the text you referenced, it specifically says that the blinding key would be needed to decrypt vouchers which are CSAM matches. This is because Apple set up their CSAM database in a blinded manner. Therefore to access a hash from the database from which to derive a decryption key, Apple would need the blinding key to first decrypt that hash value.

However, and attacker would be generating their own (presumably unblinded) database, and therefore wouldn't need to access Apple's blinding key.

shuckles · on Aug 8, 2021

I’m a little confused. The vouchers you are trying to decrypt have already been generated. How does it matter if the attacker can decrypt vouchers from a database they created but was not used by the vouchers in the breached data?

NTroy · on Aug 8, 2021

It is my understanding that the vouchers are only encrypted with a key derived from the NeuralHash of the photo. Therefore an attacker would only need to find a matching NeuralHash, to decrypt the voucher.

Apple needs the blinding key, because they encrypt their list of NeuralHashes hashes first, so that others cannot see exactly which CSAM hashes they're testing against. Therefore they first need to decrypt their own database in order to get the corresponding hash value from which to derive the decryption key.

shuckles · on Aug 8, 2021

That’s wrong. Dec(H′(\Hat{S}_j), ct_j) requires \alpha the server secret to determine the decryption key using Boneh’s notation of the PSI system. Or looking from the other direction, the encryption uses both w (NeuralHash) and L (\alpha G, for server secret \alpha).

NTroy · on Aug 8, 2021

I believe the math you outline above refers to this step, located on page 7 of the Technical Summary:

> Next, the client creates a cryptographic safety voucher that has the following properties: If the user image hash matches the entry in the known CSAM hash list, then the NeuralHash of the user image exactly transforms to the blinded hash if it went through the series of transformations done at database setup time. Based on this property, the server will be able to use the cryptographic header (derived from the NeuralHash) and using the server-side secret, can compute the derived encryption key and successfully decrypt the associated payload data.

Is that correct?

If so, then I agree that it is true that in the PSI system the server secret is completely necessary as part of the decryption process in order to decrypt the matching hash in the pointed-to location in the table. That being said, looking only at the information encrypted by the client, I don't think the server secret comes into play, right?

If I'm misunderstanding, and you're confident that an attacker would have to have the server secret to decrypt a photo (even if they already knew that photo's NeuralHash and were able to defeat the internal layer of encryption), then I definitely recommend posting a well-outlined answer to the Cryptography Stack Exchange, as that would be super helpful!

shuckles · on Aug 8, 2021

Yes the server secret comes into play looking at the encryption key on the client. This is the value L, which you can think of as a “public key” in usual ECC schemes.

It’s not useful to talk about defeating the “internal” layer of encryption separately from the “outer” layer because vouchers should be stored as generated by clients at rest. The database leak should not include any unwrapped vouchers (that would be like using a password hash but storing plaintext passwords anyways).

NTroy · on Dec 14, 2020

Sure!

- Both

- Yes. I wish that weren't the case, but considering that I can't find a single provider so far who respects end user privacy, I would expect for one who does so to charge more.

- No. Ideally, the provider wouldn't keep any logs, so they wouldn't be aware that the same client was making a subsequent request.

- I guess it's completely up to the provider. As this would be the first privacy-respecting provider, they'll probably have to go all-in with privacy, if they wish to gain traction and popularity within the community. So no, I'd personally hope that they wouldn't do that. However if this were an existing provider hoping to start becoming more private, yet they also have current customers for whom these features matter, then I guess workarounds like this are better than not being able to transition to better privacy in general. Or, even better, offer features like this for customers who need it, but allow them to be disabled from account settings for those who don't want it.

- To me, personally, I do not care at all about metrics. If a client is querying DNS, then it's because they're about to connect to one of my services (leaving cyberattacks out of the picture for the moment), at which point if I wanted to (which I don't) I could collect metrics. That being said, I don't think that, for those who want it, collecting generalized metrics at the country level, for example, would be unreasonable. And other metrics, such as DNS routing based on server "health checks" or number of resolution errors, etc. aren't bad either. It's just imperative that when the company collects these generalized metrics, they have a clear and perfect process of purging the metrics of all PII, and only saving the country name from which the request originated, for example.

No problem!

samgaw · on Dec 14, 2020

I really appreciate you taking the time to give feedback. It's great to hear from people that have a clear sense of their priorities.

NTroy · on Dec 14, 2020

Yeah, that's what I currently do. However as traffic grows in both volume and origin, and can be hard (and expensive) to keep up. That's why a privacy-respecting provider who already has the infrastructure would be ideal.

parliament32 · on Dec 14, 2020

Unfortunately, every time you use a hosted service it's basically guaranteed you're not going to get any privacy (even if they claim otherwise, see: the amount of people who get v& using no-log vpn providers). Running it yourself is the only way to have certain guarantees about logs and whatnot.

NTroy · on Dec 14, 2020

That's a good point. It wouldn't be the first time that providers (most notably VPN providers) have lied about their logging policies with devastating consequences for the end user while they get off scott free. Getting something in writing would be ideal, however I'm not sure I'm big enough yet to work out a custom deal like that with a major provider. Thank you for pointing that out, though.

Bender · on Dec 14, 2020

For logging details in a contract, the size of your company does not matter. It's worth the 2 or 3 hours of time paid to a lawyer to get that right if it is important to you or the needs of your business. Where size matters is the discounted pricing. Just avoid getting locked into 3+ year contracts and you should be able to adjust the pricing as you grow your business.

NTroy · on Dec 14, 2020

Yeah, currently I run my own DNS server. However, as traffic grows and so does your customer base and server locations, it would be nice to use a dedicated DNS provider, as they'll already have the infrastructure set up to handle a significant capacity of DNS resolutions quickly, as well as servers around the globe to do so efficiently. In addition to the speed, it's usually also less expensive than setting up and maintaining multiple DNS servers of your own around the world.

NTroy · on Dec 14, 2020

A pricing scheme that isn't too far off of what you'd find from most other managed DNS providers. Obviously I wouldn't mind paying more for the "privacy" aspect, as long as the price isn't ridiculous. I don't have any, set numbers, however.

NTroy · on Nov 23, 2020

https://GitHub.com/P5vc

NTroy · on Sept 9, 2020

Their code: https://GitHub.com/P5vc