Mozilla to integrate tracking protection in future Firefox versions

MrBrian · Aug 14, 2014

From http://www.ghacks.net/2014/08/14/mo...acking-protection-in-future-firefox-versions/:

Mozilla is working on a tracking protection feature in Firefox currently that goes a step further than this. It basically enforces DNT on the user side of things by blocking known tracking scripts on the Internet.
Click to expand...

elapsed · Aug 14, 2014

Built in EasyPrivacy list, basically.

TheWindBringeth · Aug 14, 2014

For those interested in the related bugs:
https://bugzilla.mozilla.org/showdependencytree.cgi?id=1029886&hide_resolved=0

Although the concept is in some ways similar to an ABP/EasyPrivacy setup, the Mozilla approach appears significantly different. Mozilla seems to be going with the SafeBrowsing hash-based code and model: periodically retrieve, and update a local list of, [partial] hashes, when a partial hash matches send that to the server to get a list of corresponding full hashes, then check against those full hashes.

Looks like the server communications are, for now, via https://tracking.services.mozilla.com/..., which is Amazon AWS hosted. At this point I'm not sure if one might encounter another safebrowsing separate cookie jar cookie issue. Looks like they are intending to never block top level requests. I think it is safe to assume it will be ultra conservative. The model doesn't lend itself to powerful user-defined rules that can fill in the gaps of coverage.

ATM I don't see it replacing anything, but it would seem to create a leverage point over Google and potentially one that Mozilla could use to effectively seek pay-for-exclusion.

BTW: looks like SafeBrowsing v3 was rolled out:

https://developers.google.com/safe-browsing/

The Wayback Machine suggests the page was changed between June 25 and August 3.

mirimir · Aug 14, 2014

TheWindBringeth said: ↑

BTW: looks like SafeBrowsing v3 was rolled out:

https://developers.google.com/safe-browsing/

The Wayback Machine suggests the page was changed between June 25 and August 3.
Click to expand...

Under "Safe Browsing API v3 advantages", I see "Privacy: API users exchange data with the server using hashed URLs, so the server never knows the actual URLs queried by the clients."

I don't understand that. Does the server report entirely generic stuff, that can't be related to the naked URL, perhaps in some source database? I'm concerned that, although "the server [being queried] never knows the actual URLs queried by the clients", some Google server may in fact know that. Anyone know?

TheWindBringeth · Aug 16, 2014

mirimir said: ↑

I don't understand that. Does the server report entirely generic stuff, that can't be related to the naked URL, perhaps in some source database? I'm concerned that, although "the server [being queried] never knows the actual URLs queried by the clients", some Google server may in fact know that. Anyone know?
Click to expand...

I expect that Google would know the URLs that correspond to the hashes found in its Safe Browsing database. I imagine Google to be the one that is computing those very hashes, and even if it is getting hash only data from somewhere, it could still compute hashes for any URLs it wishes to (maybe all those URLs it indexes for search engine purposes). I think we must assume that Google is in a position to perform some hashed URL -> actual URL lookups.

However, as I understand the system to work (mostly from trying to understand released Firefox behavior), a SafeBrowsing API client should never send full (SHA-256, 32 byte) hashes to the query server. I think the Safe Browsing API v3 description is closest to what Firefox is doing. Mozilla abandoned the older "separate host and URL hashing" approach back in 2012 or whatever. In short, I believe that [when the local database doesn't contain enough information] the client will perform a query and send the 4 most-significant-bytes (prefix) of the URL hash to the server. The server replies with a list of full hashes that match the hash prefix it received. Then the client examines the response list to determine if it contains the full hash that is of interest. Note: Firefox also adds "noise" to the get full hash query. It adds a number (preference: urlclassifier.gethashnoise) of randomly generated hash prefixes to the query with the intent that the server won't know which of the (few) prefixes is the "correct" one.

The general intent seems to be that the server only learns of (some, short) URL hash prefixes that match client activity. Given a large hashable URL space and the potential for full hash collisions at that level and the sending of small partial hashes, I don't think the server would know... with significant certainty... which URL the client was checking. Put another way, I think the server operator could say "what I've received is consistent with a visit to X", but they would have to acknowledge that it might in fact not have been X that was being checked and there are many other URLs that it could have been.

If a Safe Browsing API server operator wanted to try to zero in on the actual URLs being visited by the client, I suppose they might do things like:

- Compute full and partial hashes for large numbers of known URLs, particularly those most likely to be visited by clients. I don't know what the hash distribution would be like and how many URLs would match a given hash prefix, but perhaps this could be used to assign probabilities based on URL popularity.
- Be selective about which URLs and hashes are databased, and/or selective about whether the hashes distributed via client updates contain hash prefixes or full hashes, in order to try to make a get full hash query happen for certain URLs. Depending on how client side caching works... whether the client would, for awhile at least, stop requesting a full hash when it as already seen it returned by the server... perhaps the server could try to sniff out the full hash the client is interested in by varying its response to the get full hash requests.
- Assuming clients can deal with a local database that contains something other than full hashes and 4 byte prefixes, distribute client updates that contain larger prefixes so that get full hash queries communicate larger prefixes and thus more of the full hash of interest.
- Combine information from other sources. Google, for example, would have considerable visibility into the behavior of many clients (based on IP Address, account logins, third-party advertising hooks, so forth). The combined information... knowledge of where clients are, what types of URLs clients tend to request, etc... might help to assign probabilities.

I like this particular form of Safe Browsing usage and URL checking much more than other approaches I've seen. However, I can't say I'm entirely comfortable with it. I'd much prefer a zero query system.

Note: this is a complete rewrite of my original message, as mentioned below.

mirimir · Aug 15, 2014

@TheWindBringeth

Thanks

I'll continue to forgo Safe Browsing protection

Nebulus · Aug 15, 2014

mirimir said: ↑

Under "Safe Browsing API v3 advantages", I see "Privacy: API users exchange data with the server using hashed URLs, so the server never knows the actual URLs queried by the clients."
Click to expand...

Hashing an URL is not a guarantee that it is private. There is nothing stopping the server to keep a list of hashes for certain URLs and then compare them with the ones coming from the client(s). It's the same problem as the so-called "anonymization" of private data by hashing.

mirimir · Aug 15, 2014

Nebulus said: ↑

Hashing an URL is not a guarantee that it is private. There is nothing stopping the server to keep a list of hashes for certain URLs and then compare them with the ones coming from the client(s). It's the same problem as the so-called "anonymization" of private data by hashing.
Click to expand...

Yes, it seems like hand waving to me.

TheWindBringeth · Aug 16, 2014

@mirimir: I wasn't happy with my earlier reply to you. I think it misrepresented what the server would know, or at least know it knows. I decided to edit and rewrite my message.

mirimir · Aug 16, 2014

TheWindBringeth said: ↑

@mirimir: I wasn't happy with my earlier reply to you. I think it misrepresented what the server would know, or at least know it knows. I decided to edit and rewrite my message.
Click to expand...

Thanks

Why doesn't the browser client just download the entire database, with periodic updates, and do all searching locally, without logging?

Or would that use too much bandwidth to be practical?

TheWindBringeth · Aug 18, 2014

I don't know the answers to those questions. A full SHA-256 hash is only 32 bytes, so you could stuff very many entries into a moderately-sized download and local table. I think we'd need various numbers (total entries, entries added/removed each period, etc) to get a better sense of it though.

IIRC, Mozilla changed the way the Safe Browsing client worked because of mobile device constraints. If one is intent on using the same approach on both resource-rich desktops and resource-poor mobile devices, that could limit your options.

mirimir · Aug 18, 2014

Thanks. Naively, I'm guessing that there's considerable redundancy in a Safe Browsing database. Normalization would remove much of that, and compression would remove much of the rest, I suspect. That adds CPU load, but that shouldn't be an issue, even on modern smartphones.

But hey, what do I know

TheWindBringeth · Aug 19, 2014

If we were to require that all remote server lookups cause an annoying delay and/or result in a developer [financial] penalty of some sort, that might provide the motivation necessary to have purer client-side approaches explored

However, even if all concerned were OK with this hash based tracking protection being done without server lookups, would that be the best approach? IMO, the ability to look over human readable rules and immediately see what will/won't be blocked is extremely helpful. If the rules were just a table of hashes, you'd have to actually test URLs and portions of said to see if there is a match. Although, if the master URL+Hash table were published somewhere that could be helpful.

mirimir · Aug 19, 2014

It's a tough issue.

It's come up on tor-talk recently in a scaling context. As the network grows, clients must download ever more information about relays.

Nekomaou · Aug 20, 2014

elapsed said: ↑

Built in EasyPrivacy list, basically.
Click to expand...

Disconnect + Adblockedge

Log in or Sign up

Mozilla to integrate tracking protection in future Firefox versions

MrBrian Registered Member

elapsed Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

Nebulus Registered Member

mirimir Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

Nekomaou Registered Member

Log in or Sign up

Mozilla to integrate tracking protection in future Firefox versions

MrBrian Registered Member

elapsed Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

Nebulus Registered Member

mirimir Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

TheWindBringeth Registered Member

mirimir Registered Member

Nekomaou Registered Member

Useful Searches