Mozilla to integrate tracking protection in future Firefox versions

Discussion in 'privacy technology' started by MrBrian, Aug 14, 2014.

Thread Status:
Not open for further replies.
  1. MrBrian

    MrBrian Registered Member

    Joined:
    Feb 24, 2008
    Posts:
    6,032
    Location:
    USA
  2. funkydude

    funkydude Registered Member

    Joined:
    Apr 5, 2004
    Posts:
    6,853
    Built in EasyPrivacy list, basically.
     
  3. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,086
    For those interested in the related bugs:
    https://bugzilla.mozilla.org/showdependencytree.cgi?id=1029886&hide_resolved=0

    Although the concept is in some ways similar to an ABP/EasyPrivacy setup, the Mozilla approach appears significantly different. Mozilla seems to be going with the SafeBrowsing hash-based code and model: periodically retrieve, and update a local list of, [partial] hashes, when a partial hash matches send that to the server to get a list of corresponding full hashes, then check against those full hashes.

    Looks like the server communications are, for now, via https://tracking.services.mozilla.com/..., which is Amazon AWS hosted. At this point I'm not sure if one might encounter another safebrowsing separate cookie jar cookie issue. Looks like they are intending to never block top level requests. I think it is safe to assume it will be ultra conservative. The model doesn't lend itself to powerful user-defined rules that can fill in the gaps of coverage.

    ATM I don't see it replacing anything, but it would seem to create a leverage point over Google and potentially one that Mozilla could use to effectively seek pay-for-exclusion.

    BTW: looks like SafeBrowsing v3 was rolled out:

    https://developers.google.com/safe-browsing/

    The Wayback Machine suggests the page was changed between June 25 and August 3.
     
  4. mirimir

    mirimir Registered Member

    Joined:
    Oct 1, 2011
    Posts:
    6,029
    Under "Safe Browsing API v3 advantages", I see "Privacy: API users exchange data with the server using hashed URLs, so the server never knows the actual URLs queried by the clients."

    I don't understand that. Does the server report entirely generic stuff, that can't be related to the naked URL, perhaps in some source database? I'm concerned that, although "the server [being queried] never knows the actual URLs queried by the clients", some Google server may in fact know that. Anyone know?
     
  5. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,086
    I expect that Google would know the URLs that correspond to the hashes found in its Safe Browsing database. I imagine Google to be the one that is computing those very hashes, and even if it is getting hash only data from somewhere, it could still compute hashes for any URLs it wishes to (maybe all those URLs it indexes for search engine purposes). I think we must assume that Google is in a position to perform some hashed URL -> actual URL lookups.

    However, as I understand the system to work (mostly from trying to understand released Firefox behavior), a SafeBrowsing API client should never send full (SHA-256, 32 byte) hashes to the query server. I think the Safe Browsing API v3 description is closest to what Firefox is doing. Mozilla abandoned the older "separate host and URL hashing" approach back in 2012 or whatever. In short, I believe that [when the local database doesn't contain enough information] the client will perform a query and send the 4 most-significant-bytes (prefix) of the URL hash to the server. The server replies with a list of full hashes that match the hash prefix it received. Then the client examines the response list to determine if it contains the full hash that is of interest. Note: Firefox also adds "noise" to the get full hash query. It adds a number (preference: urlclassifier.gethashnoise) of randomly generated hash prefixes to the query with the intent that the server won't know which of the (few) prefixes is the "correct" one.

    The general intent seems to be that the server only learns of (some, short) URL hash prefixes that match client activity. Given a large hashable URL space and the potential for full hash collisions at that level and the sending of small partial hashes, I don't think the server would know... with significant certainty... which URL the client was checking. Put another way, I think the server operator could say "what I've received is consistent with a visit to X", but they would have to acknowledge that it might in fact not have been X that was being checked and there are many other URLs that it could have been.

    If a Safe Browsing API server operator wanted to try to zero in on the actual URLs being visited by the client, I suppose they might do things like:

    - Compute full and partial hashes for large numbers of known URLs, particularly those most likely to be visited by clients. I don't know what the hash distribution would be like and how many URLs would match a given hash prefix, but perhaps this could be used to assign probabilities based on URL popularity.
    - Be selective about which URLs and hashes are databased, and/or selective about whether the hashes distributed via client updates contain hash prefixes or full hashes, in order to try to make a get full hash query happen for certain URLs. Depending on how client side caching works... whether the client would, for awhile at least, stop requesting a full hash when it as already seen it returned by the server... perhaps the server could try to sniff out the full hash the client is interested in by varying its response to the get full hash requests.
    - Assuming clients can deal with a local database that contains something other than full hashes and 4 byte prefixes, distribute client updates that contain larger prefixes so that get full hash queries communicate larger prefixes and thus more of the full hash of interest.
    - Combine information from other sources. Google, for example, would have considerable visibility into the behavior of many clients (based on IP Address, account logins, third-party advertising hooks, so forth). The combined information... knowledge of where clients are, what types of URLs clients tend to request, etc... might help to assign probabilities.

    I like this particular form of Safe Browsing usage and URL checking much more than other approaches I've seen. However, I can't say I'm entirely comfortable with it. I'd much prefer a zero query system.

    Note: this is a complete rewrite of my original message, as mentioned below.
     
    Last edited: Aug 16, 2014
  6. mirimir

    mirimir Registered Member

    Joined:
    Oct 1, 2011
    Posts:
    6,029
  7. Nebulus

    Nebulus Registered Member

    Joined:
    Jan 20, 2007
    Posts:
    1,582
    Location:
    European Union
    Hashing an URL is not a guarantee that it is private. There is nothing stopping the server to keep a list of hashes for certain URLs and then compare them with the ones coming from the client(s). It's the same problem as the so-called "anonymization" of private data by hashing.
     
  8. mirimir

    mirimir Registered Member

    Joined:
    Oct 1, 2011
    Posts:
    6,029
    Yes, it seems like hand waving to me.
     
  9. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,086
    @mirimir: I wasn't happy with my earlier reply to you. I think it misrepresented what the server would know, or at least know it knows. I decided to edit and rewrite my message.
     
  10. mirimir

    mirimir Registered Member

    Joined:
    Oct 1, 2011
    Posts:
    6,029
    Thanks :)

    Why doesn't the browser client just download the entire database, with periodic updates, and do all searching locally, without logging?

    Or would that use too much bandwidth to be practical?
     
  11. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,086
    I don't know the answers to those questions. A full SHA-256 hash is only 32 bytes, so you could stuff very many entries into a moderately-sized download and local table. I think we'd need various numbers (total entries, entries added/removed each period, etc) to get a better sense of it though.

    IIRC, Mozilla changed the way the Safe Browsing client worked because of mobile device constraints. If one is intent on using the same approach on both resource-rich desktops and resource-poor mobile devices, that could limit your options.
     
  12. mirimir

    mirimir Registered Member

    Joined:
    Oct 1, 2011
    Posts:
    6,029
    Thanks. Naively, I'm guessing that there's considerable redundancy in a Safe Browsing database. Normalization would remove much of that, and compression would remove much of the rest, I suspect. That adds CPU load, but that shouldn't be an issue, even on modern smartphones.

    But hey, what do I know ;)
     
  13. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,086
    If we were to require that all remote server lookups cause an annoying delay and/or result in a developer [financial] penalty of some sort, that might provide the motivation necessary to have purer client-side approaches explored :)

    However, even if all concerned were OK with this hash based tracking protection being done without server lookups, would that be the best approach? IMO, the ability to look over human readable rules and immediately see what will/won't be blocked is extremely helpful. If the rules were just a table of hashes, you'd have to actually test URLs and portions of said to see if there is a match. Although, if the master URL+Hash table were published somewhere that could be helpful.
     
  14. mirimir

    mirimir Registered Member

    Joined:
    Oct 1, 2011
    Posts:
    6,029
    It's a tough issue.

    It's come up on tor-talk recently in a scaling context. As the network grows, clients must download ever more information about relays.
     
  15. Nekomaou

    Nekomaou Registered Member

    Joined:
    Aug 19, 2014
    Posts:
    5
    Disconnect + Adblockedge :D
     
Loading...
Thread Status:
Not open for further replies.