Google Safebrowsing

Discussion in 'privacy technology' started by funkydude, Apr 12, 2016.

  1. funkydude

    funkydude Registered Member

    Joined:
    Apr 5, 2004
    Posts:
    6,852
    Originally I had disabled this feature on some claims that, in Chrome, the way it works is by submitting a hashed version of the URLs you visit to Google to check if they are malicious.

    However, I then stumbled across this: https://developers.google.com/safe-browsing/developers_guide_v3#Audience

    It appears that while everything is hashed, lists are still downloaded and checked locally. So to me it seems like using Google Safebrowsing from a privacy perspective is perfectly fine. I'm not sure how the idea that URLs were sent to Google came around (maybe because other companies do that?).

    Have others disabled this feature due to this false information? It looks like I'll be re-enabling it.
     
  2. mirimir

    mirimir Registered Member

    Joined:
    Oct 1, 2011
    Posts:
    6,029
    I always disable it.
     
  3. Minimalist

    Minimalist Registered Member

    Joined:
    Jan 6, 2014
    Posts:
    5,066
  4. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,085
    Also back up and review https://developers.google.com/safe-browsing/. Maybe take a look at the code and observe traffic while you exercise the feature in purposeful ways. To get a better feel and just in case builds differ from documentation. Watch out for unique persistent identifiers in cookies/URLs/etc sent during SafeBrowsing requests if you care about the SafeBrowsing feature making you vulnerable to cross-context tracking. All URLs may now be HTTPS, but check that to be sure.

    Even if an implementation always avoided full URLs and persistent identifiers, I'm not sure this type of hashing approach would be considered perfectly safe from a privacy POV. IIRC... it has been awhile... the client will contact (inform) the SafeBrowsing provider when there is a prefix match (to request corresponding full hashes). Technically, the provider won't know for sure what URL the client was attempting to load. However, knowing which prefixes a client hits may be enough to give the provider a "browsing appears consistent with X, but we don't know for sure" sense of what the user is doing. For example, if the prefix tables contained prefixes corresponding to URLs that a person interested in X would be highly likely to visit, and a particular client hit enough of those prefixes, the SafeBrowsing provider could infer with TBD probability that the user is interested in X. Or if the SafeBrowsing provider wanted to rule-in that someone visits a specific URL, they could seed a prefix table with that URL's prefix. I'm not sure how practical/concerning this is given the characteristics of the system in question, but figured it worth mentioning.
     
  5. funkydude

    funkydude Registered Member

    Joined:
    Apr 5, 2004
    Posts:
    6,852
    Thanks. That basically confirms what I said in my first post. The only time data is sent to Google is if "usage statistics" is turned on or "automatically report details of possible security incidents to Google".

    If you only have Safe Browsing enabled, it seems the only time you will contact Google is 1) To download the list and 2) If you browse to a bad website on your local list, it will send a partial URL hash to Google to 100% confirm it is malicious. Nothing is sent when browsing harmless websites.

    Some good points made, good information to make your own choices.
     
  6. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,085
    The system sends requests to help determine if something is or isn't harmless. So that part in bold is technically incorrect. It appears that at least the following may be sent in harmless scenarios:
    1. Potential malicious URL check: URL fingerprint (the first 32 bits of a SHA-256 hash of the URL)
    2. Potential phish check: a subset of likely phishing and social engineering terms found on the page
    3. Potential dangerous file check: information about the full URL of the site or file download, all related referrers and redirects, code signing certificates, file hashes, and file header information.
    4. Suspected settings tampering check: the URL of the last downloaded potentially dangerous file and information about the nature of the unexpected changes to the Safe Browsing service
    The periodical scanning of computer check might also phone home file metadata for files the user considers harmless, but how that check works and how definitive the rules are isn't immediately clear to me.

    How often information is sent in harmless scenarios, and how much a user might object to that, would be a fair question.

    Reference: the whitepaper section on Safe Browsing.
     
    Last edited: Apr 12, 2016
  7. funkydude

    funkydude Registered Member

    Joined:
    Apr 5, 2004
    Posts:
    6,852
    No information is sent in harmless scenarios as long as you don't have the options enabled that I referenced in my earlier post. To repeat: A partial hash of the website you're on will be sent to Google only if that website is on your local list of bad websites. Basically, this is a failsafe in case your local list is out of date, to prevent flagging a website that may have "cleaned up" their issues since you downloaded the list.
     
  8. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,085
    Which options? Do you mean "usage statistics" and "automatically report details of possible security incidents to Google"? If so, a careful parsing of descriptions combined with my understand of how this works leads me to believe that those are like the "community feedback" option in cloud AVs. When enabled, they send additional information... over and above any that is normally sent... but disabling them doesn't eliminate what is normally sent. However, we can come back to these later. For the remainder of my message, lets assume they are disabled and out of the way.
    Yes, you are to send the partial hash (prefix) in order to verify that a full hash entry is still valid. However, there is another case. Several passages from developers_guide_v3 to get oriented:
    IOW, the 256-bit hash that specifies a malicious URL is normally not available on the client... just the first 32-bits are. If the client wants to load a URL that matches a prefix, but it doesn't yet have the full hash(es) for that prefix to check against, it has to send the prefix to the server in order to get the full-length hashes. It cashes the full length hashes and compares the full hash of the URL it is loading to those full length hashes it received. One may match... or NONE may match. In the later case, it has sent a prefix to the server in a harmless scenario. It would also send a prefix to the server in a harmless scenario if a locally available full hash was no longer considered valid (expired).

    Google has control of this because it decides where/when to distribute prefixes vs full hashes, and it can change the behavior if/when it wants to. Unless you can point to an official "we're only distributing full length hashes from now on", we should assume the developer's guide remains correct. However, if someone has the time (I won't for a week or two, and then would only test Firefox), they can take a look at the lists and/or run some sniffing experiments looking for gethash requests. Back when I was playing around with FF Safe Browsing, I'd clear the local Safe Browsing database to eliminate cashed full hashes, then give it some time to repopulate, then do some browsing and look for them. One of the things that creeped me a bit was seeing gethash requests at financial institutions.
     
    Last edited: Apr 13, 2016
  9. Minimalist

    Minimalist Registered Member

    Joined:
    Jan 6, 2014
    Posts:
    5,066
    Don't forget this:
    "In addition to the URL check described above, Chrome also conducts client-side checks. If a website looks suspicious, it sends a subset of likely phishing and social engineering terms found on the page to Google to obtain additional information available from Google's servers on whether the the website should be considered malicious."
     
  10. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,085
Loading...