Google Safebrowsing

elapsed · Apr 12, 2016

Originally I had disabled this feature on some claims that, in Chrome, the way it works is by submitting a hashed version of the URLs you visit to Google to check if they are malicious.

However, I then stumbled across this: https://developers.google.com/safe-browsing/developers_guide_v3#Audience

It appears that while everything is hashed, lists are still downloaded and checked locally. So to me it seems like using Google Safebrowsing from a privacy perspective is perfectly fine. I'm not sure how the idea that URLs were sent to Google came around (maybe because other companies do that?).

Have others disabled this feature due to this false information? It looks like I'll be re-enabling it.

mirimir · Apr 12, 2016

I always disable it.

Minimalist · Apr 12, 2016

I have it disabled. You can check Chrome's privacy whitepaper here: https://www.google.com/chrome/browser/privacy/whitepaper.html. Check Safe browsing protection section. Too much data is transferred to Google for my liking.

TheWindBringeth · Apr 12, 2016

Also back up and review https://developers.google.com/safe-browsing/. Maybe take a look at the code and observe traffic while you exercise the feature in purposeful ways. To get a better feel and just in case builds differ from documentation. Watch out for unique persistent identifiers in cookies/URLs/etc sent during SafeBrowsing requests if you care about the SafeBrowsing feature making you vulnerable to cross-context tracking. All URLs may now be HTTPS, but check that to be sure.

Even if an implementation always avoided full URLs and persistent identifiers, I'm not sure this type of hashing approach would be considered perfectly safe from a privacy POV. IIRC... it has been awhile... the client will contact (inform) the SafeBrowsing provider when there is a prefix match (to request corresponding full hashes). Technically, the provider won't know for sure what URL the client was attempting to load. However, knowing which prefixes a client hits may be enough to give the provider a "browsing appears consistent with X, but we don't know for sure" sense of what the user is doing. For example, if the prefix tables contained prefixes corresponding to URLs that a person interested in X would be highly likely to visit, and a particular client hit enough of those prefixes, the SafeBrowsing provider could infer with TBD probability that the user is interested in X. Or if the SafeBrowsing provider wanted to rule-in that someone visits a specific URL, they could seed a prefix table with that URL's prefix. I'm not sure how practical/concerning this is given the characteristics of the system in question, but figured it worth mentioning.

elapsed · Apr 12, 2016

Minimalist said: ↑

I have it disabled. You can check Chrome's privacy whitepaper here: https://www.google.com/chrome/browser/privacy/whitepaper.html. Check Safe browsing protection section. Too much data is transferred to Google for my liking.
Click to expand...

Thanks. That basically confirms what I said in my first post. The only time data is sent to Google is if "usage statistics" is turned on or "automatically report details of possible security incidents to Google".

If you only have Safe Browsing enabled, it seems the only time you will contact Google is 1) To download the list and 2) If you browse to a bad website on your local list, it will send a partial URL hash to Google to 100% confirm it is malicious. Nothing is sent when browsing harmless websites.

Some good points made, good information to make your own choices.

TheWindBringeth · Apr 12, 2016

elapsed said: ↑

If you only have Safe Browsing enabled, it seems the only time you will contact Google is 1) To download the list and 2) If you browse to a bad website on your local list, it will send a partial URL hash to Google to 100% confirm it is malicious. Nothing is sent when browsing harmless websites.
Click to expand...

The system sends requests to help determine if something is or isn't harmless. So that part in bold is technically incorrect. It appears that at least the following may be sent in harmless scenarios:

Potential malicious URL check: URL fingerprint (the first 32 bits of a SHA-256 hash of the URL)

Potential phish check: a subset of likely phishing and social engineering terms found on the page

Potential dangerous file check: information about the full URL of the site or file download, all related referrers and redirects, code signing certificates, file hashes, and file header information.

Suspected settings tampering check: the URL of the last downloaded potentially dangerous file and information about the nature of the unexpected changes to the Safe Browsing service

The periodical scanning of computer check might also phone home file metadata for files the user considers harmless, but how that check works and how definitive the rules are isn't immediately clear to me.

How often information is sent in harmless scenarios, and how much a user might object to that, would be a fair question.

Reference: the whitepaper section on Safe Browsing.

elapsed · Apr 12, 2016

TheWindBringeth said: ↑

The system sends requests to help determine if something is or isn't harmless. So that part in bold is technically incorrect. It appears that at least the following may be sent in harmless scenarios:

Potential malicious URL check: URL fingerprint (the first 32 bits of a SHA-256 hash of the URL)

Potential phish check: a subset of likely phishing and social engineering terms found on the page

Potential dangerous file check: information about the full URL of the site or file download, all related referrers and redirects, code signing certificates, file hashes, and file header information.

Suspected settings tampering check: the URL of the last downloaded potentially dangerous file and information about the nature of the unexpected changes to the Safe Browsing service

The periodical scanning of computer check might also phone home file metadata for files the user considers harmless, but how that check works and how definitive the rules are isn't immediately clear to me.

How often information is sent in harmless scenarios, and how much a user might object to that, would be a fair question.

Reference: the whitepaper section on Safe Browsing.
Click to expand...

No information is sent in harmless scenarios as long as you don't have the options enabled that I referenced in my earlier post. To repeat: A partial hash of the website you're on will be sent to Google only if that website is on your local list of bad websites. Basically, this is a failsafe in case your local list is out of date, to prevent flagging a website that may have "cleaned up" their issues since you downloaded the list.

TheWindBringeth · Apr 13, 2016

elapsed said: ↑

No information is sent in harmless scenarios as long as you don't have the options enabled that I referenced in my earlier post.
Click to expand...

Which options? Do you mean "usage statistics" and "automatically report details of possible security incidents to Google"? If so, a careful parsing of descriptions combined with my understand of how this works leads me to believe that those are like the "community feedback" option in cloud AVs. When enabled, they send additional information... over and above any that is normally sent... but disabling them doesn't eliminate what is normally sent. However, we can come back to these later. For the remainder of my message, lets assume they are disabled and out of the way.

elapsed said: ↑

To repeat: A partial hash of the website you're on will be sent to Google only if that website is on your local list of bad websites. Basically, this is a failsafe in case your local list is out of date, to prevent flagging a website that may have "cleaned up" their issues since you downloaded the list.
Click to expand...

Yes, you are to send the partial hash (prefix) in order to verify that a full hash entry is still valid. However, there is another case. Several passages from developers_guide_v3 to get oriented:

Google publishes phishing, malware, and unwanted software data in three separate blacklists (googpub-phish-shavar, goog-malware-shavar, and goog-unwanted-shavar). Each is a list of SHA-256 hash values that are usually truncated to a 4-byte hash prefix.
Click to expand...

Chunks that contain hash values do not necessarily contain the full hash; they are often only a prefix for that hash. A second request (a gethash request) can be issued to get the list of full-length hashes that start with the prefix
Click to expand...

For the "shavar" list format, hash prefixes are used to reduce bandwidth.
Click to expand...

IOW, the 256-bit hash that specifies a malicious URL is normally not available on the client... just the first 32-bits are. If the client wants to load a URL that matches a prefix, but it doesn't yet have the full hash(es) for that prefix to check against, it has to send the prefix to the server in order to get the full-length hashes. It cashes the full length hashes and compares the full hash of the URL it is loading to those full length hashes it received. One may match... or NONE may match. In the later case, it has sent a prefix to the server in a harmless scenario. It would also send a prefix to the server in a harmless scenario if a locally available full hash was no longer considered valid (expired).

Google has control of this because it decides where/when to distribute prefixes vs full hashes, and it can change the behavior if/when it wants to. Unless you can point to an official "we're only distributing full length hashes from now on", we should assume the developer's guide remains correct. However, if someone has the time (I won't for a week or two, and then would only test Firefox), they can take a look at the lists and/or run some sniffing experiments looking for gethash requests. Back when I was playing around with FF Safe Browsing, I'd clear the local Safe Browsing database to eliminate cashed full hashes, then give it some time to repopulate, then do some browsing and look for them. One of the things that creeped me a bit was seeing gethash requests at financial institutions.

Minimalist · Apr 13, 2016

Don't forget this:
"In addition to the URL check described above, Chrome also conducts client-side checks. If a website looks suspicious, it sends a subset of likely phishing and social engineering terms found on the page to Google to obtain additional information available from Google's servers on whether the the website should be considered malicious."

TheWindBringeth · Apr 15, 2016

Stumbled across this:

A Privacy Analysis of Google and Yandex Safe Browsing
https://hal.inria.fr/hal-01120186

Log in or Sign up

Google Safebrowsing

elapsed Registered Member

mirimir Registered Member

Minimalist Registered Member

TheWindBringeth Registered Member

elapsed Registered Member

TheWindBringeth Registered Member

elapsed Registered Member

TheWindBringeth Registered Member

Minimalist Registered Member

TheWindBringeth Registered Member

Log in or Sign up

Google Safebrowsing

elapsed Registered Member

mirimir Registered Member

Minimalist Registered Member

TheWindBringeth Registered Member

elapsed Registered Member

TheWindBringeth Registered Member

elapsed Registered Member

TheWindBringeth Registered Member

Minimalist Registered Member

TheWindBringeth Registered Member

Useful Searches