URL checkers (separating the good from the bad, privacy wise)

Discussion in 'privacy technology' started by TheWindBringeth, Mar 30, 2012.

Thread Status:
Not open for further replies.
  1. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,171
    I think URL checkers (Malicious URL Blocking, Safe Browsing, SmartSceen Filter, etc) have the potential to be a severe threat to individual and corporate privacy. The degree to which a checker presents such a threat would depend on how it operates, what information it sends and to whom, who can read that information, etc. I feel like there is much to consider, and as I re-evaluate my own approaches this year, I think I should spend a fair amount of time on this issue.

    I think there is just one approach that doesn't create a privacy issue: blocking things based on "definitions" that are pulled via secure connection and without passing a user/instance unique identifier to the definitions provider. Given the potential for large numbers of threatening URLs, I feel this approach would likely have limitations in terms of coverage or granularity. Which, given multiple overlapping lines of defense and an ability to cope with false positives, might not be a show stopper. So I consider this ideal approach still on the table for me at least.

    At the other end of the spectrum I think there are a number of behaviors which when combined would create the most privacy issues in gross terms (I'll consider net terms later). That is what I'm trying to think through now. I've created the preliminary list below. Is there anything I'm missing?

    a) Sends full URLs (scheme, hostname, port, path, and query string)
    b) Fails to strip username:password@ if present
    c) Performs query via non-secured connection with no or weak encryption/authentication
    d) Is proxy based and thus has visibility into all URLs regardless of application being used
    e) Sends a user/instance unique ID with each query
    f) Sends regular cookies with the query
    g) Query response includes active content or otherwise presents a dynamic threat
    h) Sends referrer or other information about recently visited sites
    i) Performs queries on many schemes (not just HTTP, but others such as ftp, mailto, ...)
    j) Performs queries on URLS associated with sites visited via HTTPS
    k) Performs queries on local filesystem URLs
    l) Performs queries on local private network URLs (e.g. no private IP address checking)
    m) Queries ahead (checks URLs you haven't actually clicked on)
    n) Doesn't utilize its own exclusion list to reduce reported URLs (checks everything)
    o) Doesn't utilize caching (queries each and every time or very frequently)
    p) No user control over what is checked (can't create exclusions or set it to ask)
    q) No logging to make it easy for a user to review what has been sent
     
    Last edited: Mar 30, 2012
  2. m00nbl00d

    m00nbl00d Registered Member

    Joined:
    Jan 4, 2009
    Posts:
    6,623
    n) Doesn't utilize a whitelist (queries everything)

    What do you exactly mean with whitelist?
     
  3. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,171
    I just edited that to hopefully better communicate the idea. Theoretically speaking, a URL checker could utilize its own <edit> exclusion list of "known safe" and/or "too sensitive to report"</edit> domains, hosts, whatever to cut back on the number of URLs that are checked/reported. Although one might not actually want that, cutting back on the URLs checked theoretically reduces the information that is sent to the provider and thus is grossly better for privacy. Or so my thinking is.
     
    Last edited: Mar 30, 2012
  4. vasa1

    vasa1 Registered Member

    Joined:
    May 1, 2010
    Posts:
    4,417
    Palant had to face a lot of criticism when he introduced an AdBlock Plus white-list (with an opt-out rather than opt-in); people questioned his motive for going there. I don't know that the ABP team have managed to come with with a useful white-list so far.

    I feel it would be less of a problem for these url checkers to look at a local, user-generated file for exemptions if that's feasible.
     
  5. m00nbl00d

    m00nbl00d Registered Member

    Joined:
    Jan 4, 2009
    Posts:
    6,623
    Well, the deal with ABP was whitelisting ads and trackers. Whitelisting "safe" domains is a bit more problematic, I'd say...
     
  6. TheWindBringeth

    TheWindBringeth Registered Member

    Joined:
    Feb 29, 2012
    Posts:
    2,171
    I think you might be looking at things backwards from a privacy point of view. ABP's whitelist effectively UNblocks certain things and causes a WORSE situation from a privacy point of view. The whitelist I mentioned, which I'll now call an exclusion list in the hopes that it makes things more clear, is designed to cut back on what gets reported and thus causes a BETTER situation from a privacy point of view.

    The problem, of course, is it is theoretically possible that something which is excluded from the URL checker (a whistleblower website, whatever) might actually turn out to have malicious content which in turn could conceptually actually bite you and create its own privacy issue. Weighing those things is difficult. I personally would prefer less coverage, more privacy, and try to cover that gap with another layer of protection. Others may feel differently. Theoretically, the URL checker could give you some ability to adjust that risk vs privacy tradeoff. Worst case I think would be where it doesn't.
     
    Last edited: Mar 30, 2012
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.