scrubber.py, a basic web filter for Apache

Discussion in 'other software & services' started by Gullible Jones, Jan 18, 2014.

Thread Status:
Not open for further replies.
  1. Gullible Jones

    Gullible Jones Registered Member

    Joined:
    May 16, 2013
    Posts:
    1,466
    My simple (err, more like simplistic) HTML filter script:

    https://gitorious.org/scrubber

    It's designed to be run with Apache as a proxy. If you do not know anything about Apache, do not attempt that setup. I am not responsible for you opening your desktop up to the Internet and getting hacked.

    What it does do
    Processes HTML from unencrypted HTTP connections only, allowing categories of content based on a simple whitelist. You pass it the categories as CLI parameters, and it blocks everything else. Note that you'll need the 'required' category unless you want blank white pages in your browser.

    What it does NOT do
    - Provide guaranteed protection from malware or attacks of any sort
    - Remove every single ad you might see online
    - Work perfectly with every HTTP website
    - Work at all with any HTTPS website (i.e. HTTPS will not be filtered)
    - Allow per-site preferences
    - Deal with cookies of any sort
    - Allow Java, Flash, etc. from unencrypted sites (seriously, who needs that?)

    Current limitations
    - No per-site preferences
    - No specific blocking of third-party embedded content

    Since these things are highly desirable, I do have...

    Future plans

    I'm tentatively planning a rewrite as a pure Python application. My hope is to implement a simple HTTP proxy and filter as Python libraries, with a config file also in Python; that way I could use dicts and lists for per-site settings.

    Ultimate goal

    To create a Proxomitron-like filtering proxy that is
    - versatile
    - mutliplatform
    - open source
    - easily configured (via text configuration files)

    That's a bit long-term, but I don't think it's undoable; AFAIK all the necessary stuff is already in the default CPython install. I will continue working on it in my spare time, such as I have.

    (Don't hold your breath though. I'll be tied up for most of the next two weeks, and after that who knows. Still, I hope I can do something about the lack of software in this niche.)
     
    Last edited: Jan 18, 2014
  2. Gullible Jones

    Gullible Jones Registered Member

    Joined:
    May 16, 2013
    Posts:
    1,466
    Okay, heads up update: I have switched scrubber to tag blacklisting instead of whitelisting. The script is much more maintainable (and much less painful to use) this way.

    Note that you now would specify the content you want to block, NOT the content you want to allow.

    Also note that, while the approach per tag is blacklisting, scubber still blocks prohibited elements on all HTTP sites (unless they're obfuscated enough to somehow slip past it).
     
Thread Status:
Not open for further replies.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.