Removing HOSTS File Duplicate Entries

Discussion in 'privacy general' started by subhrobhandari, Nov 6, 2009.

Thread Status:
Not open for further replies.
  1. subhrobhandari

    subhrobhandari Registered Member

    Joined:
    Nov 6, 2009
    Posts:
    708
    How can duplicate entries in HOSTS file be removed? ;) Its really impossible to manually edit a 30 MB text file.
     
  2. funkydude

    funkydude Registered Member

    Joined:
    Apr 5, 2004
    Posts:
    6,852
  3. subhrobhandari

    subhrobhandari Registered Member

    Joined:
    Nov 6, 2009
    Posts:
    708
    I have already tested but it just crashes every time I try to find duplicate entries. Though comments are removed successfully using this. :(
     
  4. Keyboard_Commando

    Keyboard_Commando Registered Member

    Joined:
    Mar 6, 2009
    Posts:
    690
    30mb HOSTS file, Yikes!

    Just out of interest ... do you have a performance slow downs with a HOSTS file that large? and how did you get to have one that size. You must be subscribed to a lot of blocking definitions? :eek:
     
  5. subhrobhandari

    subhrobhandari Registered Member

    Joined:
    Nov 6, 2009
    Posts:
    708
    Yes, I am subcribed to 8 lists or more, and among those only one list that filters the adult sites is around 20 MB. :p There are no noticeable slow down. Though it has decreased to 28.4 MB after the comments are removed. The only problem I face it takes around 3-4 min to merge an additional list but thats expected.
     
  6. vroom23

    vroom23 Registered Member

    Joined:
    Nov 28, 2009
    Posts:
    1
    I use Boxer Text editor to change all cases to lower remove trialling spaces and then remove duplicate lines under edit>delete>duplicate lines, I have around 850k entries.
     
  7. subhrobhandari

    subhrobhandari Registered Member

    Joined:
    Nov 6, 2009
    Posts:
    708
    Right now I have removed the 20 MB adult sites' subscription, so HOSTS is around 6 MB right now and Hostsman can remove dupes.
     
  8. siljaline

    siljaline Former Poster

    Joined:
    Jun 29, 2003
    Posts:
    6,619
  9. inka

    inka Registered Member

    Joined:
    Oct 21, 2009
    Posts:
    406
    TextPad32 handles files of unlimited size.
    Tools -} Sort (and checkmark 'remove duplicates')
    www.textpad.com

    How many total lines (hostnames) does your HOST file currently contain?

    Do yourself a favor.
    Come to grips with the reality that you can't win by attempting to (collect and) block by hostname.
    -=-
    After you block smuttylefthandedpilgrims. com
    and
    www .smuttylefthandedpilgrims. com
    next week you'll be right back, chasing your tail and adding
    ns1.smuttylefthandedpilgrims. com
    and
    ns2.smuttylefthandedpilgrims. com
    and
    download.smuttylefthandedpilgrims. com
    ad nauseum

    and nowadays yer chasing skeeters, playing Bop-the-Gopher, across 100+ country code TLDs
    smuttylefthandedpilgrims. com.my
    and
    www .smuttylefthandedpilgrims. com.my
    and
    smuttylefthandedpilgrims. com.kr
    and
    www .smuttylefthandedpilgrims. com.kr

    Instead, you can block based on the (one) pattern:
    smuttylefthandedpilgrims
    and be done with it using the freeware DNSKong from
    http://pyrenean.com

    FWIW, I use an older version of DNSKong.exe which has a GUI & its filesize is 260kb.
    The author ditched the GUI for the current version; dnskong.exe is now about 28kb
    (tiny. no frills. terse documentation.)
    My current 'named.txt' (blocklist) contains 176,000+ patterns & occupies 2.8Mb

    I've tested DNSKong using a blocklist containing 500K+ lines. No problem, no slowdown.
     
  10. siljaline

    siljaline Former Poster

    Joined:
    Jun 29, 2003
    Posts:
    6,619
    In English, please ? o_O
     
  11. inka

    inka Registered Member

    Joined:
    Oct 21, 2009
    Posts:
    406
    First part of my post explains "how to" remove duplicate entries
    by loading the giant HOSTS file into TextPad32 and using its Sort command.
    The balance of the post presents an argument for utilizing a blocklist based based on patterns (vs full hostnames) and recommends an app (DNSKong) which will enable you to do so.
     
Loading...
Thread Status:
Not open for further replies.