A script for updating your hosts file

Discussion in 'all things UNIX' started by tlu, Mar 21, 2013.

Thread Status:
Not open for further replies.
  1. tlu

    tlu Guest

    Yesterday I stumbled upon a nice and relatively simple script on kubuntuforums.net which downloads several well-known hosts files, merges them and adds them to your system hosts file. This is my version:

    Code:
    #!/bin/bash
    
    # If this is our first run, save a copy of the system's original hosts file and set to read-only for safety
    if [ ! -f ~/hosts-system ]
    then
     echo "Saving copy of system's original hosts file..."
     cp /etc/hosts ~/hosts-system
     chmod 444 ~/hosts-system
    fi
    
    # Perform work in temporary files
    temphosts1=`mktemp`
    temphosts2=`mktemp`
    
    # Obtain various hosts files and merge into one
    echo "Downloading ad-blocking hosts files..."
    wget -nv -O - http://winhelp2002.mvps.org/hosts.txt >> $temphosts1
    wget -nv -O - http://hosts-file.net/download/hosts.txt >> $temphosts1
    wget -nv -O - http://someonewhocares.org/hosts/hosts >> $temphosts1
    wget -nv -O - "http://pgl.yoyo.org/as/serverlist.php?hostformat=hosts&showintro=1&mimetype=plaintext" >> $temphosts1
    
    # Do some work on the file:
    # 1. Remove MS-DOS carriage returns
    # 2. Delete all lines that don't begin with 127.0.0.1
    # 3. Delete any lines containing the word localhost because we'll obtain that from the original hosts file
    # 4. Replace 127.0.0.1 with 0.0.0.0 because then we don't have to wait for the resolver to fail
    # 5. Scrunch extraneous spaces separating address from name into a single tab
    # 6. Delete any comments on lines
    # 7. Clean up leftover trailing blanks
    # Pass all this through sort with the unique flag to remove duplicates and save the result
    echo "Parsing, cleaning, de-duplicating, sorting..."
    sed -e 's/\r//' -e '/^127.0.0.1/!d' -e '/localhost/d' -e 's/127.0.0.1/0.0.0.0/' -e 's/ \+/\t/' -e 's/#.*$//' -e 's/[ \t]*$//' < $temphosts1 | sort -u > $temphosts2
    
    
    # Combine system hosts with adblocks
    echo Merging with original system hosts...
    echo -e "\n# Ad blocking hosts generated "`date` | cat ~/hosts-system - $temphosts2 > ~/hosts-block
    
    # Clean up temp files and remind user to copy new file
    echo "Cleaning up..."
    rm $temphosts1 $temphosts2
    echo "Done."
    echo
    echo "Copy ad-blocking hosts file with this command:"
    cp ~/hosts-block /etc/hosts
    echo
    echo "You can always restore your original hosts file with this command:"
    echo " sudo cp ~/hosts-system /etc/hosts"
    echo "so don't delete that file! (It's saved read-only for your protection.)"
    
    127.0.0.1 is replaced with 0.0.0.0 in $temphosts2 as this seems to speed up things considerably. I saved gethosts in /root, made it executable and executed

    Code:
    ln -s /root/gethosts /etc/cron.daily
    in order to update it daily. You might prefer another schedule, though.

    Seems to work very well.
     
    Last edited by a moderator: Mar 21, 2013
  2. m00nbl00d

    m00nbl00d Registered Member

    Joined:
    Jan 4, 2009
    Posts:
    6,623
    You could also try to add 9 entries in each line, instead of 1 entry = 1 line.

    Example:

    0.0.0.0 domain1 domain2 domain3 domain4 domain5 domain6 domain7 domain8 domain9
     
  3. tlu

    tlu Guest

    Yes, but why? What benefits would that have? Besides, it would make the script more complicate, and I'm not a scripting expert at all.
     
  4. m00nbl00d

    m00nbl00d Registered Member

    Joined:
    Jan 4, 2009
    Posts:
    6,623
    Sorry for the late reply, but I really forgot about this thread.

    It should have two benefits, one/both depending on the size of the hosts file. But, it would decrease both the size and the time it would take to read each line. :)
     
  5. tlu

    tlu Guest

    Here's a slightly updated version of the script. Now it contains 8 hosts files and produces a system hosts file with nearly 240,000 entries.

    Code:
    #!/bin/bash
    
    # If this is our first run, save a copy of the system's original hosts file and set to read-only for safety
    if [ ! -f ~/hosts-system ]
    then
     echo "Saving copy of system's original hosts file..."
     cp /etc/hosts ~/hosts-system
     chmod 444 ~/hosts-system
    fi
    
    # Perform work in temporary files
    temphosts1=`mktemp`
    temphosts2=`mktemp`
    
    # Obtain various hosts files and merge into one
    echo "Downloading ad-blocking hosts files..."
    wget -nv -O - http://winhelp2002.mvps.org/hosts.txt >> $temphosts1
    wget -nv -O - http://hosts-file.net/download/hosts.txt >> $temphosts1
    wget -nv -O - http://someonewhocares.org/hosts/hosts >> $temphosts1
    wget -nv -O - "http://pgl.yoyo.org/as/serverlist.php?hostformat=hosts&showintro=1&mimetype=plaintext" >> $temphosts1
    wget -nv -O - "https://spyeyetracker.abuse.ch/blocklist.php?download=hostfile" >> $temphosts1
    wget -nv -O - "https://zeustracker.abuse.ch/blocklist.php?download=hostfile" >> $temphosts1
    wget -nv -O - "http://www.malware.com.br/cgi/submit?action=list_hosts_win_127001" >> $temphosts1
    wget -nv -O - http://www.malwaredomainlist.com/hostslist/hosts.txt >> $temphosts1
    
    # Do some work on the file:
    # 1. Remove MS-DOS carriage returns
    # 2. Delete all lines that don't begin with 127.0.0.1
    # 3. Delete any lines containing the word localhost because we'll obtain that from the original hosts file
    # 4. Delete any lines containing the word dropbox.com.
    # 5. Replace 127.0.0.1 with 0.0.0.0 because then we don't have to wait for the resolver to fail
    # 6. Scrunch extraneous spaces separating address from name into a single tab
    # 7. Delete any comments on lines
    # 8. Clean up leftover trailing blanks
    # Pass all this through sort with the unique flag to remove duplicates and save the result
    echo "Parsing, cleaning, de-duplicating, sorting..."
    sed -e 's/\r//' -e '/^127.0.0.1/!d' -e '/localhost/d' -e '/dropbox.com/d' -e 's/127.0.0.1/0.0.0.0/' -e 's/ \+/\t/' -e 's/#.*$//' -e 's/[ \t]*$//' < $temphosts1 | sort -u > $temphosts2
    
    
    # Combine system hosts with adblocks
    echo Merging with original system hosts...
    echo -e "\n# Ad blocking hosts generated "`date` | cat ~/hosts-system - $temphosts2 > ~/hosts-block
    
    # Clean up temp files and remind user to copy new file
    echo "Cleaning up..."
    rm $temphosts1 $temphosts2
    echo "Done."
    echo
    echo "Copy ad-blocking hosts file with this command:"
    cp ~/hosts-block /etc/hosts
    echo
    echo "You can always restore your original hosts file with this command:"
    echo " sudo cp ~/hosts-system /etc/hosts"
    echo "so don't delete that file! (It's saved read-only for your protection.)"
    

    For those of you using dnsmasq I've attached a file with 838 domains of adservers and trackers. Just copy those entries to your /etc/dnsmasq.conf file and restart dnsmasq with

    sudo service dnsmasq restart

    EDIT: New version of the attachment uploaded becuase of a few errors.
     

    Attached Files:

    Last edited by a moderator: Aug 31, 2013
  6. tlu

    tlu Guest

  7. tlu

    tlu Guest

    Just in case that someone is interested: I've updated the gethosts script considerably.

    There are now more than 565,000 entries in the hosts file. But above all, bandwith is dramatically reduced since now the script downloads several hosts files as zip and 7z archives and it compares the timestamps of the remote and local files so that only newer versions of the files are downloaded again. That's why it's a lot faster now.
     
Loading...
Thread Status:
Not open for further replies.