Which Website Downloader?

Discussion in 'other software & services' started by pvsurfer, Feb 23, 2006.

Thread Status:
Not open for further replies.
  1. pvsurfer

    pvsurfer Registered Member

    Joined:
    Sep 1, 2004
    Posts:
    1,400
    Location:
    California - USA
    I'm looking to get a website downloader (and offline browser).

    I'm aware of the likes of Netgrabber, Offline Explorer, Teleport, etc. but don't know much about them other than their own boilerplate.

    I've also come across a freeware product, HTTrack, but if the 'payware' products are better, I wouldn't mind buying one.

    Any advice or feedback would be appreciated.
     
  2. dog

    dog Guest

    I'd suggest you'd forget those tools all together... they do more harm than any good. The resource spikes they cause sites is tremendous as they pull page after page, multiple pages at a time, aggresively downloading information - the bandwidth and server load is very taxing ... they'll only serve to cripple servers, raise costs, thus increasing ads etc, in an attempt for sites to gain more revenue to cover increased costs, to maybe ultimately causing sites/domains to go subscription based.

    Why don't you try an RSS reader instead and then view the pages you fancy from the descriptions.

    Steve
     
  3. pvsurfer

    pvsurfer Registered Member

    Joined:
    Sep 1, 2004
    Posts:
    1,400
    Location:
    California - USA
    Steve~

    I didn't realize that they put that kind of load on a website! :eek:

    Do all of them do that to about the same extent (don't they have settings to prevent that)?

    ~pv
     
  4. NGRhodes

    NGRhodes Registered Member

    Joined:
    Jun 23, 2003
    Posts:
    2,331
    Location:
    West Yorkshire, UK
    I use Winhttrack. Never explored commercial versions, but this is the only one I found that would work flawlessy and have good configuration options.

    http://www.httrack.com/

    Steve, I agree, but sometimes we need to grab offline copies of sites.
     
  5. dog

    dog Guest

    Hi pv, :)

    Unfortuantely ... they all have the same effect. :doubt: Some of the tools mentioned can be throttled down to some extent, but even then they cause unnecessary load, they work much much faster than a real person (as a person reads as they go - a person also differentiates between what does and doesn't interest them, these tools do not they just instantly download everything / following every link - they're also designed to pull using multiple connections simutaniously (to reduce the time needed to accomplish their goal). Truthfully I don't see the merit of these tools for offline browsing. Yes they could be useful if someone wanted/needed to mirror a site that was experiencing difficulties, but not much beyond that.

    You also have to remember search engine spiders/crawlers use similar technology, and there are hundreds of thousands of those indexing every piece of content available - even though they are scripted to index things at a relatively reasonable rate - they too can cause heavy loads especially when mutliple spiders (10-100) from the same search engine are all indexing one site ... not to mention the spiders/crawlers of other search engines also indexing at the same time. Search engine spiders really are a catch 22 for site operators, yes they are wanted for their indexing/search results ... but the kind of bunch ups mentioned are very taxing. If you added several individuals doing the same thing you can really begin to imagine the damage it could inflict, as well as the cause and effect I mention in the previous post.

    I hope anyone using this type of technology for personal use, really reconsiders doing so. It'll only serve to harm us all in the end. :'(

    HTH,

    Steve
     
  6. NGRhodes

    NGRhodes Registered Member

    Joined:
    Jun 23, 2003
    Posts:
    2,331
    Location:
    West Yorkshire, UK
    dog, Yes and a lot of these crawlers break the http 1.1 standard by using more than 2 connections.

    Thats the best thing you can do is make sure that you only use 1 or 2 connections to a website to prevent excessive strain on the servers.

    Good point about dynamic, searchable sites, I always stop crawlers from using any search facilities (make sure they have access to an index/sitemap instead).
     
Loading...
Thread Status:
Not open for further replies.