Page 1 of 2

Looking for some picture downloaders

Posted: Sat Dec 17, 2005 6:17 pm
by Weresmilodon
Ok, I'm looking for some picture downloaders, that can download pictures either from a page with multiple links, all links going from that page, or from subfolders. It should also be able to determen if the picture has been downloaded before, and if so, have an option not to do so again. (Sorting through hundreds of pictures again and again is not my idea of fun.)

Additionally, it should be able to 'check' the page/subfoldes at certain times to see if there is new material, preferably some form of schedule, or ‘check every * minutes’ option. Also, it should keep itself somewhat at the same speed as a browser, so as not to overload the page.

Anyone have any ideas of one (or several)? None of the ones I have found can check for itself, or determine if there is new material. They just download everything when you tell them to, regardless of if it has already been downloaded before, just overwriting the former pictures.

Re: Looking for some picture downloaders

Posted: Sat Dec 17, 2005 7:33 pm
by Scuttle
Hm, httrack is good, also, I use wget when mirroring sites, wget checks the timestamps when it retreives files and only gets new ones

Re: Looking for some picture downloaders

Posted: Sat Dec 17, 2005 8:35 pm
by Weresmilodon
Thanks, I'll try looking them up.

Re: Looking for some picture downloaders

Posted: Sat Dec 17, 2005 9:31 pm
by Shadowhawk
Scuttle wrote:(...) I use wget when mirroring sites, wget checks the timestamps when it retrieves files and only gets new ones
If the files doesn't change, and they might be only new files, wget in recursive mode (or with page prerequisites), with proper depth limit, type limit, directory limit etc. in continue mode can probably do what you want. If files change, you might use time-stamping. I even used wget successfully it on the sites which used simple leech protection, like checking the referer, or time between successive downloads.

You might also (if you use MS Windows) try Nici (not tested, trial shareware :!:)

Re: Looking for some picture downloaders

Posted: Sat Dec 17, 2005 10:16 pm
by afrigeek
I use sitecopy and it works beatifully for me with regards to all the aforementioned features.

Re: Looking for some picture downloaders

Posted: Sat Dec 17, 2005 11:10 pm
by Weresmilodon
Tried several of them, but I’m having trouble. Perhaps I should explain, and see if there's been a misunderstanding, or if some of you can help me get it right.

Let’s pull up 4chan as an example. Lets say I wanted all the pictures in the /wg (Wallpapers/General) directory. The link there is http://cgi.4chan.org/wg/imgboard.html . Now, I want the program to get all the picture in /wg, and keep checking if there’s new ones, and download those as well. This cause several problems for me; the biggest one is actually getting all the pictures. It needs to go into the threads, check all the replys, and all the pages and get everything that way. At the same time, I don't want it to download everything again and again, just because I deleted the junk. That's why I mentioned checking the subfolders. If it monitors http://cgi.4chan.org/wg/ and downloads the files that are added, but don't touch the files already gotten, then that’s all I need.

It should also be able to monitor several addresses at the same time, and if possible, download them into different folders.

Now, I tried wget, got wgetGUI, and worked on that for some time, but I couldn’t get anything. And I mean noting with that. I couldn’t even connect to anything. Then I tried Nici. Seems to be closer to what I want, but I can’t really get anything with that one either. Seems like I have no real control over what it gets, and if I want some, I have to monitor it all the time, and add everything it should get manually. Then it’s easier to just go to the page and use Linky to open all images in tabs, then Magpie to save all the images to a folder. Wastes a lot of time, and that's why I’m looking for this kind of program in the first case, so I don't have to do that.

Anyway, I appreciate any help you can give me, and thanks for the help that’s already been given.

Re: Looking for some picture downloaders

Posted: Sat Dec 17, 2005 11:17 pm
by Wiz
I think that Quadsucker Web from Scott Baker may be what you need.
http://www.quadsucker.com/quadweb/
It is multithreaded and rejects duplicates.

Re: Looking for some picture downloaders

Posted: Sat Dec 17, 2005 11:37 pm
by Weresmilodon
Ok, trying Quadsucker and httrack right now. Both seems good, but neither seems to have a schedule. Do they automatically keep checking, or am I required to activate them every time to check for new content?

Re: Looking for some picture downloaders

Posted: Sun Dec 18, 2005 12:24 am
by Shadowhawk
Weresmilodon wrote:Now, I tried wget, got wgetGUI, and worked on that for some time, but I couldn’t get anything. And I mean noting with that. I couldn’t even connect to anything.
Well, that it is because this site (http://cgi.4chan.org/wg/imgboard.html) has some protection against leechers aka. web robots. I haven't tried to download images with the page (using either '<tt>--page-requisites</tt>' option, or limited recursive downloading including '<tt>--recursive --level=1 --accept="html,php,gif,jpeg,jpg"</tt>' options, of course with '<tt>--timestamping</tt>' (download file if there is newer version) and/or '<tt>--continue</tt>' (do not overwrite existing files)) but the main page itself can be downloaded using wget... by faking that you are using web browser, e.g. '<tt>--user-agent="Mozilla/5.0"</tt>'. For checking the site once per some time, use some command scheduler, for example put appropriate wget calling script in cron on Linux.

It is possible that you might have to set the refer to main page to download images from it.

Re: Looking for some picture downloaders

Posted: Sun Dec 18, 2005 12:40 am
by Weresmilodon
Yah, well, there's a reason for me getting the wgetGUI. I can't really work command-lines, so I don't really have any big idea of what you said there. I did try the pretend to be a browser option, and also the ignore robot.txt option, but neither one helped. I didn't see any way to set up an referral in the GUI either, so I have no idea of how to do that.

Moreover, is it possible for wget to get everything, and keep checking for new stuff? That's the function I’m really looking for, because a few of the places I want to monitor delete everything about 1-2 times per day. I don't have a chance getting them myself, so I really need some way for the program to manage itself.

Re: Looking for some picture downloaders

Posted: Sun Dec 18, 2005 12:50 am
by Wiz
I read through the user manual http://www.quadsucker.com/quadweb/quadweb.html and it looks like Quadsucker will not automatically check for new content. You may be able to use a tool like Windows Task Scheduler but I have never tried.

Re: Looking for some picture downloaders

Posted: Sun Dec 18, 2005 12:56 am
by Weresmilodon
Yeah, I came to the same conclusion once it had run once. I doubt the Task Scheduler will work, as you have to press a button too start it all, providing it saves the settings. I couldn’t find any commands for the commandline either, so it's pretty much a bust. I read something about a more advanced version (Supersucker or somesuch) that I will check into next. Might be the same except bigger, but it's worth a try.

Edit: Nevermind. Ultrasucker is the same, except it shows no thumbs, and instead opens 12 connections rather then 4.

Re: Looking for some picture downloaders

Posted: Sun Dec 18, 2005 1:53 am
by Shadowhawk
Weresmilodon wrote:Yah, well, there's a reason for me getting the wgetGUI. I can't really work command-lines, so I don't really have any big idea of what you said there. I did try the pretend to be a browser option, and also the ignore robot.txt option, but neither one helped. I didn't see any way to set up an referral in the GUI either, so I have no idea of how to do that.

Moreover, is it possible for wget to get everything, and keep checking for new stuff? That's the function I’m really looking for, because a few of the places I want to monitor delete everything about 1-2 times per day. I don't have a chance getting them myself, so I really need some way for the program to manage itself.
First, you can put any command line option in Additional Parameters field in wgetGUI (or use I am a pro button to modify parameters passed to wget). Second, I don't know why Identify as browser option doesn't work for you... Do you have the problem with downloading main page, or just only the links (images)?

I haven't used wget for mirroring, so the answers might be wrong, but I think between Timestamping (download only if file is newer or does not exist), No clobber and Continue file download you should get what you want.

Re: Looking for some picture downloaders

Posted: Sun Dec 18, 2005 1:59 am
by Weresmilodon
Shadowhawk wrote:
Weresmilodon wrote:I haven't used wget for mirroring, so the answers might be wrong, but I think between Timestamping (download only if file is newer or does not exist), No clobber and Continue file download you should get what you want.
Except that I have to start it myself. And that's the problem. I need something that checks for new stuff at least every hour, without me having to give it an order to.

Re: Looking for some picture downloaders

Posted: Sun Dec 18, 2005 2:24 am
by Shadowhawk
Weresmilodon wrote:Except that I have to start it myself. And that's the problem. I need something that checks for new stuff at least every hour, without me having to give it an order to.
Can't you use the *.bat file generated by wgetGUI in Task Scheduler?

Or equivalently, make a batch file/script/program which calls wget/batch file generated by wgetGUI every hour, e.g. in bash shell.

Code: Select all

while ["1" = "1"]; do
   wget [parameters]
   sleep 3600 # 3600 seconds = 1 hour
done
There is probably equivalent in MS Windows DOS shell/batch file syntax... if there is equivalent of <tt>sleep</tt> command (<tt>delay</tt>, perhaps?).