__/ [ SteveN ] on Wednesday 22 February 2006 10:35 \__
> Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> wrote in
> news:dthbe5$k6v$4@xxxxxxxxxxxxxxxxx:
>>> Are there any other tools that I can use to mirror a website without
>>> using wget (which doesn't seem to work).
>>
>> scp, ftp, among more. Are you *mirroring* a site or just *scraping*
>> it? Is it /your/ site?
>
> No, it's not my site, and it doesn't have ftp access. I want to copy
> images from it, and although I could browse it manually, and copy the
> images from my browser's cache, it is just so time-consuming.
It doesn't sounds as though what you are trying to do is legitimate or
ethical.
>>> The site needs a username and password (which I have) and I copied
>>> the cookies from Firefox to the wget directory after properly logging
>>> in. Firefox, Opera, and for that matter Internet Explorer have no
>>> problems once I have logged in, but it seems wget is getting confused
>>> by some javascript nastiness that sends it off-site.
>>
>> Maybe grabbers are denied as a matter of principle. Maybe some
>> user-agent sniffing is involved, in which case you must spoof.
>
> Yes, I am spoofing, as Mozilla - most of the pages work, but when it comes
> to a 'protected' page, it seems to ignore the cookies I already have, and
> then tries to force me to log in again (I think!) I'll have to examine
> ethereal to see if I can see anything.
Page authentication is not something which I know how to deal with, not
without exploring the intricacies of wget. It might not offer that facility
at all.
>>> I *think* what I am asking is if there is an extension to Firefox
>>> which allows it to be used as a mirroring tool? Googling just seems
>>> to give me lots of 'mirrors of firefox' rather than what I am after.
>>
>>
>> There are mirroring tools for Web sites that are owned by the
>> 'mirrorer'. I syndicate Firefox plug-ins on a daily basis and I have
>> not come across such an extension.
>
> Oh, well ...
>
>> There are Google scrapers in the wild, so you might be able to re-use
>> them. They should be easy to identify on the Net.
>
> I don't think I need the scrapers yet.
>
>> Hope it helps,
>
> Thanks
wget -r -l3 -H -t3 -nd -N -np -N -np -A.jpg,.jpeg,.gif,.png,.bmp -erobots=off
-i list_of_sites.txt
Hope it helps,
Roy
--
Roy S. Schestowitz | "I regularly SSH to God's brain and reboot"
http://Schestowitz.com | SuSE Linux | PGP-Key: 0x74572E8E
11:50am up 5 days 0:09, 8 users, load average: 1.06, 0.75, 0.59
http://iuron.com - Open Source knowledge engine project
|
|