Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: [News] Linux Reciprocity is a Major Merit

Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> espoused:
> __/ [ Mark Kent ] on Friday 16 March 2007 08:23 \__
> 
>> Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> espoused:
>>> __/ [ Mark Kent ] on Thursday 15 March 2007 16:28 \__
>>> 
>>>> Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx> espoused:
>>>>> Giving Back [to Linux]
>>>>> 
>>>>> ,----[ Quote ]
>>>>>| My contribution to Vector Linux is quite small even when compared to
>>>>>| some of the other volunteer packagers. It?s minuscule compared to
>>>>>| the core developers. The point is that the strength of the Open
>>>>>| Source community is that lots of people give back and contribute
>>>>>| what they can. Lots of little contributions make a huge impact.
>>>>> `----
>>>>> 
>>>>> http://www.oreillynet.com/linux/blog/2007/03/post_3.html
>>>>> 
>>>>> [H]omer, for example, builds RPMs for Fedora. And look what we have as a
>>>>> result:
>>>>> 
>>>>> Four good reasons to switch to RHEL 5
>>>>> 
>>>>> ,----[ Quote ]
>>>>>| Sometimes you don't want the hassle of the big upgrade. For example,
>>>>>| there is no good reason to "upgrade" Windows to Vista. On the other
>>>>>| hand, there are upgrades like Red Hat Enterprise Linux 5 (RHEL) that
>>>>>| give you some darn good reasons to make the jump.
>>>>> `----
>>>>> 
>>>>> http://www.linux-watch.com/news/NS6991009676.html
>>>> 
>>>> I would suggest that the [News] postings provide a useful service into
>>>> the Community, too.  In fact, I suspect that if you were to google all
>>>> the digests, you'd have more linux-related news URLs than any amount of
>>>> manual googling would bring.  I have noticed though that some of the
>>>> older references seem to disappear in the end.  Perhaps we should be
>>>> saving whole articles somewhere?
>>> 
>>> If the URL merely changes, then doing a Web search with the title/snippet
>>> should bring up a mirror/identical article. Some time ago OpenAddict moved
>>> from one CMS to another and many URLs broke. I sent an E-mail to the
>>> Webmaster and this will be corrected, but I agree that we can't rely on
>>> the Internet Archive (Wayback Time Machine) and search engines cache. One
>>> option is to save every page before I post it (CTRL+S+ENTER), but it
>>> doesn't make the previous links live. It also doesn't make it public. It
>>> does, on the other hand, allow me to grep back to life any article which I
>>> post here. Those who volunteered in Groklaw and Slashdot did a huge favour
>>> to society, IMHO. The deposition tapes, for example, were immortalised by
>>> a Groklawian who chose to remain anonymous.
>> 
>> I suppose I could start storing/hosting stuff here.  These are only text
>> articles anyway, so the storage requirement would not be vast.
>> 
>>> 
>>> Maybe one day we'll have 'monopoly deniers' (now they have 'climate
>>> deniers', with the negative connotation), so all this evidence is very
>>> important. It will help write history properly, going past the scope of
>>> Gates' 'Museum of Computing' and charitable work (READ: investment).
>>> 
>> 
>> Indeed, and I remain somewhat concerned that although we get to keep the
>> short snippets of articles in google and elsewhere, we might be losing
>> the originals.
> 
> True. We should learn from history that when evidence goes away, people
> conveniently ignore the past. I suggest we make use of a tool that parses
> [News]-tagged posts, extracts the URLs, and then curls/wgets them
> systematically, maybe putting them under a directory that's named just like
> the msg-id. I would have gladly implemented this if I was any good with Perl
> parsing, but it sounds like you could reuse a lot from your current
> digest-generating script. You already pick up the tags to isolate some posts
> from the rest and then extract values from the messages headers. Since
> information excess is not a major issue (it's just arhiving), then just
> wgetting everything (even tinyurl's and related posts) which begins with
> "http" might actually work. If the formatting of the posts needs to change,
> that'll be a non-issue, but I also make it convenient for Ed to parse as he
> creates local copies on his BSD server.
> 

It's an interesting possibility, to be honest, you could do this in a
simple sense easily from a bash script, since wget is able to use URLs
in a file anyway, although there are already some good regexes around
for getting URLs out of files in "urlview", so it would be a case of
launching wget from something like urlview, perhaps.  The issue that
causes me a little concern is that many articles are multiple page, and
deliberately designed so that you cannot easily do this trick, so in
practice, you're looking at something more like a webcrawler kind of
technology.

Has anyone tried to do anything like this already and perhaps has
solutions for these issues?

-- 
| Mark Kent   --   mark at ellandroad dot demon dot co dot uk          |
| Cola faq:  http://www.faqs.org/faqs/linux/advocacy/faq-and-primer/   |
| Cola trolls:  http://colatrolls.blogspot.com/                        |

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index