Home Messages Index
[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index

Re: harvesting addresses from kmail folders

  • Subject: Re: harvesting addresses from kmail folders
  • From: Roy Schestowitz <newsgroups@xxxxxxxxxxxxxxx>
  • Date: Sun, 05 Mar 2006 08:46:39 +0000
  • Newsgroups: uk.comp.os.linux
  • Organization: schestowitz.com / MCC / Manchester University
  • References: <2175462.7YWmVdkXkE@ale.cx> <ccmpd3-04m.ln1@ID-107770.user.individual.net>
  • Reply-to: newsgroups@xxxxxxxxxxxxxxx
  • User-agent: KNode/0.7.2
__/ [ Whiskers ] on Sunday 05 March 2006 00:01 \__

> On 2006-03-04, alexd <look@xxxxxx> wrote:
>> I've just moved someone from Thunderbird on Windoze to KMail on Linux.
>> I've managed to import the ~1gb[!] of emails into KMail, but I deleted the
>> Windows partition before remembering the contacts would need importing
>> too :-/ There may be a ray of hope however, as pretty much everyone this
>> person needs to email is a sender or receiver of said gig of mails.
>>
>> So is there a tool that can parse a shitload of KMail mail directories [or
>> possibly mbox files], and spit out a .vcf or some other easily-importable
>> contact book?
>>
>> Amusingly enough, searching google for various permutations of 'kmail'
>> 'address' 'email address' 'harvest' and 'address book' caused google to
>> accuse me of being a virus-infected computer. I even had to use A9.com at
>> one point.
> 
> Crude and off-the-top-of-my-head, but if you have KMail set up to use
> 'maildir' storage, each email is a seperate file inside a directory with
> the name of the 'folder' that appears in KMail, and you can get list of
> all the 'From' headers with
> 
> cat * |grep From
> 
> (from within the apropriate directory) which can be saved to a file like
> this
> 
> cat * |grep From >/home/mark/tmp/Family_from
> 
> (KMail puts its stuff in a 'hidden' directory so the maildir files for
> mark's Family folder will be in /home/mark/.Mail/Family/cur/).
> 
> If you used the mbox format instead, find the directory with the mbox
> files in it, and use
> 
> cat <filename> |grep From
> 
> instead.
> 
> I expect a more 'refined' method could be devised, but grep is the basic
> tool.  See man grep  ;))
 
The task at hand makes it appear like liasing with a spammer would a good
idea. You would not realible extract addresses based on the "@" symbol, nor
would you be able to pull addresses reliably based on regex with "From:".
You are still left with some issues like non-RFC-compliant messages.

I'd imagine that the best use of time would involve echoing or concatenating
all lines that contain "From: ", then remove duplicate lines and manually
copy them to KMail. If you add some commas in accordance with the CSV
conventions (if any exist), then you should be able to import as CSV. I
think you get to assign the column names (thus meaning) when importing file
that are CSV or TSV. You could use KSpread to help you with that.

Best wishes,

Roy

[Date Prev][Date Next][Thread Prev][Thread Next]
Author IndexDate IndexThread Index