[codeface] Re: Fwd: Mailing list analysis mbox input file

  • From: Wolfgang Mauerer <wm@xxxxxxxxxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Mon, 24 Mar 2014 23:05:35 +0100

Hi Mitchell,

Am 24/03/2014 15:01, schrieb Mitchell Joblin:
Hello all,

I am having problems with getting the mailing list analysis to run
correctly. I have narrowed the problem down to the dispatch.all
function in R/ml/analysis.r and the error message is "Mailing list
does not cover any release range." The problem seems to be that not
all the dates from the emails in the .mbox file are identified. For
example, when I load an archive of one month of emails from qemu I
only get a corpus with 1 document and 1 date. The date that is
identified is the date of the first email in the mbox file.

Does each individual email need to be in its own file for this to work
correctly?
there are two alternatives how the ml analysis deals with .mbox input
files

- split them into indivual mail files and read them one after one
- process the complete mailbox in one go, without creating intermediate
  individual emails

The latter approach is clearly more intelligent, but only works with
the most recent versions of tm and tm.plugin.mail. Which versions are
you working with?


Should the gen.corpus function return a corpus with more than 1 document
when a single archive with multiple emails is loaded?
absolutely. It should return a corpus with as many documents as there
are emails.

Best regards, Wolfgang

Kind regards,

Mitchell


Other related posts: