Am 26/03/2014 19:11, schrieb Mitchell Joblin:
On Tue, Mar 25, 2014 at 9:39 AM, Mitchell Joblin <joblin.m@xxxxxxxxx> wrote:
I have attached a corpus and .mbox file with a small number of emails . I am able to see more then one email per file but the the dates that are returned are all NAs and then the analysis fails shortly after gen.corpus is called.I have found a solution to the problem. The dates were not being parsed because my system LC_TIME locale was set to Germany. The dates in the mailing list are in english so when the date was parsed it returned NA. The lack of dates then cause a later failure when the dates are used to find the overlap with the VCS revision dates. The dates are parsed using the readmail function generator in the snatm package where strptime is called and that relies on the LC_TIME locale. I suggest that we manually set the LC_TIME locale in the R
great, thanks! Looking forward to the patch ;)
environment using Sys.setlocale(category = "LC_TIME", locale = "en_US.UTF-8"). Is it safe to assume that all mbox files will be in english or will it be necessary to make this user configurable?
that should not be necessary. As per RFC2822, Section 3.3, the date/time specification uses English terms, so the locale en_US.UTF-8 should be safe. However, we should test if other locales than C and en_XX do cause unexpected problems in other places. Best regards, Wolfgang
Kind regards, MitchellKind regards, MitchellBest regards, WolfgangKind regards, Mitchell