In some mbox files downloadable from GMane, there are more than one date
header given in a mail document. To circumvent problems in later stages
of the ML analysis, only the first date gets considered.
Small indentation fix in code nearby.
Signed-off-by: Claus Hunsen <hunsen@xxxxxxxxxxxxxxxxx>
---
codeface/R/ml/analysis.r | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/codeface/R/ml/analysis.r b/codeface/R/ml/analysis.r
index 93d5425..9704853 100644
--- a/codeface/R/ml/analysis.r
+++ b/codeface/R/ml/analysis.r
@@ -360,6 +360,11 @@ check.corpus.precon <- function(corp.base) {
return(NA)
}
+ ## only consider first date header in document if more are given
+ if (length(date.header) > 1) {
+ date.header = date.header[1]
+ }
+
## patterns without time-zone pattern
date.formats.without.tz = c(
"%a, %d %b %Y %H:%M:%S", # initially used format; e.g., "Date: Tue, 20
Feb 2009 20:24:54 +0100"
--
2.10.2