[codeface] [PATCH v2 14/24] Only consider first date header in ML analysis

  • From: Claus Hunsen <hunsen@xxxxxxxxxxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Thu, 1 Dec 2016 17:11:08 +0100

In some mbox files downloadable from GMane, there are more than one date
header given in a mail document. To circumvent problems in later stages
of the ML analysis, only the first date gets considered.

Small indentation fix in code nearby.

Signed-off-by: Claus Hunsen <hunsen@xxxxxxxxxxxxxxxxx>
---
 codeface/R/ml/analysis.r | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/codeface/R/ml/analysis.r b/codeface/R/ml/analysis.r
index 93d5425..9704853 100644
--- a/codeface/R/ml/analysis.r
+++ b/codeface/R/ml/analysis.r
@@ -360,6 +360,11 @@ check.corpus.precon <- function(corp.base) {
       return(NA)
     }
 
+    ## only consider first date header in document if more are given
+    if (length(date.header) > 1) {
+      date.header = date.header[1]
+    }
+
     ## patterns without time-zone pattern
     date.formats.without.tz = c(
       "%a, %d %b %Y %H:%M:%S",  # initially used format; e.g., "Date: Tue, 20 
Feb 2009 20:24:54 +0100"
-- 
2.10.2


Other related posts:

  • » [codeface] [PATCH v2 14/24] Only consider first date header in ML analysis - Claus Hunsen