[codeface] [PATCH 1/2] Remove move problematic characters from the email authors

  • From: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
  • To: codeface@xxxxxxxxxxxxx
  • Date: Fri, 11 Dec 2015 18:32:42 +0100

- Parenthesis in the author name cause the id service to return
NA for the person ids

Signed-off-by: Mitchell Joblin <mitchell.joblin.ext@xxxxxxxxxxx>
---
codeface/R/ml/analysis.r | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/codeface/R/ml/analysis.r b/codeface/R/ml/analysis.r
index 9f13d29..9c10403 100644
--- a/codeface/R/ml/analysis.r
+++ b/codeface/R/ml/analysis.r
@@ -227,8 +227,10 @@ check.corpus.precon <- function(corp.base) {
}

## Remove problematic punctuation characters
- author <- gsub("\"", " ", author)
- author <- gsub(",", " ", author)
+ problem.characters <- c("\"", ",", "\\(", "\\)")
+ for (p.char in problem.characters) {
+ author <- gsub(p.char, " ", author)
+ }

## Trim trailing and leading whitespace
author <- str_trim(author)
--
2.1.4


Other related posts: