Am 24.11.2016 um 11:06 schrieb Wolfgang Mauerer:
If the name of an author looks like an email address, without the fix the name is treated as email address and the actual email address will not get parsed. That's the "parsing problem" I have meant. So, in the end, in the DB there is a NULL value for the name and the name itself is treated as email address.
Am 13/10/2016 um 17:29 schrieb Claus Hunsen:
From: Thomas Bock <bockthom@xxxxxxxxxxxxxxxxx>can you please limit the length of these lines to 75-80 chars?
Within mailing list analysis some persons with name 'NULL' could occur in the
database due to some parsing problems. The fix considers two cases:
- only email address in angle brackets is provided. Then just use the first
part of the email as name.
- name looks like an email address. In that case also use only the first part
of that as name in order to avoid parsing problems.
Depending on the commit history browser, such long lines can lead
to an inconvenient display of the message.
I'm not sure about the statement "in order to avoid parsing
problems." -- the commits makes sure that in this case no
NULL value is written into the DB, but would the parser otherwise
cause issues, or are you referring to an error that's later
on caused by the NULL value?
Signed-off-by: Thomas Bock <bockthom@xxxxxxxxxxxxxxxxx>see below...
---
codeface/R/ml/analysis.r | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)
diff --git a/codeface/R/ml/analysis.r b/codeface/R/ml/analysis.r
index 3218864..51c3ac9 100644
--- a/codeface/R/ml/analysis.r
+++ b/codeface/R/ml/analysis.r
@@ -299,14 +299,42 @@ check.corpus.precon <- function(corp.base) {
## Get email and name parts
r <- regexpr("<.+>", author, TRUE)
if(r[[1]] == 1) {
- email <- substr(author, r, r + attr(r,"match.length")-1)see below...
- name <- sub(email, "", author, fixed=TRUE)
- name <- fix.name(name)
+
+ ## Check if only an email is provided
+ if(attr(r, "match.length") == nchar(author)) {
+ ## Only an email like "<hans.huber@xxxxxxxxxxxxx>" is provided... here: Can you please use only one of attr(r, "match.length")
+ email <- substr(author, r+1, r + nchar(author)-2)
+ name <- gsub("\\.", " ",gsub("@.*", "", email))
+ } else {
+ ## email and name both are provided
+ email <- substr(author, r, r + attr(r,"match.length")-1)
+ name <- sub(email, "", author, fixed=TRUE)
+ name <- fix.name(name)
+ }
+
email <- str_trim(email)
author <- paste(name,email)
}
}
+ ## Check if name looks like an email address.
+ ## Since that causes parsing problems, use only the local part of an
+ ## email address as name.
+
+ ## Get email and name parts first
+ r <- regexpr("<.+>", author, TRUE)
+ if(r[[1]] >= 1) {
or r[[1]] (the former preferred)? Otherwise, is not easy to see
at the first glance that the statements compute identical things.
+ email <- substr(author, r, r + attr(r,"match.length")-1)nitpick: ,<space> (admittedly, there are tons of this in the source
base, but let's try to keep new code clean)
+ name <- sub(email, "", author, fixed=TRUE)Reviewed-by: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
+ name <- fix.name(name)
+
+ if(regexpr("\\S+@\\S+", author, TRUE)[1]==1) {
+ ## Name looks like an email address. Use only local part as name.
+ name <- gsub("\\.", " ",gsub("@.*", "", name))
+ }
+ author <- paste(name,email)
+ }
+
return(author)
}