[codeface] Re: LLVM email format

  • From: Wolfgang Mauerer <wolfgang.mauerer@xxxxxxxxxxxxxxxxx>
  • To: <codeface@xxxxxxxxxxxxx>
  • Date: Thu, 12 Nov 2015 19:54:21 +0100



Am 12/11/2015 um 14:15 schrieb Mitchell Joblin:

On Thu, Nov 12, 2015 at 1:25 PM, Mitchell Joblin <joblin.m@xxxxxxxxx> wrote:
On Thu, Nov 12, 2015 at 12:06 PM, Andreas Ringlstetter
<andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx> wrote:


Am 12.11.2015 um 11:39 schrieb Mitchell Joblin:
On Thu, Nov 12, 2015 at 11:17 AM, Andreas Ringlstetter
<andreas.ringlstetter@xxxxxxxxxxxxxxxxxxxx> wrote:


Am 12.11.2015 um 10:49 schrieb Mitchell Joblin:
Hi all,

I see that llvm has an odd format to specify who the email is from. An
example is of this is "Adrian Prantl via llvm-dev
<llvm-dev@xxxxxxxxxxxxxx>". Unfortunately this breaks the email
analysis because the "from" line does not get parsed properly and
decomposed into a (name, email) pair and the id service is returning
NAs. I propose that we:

Back to the original problem. I found out there is a regular
expression in fixup.authors that removes not just the "via" part of
the string, but also the entire email part.

authors <- gsub(pattern=" via [[:print:]]*>?|\\]?", x=authors,
replacement="")

@Wolfgang, can you please add a few comments for the rationalization
behind this pattern. Unfortunately there are no comments and the
commit message didn't help that much either. Perhaps you can mention
the cases you tried to handle here so that when we alter the pattern
all the cases still covered. An alternative would be that we do not
alter the above expression and instead add the email back separately
in a different expression.

IIRC, this was intended to capture things like you describe below,
"Au Thor via xyz <...>", incuding square brackets "[..]" for the relay
email. I should indeed have documented my intention. Everyone who
submits a regexp without proper documentation should receive -5 mojo
points. I exempt myself for this because the -5 rule was not yet
invented back when I made the commit ;)

Thanks, Wolfgang


Well, I know what is causing the id service to return NAs. Because the
format isn't "Adrian Prantl via llvm-dev <llvm-dev@xxxxxxxxxxxxxx>",
it's "<llvm-dev@xxxxxxxxxxxxxx> (Adrian Prantl via llvm-dev)".

It's the brackets which are tripping the id service, as they cause the
addressparser library to interpret the string in brackets as a group, no
longer returning a flat string for the name but an array of names
instead. I think I warned about this anomaly before.

No. That is not the problem. The example I gave is not of what is
provided to the id service, the example shows what is provided by
llvm as who the email is from (i.e., the "From" line). This line gets
parsed differently than the standard format and only a name gets
passed to the id service. I didn't look deeply enough to figure out
why the email is dropped entirely. The brackets used to describe the
(name,email) pair was nothing to do with what is passed to the id
service, the point was that the From line gets decomposed and at the
moment that decomposition is not done properly at the moment.

mbox crawled via gmane or downloaded from the archives at
http://lists.llvm.org/mailman/listinfo ?

It's from gmane. That's why the "From" line shown in the first email
looks like the one retrieved from gmane.

--Mitchell


They are using different, incompatible formats for the "From" line.

-- Andreas



Other related posts: