[yunqa.de] Matched TOKEN COUNT for YuPcre2

  • From: "George Spears" <george@xxxxxxxxxx>
  • To: <yunqa@xxxxxxxxxxxxx>
  • Date: Sat, 3 Oct 2015 12:22:10 -0400

Hello,



I am looking at YuPcre2 for some work I am starting. I need to determine
the likelihood that a given business name is really business Name 'X'.

For example, does 'University of Chicago' really match 'Chicago University'.
As such, I am looking to see HOW MANY tokens in PATTERN are matching STRING.

For example, assume I want to match as many as possible the tokens, 'one',
'two' and 'three'.



I can use '^.*\b(one|two|three).*$' to match any of the tokens.

I can use a different string to match ALL the tokens.



What I really want to know, of the tokens 'one', 'two' and 'three', HOW MANY
were matched?

I can be a little more discriminating and say a MINIMUM of two must be
matched.

(.*\b(one|two|three)\b){2,}



But again, I don't know if this matched 2 or 3 of the tokens. Is there a
way to get this information?



Thanks,

George S





Other related posts:

  • » [yunqa.de] Matched TOKEN COUNT for YuPcre2 - George Spears