Chris Smith wrote:
Maybe ... though I have forgotten my original reasoning - I guess it may have been flawed, here goes...It worse than that. Because I only have a byte offset, I need to convert that into a character offset somehow - in order to work out the position in the string which the match occurs. That probably means using preg_split rather than preg_match and utf8_strlen on the first portion of the split. All very very messy.
The context selection amounts - are only 50 bytes if I use substr(). If use utf8_substr() they would be utf-8 characters. But then when I come to plug the offset back into preg_match, I don't know the byte amount. That would mean using two utf8_substr(), one for the "pre" snippet and one for the "post" snippet, so that I could then run strlen on the match + post snippet to ascertain the new amount for offset.