[dokuwiki] Re: preg_match() Compilation failed: regular expression too large...

  • From: Danjer <Danjer@xxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Fri, 01 Jun 2007 10:58:24 +0200

Hi,

I am still working on it. But, I didn't find the best way to replace p_get_instructions. Each time I recall p_get_instruction, $this->_getCompoundedRegex() grow. I think p_get_instructions is not design to be call like dokutexit do.

Jaume, I'll try to provide you a new version next week.

Regards,
Danjer

Harry Fuecks wrote:
No confessions ;)

Think your best best is help yourself. In
/var/www/lledoner/dokuwiki/inc/parser/lexer.php on line 115,
_temporarily_ replace this

if (! preg_match($this->_getCompoundedRegex(), $subject, $matches)) {

with

$regex = $this->_getCompoundedRegex();
if (! preg_match($regex, $subject, $matches)) {
   file_put_contents('/tmp/regexes.txt', $regex, FILE_APPEND);

That should hopefully catch the problem regex and provide a clue to
where it's coming from.

Drop results here but make sure you restore the old version of the script.

On 5/31/07, Jaume Obrador <obrador@xxxxxxxxxxxx> wrote:
Any news about that? I'm still blind.

Thanks in advance,
Jaume.


El dt 22 de 05 del 2007 a les 14:34 +0200, en/na Danjer va escriure:
> Hi all,
>
> I didn't remember doing something weird with the lexer. Anyways, the
> lexer is always called with that function p_get_instructions($text). In
> Jaume's case, $text is always filled by rawWiki($id).
> But I guess, the stinky thing is for each link found in the main
> document p_get_instructions(rawWiki($id)) is called with the link id.
>
> I believe that I didn't clean up the instructions results set, properly.
>
> Few month ago a ask for that :
> //www.freelists.org/archives/dokuwiki/09-2006/msg00299.html
>
> Harry, what do you think about that ?
>
> Cordialement,
> Danjer
>
> Harry Fuecks wrote:
> > From a brief scan of the code (found at
> > http://danjer.doudouke.org/tech/dokutexit) in particular
> > class.texitrender.php, it invoking the parser directly and _may_ have > > passed something to the lexer which depends on the input document. I'd
> > discuss this with the author...
> >
> > On 5/22/07, Jaume Obrador <obrador@xxxxxxxxxxxx> wrote:
> >> The only plugin I have installed and it's used when this error occurs is > >> dokutexit, which parses dokuwiki syntax into LaTeX to generate a PDF
> >> file.
> >>
> >> Thnks.
> >> Jaume Obrador.
> >>
> >>
> >> El dl 21 de 05 del 2007 a les 23:47 +0200, en/na Harry Fuecks va
> >> escriure:
> >> > On 5/21/07, Jaume Obrador <obrador@xxxxxxxxxxxx> wrote:
> >> > > Hi, I try to use dokuwiki together with dokutexit plugin to
> >> generate a
> >> > > PDF file from some pages. I may say that it works great with a few > >> > > pages, but the problem comes when I try to generate a PDF from a
> >> larger
> >> > > number of pages, total Kb of those pages together are 209K. I get
> >> the
> >> > > following error 22 times:
> >> > >
> >> > > Warning: preg_match() [function.preg-match]: Compilation
> >> failed:
> >> > >         regular expression too large at offset 0
> >> > >         in /var/www/lledoner/dokuwiki/inc/parser/lexer.php on
> >> line 115
> >> > >
> >> > > I increased the default php memory_limit from 8M to 32M, with no
> >> good
> >> > > results.
> >> >
> >> > That's a limitation of PCRE not PHP - see http://www.pcre.org/pcre.txt > >> > section "LIMITATIONS": "The maximum length of a compiled pattern is
> >> > 65539 (sic) bytes  if  PCRE
> >> >        is compiled with the default internal linkage size of 2."
> >> >
> >> > It goes on to say you can compile PCRE differently to get round this > >> > but that will probably get interesting with PHP, where PCRE is part of
> >> > the core distribution.
> >> >
> >> > Without knowing more I'd say this problem _isn't_ directly caused by
> >> > trying to generate a PDF from many pages, because the size of the
> >> > parser regex should not be dependent on the size of the document(s)
> >> > you are parsing.
> >> >
> >> > To have this error possible causes might be;
> >> >
> >> > - a very large smileys or acronyms file - last time I checked these > >> > get turned into a single regular expression so a big input file could
> >> > lead to a regex > 65539 bytes. Normally though if the problem was
> >> > here, you'd see it on every page, not just the PDF version
> >> >
> >> > - a plugin which _IS_ using the document to build further regular
> >> > expressions, in some way: list your plugins...
> >> >
> >> > Hope that helps
> >>
> >> --
> >> DokuWiki mailing list - more info at
> >> http://wiki.splitbrain.org/wiki:mailinglist
> >>

--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: