[recoll-user] Re: Getting recoll working via python module behind apache

  • From: Jean-Francois Dockes <jfd@xxxxxxxxxx>
  • To: recoll-user@xxxxxxxxxxxxx
  • Date: Wed, 23 Apr 2014 17:11:33 +0200

Ciaran Farrell writes:
 > On Wed, 23 Apr 2014 16:02:49 +0200
 > Jean-Francois Dockes <jfd@xxxxxxxxxx> wrote:
 > 
 > > Ciaran Farrell writes:
 > >  > Hi again,
 > >  > 
 > >  > I have recoll working fine when called from a web.py script (i.e.
 > >  > local user starts script which uses ~/.recoll.conf from local
 > >  > user's home directory).
 > >  > 
 > >  > I now want my web.py script working as wsgi behind apache. I have
 > >  > this working. However, when my server script tries to connect to
 > >  > recoll (recoll.connect(confdir='/home/localuser/.recoll/')) I get:
 > >  > 
 > >  > EnvironmentError: Configuration could not be built:, referer:
 > >  > https://myscript.com/mypage.html
 > >  > Explicitly specified configuration directory must exist (won't be
 > >  > automatically created). Use mkdir first, referer:
 > >  > https://myscript.com/mypage.html
 > >  > 
 > >  > I also tried copying the entire .recoll folder to /tmp and
 > >  > chmodding it to 777 to make sure it wasn't (obviously) a
 > >  > permissions problem. Still doesn't work. The apache user is wwwrun
 > >  > (no login according to /etc/passwd).
 > >  > 
 > >  > Given that I can't create a .recoll directory in wwwrun's home (as
 > >  > there isn't any), what could I do to get it working?
 > >  > 
 > >  > Ciaran
 > > 
 > > I am really not sure of what is happening here. As a test, I'd first
 > > try to access a file inside the /home/localuser/.recoll directory
 > > from the python program (independantly of recoll), to check that this
 > > is really Recoll having trouble. 
 > 
 > From the python console it seems to work. That is python running as
 > user cfarrell though. The python web.py script is located in my home
 > directory too. If I run it locally (python mywebpyscript.py) it starts
 > a webserver on port 8080. I make ajax queries back to that script.
 > Those queries are turned into recoll queries. This works perfectly.
 > 
 > However, instead of having the python script running as a local user on
 > port 8080 I want to have it running on port 80. Typically, using wsgi
 > (apache's mod_wsgi) you can write a vhost config to have apache pass
 > requests coming in on port 80 to the web.py script in the home
 > directory. However, the user for apache is wwwrun with group www (on
 > suse). I assume that it looks for recoll in /home/wwwrun/.recoll and
 > doens't find it (as wwwrun has no login shell). That's why I put the
 > recoll config directory in /tmp instead and called it with
 > recoll.connect(confdir='/tmp/.recoll').
 > 
 > Incidentally, when I use recoll.connect(confdir='/tmp') the error
 > message is different: 'failed to open index' than when I use
 > recoll.connect(confdir='/tmp/.recoll'). There, the error is something
 > like the directory doesn't exist (it does). Use mkdir before connecting.
 > 
 > Where exactly is the index that recoll wants to open? Is it the
 > xapiandb/ directory inside .recoll? What permissions is it expecting on
 > the relevant files?

Recoll is expecting to be able to read/write anything inside the
configuration directory (the Python "confdir").

The message you are getting indicates that the directory itself does not
exist. This is the result of an access(2) test with 0 mode, so not a
permission issue, really an existence one.

Once the directory is tested to exist, recoll will try to open the config
files (I think that it's not an error for these to not exist, as
everything has default values), and then the index (normally the xapiandb
subdirectory inside the configuration directory, but this may be located
elsewhere if the dbdir parameter is set in the config).

The fact that you get an index open error with /tmp and a config open error
one with /tmp/.recoll plays well with my idea that apache may be seeing a
different file tree (my chroot idea). But you write that Apache runs
normally, so this must be something else.

 > >>> fd = open('/home/cfarrell/.recoll/recoll.conf','r')
 > >>> x = fd.read()
 > >>> x
 > >>>'# The system-wide configuration files for recoll are located
 > >>>in:\n#   /usr/share/recoll/examples\n# The default configuration
 > >>>files
 > >>>are commented, you should take a look\n# at them for an explanation
 > >>>of
 > >>>what can be set (you could also take a look\n# at the manual
 > >>>instead).\n# Values set in this file will override the system-wide
 > >>>values for the file\n# with the same name in the central directory.
 > >>>The
 > >>>syntax for setting\n# values is identical.\n\ntopdirs
 > >>>= /home/cfarrell/comema/processed\nidxflushmb = 50\nloglevel =
 > >>>1\nindexedmimetypes = application/pdf  \nidxmetastoredlen = 1500 #
 > >>>see
 > >>>http://is.gd/lxxu2P\n'
 > >>> fd.close()
 > >>>


Did you try to run this code inside a wsgi script ?

 > > I don't know much either about running Apache. Is there any chance
 > > that httpd might run chroot'ed ?
 > 
 > Not in this case...

What a pity :)

jf

Other related posts: