[recoll-user] Re: Indexing python Files
- From: Linos <info@xxxxxxxx>
- To: recoll-user@xxxxxxxxxxxxx
- Date: Sun, 14 Sep 2008 14:44:49 +0200
Hi Jean-Francois,
to convert it to a pretty html colored output i am using the script i have
found here "http://chrisarndt.de/en/software/python/colorize.html" it is pure
python, i have found any other tricks to make the same with vim or enscript but
i suppose the unique program you can be nearly sure to be in a machine that want
to index python files it is python itself, i have renamed the file to rclpython
in ~/.recoll directory and now the settings works well in mimemap and mimeconf,
i can preview and edit the .py files and the contents are correctly indexed. I
am using the icon found in the oxygen kde package, i have copied it to
/usr/share/recoll/images and i am using this configs in ~/.recoll
mimeconf:
[index]
text/x-python = exec rclpython
[icons]
text/x-python = text-x-python
[categories]
text = \
text/x-python
mimemap:
.py = text/x-python
mimeview:
[view]
text/x-python = idle %f
I am using idle for editor because it is the only editor shipping with python.
You can find the icon and the rclpython in the attached files, the icon set and
the script to do the convert are behind gpl so you should have no probles to
include it in the official distribution. Thanks for the hint.
Regards,
Miguel Angel.
Jean-Francois Dockes escribió:
Hi Linos,
You can't just say "text/x-python = internal", this would suppose that the
c++ code knows what to do with text/x-python, which it doesnt
You have 2 possibilities:
1- Either add something like the following to mimemap:
.py = text/plain
Then python files will just be indexed as text, but you lose the ability
to have a specific viewer/icons etc.
2- Or add ".py = text/x-python" to mimemap, but then you need to add a
filter script for python files. Add something like the following to
mimeconf:
text/x-python = exec rclpython
- The rclpython script (which might be written in python by the way...)
would need to turn the python program into html. Minimally, this means
emitting a charset meta tag, and just emitting the python text after
escaping characters like "<", "&". For a simple example, take a look at
rclman. In fact, I should write a script that would generically do this
to any text file, maybe I'll do it for the next release.
Alternatively, maybe someone already wrote a program to turn a python
program into nice html, then you could just call this from the script.
The regular rcl... script also do other stuff like checking for external
programs and emitting specific erors etc., but this is not strictly
needed, you just need to spit html
Don't hesitate to come back to me if anything is unclear. If you go the
script way and you like the results, I'd be glad to add it to the
distribution so that it will be there for you next release...
Regards,
jf
Linos writes:
>
> Hello,
> i am trying to get recoll index my python source files, but i am doing anything
> wrong because i cant get it to work, i have added this files to my ~/.recoll
> directory.
>
> mimeconf
> [index]
> text/x-python = internal
>
> [icons]
> text/x-python = txt
>
> [categories]
> text = \
> text/x-python
>
> mimemap
> .py = text/x-python
>
> mimeview
> text/x-python = kwrite %f
>
> in the gui interface i can select the type to filter it in advanced search if i
> want, but i dont get the files really indexed, only his names, i cant search the
> content, obviously the viewer and icon are only to text the indexing function
> later i will make it use better editor/icon, i have tried recreating the
> complete xapiandb with recollindex -z, and other question, if i add new types
> (when you help me with the correct way to do it hehehe) do i have to recreate
> the complete index if the files has not been changed and are in a subdirectory
> previously indexed?
>
#!/usr/bin/python
# -*- coding: iso-8859-1 -*-
"""
MoinMoin - Python source parser and colorizer
"""
# Based on the code from Jürgen Herman, the following changes where made:
#
# Mike Brown <http://skew.org/~mike/>:
# - make script callable as a CGI and a Apache handler for .py files.
#
# Christopher Arndt <http://chrisarndt.de>:
# - make script usable as a module
# - use class tags and style sheet instead of <style> tags
# - when called as a script, add HTML header and footer
#
# TODO:
#
# - parse script encoding and allow output in any encoding by using unicode
# as intermediate
__version__ = '0.3'
__date__ = '2005-07-04'
__license__ = 'GPL'
__author__ = 'Jürgen Hermann, Mike Brown, Christopher Arndt'
# Imports
import cgi, string, sys, cStringIO
import keyword, token, tokenize
#############################################################################
### Python Source Parser (does Hilighting)
#############################################################################
_KEYWORD = token.NT_OFFSET + 1
_TEXT = token.NT_OFFSET + 2
_css_classes = {
token.NUMBER: 'number',
token.OP: 'operator',
token.STRING: 'string',
tokenize.COMMENT: 'comment',
token.NAME: 'name',
token.ERRORTOKEN: 'error',
_KEYWORD: 'keyword',
_TEXT: 'text',
}
_HTML_HEADER = """\
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>%%(title)s</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="Generator" content="colorize.py (version %s)">
</head>
<body>
""" % __version__
_HTML_FOOTER = """\
</body>
</html>
"""
_STYLESHEET = """\
<style type="text/css">
pre.code {
font-style: Lucida,"Courier New";
}
.number {
color: #0080C0;
}
.operator {
color: #000000;
}
.string {
color: #008000;
}
.comment {
color: #808080;
}
.name {
color: #000000;
}
.error {
color: #FF8080;
border: solid 1.5pt #FF0000;
}
.keyword {
color: #0000FF;
font-weight: bold;
}
.text {
color: #000000;
}
</style>
"""
class Parser:
""" Send colored python source.
"""
stylesheet = _STYLESHEET
def __init__(self, raw, out=sys.stdout):
""" Store the source text.
"""
self.raw = string.strip(string.expandtabs(raw))
self.out = out
def format(self):
""" Parse and send the colored source.
"""
# store line offsets in self.lines
self.lines = [0, 0]
pos = 0
while 1:
pos = string.find(self.raw, '\n', pos) + 1
if not pos: break
self.lines.append(pos)
self.lines.append(len(self.raw))
# parse the source and write it
self.pos = 0
text = cStringIO.StringIO(self.raw)
self.out.write(self.stylesheet)
self.out.write('<pre class="code">\n')
try:
tokenize.tokenize(text.readline, self)
except tokenize.TokenError, ex:
msg = ex[0]
line = ex[1][0]
self.out.write("<h3>ERROR: %s</h3>%s\n" % (
msg, self.raw[self.lines[line]:]))
self.out.write('\n</pre>')
def __call__(self, toktype, toktext, (srow,scol), (erow,ecol), line):
""" Token handler.
"""
if 0:
print "type", toktype, token.tok_name[toktype], "text", toktext,
print "start", srow,scol, "end", erow,ecol, "<br>"
# calculate new positions
oldpos = self.pos
newpos = self.lines[srow] + scol
self.pos = newpos + len(toktext)
# handle newlines
if toktype in [token.NEWLINE, tokenize.NL]:
self.out.write('\n')
return
# send the original whitespace, if needed
if newpos > oldpos:
self.out.write(self.raw[oldpos:newpos])
# skip indenting tokens
if toktype in [token.INDENT, token.DEDENT]:
self.pos = newpos
return
# map token type to a color group
if token.LPAR <= toktype and toktype <= token.OP:
toktype = token.OP
elif toktype == token.NAME and keyword.iskeyword(toktext):
toktype = _KEYWORD
css_class = _css_classes.get(toktype, 'text')
# send text
self.out.write('<span class="%s">' % (css_class,))
self.out.write(cgi.escape(toktext))
self.out.write('</span>')
def colorize_file(file=None, outstream=sys.stdout, standalone=True):
"""Convert a python source file into colorized HTML.
Reads file and writes to outstream (default sys.stdout). file can be a
filename or a file-like object (only the read method is used). If file is
None, act as a filter and read from sys.stdin. If standalone is True
(default), send a complete HTML document with header and footer. Otherwise
only a stylesheet and a <pre> section are written.
"""
from os.path import basename
if hasattr(file, 'read'):
sourcefile = file
file = None
try:
filename = basename(file.name)
except:
filename = 'STREAM'
elif file is not None:
try:
sourcefile = open(file)
filename = basename(file)
except IOError:
raise SystemExit("File %s unknown." % file)
else:
sourcefile = sys.stdin
filename = 'STDIN'
source = sourcefile.read()
if standalone:
outstream.write(_HTML_HEADER % {'title': filename})
Parser(source, out=outstream).format()
if standalone:
outstream.write(_HTML_FOOTER)
if file:
sourcefile.close()
if __name__ == "__main__":
import os
if os.environ.get('PATH_TRANSLATED'):
filepath = os.environ.get('PATH_TRANSLATED')
print 'Content-Type: text/html; charset="iso-8859-1"\n'
colorize_file(filepath)
elif len(sys.argv) > 1:
filepath = sys.argv[1]
colorize_file(filepath)
else:
colorize_file()
Hi Linos, You can't just say "text/x-python = internal", this would suppose that the c++ code knows what to do with text/x-python, which it doesnt You have 2 possibilities: 1- Either add something like the following to mimemap: .py = text/plain Then python files will just be indexed as text, but you lose the ability to have a specific viewer/icons etc. 2- Or add ".py = text/x-python" to mimemap, but then you need to add a filter script for python files. Add something like the following tomimeconf:
text/x-python = exec rclpython - The rclpython script (which might be written in python by the way...) would need to turn the python program into html. Minimally, this means emitting a charset meta tag, and just emitting the python text after escaping characters like "<", "&". For a simple example, take a look at rclman. In fact, I should write a script that would generically do this to any text file, maybe I'll do it for the next release. Alternatively, maybe someone already wrote a program to turn a python program into nice html, then you could just call this from the script. The regular rcl... script also do other stuff like checking for external programs and emitting specific erors etc., but this is not strictly needed, you just need to spit html Don't hesitate to come back to me if anything is unclear. If you go the script way and you like the results, I'd be glad to add it to the distribution so that it will be there for you next release... Regards, jf Linos writes:> > Hello, > i am trying to get recoll index my python source files, but i am doing anything > wrong because i cant get it to work, i have added this files to my ~/.recoll > directory. > > mimeconf
> [index] > text/x-python = internal> > [icons]
> text/x-python = txt> > [categories]
> text = \ > text/x-python> > mimemap
> .py = text/x-python> > mimeview
> text/x-python = kwrite %f> > in the gui interface i can select the type to filter it in advanced search if i > want, but i dont get the files really indexed, only his names, i cant search the > content, obviously the viewer and icon are only to text the indexing function > later i will make it use better editor/icon, i have tried recreating the > complete xapiandb with recollindex -z, and other question, if i add new types > (when you help me with the correct way to do it hehehe) do i have to recreate > the complete index if the files has not been changed and are in a subdirectory > previously indexed? >
- Follow-Ups:
- [recoll-user] Re: Indexing python Files
- From: Jean-Francois Dockes
- [recoll-user] Re: Indexing python Files
- References:
- [recoll-user] Indexing python Files
- From: Linos
- [recoll-user] Re: Indexing python Files
- From: Jean-Francois Dockes
- [recoll-user] Indexing python Files
Other related posts:
- » [recoll-user] Indexing python Files
- » [recoll-user] Re: Indexing python Files
- » [recoll-user] Re: Indexing python Files
- » [recoll-user] Re: Indexing python Files
- » [recoll-user] Re: Indexing python Files
- » [recoll-user] Re: Indexing python Files
- » [recoll-user] Re: Indexing python Files
