Hi Jean-Francois,to convert it to a pretty html colored output i am using the script i have found here "http://chrisarndt.de/en/software/python/colorize.html"; it is pure python, i have found any other tricks to make the same with vim or enscript but i suppose the unique program you can be nearly sure to be in a machine that want to index python files it is python itself, i have renamed the file to rclpython in ~/.recoll directory and now the settings works well in mimemap and mimeconf, i can preview and edit the .py files and the contents are correctly indexed. I am using the icon found in the oxygen kde package, i have copied it to /usr/share/recoll/images and i am using this configs in ~/.recoll
mimeconf: [index] text/x-python = exec rclpython [icons] text/x-python = text-x-python [categories] text = \ text/x-python mimemap: .py = text/x-python mimeview: [view] text/x-python = idle %fI am using idle for editor because it is the only editor shipping with python. You can find the icon and the rclpython in the attached files, the icon set and the script to do the convert are behind gpl so you should have no probles to include it in the official distribution. Thanks for the hint.
Regards, Miguel Angel. Jean-Francois Dockes escribió:
Hi Linos, You can't just say "text/x-python = internal", this would suppose that the c++ code knows what to do with text/x-python, which it doesnt You have 2 possibilities: 1- Either add something like the following to mimemap: .py = text/plain Then python files will just be indexed as text, but you lose the ability to have a specific viewer/icons etc. 2- Or add ".py = text/x-python" to mimemap, but then you need to add a filter script for python files. Add something like the following tomimeconf:text/x-python = exec rclpython - The rclpython script (which might be written in python by the way...) would need to turn the python program into html. Minimally, this means emitting a charset meta tag, and just emitting the python text after escaping characters like "<", "&". For a simple example, take a look at rclman. In fact, I should write a script that would generically do this to any text file, maybe I'll do it for the next release. Alternatively, maybe someone already wrote a program to turn a python program into nice html, then you could just call this from the script. The regular rcl... script also do other stuff like checking for external programs and emitting specific erors etc., but this is not strictly needed, you just need to spit html Don't hesitate to come back to me if anything is unclear. If you go the script way and you like the results, I'd be glad to add it to the distribution so that it will be there for you next release... Regards, jf Linos writes:> > Hello, > i am trying to get recoll index my python source files, but i am doing anything > wrong because i cant get it to work, i have added this files to my ~/.recoll > directory. > > mimeconf> [index] > text/x-python = internal> > [icons]> text/x-python = txt> > [categories]> text = \ > text/x-python> > mimemap> .py = text/x-python> > mimeview> text/x-python = kwrite %f> > in the gui interface i can select the type to filter it in advanced search if i > want, but i dont get the files really indexed, only his names, i cant search the > content, obviously the viewer and icon are only to text the indexing function > later i will make it use better editor/icon, i have tried recreating the > complete xapiandb with recollindex -z, and other question, if i add new types > (when you help me with the correct way to do it hehehe) do i have to recreate > the complete index if the files has not been changed and are in a subdirectory > previously indexed? >
#!/usr/bin/python # -*- coding: iso-8859-1 -*- """ MoinMoin - Python source parser and colorizer """ # Based on the code from Jürgen Herman, the following changes where made: # # Mike Brown <http://skew.org/~mike/>: # - make script callable as a CGI and a Apache handler for .py files. # # Christopher Arndt <http://chrisarndt.de>: # - make script usable as a module # - use class tags and style sheet instead of <style> tags # - when called as a script, add HTML header and footer # # TODO: # # - parse script encoding and allow output in any encoding by using unicode # as intermediate __version__ = '0.3' __date__ = '2005-07-04' __license__ = 'GPL' __author__ = 'Jürgen Hermann, Mike Brown, Christopher Arndt' # Imports import cgi, string, sys, cStringIO import keyword, token, tokenize ############################################################################# ### Python Source Parser (does Hilighting) ############################################################################# _KEYWORD = token.NT_OFFSET + 1 _TEXT = token.NT_OFFSET + 2 _css_classes = { token.NUMBER: 'number', token.OP: 'operator', token.STRING: 'string', tokenize.COMMENT: 'comment', token.NAME: 'name', token.ERRORTOKEN: 'error', _KEYWORD: 'keyword', _TEXT: 'text', } _HTML_HEADER = """\ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd";> <html> <head> <title>%%(title)s</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Generator" content="colorize.py (version %s)"> </head> <body> """ % __version__ _HTML_FOOTER = """\ </body> </html> """ _STYLESHEET = """\ <style type="text/css"> pre.code { font-style: Lucida,"Courier New"; } .number { color: #0080C0; } .operator { color: #000000; } .string { color: #008000; } .comment { color: #808080; } .name { color: #000000; } .error { color: #FF8080; border: solid 1.5pt #FF0000; } .keyword { color: #0000FF; font-weight: bold; } .text { color: #000000; } </style> """ class Parser: """ Send colored python source. """ stylesheet = _STYLESHEET def __init__(self, raw, out=sys.stdout): """ Store the source text. """ self.raw = string.strip(string.expandtabs(raw)) self.out = out def format(self): """ Parse and send the colored source. """ # store line offsets in self.lines self.lines = [0, 0] pos = 0 while 1: pos = string.find(self.raw, '\n', pos) + 1 if not pos: break self.lines.append(pos) self.lines.append(len(self.raw)) # parse the source and write it self.pos = 0 text = cStringIO.StringIO(self.raw) self.out.write(self.stylesheet) self.out.write('<pre class="code">\n') try: tokenize.tokenize(text.readline, self) except tokenize.TokenError, ex: msg = ex[0] line = ex[1][0] self.out.write("<h3>ERROR: %s</h3>%s\n" % ( msg, self.raw[self.lines[line]:])) self.out.write('\n</pre>') def __call__(self, toktype, toktext, (srow,scol), (erow,ecol), line): """ Token handler. """ if 0: print "type", toktype, token.tok_name[toktype], "text", toktext, print "start", srow,scol, "end", erow,ecol, "<br>" # calculate new positions oldpos = self.pos newpos = self.lines[srow] + scol self.pos = newpos + len(toktext) # handle newlines if toktype in [token.NEWLINE, tokenize.NL]: self.out.write('\n') return # send the original whitespace, if needed if newpos > oldpos: self.out.write(self.raw[oldpos:newpos]) # skip indenting tokens if toktype in [token.INDENT, token.DEDENT]: self.pos = newpos return # map token type to a color group if token.LPAR <= toktype and toktype <= token.OP: toktype = token.OP elif toktype == token.NAME and keyword.iskeyword(toktext): toktype = _KEYWORD css_class = _css_classes.get(toktype, 'text') # send text self.out.write('<span class="%s">' % (css_class,)) self.out.write(cgi.escape(toktext)) self.out.write('</span>') def colorize_file(file=None, outstream=sys.stdout, standalone=True): """Convert a python source file into colorized HTML. Reads file and writes to outstream (default sys.stdout). file can be a filename or a file-like object (only the read method is used). If file is None, act as a filter and read from sys.stdin. If standalone is True (default), send a complete HTML document with header and footer. Otherwise only a stylesheet and a <pre> section are written. """ from os.path import basename if hasattr(file, 'read'): sourcefile = file file = None try: filename = basename(file.name) except: filename = 'STREAM' elif file is not None: try: sourcefile = open(file) filename = basename(file) except IOError: raise SystemExit("File %s unknown." % file) else: sourcefile = sys.stdin filename = 'STDIN' source = sourcefile.read() if standalone: outstream.write(_HTML_HEADER % {'title': filename}) Parser(source, out=outstream).format() if standalone: outstream.write(_HTML_FOOTER) if file: sourcefile.close() if __name__ == "__main__": import os if os.environ.get('PATH_TRANSLATED'): filepath = os.environ.get('PATH_TRANSLATED') print 'Content-Type: text/html; charset="iso-8859-1"\n' colorize_file(filepath) elif len(sys.argv) > 1: filepath = sys.argv[1] colorize_file(filepath) else: colorize_file()