If elastisearch cost lot to develop, i think one can use Yacy
http://www.yacy.net/en/ for a simple search, it can easily deal with even a
TB of data as it has got great indexing, cost of setting it up will be
minimal too. But if some customized search must be done, then I think one
has to go for customized solution.
On Sat, Oct 8, 2016 at 2:37 AM, Shrinivasan T <tshrinivasan@xxxxxxxxx>
wrote:
Hi,
Many tamil scholars are looking for a search engine for tamil literatures.
They often look for the following things.
1. search for any word in all literature. highlight the line of
occurrence, if possible one line above and below.
2. frequency of any given words
3. major used, minor used words by any given author
The literature are available in text format here.
http://www.projectmadurai.org/pmworks.html
There are people who scraps tamil websites regularly.
They have around 180 GB of tamil in plain text format.
When they do a grep for any word it tools 8-10 hours on normal desktop.
I think we can use bigdata tools for them.
Can we use elasticsearch/druid for their purpose?
How to import the plaintext to these tools?
share your thoughts on this.
--
Regards,
T.Shrinivasan
My Life with GNU/Linux : http://goinggnu.wordpress.com
Free E-Magazine on Free Open Source Software in Tamil : http://kaniyam.com
Get Free Tamil Ebooks for Android, iOS, Kindle, Computer :
http://FreeTamilEbooks.com
_____________________________________
ILUGC List: //www.freelists.org/list/ilugc
ILUGC Web: http://ilugc.in/