Whoosh is touted as the easy to install backend for django-haystack. It is really easy to get going and the django-haystack documentation is great. Today you can basically just:
(env)$ pip install whoosh
(env)$ pip install -e git+git://github.com/toastdriven/django-haystack.git#egg=django-haystack
And, off you go.
But, for a multi-threaded WSGIDaemonProcess you might find problems like:
http://groups.google.com/group/django-haystack/browse_thread/thread/40882b1b6d89b66a
So, this site has some serious traffic (not skyl.org, I mean the one the one that I'm working on ;) ). First, let's switch to xapian. You can find the official haystack docs for installing the xapian backend:
http://haystacksearch.org/docs/installing_search_engines.html#xapian
If you're on a great OS like ubuntu however, you could get away with just installing what you need from the package manager:
$ sudo aptitude install python-xapian
This should get xapian and the python bindings for you. But, you will need xapian-haystack too.
http://github.com/notanumber/xapian-haystack
The README there says that you can use pip or easy_install but I had no luck running those commands, instead I resorted to good old (we are working on the pypi issue in irc #haystack right now .. ):
git clone git://github.com/notanumber/xapian-haystack.git
cd xapian-haystack/
(env)$python setup.py install # being in a virtualenv is good!
In my settings, I point to xapian as my backend instead of whoosh:
here = os.path.dirname(os.path.abspath(__file__))
HAYSTACK_SEARCH_ENGINE = 'xapian'
HAYSTACK_XAPIAN_PATH = here + '/search_index'
#HAYSTACK_SEARCH_ENGINE = 'whoosh'
#HAYSTACK_WHOOSH_PATH = here + '/search_index'
HAYSTACK_SITECONF = 'myproject.search_sites'
Let's try something simple for search_sites.py to see if we have lift-off:
from haystack import site
from myproject.pages.models import Page
site.register(Page)
Now, run something like ./manage.py shell to see if everything can can get imported and working correctly. If you see something like:
django.core.exceptions.ImproperlyConfigured: 'xapian' isn't an available search backend. Available options are: 'dummy', 'solr', 'whoosh'
Well, then you might be trying this today or too close to when I published this. xapian-haystack and django-haystack trunks are not playing nice today as django-haystack just Oct 18th added the new SQ objects which xapian-haystack does not yet support. EDIT :
skyl: Good post. Just one thing; the log_query ImportError isn't related to the SQ change. The log_query method was added as part of another change to Haystack and is supported by xapian-haystack.
So, this is rocket science slightly beyond my understanding. How can I mark this up so that I can <strike> the last line? Anyhoo, rolling back works for today.
Remove the offending settings so that you can get into a ./manage.py shell and try to import xapian_backend, oops:
In [1]: import xapian_backend
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/home/skylar/project/singapore/Oktosys-CMS/myproject/<ipython console> in <module>()
/home/skylar/project/singapore/env/lib/python2.6/site-packages/xapian_backend.py in <module>()
30 from django.utils.encoding import smart_unicode, force_unicode
31
---> 32 from haystack.backends import BaseSearchBackend, BaseSearchQuery, log_query
33 from haystack.exceptions import MissingDependency
34 from haystack.fields import DateField, DateTimeField, IntegerField, FloatField, BooleanField, MultiValueField
ImportError: cannot import name log_query
Okay, we can rollback django-haystack, find out where it is importing from:
>>> import haystack
>>> haystack.__file__
Provided that you got a version from git you can reset --hard to a working revision.
(env)skylar@ABC255:~/env/src/django-haystack/haystack$\
> git pull
(env)skylar@ABC255:~/env/src/django-haystack/haystack$\
> git reset --hard 2389ad090c9e4bfb069c4cfd9c94b5de84a6d38d
All ready to go? Activate xapian in the settings again and rock out!
Oh wait, under load with a multi-threaded, multi-process server we can run into LockError exceptions as things are trying to update and reindex simultaneously, bummer. There is a ticket so that the xapian index DB will handle this situation more gracefully. I've been told that this ticket is not a high priority b/c the work around is not that hard.
We can stop the signals from the post_save and post_delete by subclassing haystack.indexes.SearchIndex (I would have thought indices but what do I know? ) you can check the haystack tutorial. Then, since we are not reindexing with every update we can reindex with a cronjob. Your search_sites.py might look something like this:
from haystack import site
from haystack import indexes
from myproject.pages.models import Page
class NoSignalSearchIndex(indexes.SearchIndex):
"""
A subclass of haystack's default SearchIndex that overrides the save
and delete signals to prevent them from firing
"""
def _setup_save(self, model):
pass
def _setup_delete(self, model):
pass
def _teardown_save(self, model):
pass
def _teardown_delete(self, model):
pass
class MyIndex(NoSignalSearchIndex):
text = indexes.CharField(document=True, use_template=True)
site.register(Page, MyIndex)
And, of course create the template then in templates/search/indexes/page/page_text.txt, something like:
{{ object.slug }}
{{ object.title }}
{{ object.html }}
Hopefully now you can run ./manage.py reindex with impunity. Let's create a cronjob to run it regularly:
#!/bin/bash
# activate virtual environment
source /home/skyl/envs/foo-env/bin/activate
cd /path/to/proj
python manage.py reindex
let's call this reindex.sh. Then we can run crontab -e from the command line and insert:
# m h dom mon dow command
* * * * * /path/to/reindex.sh
If we want to reindex every minute.
Whoosh! That was longer and harder than I was anticipating. Check back in a month or two and maybe it is all pip and easy_install and requiring no configuration!
By far, the easiest way to install xapian on OSX, especially with 64bit python is simply:
sudo brew install xapian --python