Login or Sign up

Django-haystack with whoosh or xapian, get around the LockErrors for a site with significant traffic

Posted by: skyl on Oct. 20, 2009

Whoosh is touted as the easy to install backend for django-haystack. It is really easy to get going and the django-haystack documentation is great. Today you can basically just:

(env)$ pip install whoosh
(env)$ pip install -e git+git://github.com/toastdriven/django-haystack.git#egg=django-haystack

And, off you go.

But, for a multi-threaded WSGIDaemonProcess you might find problems like:

http://groups.google.com/group/django-haystack/browse_thread/thread/40882b1b6d89b66a

So, this site has some serious traffic (not skyl.org, I mean the one the one that I'm working on ;) ). First, let's switch to xapian. You can find the official haystack docs for installing the xapian backend:

http://haystacksearch.org/docs/installing_search_engines.html#xapian

If you're on a great OS like ubuntu however, you could get away with just installing what you need from the package manager:

$ sudo aptitude install python-xapian

This should get xapian and the python bindings for you. But, you will need xapian-haystack too.

http://github.com/notanumber/xapian-haystack

The README there says that you can use pip or easy_install but I had no luck running those commands, instead I resorted to good old (we are working on the pypi issue in irc #haystack right now .. ):

git clone git://github.com/notanumber/xapian-haystack.git
cd xapian-haystack/
(env)$python setup.py install # being in a virtualenv is good!

In my settings, I point to xapian as my backend instead of whoosh:

here = os.path.dirname(os.path.abspath(__file__))
HAYSTACK_SEARCH_ENGINE = 'xapian'
HAYSTACK_XAPIAN_PATH = here + '/search_index'
#HAYSTACK_SEARCH_ENGINE = 'whoosh'
#HAYSTACK_WHOOSH_PATH = here + '/search_index'

HAYSTACK_SITECONF = 'myproject.search_sites'

Let's try something simple for search_sites.py to see if we have lift-off:

from haystack import site
from myproject.pages.models import Page

site.register(Page)

Now, run something like ./manage.py shell to see if everything can can get imported and working correctly. If you see something like:

django.core.exceptions.ImproperlyConfigured: 'xapian' isn't an available search backend. Available options are: 'dummy', 'solr', 'whoosh'

Well, then you might be trying this today or too close to when I published this. xapian-haystack and django-haystack trunks are not playing nice today as django-haystack just Oct 18th added the new SQ objects which xapian-haystack does not yet support. EDIT :

skyl: Good post. Just one thing; the log_query ImportError isn't related to the SQ change. The log_query method was added as part of another change to Haystack and is supported by xapian-haystack.

So, this is rocket science slightly beyond my understanding. How can I mark this up so that I can <strike> the last line? Anyhoo, rolling back works for today.

Remove the offending settings so that you can get into a ./manage.py shell and try to import xapian_backend, oops:

In [1]: import xapian_backend
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

/home/skylar/project/singapore/Oktosys-CMS/myproject/<ipython console> in <module>()

/home/skylar/project/singapore/env/lib/python2.6/site-packages/xapian_backend.py in <module>()
     30 from django.utils.encoding import smart_unicode, force_unicode
     31
---> 32 from haystack.backends import BaseSearchBackend, BaseSearchQuery, log_query
     33 from haystack.exceptions import MissingDependency
     34 from haystack.fields import DateField, DateTimeField, IntegerField, FloatField, BooleanField, MultiValueField

ImportError: cannot import name log_query

Okay, we can rollback django-haystack, find out where it is importing from:

>>> import haystack
>>> haystack.__file__

Provided that you got a version from git you can reset --hard to a working revision.

(env)skylar@ABC255:~/env/src/django-haystack/haystack$\
> git pull
(env)skylar@ABC255:~/env/src/django-haystack/haystack$\
> git reset --hard 2389ad090c9e4bfb069c4cfd9c94b5de84a6d38d

All ready to go? Activate xapian in the settings again and rock out!

Oh wait, under load with a multi-threaded, multi-process server we can run into LockError exceptions as things are trying to update and reindex simultaneously, bummer. There is a ticket so that the xapian index DB will handle this situation more gracefully. I've been told that this ticket is not a high priority b/c the work around is not that hard.

We can stop the signals from the post_save and post_delete by subclassing haystack.indexes.SearchIndex (I would have thought indices but what do I know? ) you can check the haystack tutorial. Then, since we are not reindexing with every update we can reindex with a cronjob. Your search_sites.py might look something like this:

from haystack import site
from haystack import indexes

from myproject.pages.models import Page

class NoSignalSearchIndex(indexes.SearchIndex):
    """
    A subclass of haystack's default SearchIndex that overrides the save
    and delete signals to prevent them from firing
    """

    def _setup_save(self, model):
        pass

    def _setup_delete(self, model):
        pass

    def _teardown_save(self, model):
        pass

    def _teardown_delete(self, model):
        pass

class MyIndex(NoSignalSearchIndex):
    text = indexes.CharField(document=True, use_template=True)

site.register(Page, MyIndex)

And, of course create the template then in templates/search/indexes/page/page_text.txt, something like:

{{ object.slug }}
{{ object.title }}
{{ object.html }}

Hopefully now you can run ./manage.py reindex with impunity. Let's create a cronjob to run it regularly:

#!/bin/bash

# activate virtual environment
source /home/skyl/envs/foo-env/bin/activate

cd /path/to/proj
python manage.py reindex

let's call this reindex.sh. Then we can run crontab -e from the command line and insert:

# m h  dom mon dow   command
* * * * * /path/to/reindex.sh

If we want to reindex every minute.

Whoosh! That was longer and harder than I was anticipating. Check back in a month or two and maybe it is all pip and easy_install and requiring no configuration!

Comments on This Post:

Please Login (or Sign Up) to leave a comment