Using Zend Search Lucene in a symfony app

by Dave Dash 25Aug06

[tags]zend, search, lucene, zend search lucene, zsl, symfony,php[/tags]

If you're like me you've probably followed the Askeet tutorial on Search in order to create a decent search engine for your web app. It's fairly straight forward, but they hinted that when Zend Search Lucene (ZSL) is released, that might be the way to go. Well we are in luck, ZSL is available, so let's just dive right in.

If you aren't using [symfony] have a look at this article from the Zend Developer Zone. It covers just enough to get you started. If you are using [symfony], just follow along and we'll get you where you need to go.

Obtaining Zend Search Lucene

First download the Zend Framework (ZF). The Zend Framework is supposed to be fairly "easy" in terms of installation. So let's put that to the test. Open your ZF archive. Copy Zend.php and Zend/Search to your [symfony] project's library folder:

cp Zend.php $SF_PROJECT/lib              
mkdir $SF_PROJECT/lib/Zend
cp -r Zend/Search $SF_PROJECT/lib/Zend
cp Zend/Exception.php $SF_PROJECT/lib/Zend                 
chmod -R a+r $SF_PROJECT/lib/Zend*

Index Something

We'll deviate slightly from food themed tutorials and do something generic. Let's try a user search where we can find a user by their name or email address. It's fairly simple to accomplish, and hardly requires the use of ZSL, but by using ZSL we can easily extend it to do a full-text search of a user's profile or any other textual data.

Each "thing" stored in the index is a "document" in ZSL, specifically a Zend_Search_Lucene_Document. Each document then consists of several "fields" (Zend_Search_Lucene_Field objects). In our example, our document will be an individual user and the fields will be relevant attributes of the user (username, first name, last name, email, the text of their profile).

We're going to write a general re-indexing tool. Something that will index all users.

In our userActions class let's add the following action:

The code should be fairly easy to follow. First of all we're requiring the necessary libraries for Lucene. The next line we are creating the index:

app_search_user_index_file is a symfony configuration that you define in your app.yml. It defines which file you want to use for your index. /tmp/lucene.user.index works for our purposes. The second parameter tells Lucene we are creating a new index.

We then loop through all the users and for each user create a document. For all the search relevant attributes that a user might have we add a field into the document. Note the last field:

By default search is made for the "contents" field. So in this example we want people to be able to type in someone's name, email, username without having to specify what field we're searching for.

Find those users

Finding the user's is equally as straight-forward. We make a new action called search:

If we have a query, open the [ZSL] index (note that we only have one parameter here). Run the find method to find our query and store it to the $hits array. Note that our query was cleaned with strtolower, since [ZSL] is case sensitive.

The template takes care of the rest:

Fairly simple... but it could use some cleaning up (enjoy).

What about new users?

Regularly reindexing might be nice in terms of having an optimized search index, but its lousy if you want to be able to search the network immediately when new people join on. So why not automatically re-index each user every time they are created or everytime one of their indexed components is summoned?

This should be fairly simple by adding to the User class:

We have an attribute called $reindex. When it is false we don't need to worry about indexes. When something significant changes, like an update to your name or email address, then we set $reindex to true. Then when we save:

We're calling a new function called generateZSLDocument. It might look familiar:

Now, whenever a user is updated, so is our index. Additionally we can modify our reindex action:

That's a lot easier to deal with.

...and beyond

Hope this article helps some of you jumpstart your [symfony] apps. Really cool, easy to implement search is here. We no longer have to stick with shoddy solutions like HT://Dig or spend time rolling our own full text search, as the symfony team diligently showed us we could. But there is a lot more ground to cover. Including optimization techniques and best practices.

Let me know what you think, and if you use this in any of your apps.


Where am I?

This is a single entry in the weblog.

"Using Zend Search Lucene in a symfony app" is filed under programming, reviewsby.us and symfony. It was published in August 2006.

August 2006
M T W T F S S
« Jul   Sep »
 123456
78910111213
14151617181920
21222324252627
28293031  

need more help

If you found our tutorials and articles to be useful, but are still looking for more hands on help, consider hiring us. Find out more about how Spindrop can help you.

 

25 Responses to “Using Zend Search Lucene in a symfony app”


  1. 1 Hector Posted August 30th, 2006 - 10:20 am

    In public function save I had to change: $index->find(’id:’. $this->getId()); to $hits = $index->find(’id:’. $this->getId());

    Worked great otherwise – thanks for the tutorial!

  2. 2 Dave Dash Posted August 30th, 2006 - 11:47 am

    Oops, you are correct hector, I think I fixed my own code, but not the tutorial, I’ll fix that now.

  3. 3 joesimms Posted August 31st, 2006 - 4:43 pm

    Great article many thanks.

    is it possible to combine this with ranges such as date from and date to, and price less than, price over etc. Also can you sort the result other than by relevancy. I know these are advanced topics, but they are the reasons why i am sticking to writing my own full text search functionality from a database as db query can do this.

    I am not usre if such features are avaiable in ZLS yet, but if they are it would be great if you could add an example here.

    Thanks again

    Joe

  4. 4 Dave Dash Posted August 31st, 2006 - 5:23 pm

    Joe,

    Excellent question. First of all range queries are doable, of course they aren’t documented in Zend Search Lucene as near as I can tell. However, looking at the Lucene documentation from Apache, in our example above:

    firstname:{chris TO dave}

    will match the following (scores included):

    * 0.39374274034618  chrish
    * 0.14888978641659  davedash2
    * 0.14888978641659  davedash3
    * 0.14888978641659  davedash
    

    I honestly must say, that this is news to me, I didn’t think it would work, but it does, and that’s awesome!

    As far as sorting goes, I personally would handle the sorting within the application… since I use propel I could take the $hits array and do something quick and dirty like:

    $ids = array();
    foreach ($hits AS $hit) 
    {
      array_push($ids,$hit->user_id);
    }
    
    $sql = "SELECT user.* FROM user WHERE id IN (" . join(',', $ids) .") ORDER BY user.LASTNAME";
    
    $stmt = $con->createStatement();
    $rs = $stmt->executeQuery($sql, ResultSet::FETCHMODE_NUM);
    
    return User::populateObjects($rs);
    

    That’s a bit rough around the edges, and kinda clunky (It’s using join it must be clunky), but it will do you right. Let me know if that works.

  5. 5 PeterVG Posted September 8th, 2006 - 7:13 pm

    Dave. Thanks for a really useful tutorial. It helped me off to a great start on integrating Zend Search in my Symfony app.

    PLEASE NOTE: I tried to use the $hit->id value in my search results (e.g. to create a link to the detailed User record: ‘/user/show/id/1′) but I discovered that $hit->id appears to be a reserved property within Zend Search (used to uniquely identify each search result hit within Zend Search itself).

    See Zend Search documentation here.

    Therefore, I just had to rename this field name when creating the Zend Lucene Document, e.g. ‘user_id’ instead of ‘id’:

    $doc->addField(ZendSearchLuceneField::Keyword(’userid’, $this->getId()));

    This will something to keep in mind for Symfony users as ‘id’ is recommended as the default unique identifier name for all Propel data model objects.

  6. 6 Dave Dash Posted September 8th, 2006 - 8:22 pm

    Hey Peter… I noticed this mistake myself. You’re absolutely right, $hit->id is reserved. I changed my own code to be $hit->user_id.

    -d

  7. 7 PeterVG Posted September 19th, 2006 - 6:52 pm

    Just to follow up on the ‘id’ issue with something that both Dave and I discovered subsequently. Don’t use ‘id’ to name your Symfony id field. Zend Search won’t be able to search on any fields subsequently that explicitely have ‘id’ in their name. Therefore, you have to rename the field to something like ‘uid’.

    I ran into this problem when I tried to delete a ‘document’ from the Zend Search index using the ‘user_id’ field. It wouldn’t generate a hit until I renamed it to ‘uid’.

  8. 8 Peter Van Garderen Posted March 8th, 2007 - 9:00 pm

    Dave,

    I recently upgraded Zend Search from version 0.1.5 to 0.8.0. There are a number of changes to make note of if you are using the example in your tutorial here. I’ve posted them here.

    Cheers,

    –peterVG

  9. 9 spinner Posted April 26th, 2007 - 6:04 am

    First of all thank you for this great tutorial. It was really a peace of cake to install this search.

    But I’ve got a question to the part where you delete the index for users that already exist: $hits = $index->find(’id:’. $this->getId()); foreach ($hits AS $hit) { $index->delete($hit->id); }

    Since the id is a reserved term in lucene, and shouldn’t be used for the user-id, the command “$hits = $index->find(’id:’. $this->getId()); ” doesn’t really work (id is the internal id of lucene). So the code be something like this: $term = new ZendSearchLuceneIndexTerm($this->getId(), ‘userid’); $query = new ZendSearchLuceneSearchQueryTerm($term); $hits = $index->find($query); foreach ($hits AS $hit) { $index->delete($hit->id); }

    Hope this helps…

  10. 10 Martin Poirier Théorêt Posted May 10th, 2007 - 9:20 pm

    Great tutorial, I’m building a great plugin that will offer all common feature that a website’s need like a search engine. I’m already able to change symfony builders and generators… It will be no trouble for me to set the symfony default id to uid to be sure that the Zend Search Lucene is working properly.

    Before I do this kind of modification can someone confirm me that using uid instead of id will fix the problem ? That’s the kind of things that need to be uncheange when the project is already started…

    Thanks for your help

  11. 11 Dave Dash Posted May 10th, 2007 - 9:24 pm

    Martin,

    I’m not fully sure why you’d want to do that, but first look at some of my later posts tagged as “lucene” before you get started.

  12. 12 Benjamin Luby Posted July 5th, 2007 - 2:42 am

    Sorry if this question is too stupid – but is Zend Search Lucene also a good choice for searching in a MySQL table (with or withour fulltext fields) ? I am looking for a fast and robust search application for an article database (appr. 800000 rows), but the terminology of Lucene is a bit irritating – is it only possible to search in documents? But who has still a lot of static text or html files these days?

    Thanks for hopefully enlightening me ;-)

    Ben

  13. 13 Martin Poirier Théorêt Posted July 14th, 2007 - 4:48 pm

    Hi dave,

    I’m working on the integration of the zendsearchlucene and everything is working fine except for two points (who are obviously really important)

    A number search is not working :

    Exemple

    $query = ‘contents:999′

    The field contents is text (who is set to default anyway) doesn’t return anything even if 999 is present.

    Other things, I need to specify a limit/offset like a sql query… Since it will be possible in my case to have a 10 000 result I would need to do a paginator system… And obviously I cannot load everything in memory for every search to do my own pagination system. I did found anything on the web and I’m planning to do my own search systme in database who will work but it will be a lot of job since it’s already working well (without this 2 exception).

    I have another issue who is about search between two dates do you have a suggestion ? Or it’s still a dead end !

  14. 14 Greg Herrington Posted September 4th, 2007 - 4:27 pm

    In version 1.0.1 of the Zend framewodk the Zend.php file is no longer used. So….do this :

    iniset(’includepath’,iniget(’includepath’).”:/path/to/lib:”); iniset(’includepath’,iniget(’includepath’).”:/path/to/lib/Zend:”);

    include ‘Loader.php’;

    ZendLoader::loadClass(’ZendSearchLucene’); ZendLoader::loadClass(’ZendSearchException’);

    I think. :)

    cheers,

  15. 15 Marcel Posted October 22nd, 2007 - 11:44 am

    This tuto is correct, I do the same but I met evey time the same issue : I start with some init … the following statements =>

    public function indexAction()
    {
        $sql='SELECT c_name, tKnownledgeID, c_descript FROM tKnownledge ';
        $where='';
        $res = $this->connector->fetchAll($sql, $where);            
        foreach ($res as $index => $row){
            $doc = new Zend_Search_Lucene_Document();
            $doc->addField(Zend_Search_Lucene_Field::UnIndexed('url',$row['tKnownledgeID']));
            $doc->addField(Zend_Search_Lucene_Field::UnIndexed('created', '2007-10-23'));
            $doc->addField(Zend_Search_Lucene_Field::Text('title', $row['c_name']));
            $doc->addField(Zend_Search_Lucene_Field::UnIndexed('author', "Eurordis"));
            $doc->addField(Zend_Search_Lucene_Field::Text('comments', $row['c_descript']));         
            $this->view->indexer->addDocument($doc);
        }
        $this->view->indexer->commit(); 
    }
    

    I’ve always an exception => Index compound file doesn’t contain _0.del file.

    Have somebody an idea ?

    Thanks for your help

    Marcel

  16. 16 ruzz Posted February 22nd, 2008 - 2:36 pm

    as always, you rock my world mr. dash.

  17. 17 Dave Dash Posted February 22nd, 2008 - 4:03 pm

    Hehe… I saw you twittering about Lucene ;) It’s good stuff.

  18. 18 Senthil Kumar R Posted May 26th, 2009 - 6:30 am

    Hi,

    I didn’t find the Zend.php in the ZF archive. Without Zend.php it work? Can anyone please help me.

    Thanks Senthil

  19. 19 Rodd Posted August 3rd, 2009 - 3:37 pm

    @Senthil I believe all you need is to copy - /Zend/Search folder - /Zend/Exception.php

Who's linking?

  1. 1 PHPDeveloper.org Trackback on Aug 28th, 2006
    "SpinDrop.us: Using Zend Search Lucene in a symfony app... ... "
  2. 2 archivematica » Zend Search Lucene, Symfony and the ICA-AtoM application Pingback on Mar 19th, 2007
    "[...] About six months ago Dave Dash posted a great little tutorial demonstrating how to integrate the Zend Framework’s Search ... "
  3. 3 Full-text search using Apache Lucene search engine | my-whiteboard Pingback on Apr 6th, 2007
    "[...] 4, Using Zend Search Lucene in a symfony app [...] "
  4. 4 sfZendPlugin at Spindrop Pingback on Apr 10th, 2007
    "[...] originally intended to rewrite my Zend Search Lucene tutorial, but Peter Van Garderen covered the bulk of what’s changed ... "
  5. 5 Medieval Programming » Blog Archive » Integrating Lucene into Symfony - a wrap up Pingback on Sep 15th, 2007
    "[...] Dave Dash provided the initial tutorial, based on some old ZSL implementation [...] "
  6. 6 Symfony - Eine Volltextsuche muss her! | ausgebloggt.de Pingback on Oct 12th, 2007
    "[...] der Klasse aus dem Zend Framework. Den Weg dazu beschreibt Johannes Schmidt hier im t8d-Blog oder auch hier Dave ... "

Further Help

If you require more hands on assistance, we do offer affordable hands on support.

Leave a Reply


Comment guidelines: No spamming, no profanity, and no flaming. Inappropriate comments will be deleted outright.