The Lucene Search Index and symfony

by Dave Dash 23Apr07

[tags]Zend, Zend Search Lucene, Search, Lucene, php, symfony, zsl, index[/tags]

This article is meant to followup sfZendPlugin where we learn a newer way of obtaining the Zend Framework.

In this tutorial we're going to delve into the Lucene index. Zend Search Lucene relies on building a Lucene index. This is a directory that contains files that can be indexed and queried by Lucene or other ports. In our example we'll be creating a search for user profiles.

We'll want to store in our app.yml the precise location of this index file so we can refer to it in our app1.

Here's an example:

all:
  search:
    user_index: /tmp/myapp.user.lucene.index

Now when we need to refer to the index we can do sfConfig::get('app_search_user_index').

Index Something

Let's try a user search where we can find a user by their name or email address. It's fairly simple to accomplish, and hardly requires the use of ZSL, but by using ZSL we can easily extend it to do a full-text search of a user's profile or any other textual data.

Each "thing" stored in the index is a Lucene "document". Each document then consists of several "fields" (Zend_Search_Lucene_Field objects). In our example, each document will be an individual user and the fields will be relevant attributes of the user (username, first name, last name, email, the text of their profile).

Initially we'll want to populate our index. We may also want to regularly reindex all the users at once to optimize the search performance. Since reindexing involves multiple users it would make sense to have a static reindex method in our UserPeer class2.

Very simply, we're creating a new index, getting all the users, adding a document to the index and then committing the index (to disk). You might have noticed that there's a strange function, User::generateZSLDocument(). This function contains all the magic. In order to not repeat ourselves we keep the internals of making a document for the Lucene index in the User class itself. Let's look at it:

We're really just dumping the relevant search terms into this document. The beauty of keeping this code internalized in the User class is we can reuse it later if we need to index a single User at a time.

A couple things to note. Zend_Search_Lucene_Field::Keyword allows us to store data that we can lookup later. We store the User::id in a field called uid since id is a reserved word for the index and we can't access it from Zend Search Lucene.

In a batch script or a reindex action we can now just call UserPeer::reindex() and have a working search index for our users.


  1. Storing things in app.yml is great for indexes that don't need to be searched in multiple applications.
  2. Since we're using a Lucene index, which has an open documented structure, we aren't limited to just using Zend Search Lucene or Apache Lucene (java). We can mix and match and read and write to the same index file. For very large indexes (65,000+ documents), I rewrote a Java application to index all the documents at once as PHP would time out during such a task.

Where am I?

This is a single entry in the weblog.

"The Lucene Search Index and symfony" is filed under programming and symfony. It was published in April 2007.

April 2007
M T W T F S S
« Mar   May »
 1
2345678
9101112131415
16171819202122
23242526272829
30  

need more help

If you found our tutorials and articles to be useful, but are still looking for more hands on help, consider hiring us. Find out more about how Spindrop can help you.

 

11 Responses to “The Lucene Search Index and symfony”


  1. 1 Mark Quezada Posted April 23rd, 2007 - 7:48 pm

    One thing of note that I found is different in ZSL than say, solr, is that in ZSL you can have the search all fields by default (without creating a “contents” field as in your example).

  2. 2 Dave Dash Posted April 23rd, 2007 - 8:17 pm

    You’re right, that’s absolutely unnecessary thanks to that change. I was a little annoyed because ZSL wasn’t a perfect port of Lucene in that regard, but I actually think it is a smart move, and hope that Apache Lucene will support that in the future (and SOLR which I imagine depends on it).

  3. 3 Jf Posted August 1st, 2007 - 8:48 am

    Hello,

    I ‘m interesting by “I rewrote a Java application to index all the documents”

    I ‘ve made a document’s index with java, but I can’t used this index with php

    Can you share your script ?

    thanks a lot

  4. 4 Dave Dash Posted August 1st, 2007 - 8:52 am

    Jf,

    I don’t have the Java app available at the moment. As far as I recall, I had to NOT use the latest version of the Java Lucene libraries. Unfortunately I don’t remember the version number off hand. If you still have trouble with it let me know and I’ll try to find out those details.

    -d

  5. 5 Jf Posted August 2nd, 2007 - 11:06 am

    Great it works with lucene 2.0 :-) Before I used lucene 2.1

    But I am always interested by your script ;-)

    Thanks a lot

  6. 6 Jf Posted August 16th, 2007 - 7:13 am

    Finally I used php-java-bridge.sourceforge.net I can used all the functions lucene

    If someone his interested, he can contact me at jf@marche.be

Who's linking?

  1. 1 PHPDeveloper.org Trackback on Apr 24th, 2007
    "Spindrop.us: The Lucene Search Index and symfony... ... "
  2. 2 developercast.com » Spindrop.us: The Lucene Search Index and symfony Pingback on Apr 24th, 2007
    "[...] about implementing the entire Zend Framework inside a module for Symfony, Dave Dash is back with this new post ... "
  3. 3 Creating, Updating, Deleting documents in a Lucene Index with symfony at Spindrop Pingback on Apr 24th, 2007
    "[...] Contact Us « The Lucene Search Index and symfony ... "
  4. 4 developercast.com » Spindrop.us: Creating, Updating, Deleting documents in a Lucene Index with symfony Pingback on Apr 24th, 2007
    "[...] we covered an all-at-once approach to indexing objects in your symfony app. But for some reason, people find the ... "
  5. 5 Angel’s Test Blog » Blog Archive » Zend Newsletter May 2007 Pingback on Jul 13th, 2008
    "[...] Dash delves into the Lucene Search Index and Zend Platform in this blog post on Spindrop. Click here to ... "

Further Help

If you require more hands on assistance, we do offer affordable hands on support.

Leave a Reply


Comment guidelines: No spamming, no profanity, and no flaming. Inappropriate comments will be deleted outright.