[tags]Zend, Zend Search Lucene, Search, Lucene, php, symfony, zsl, index[/tags]
This article is meant to followup sfZendPlugin where we learn a newer way of obtaining the Zend Framework.
In this tutorial we're going to delve into the Lucene index. Zend Search Lucene relies on building a Lucene index. This is a directory that contains files that can be indexed and queried by Lucene or other ports. In our example we'll be creating a search for user profiles.
We'll want to store in our app.yml the precise location of this index file so we can refer to it in our app1.
Here's an example:
all:
search:
user_index: /tmp/myapp.user.lucene.index
Now when we need to refer to the index we can do sfConfig::get('app_search_user_index').
Index Something
Let's try a user search where we can find a user by their name or email address. It's fairly simple to accomplish, and hardly requires the use of ZSL, but by using ZSL we can easily extend it to do a full-text search of a user's profile or any other textual data.
Each "thing" stored in the index is a Lucene "document". Each document then consists of several "fields" (Zend_Search_Lucene_Field objects). In our example, each document will be an individual user and the fields will be relevant attributes of the user (username, first name, last name, email, the text of their profile).
Initially we'll want to populate our index. We may also want to regularly reindex all the users at once to optimize the search performance. Since reindexing involves multiple users it would make sense to have a static reindex method in our UserPeer class2.
Very simply, we're creating a new index, getting all the users, adding a document to the index and then committing the index (to disk). You might have noticed that there's a strange function, User::generateZSLDocument(). This function contains all the magic. In order to not repeat ourselves we keep the internals of making a document for the Lucene index in the User class itself. Let's look at it:
We're really just dumping the relevant search terms into this document. The beauty of keeping this code internalized in the User class is we can reuse it later if we need to index a single User at a time.
A couple things to note. Zend_Search_Lucene_Field::Keyword allows us to store data that we can lookup later. We store the User::id in a field called uid since id is a reserved word for the index and we can't access it from Zend Search Lucene.
In a batch script or a reindex action we can now just call UserPeer::reindex() and have a working search index for our users.
- Storing things in
app.ymlis great for indexes that don't need to be searched in multiple applications. ↩ - Since we're using a Lucene index, which has an open documented structure, we aren't limited to just using Zend Search Lucene or Apache Lucene (java). We can mix and match and read and write to the same index file. For very large indexes (65,000+ documents), I rewrote a Java application to index all the documents at once as PHP would time out during such a task. ↩



One thing of note that I found is different in ZSL than say, solr, is that in ZSL you can have the search all fields by default (without creating a “contents” field as in your example).
You’re right, that’s absolutely unnecessary thanks to that change. I was a little annoyed because ZSL wasn’t a perfect port of Lucene in that regard, but I actually think it is a smart move, and hope that Apache Lucene will support that in the future (and SOLR which I imagine depends on it).
Hello,
I ‘m interesting by “I rewrote a Java application to index all the documents”
I ‘ve made a document’s index with java, but I can’t used this index with php
Can you share your script ?
thanks a lot
Jf,
I don’t have the Java app available at the moment. As far as I recall, I had to NOT use the latest version of the Java Lucene libraries. Unfortunately I don’t remember the version number off hand. If you still have trouble with it let me know and I’ll try to find out those details.
-d
Great it works with lucene 2.0
Before I used lucene 2.1
But I am always interested by your script
Thanks a lot
Finally I used php-java-bridge.sourceforge.net I can used all the functions lucene
If someone his interested, he can contact me at jf@marche.be