[tags]zend, search, lucene, zend search lucene, zsl, symfony,php[/tags]
If you're like me you've probably followed the Askeet tutorial on Search in order to create a decent search engine for your web app. It's fairly straight forward, but they hinted that when Zend Search Lucene (ZSL) is released, that might be the way to go. Well we are in luck, ZSL is available, so let's just dive right in.
If you aren't using [symfony] have a look at this article from the Zend Developer Zone. It covers just enough to get you started. If you are using [symfony], just follow along and we'll get you where you need to go.
Obtaining Zend Search Lucene
First download the Zend Framework (ZF). The Zend Framework is supposed to be fairly "easy" in terms of installation. So let's put that to the test. Open your ZF archive. Copy Zend.php and Zend/Search to your [symfony] project's library folder:
cp Zend.php $SF_PROJECT/lib
mkdir $SF_PROJECT/lib/Zend
cp -r Zend/Search $SF_PROJECT/lib/Zend
cp Zend/Exception.php $SF_PROJECT/lib/Zend
chmod -R a+r $SF_PROJECT/lib/Zend*
Index Something
We'll deviate slightly from food themed tutorials and do something generic. Let's try a user search where we can find a user by their name or email address. It's fairly simple to accomplish, and hardly requires the use of ZSL, but by using ZSL we can easily extend it to do a full-text search of a user's profile or any other textual data.
Each "thing" stored in the index is a "document" in ZSL, specifically a Zend_Search_Lucene_Document. Each document then consists of several "fields" (Zend_Search_Lucene_Field objects). In our example, our document will be an individual user and the fields will be relevant attributes of the user (username, first name, last name, email, the text of their profile).
We're going to write a general re-indexing tool. Something that will index all users.
In our userActions class let's add the following action:
The code should be fairly easy to follow. First of all we're requiring the necessary libraries for Lucene. The next line we are creating the index:
app_search_user_index_file is a symfony configuration that you define in your app.yml. It defines which file you want to use for your index. /tmp/lucene.user.index works for our purposes. The second parameter tells Lucene we are creating a new index.
We then loop through all the users and for each user create a document. For all the search relevant attributes that a user might have we add a field into the document. Note the last field:
By default search is made for the "contents" field. So in this example we want people to be able to type in someone's name, email, username without having to specify what field we're searching for.
Find those users
Finding the user's is equally as straight-forward. We make a new action called search:
If we have a query, open the [ZSL] index (note that we only have one parameter here). Run the find method to find our query and store it to the $hits array. Note that our query was cleaned with strtolower, since [ZSL] is case sensitive.
The template takes care of the rest:
Fairly simple... but it could use some cleaning up (enjoy).
What about new users?
Regularly reindexing might be nice in terms of having an optimized search index, but its lousy if you want to be able to search the network immediately when new people join on. So why not automatically re-index each user every time they are created or everytime one of their indexed components is summoned?
This should be fairly simple by adding to the User class:
We have an attribute called $reindex. When it is false we don't need to worry about indexes. When something significant changes, like an update to your name or email address, then we set $reindex to true. Then when we save:
We're calling a new function called generateZSLDocument. It might look familiar:
Now, whenever a user is updated, so is our index. Additionally we can modify our reindex action:
That's a lot easier to deal with.
...and beyond
Hope this article helps some of you jumpstart your [symfony] apps. Really cool, easy to implement search is here. We no longer have to stick with shoddy solutions like HT://Dig or spend time rolling our own full text search, as the symfony team diligently showed us we could. But there is a lot more ground to cover. Including optimization techniques and best practices.
Let me know what you think, and if you use this in any of your apps.



In public function save I had to change: $index->find(‘id:’. $this->getId()); to $hits = $index->find(‘id:’. $this->getId());
Worked great otherwise – thanks for the tutorial!
Oops, you are correct hector, I think I fixed my own code, but not the tutorial, I’ll fix that now.
Great article many thanks.
is it possible to combine this with ranges such as date from and date to, and price less than, price over etc. Also can you sort the result other than by relevancy. I know these are advanced topics, but they are the reasons why i am sticking to writing my own full text search functionality from a database as db query can do this.
I am not usre if such features are avaiable in ZLS yet, but if they are it would be great if you could add an example here.
Thanks again
Joe
Joe,
Excellent question. First of all range queries are doable, of course they aren’t documented in Zend Search Lucene as near as I can tell. However, looking at the Lucene documentation from Apache, in our example above:
firstname:{chris TO dave}
will match the following (scores included):
I honestly must say, that this is news to me, I didn’t think it would work, but it does, and that’s awesome!
As far as sorting goes, I personally would handle the sorting within the application… since I use propel I could take the
$hitsarray and do something quick and dirty like:That’s a bit rough around the edges, and kinda clunky (It’s using
joinit must be clunky), but it will do you right. Let me know if that works.Dave. Thanks for a really useful tutorial. It helped me off to a great start on integrating Zend Search in my Symfony app.
PLEASE NOTE: I tried to use the $hit->id value in my search results (e.g. to create a link to the detailed User record: ‘/user/show/id/1′) but I discovered that $hit->id appears to be a reserved property within Zend Search (used to uniquely identify each search result hit within Zend Search itself).
See Zend Search documentation here.
Therefore, I just had to rename this field name when creating the Zend Lucene Document, e.g. ‘user_id’ instead of ‘id’:
$doc->addField(ZendSearchLuceneField::Keyword(‘userid’, $this->getId()));
This will something to keep in mind for Symfony users as ‘id’ is recommended as the default unique identifier name for all Propel data model objects.
Hey Peter… I noticed this mistake myself. You’re absolutely right, $hit->id is reserved. I changed my own code to be $hit->user_id.
-d
Just to follow up on the ‘id’ issue with something that both Dave and I discovered subsequently. Don’t use ‘id’ to name your Symfony id field. Zend Search won’t be able to search on any fields subsequently that explicitely have ‘id’ in their name. Therefore, you have to rename the field to something like ‘uid’.
I ran into this problem when I tried to delete a ‘document’ from the Zend Search index using the ‘user_id’ field. It wouldn’t generate a hit until I renamed it to ‘uid’.
Dave,
I recently upgraded Zend Search from version 0.1.5 to 0.8.0. There are a number of changes to make note of if you are using the example in your tutorial here. I’ve posted them here.
Cheers,
–peterVG
First of all thank you for this great tutorial. It was really a peace of cake to install this search.
But I’ve got a question to the part where you delete the index for users that already exist: $hits = $index->find(’id:’. $this->getId()); foreach ($hits AS $hit) { $index->delete($hit->id); }
Since the id is a reserved term in lucene, and shouldn’t be used for the user-id, the command “$hits = $index->find(’id:’. $this->getId()); ” doesn’t really work (id is the internal id of lucene). So the code be something like this: $term = new ZendSearchLuceneIndexTerm($this->getId(), ‘userid’); $query = new ZendSearchLuceneSearchQueryTerm($term); $hits = $index->find($query); foreach ($hits AS $hit) { $index->delete($hit->id); }
Hope this helps…
Great tutorial, I’m building a great plugin that will offer all common feature that a website’s need like a search engine. I’m already able to change symfony builders and generators… It will be no trouble for me to set the symfony default id to uid to be sure that the Zend Search Lucene is working properly.
Before I do this kind of modification can someone confirm me that using uid instead of id will fix the problem ? That’s the kind of things that need to be uncheange when the project is already started…
Thanks for your help
Martin,
I’m not fully sure why you’d want to do that, but first look at some of my later posts tagged as “lucene” before you get started.
Sorry if this question is too stupid – but is Zend Search Lucene also a good choice for searching in a MySQL table (with or withour fulltext fields) ? I am looking for a fast and robust search application for an article database (appr. 800000 rows), but the terminology of Lucene is a bit irritating – is it only possible to search in documents? But who has still a lot of static text or html files these days?
Thanks for hopefully enlightening me
Ben
Hi dave,
I’m working on the integration of the zendsearchlucene and everything is working fine except for two points (who are obviously really important)
A number search is not working :
Exemple
$query = ‘contents:999′
The field contents is text (who is set to default anyway) doesn’t return anything even if 999 is present.
Other things, I need to specify a limit/offset like a sql query… Since it will be possible in my case to have a 10 000 result I would need to do a paginator system… And obviously I cannot load everything in memory for every search to do my own pagination system. I did found anything on the web and I’m planning to do my own search systme in database who will work but it will be a lot of job since it’s already working well (without this 2 exception).
I have another issue who is about search between two dates do you have a suggestion ? Or it’s still a dead end !
In version 1.0.1 of the Zend framewodk the Zend.php file is no longer used. So….do this :
iniset(‘includepath’,iniget(‘includepath’).”:/path/to/lib:”); iniset(‘includepath’,iniget(‘includepath’).”:/path/to/lib/Zend:”);
include ‘Loader.php’;
ZendLoader::loadClass(‘ZendSearchLucene’); ZendLoader::loadClass(‘ZendSearchException’);
I think.
cheers,
This tuto is correct, I do the same but I met evey time the same issue : I start with some init … the following statements =>
I’ve always an exception => Index compound file doesn’t contain _0.del file.
Have somebody an idea ?
Thanks for your help
Marcel
as always, you rock my world mr. dash.
Hehe… I saw you twittering about Lucene
It’s good stuff.
Hi,
I didn’t find the Zend.php in the ZF archive. Without Zend.php it work? Can anyone please help me.
Thanks Senthil
@Senthil I believe all you need is to copy - /Zend/Search folder - /Zend/Exception.php
In your last code snippet (the reindex function) $user->generateZSLDocument need a pair of parenthesis.
Great tutorial.