Solr - Reindex All Items

I've installed and configured the Solr plugin and so far I'm loving the great search capabilities. However, when I click the Reindex button nothing appears to happen. I've watched the processes on the server and Tomcat is not doing anything (as opposed to during a CSV import when it's using the most CPU % since it is indexing.)

To confirm this I manually deleted items from the database and attempted to Reindex. The deleted items would still show up in search results.

My concern is that if by some possibility the index gets corrupted or outdated down the road I would be unable rebuild/reindex.

I'm new to Solr so any help or suggestions are very much appreciated.

Thanks.

After some further testing it seems that the ability to reindex is based on the total number of items in Omeka.

Test case #1: 1,000 public items in Omeka. When I install the SolrSearch plugin all items are automatically added to the index. Also, the Reindex button works and I am able to reindex all the data.

Test case #2: 80,000 public items in Omeka. When I install the SolrSearch plugin none of the items are added to the index. The Reindex button does not work and I am unable to get the data from my items added to the index. Adding new items (either by adding a single item or multiple items with CSV Import) results in the data for those items correctly being added to the index.

So it looks like somewhere between 1,000 and 80,000 items there is a problem adding all public items to the index.

I've taken a quick look at the IndexAll.php file in the plugin but don't see a part where it might have a problem with a larger document set.

Any thoughts?

Taking a look into SolrSearch myself, I think I see the problem.

In IndexAll.php, there is the line:

$items = $db->getTable('Item')->findAll();

So, with 80,000 items in your database, this line is going to produce 80,000 Item records. This is probably ending up using more memory than PHP's memory limit.

The IndexAll (and possibly DeleteAll) scripts likely just need to page through the items instead of getting them all up front.

Yeah you're right John. I increased the memory_limit in /etc/php.ini and it fixes the problem, although uses a lot of memory. I'll try your suggestion about paging through the items.

Thanks

I had meant to weigh in on this earlier but have been out of town at a conference. We had re-factored this plugin a bit to handle larger collections, but it looks like we need to start paging through the record results to get around the memory allocation bits. May need to do a bit of testing with the data payloads for an optimal number, but we'll get this in the queue to be looked at..

Wayne