Keyword search not returning expected results

Hi--

We're having an odd problem with Keyword searches in Omeka 2.3 (and 2.3.1). A keyword search returns many results perfectly, but there are several words (extremely common words-- names of correspondents in a letter collection that appear both in the Title fields and the Transcription [item type metadata] field, for instance) that return zero results, even though they are clearly there.

We've done several rounds of troubleshooting, including adding the same words via csv upload and individually, in multiple fields. New entries seem to be indexed right away, so that does't seem to be the issue, although we've tried reindexing several times as well. These search results are not returned from the admin side, from the public view, or from the Neatline admin panel search. Some proper names are returned and others aren't. The word "received" works but the word "letter" doesn't (even though they appear in the same sentence of the transcription field). These names/words ARE returned, however, in both the Advanced search and the "exact match" option found on the admin side.

We're at the end of our tether here-- does anyone have any experience with this sort of behavior? and/or any solutions for us to try?

Thanks!

JenB

How extremely common? MySQL considers terms that appear in over 50% of indexed items to be so common that it excludes them from keyword searches.

If you don't have a lot of records (and it is indeed the 50% index problem described above) then I believe you can trick MySQL by creating a bunch of dummy, private records with lots of text (say, in the Description field). This in turn will drop the proportion of your missing keywords in the index relative to all records.

For the Omeka folks.. would it be useful to highlight these MySQL 'gotchas' more prominently in the docs? Perhaps as part of a README included with the release? This and related issues such as English language stopwords on a US hosted site (with an Omeka site consisting almost entirely of German records) and minimum character lengths (by default nothing under 4 characters gets indexed) have tripped me up in the past.

Bingo! That must be the case. We have two letter collections (one approx 400 items and one approx 50) and the main correspondent names in the bigger collection aren't returned, but they are for the smaller collection.

Thanks so much for the help, and YES I'd be in favor of more clearly highlighted 'gotchas'!

I've added a Troubleshooting section to the documentation on Search Settings - where else would you look for these issues?

How about in the Site Planning Tips?

The thinking being that it's all too easy (speaking for myself, but I don't think I'm alone) to import a bunch of records and then realize afterwards that you need to change all your titles because they use words that are too short. Or, you will need to self-host your database because you can't modify a stop word list. The earlier one becomes aware of these issues, the better.

The easy way to tell if it's the 50% threshold that's at fault is to choose the "boolean" search. MySQL doesn't apply the 50% limitation to boolean searches.