Search : PDFText + Solr - Improve User Experience

Hi there,

This might be a bit of a general request and / or question in regards to a standard situation we encounter quite often.

The typical scenario is as follows: Client provides us with a set of manuscripts and associated metadata. We digitize and OCR the collection, create PDFs and batchimport the items into an Omeka install. The PDF Text plugin extracts the text and stores it related to the files. This process is simple enough.

However, the user experience when exploring collections is in my mind somewhat hampered, largely in part of the way searches are being executed.

* Basic Search searches across Items, Files and Collections
* Advanced Search however, searches only across Items, and therefor ignores any of the text extracted from the file itself
* Solr Search appears to ignore metadata related to Files, and therefor never returns results containing the extracted text

To me, this is not very intuitive at all - as a user, I simply would like to get relevant results to all my searches, whether they are related to Collections, Files or Items. Because the distinction may not be that important to me (as a user).

As an administrator on the other hand, I simply want to be able to batch import the items, map the metadata fields and not have to worry whether they get put in the right fields so that they are searchable as well. I know that I can extract the text and add it to the CSV file upon import, but that workflow would take way more time.

I think the most important improvement would be to make the PDFtext tool configurable, so that the admin can choose whether to store the extracted text in the Item Type related metadata, or with the File (Or both). Maybe this would require a mapping that can be configured depending on the Item Type, since for example Text, Lesson Plan and Email are all Item Types which may have related PDF files but utilize different fields that information could end up in.

I would also like to see Advanced Search to be expanded to all elements - Items, Files and Collections. I am not sure why the decision was made to limit it to Items only, so maybe there is a good reason behind it.

Thanks for any input and consideration.


Hey, I'm running into the same issue. It's frustrating to think that the advanced search is actually not as advanced as we need it to be. And it's surprising that no one has responded in this thread at all. For my collection, the use of this functionality probably makes-or-breaks my institution's decision for even using Omeka.

- Darrin