OCR'ed PDF search results display file names

Hi all,

Recently, I've ingested a number of OCR'd PDFs into Omeka. If I were to search for a specific item, the search results displays the appropriate PDFs by its title and description. When I click on it, I'm able to go to the item page and view it as normal. However, if I were to search for text within an OCR'd PDf, the search results display the actual file name of the PDF instead of its title. And when I click on it, it displays info about the file such as its file type, size, OCR text, etc. This page is, to be honest, useless to me and to my users. How can I have the search results resemble the normal style and content display as mentioned in the first case when searching by text in an OCR'd PDF?


- Darrin

This might vary based on the theme you are using, but am I guessing right that in the search results, in the second case the record type is "File", rather than "Item"?

If so, is the OCR text part of the File metadata, not the Item metadata?

You're right in that the record type is File and not Item. It looks like the OCR text is part of the File metadata, but I need it to be associated to the Item instead. That way when a user searches for text within the PDF, items (with their associated titles) are displayed in the search results and not the file (and associated file names). Does that make sense?

You have two courses of action, probably depending on how many PDFs you have.

One is to manually copy the text out of the File metadata and put it in the Item metadata, probably using the 'Text' element in the Text Item Type.

Then, depending on how you are adding the text, just follow that process. If you're using the PDFText plugin, though, that will continue to attach it to the file.

If you have lots of items, though, the other approach is to hack a change in your theme.

You would want to copy the 'search/index.php' file into your theme as described here.

Then, just before the line

<?php if ($recordImage = record_image($recordType, 'square_thumbnail')): ?>

add this code:

            if ($recordType == 'File') {
                $record = $record->getItem();

I've only done very limited testing on that approach, but I suspect it will work okay.