Plugins/PdfText

PdfText enables searching on PDF files by extracting their texts and saving them to their file records. PdfText strips out images and layout in the original file, leaving only searchable text which is viewable from site admin.

Dependencies

PdfText requires a command-line program installed on your server: pdftotext. The plugin will refuse to install if pdftotext is not found on the server. If you control the server yourself and need to install pdftotext, it's provided by the poppler-utils package which should be available from your distribution's package manager. If you don't control the server, ask your host to install it for you.

Using the PdfText plugin

  1. Upload and install the PdfText plugin (see Installing a Plugin)


Pdftxtconfig.png

Configure PdfText. If PDF files are already in your Omeka database when you install the plugin, you can configure PdfText to run the text extraction process on those items. Just check the box and remember to save changes.


Pdftxtview.png

To locate extracted text, select the item to which the PDF is attached. Select File from the Item navigation. Click on the name of the file.


The searchable, extracted text opens in a new window.