Creating a ressource center for texts


I'm currently benchamrking different systems to create an archive of my employer's documents.
We are a center for ethical reflexion upon healthcare, social issues...

The material will be mostly, if not only, text, in PDF form.

I really like the ability, in Omeka, to have detailed items (authors, description...) but I'm not entirely sure it's been created for purely textual documents (appart from scanned documents, such as ancient texts). As I understand, the search engine cannot index the content of a PDF file, for example.

My choice would be to use Omeka, because of its "out of the box" capabilities, but I'm still wondering wether it's the good option.

Did someone used, or know of a case, of using Omeka to organize non-historical texts ?

Thank you

We are evaluating Omeka's use for a collection of language teaching resources including many texts, but also including audio and video. We will have pdfs, docs, and docx as well as Google Docs. We probably don't need to index every word in our documents (unlike you perhaps) and will use tags liberally as well as a selection of item and Dublin Core fields.
I'm not sure organized would be the right word, but searchable and findable are important.


I don't think we need to index every word.
I think my interrogation comes down to defining what we want the user to access : a PDF file he will have to download, knowing only the auhor, subject etc, or to access directly to a text, in wich case Omeka may not be the perfect tool.

It also means that with Omeka, the user will be able to download the content (.doc, .pdf), if I'm correct...

My personal sense after working with Omeka 1.4 fairly extensively for a relatively small, text and PDF heavy archive ( a few months ago is that it's best suited for image centric archives. That is, for example, collections where a scanned image is at the center, surrounded by metadata of various kinds which in turn can then be represented in maps, timelines etc. The images in turn can very easily be commented, tagged and otherwise organized in permanent or temporary groupings via the exhibits plugin.

What Omeka does less well is working with plaintext as such as the central item and even less well (not at all, afaik) in cases where a scanned image (say, a manuscript) is displayed side by side with its transcription or other representations. Omeka can't search for text in exhibits (so text can't be the "item" in a exhibit or it won't be searchable) nor does the exhibit plugin offer a way to view image and text(s) side by side. Search in general (out of the box) also has some limitations (more on this on the forums). As far as searchable PDF is concerned I believe this is currently being worked on (search forum for more) but for 1.4 at least you need to handle this via the somewhat clumsy workaround of a separate Google custom search.

While I'd love to be able to use Omeka to create a text archive that looks something like this for example: (note the possibility for comparing different representations of the text side by side) it currently can't do it and so I would not recommend it for a text centric project. This is a real pity since there is so, so much that "just works" in Omeka and effectively "comes for free" with either the standard install or by the addition of handful of plugins.

At the same time, I don't know of any other easy to use / out of the box options which is why I decided to use Omeka all time same. Frankly though, with WordPress being so simple to use, customize (massive add-on community), and keep updated, it may be worth putting up with its more generic CMS paradigm and extending that via available plugins to meet your needs.

Worth mentioning on this thread are two recent efforts to improve text (edition) handling in Omeka. One is the TEI Boilerplate project ( and the other a project to update the nascent TEI Display plugin ( for Omeka. More info on this effort and opportunity for feedback & requests here: