scanned books - multiple files on one item or multiple items in one collection?

I have about 70 high school yearbooks that I have scanned and OCR'ed every page. There are about 6,000 individual page scans.

I'm new to Omeka and was playing with how to best represent these yearbooks.

I would like to:
- display all pages in a single yearbook
- display all yearbooks (covers only)
- search for keywords on a given page
- navigate easily from page to page within a given yearbook
- crowdsource transcription for each individual page (OCR was not very good and early yearbooks are handwritten)
- tag individual pages with things like "class of 1952" and "football"

My initial thought was to have each yearbook as an item since they are from one physical item. Then have a collection of items that would be "all yearbooks". But going down this path, I see it ends up with one big transcription and only one set of tags.

So then I played with having each page an item and a collection for each yearbook. That seems to work too. And the advantage of this way is that I can tag each page individually.

Eventually I'd like to be able to extract people names from the yearbooks and be able to link to yearbook pages (maybe using Item Relation?). So a user could easily see all yearbook pages of a given person.

Before I get too far along, has anyone done this type of thing before? My gut says that 1 item would be better but maybe with tagging it doesn't work well that way. Any suggestions?


There is no perfect solution, this is just a question of choice. See an example with the plugins Scripto and Bookreader, that I use for example here


Daniel Berthereau
Infodoc & Knowledge management

With the set of things you would like to do, that sounds very much like 1 page = 1 item as in your second foray. That will work for the tagging you mention, and for the search, display, and probably transcription needs.

You might even make the covers items, probably as a new custom item type. Make those the first items in the collection for the yearbook, and Omeka's out-of-the-box collections browse pages should produce the display of all yearbooks with the cover. (Yearbook pages would also have their own item type).

As for the (I'm guessing later?) step of associating people with yearbook pages, the Item Relation route would work. In that case, each person would also be an item, with the "person" item type. Then the Item Relation plugin would let you relate the item for a person to the pages that are also items. The nice thing about that is that it would be easy to put together an index by name. It would be somewhat labor intensive to produce all those relations, though. It would also leave more structural options for later.

A cheaper way at similar functionality would be to add some kind of metadata field to the Yearbook Page item type for the names of people on the page. Then, the Search By Metadata plugin would let you automatically search for pages where that person appears. It'd be harder to build an index, but the connections would be built automatically.

So, you have a couple options for the approach. To me, they all seem to involve working with custom item types, with pages (and maybe people) as items, and collections for each yearbook.