I need to set up a digital library at a University in the Democratic Republic of Congo. I'd like to use Omeka instead of something like DSpace because of it's simplicity, most people here don't have much experience with IT. The library will be used locally for the students to consult University documents.
This means we have an external hard disk here with over 2TB of documents (mostly PDFs), without metadata. I think the easiest thing to do would be to use Zotero to find metadata for the PDFs and then to use the Zotero plugin to batch import them onto the system. However, the internet connection here is really, really, really bad. It won't be possible to first upload the data to the Zotero server and then to download it again into Omeka. I need to do this offline, directly. Is there a way I could achieve this?
Alternatively, if I were able to easily batch import only the PDFs (without metadata) into Omeka, and then use the OAI Harvester, would the end result be the same?
The Zotero Import plugin only works when talking to the Zotero servers, so if you can't count on that connection it probably won't work.
OAI Harvester is similarly designed to gather data over the internet from a repository, so it sounds like that would be just as risky. In any case, if the PDFs are added to Omeka separately, there probably wouldn't be a good way to line up harvested data with the PDFs.
It sounds like your best bet would be to use the CSVImport plugin. You'd need to somehow gather the metadata, then put it all in a CSV file that you could import into Omeka. (Actually, with 2TB of PDFs, probably several different CSV files).
You could probably still use Zotero for gathering the data, and then the hard part would be converting one of Zotero's export formats. My Zotero developer colleague across the table suggests exporting CSL-JSON or one of the XML-based formats, then working on converting that to CSV. With luck, some tools might already be available for that.
Thanks for the timely reply. Unfortunately this forces me to use DSpace as it already has tools available for this kind of batch import.
If you place the PDFs somewhere that your Omeka server can access by URL, and you export the Zotero Library as MODS, it isn't that difficult to use CSVImport to achieve your goals in the way you stated.
I'd wanted to make these a bit more generic (and better documented) before releasing to the wild, but here's what I use to convert MODS to CSV.
config.rb needs to be modified to reflect the location of your PDFs:
Most fields/tags should be covered. If it's missing something, let me know and I can help modify the script to pull the data from the XML.