Harvesting Qualified Dublin Core · Legacy Forums

Liz Woolcott October 1, 2014

Hi there,
Are there any plans to add the capability to both harvest and be harvested in qualified Dublin Core? We currently have 2 Omeka sites (soon to be 3) that we are developing, but are running into the problem of getting our qualified Dublin Core out into an aggregated site and also being able to pull in content from our CONTENTdm website with all of the glorious dcterms:spatial data separated from the dcterms:temporal. :)

Thanks for your help!
Liz

John Flatness October 1, 2014

Are you talking about OAI?

As you've seen, you can do that right now with the harvester or repository. The harvester only outputs unqualified Dublin Core in the oai_dc metadata format.

I think it would be very nice to wire up the repository and/or harvester along with the Dublin Core Extended plugin that adds the qualified elements so they could send and recieve them. Part of the problem is deciding which output format to try implementing. Last I checked, unlike for oai_dc, the situation for qualified Dublin Core through OAI was a little unsettled as to what metadata prefix and specific format to use.

I know of no current plans along those lines, but it's definitely an area of interest.

John Flatness October 14, 2014

After a closer look at this, I think RDF (which does have a well-defined schema and namespace within OAI) would be a suitable format to expose both qualified and unqualified Dublin Core data.

As for harvesting into Omeka, that depends to some degree on the format your CONTENTdm site is exposing its own data in. Omeka's harvester works by matching up the schema the repository uses with the formats it understands (though it should probably be using the namespace instead). This is one of those areas where it's tough to nail down a single schema that qualified Dublin Core would be exposed as.

Liz Woolcott October 21, 2014

Hi John,
Thank you for looking into this! And I apologize for the time delay in responding.
I can see the dilemma for exposing the qualified Dublin Core. It is rather a thorny nest to jump into. Our CONTENTdm site uses the oai_qdc: (http://cdm16944.contentdm.oclc.org/oai/oai.php?verb=ListMetadataFormats). But I know of other CONTENTdm repositories that use the qdc.
For our own purposes, we simply use the harvester to pull collections from CONTENTdm into Omeka for exhibits (and the CSV Import offers a great workaround for that - for the initial upload, that is).

Our real concern is getting one of our Omeka sites harvested by MWDL. We are hoping to host an Omeka site where libraries and archives we partner with (who don't have an IT infrastructure) can put their content on our hosted site and have it harvested up into the MWDL aggregator and from there into the DPLA. So, the harvestability is a big concern for us, as well. Luckily our MWDL metadata folks can harvest in oai_qdc or qdc. But I know that is just the tip of the iceberg.

Thank you again for looking into this!

Liz

John Flatness October 22, 2014

Can MWDL harvest/ingest qualified DC in RDF? That's the avenue I'm currently considering and most comfortable with for adding qualified DC support to the repository plugin.

Liz Woolcott October 22, 2014

Hi John,
Thanks again for your response. I talked to our metadata guru at MWDL and here is her response:

"I think that you would find more people right now interested in harvesting qualified dublin core directly with their current systems that are set up to handle that, instead of in RDF. We would recommend this schema: http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd

I think that for the scenario you've described, we’d have to transform the RDF into XML for that to work, or we would have to research and set up a different set of norm rules just for Omeka based repositories, while currently all the repositories we harvest go through a fairly standard but complex set of norm rules that are universally applied. I think that Primo can handle ingesting RDF/XML based on some of the documentation that I’ve found, but we haven’t had to deal with that as part of the regular ingestion work for MWDL."

Thanks,
Liz

mjlassila October 23, 2014

If you want to extend the current capabilities of OAI-PMH for site specific use, it is pretty straightforward to do. For example, I have created mappings for various internal Item Type metadata fields to standard DC by extending the default DC implementation

(https://github.com/mjlassila/plugin-OaiPmhRepository/blob/master/metadata/OaiKdk.php)

John Flatness October 24, 2014

Yes, the repository is designed to allow you to add new formats by just adding a class. That's definitely an option.

As far as what it supports out of the box, I just pushed the RDF format I mentioned earlier. Right now it just includes the unqualified and qualified or "extended" Dublin Core terms, though I'll probably add some mechanism for other plugins (either ones that add their own elements or something like Item Relations) to add properties of their own to the RDF output.

Obviously, this is pre-release code, but you can take a look at https://github.com/zerocrates/OaiPmhRepository (the specific added file is metadata/Rdf.php).

Sacha November 3, 2015

Hi,

I'm running Omeka v2.3.1 and have added the Rdf.php file to the metadata directory of the OAI repository plugin
(.../plugins/OaiPmhRepository/metadata).

When trying to invoke the format like so :

http://bamu.bib.uqam.ca/omeka/oai-pmh-repository/request?verb=ListMetadataFormats&identifier=oai:bamu.bib.uqam.ca:56809

http://bamu.bib.uqam.ca/omeka/oai-pmh-repository/request?verb=GetRecord&metadataPrefix=rdf&identifier=oai:bamu.bib.uqam.ca:56809

I get this error message:

Fatal error: Interface 'OaiPmhRepository_Metadata_FormatInterface' not found in /data/www/html/omeka/plugins/OaiPmhRepository/metadata/Rdf.php on line 16

Does anyone know of a simple fix? I need to harvest all the available DC elements.

Thanks,
Sacha

John Flatness November 3, 2015

You can't just take the Rdf format out from there and use it in the old plugin: it depends on other changed plumbing elsewhere in the plugin.

I'll see if I can get going on actually finishing up a release of the Repository that will include the RDF format.

Sacha November 3, 2015

Ok, thanks John! I installed the entire plugin and it works fine now.

John Flatness November 7, 2015

I went ahead and officially released the version of the plugin with the RDF format, and lots of other nice improvements, as Version 2.1.

Corfromleuven November 7, 2015

I've got a result from my latest harvest, but it won't complete (still running after two days.

And it gives a ton of errors in this style:

ID 4
Set Spec
Set Name
Metadata Prefix oai_dc
Base URL http://data.fitzmuseum.cam.ac.uk/oai/
Status In Progress
Initiated 2015-11-05 09:13:46
Completed [not completed]
Status Messages Notice: Received resumption token: KnwqfG9haV9kY3wxMDA= (2015-11-05 09:15:32)

Notice: Received resumption token: KnwqfG9haV9kY3wyMDA= (2015-11-05 09:17:36)

Notice: Received resumption token: KnwqfG9haV9kY3wzMDA= (2015-11-05 09:19:29)

Notice: Received resumption token: KnwqfG9haV9kY3w0MDA= (2015-11-05 09:21:59)

Notice: Received resumption token: KnwqfG9haV9kY3w1MDA= (2015-11-05 09:24:09)

Notice: Received resumption token: KnwqfG9haV9kY3w2MDA= (2015-11-05 09:26:15)

Notice: Received resumption token: KnwqfG9haV9kY3w3MDA= (2015-11-05 09:28:14)

Goes on for uncountable lines. So I guess it is not working still. Could you tell me what to do now? Can I delete this? Can I just give it a second try?

Thanks in advance, Cor

John Flatness November 7, 2015

I think you might have posted this in the wrong thread.

Corfromleuven November 9, 2015

True. Posted it again in the right one now. Question remains identical though.