OAI-PMH Harvester: fields missing from import · Legacy Forums

ebellempire January 12, 2015

I'm trying to import the following OAI content from a Bepress-based repository:

http://engagedscholarship.csuohio.edu/do/oai/?verb=ListRecords&metadataPrefix=dcq&set=publication:crohc200

It looks like all my data is available, but several fields are missing from the resulting Omeka items after the harvest, including both qualified and unqualified Dublin Core fields (dc:creator, dc:creator.interviewee, one of two dc:identifier fields, dc:format.extent, and one of two dc:source fields).

Any ideas why and/or how to fix?

Thanks -- E

John Flatness January 12, 2015

They're doing something weird there. You're using the metadata prefix "dcq", which the server doesn't even advertise as being available, and it's referring to the old 15-element DC namespace while using terms not in that namespace... basically it's a bit of a hack, it looks like.

Beyond that, though, the reason you're not seeing any of that stuff is that the harvester doesn't know about that "dcq" metadata format. What it does know about, and what you're presumably harvesting, is plain ol' oai_dc. As you can see with your same query but in oai_dc, that server doesn't include any of the qualified terms in that output, or for some reason some of the regular ones (like dc:creator, as you mentioned).

It would be possible to extend the harvester to read their qdc or some other format, but there's otherwise not much you can do about the underinclusive oai_dc output, other than seeing if you (or whoever runs the server) can fix whatever's going wrong there and/or set up some crosswalking from qualified to unqualified.

ebellempire January 12, 2015

Thanks John, I'll consult with Bepress and see what they say.

ebellempire January 12, 2015

After looking into it some more, I think I'm just going to take in their output and reformat it on my server before running the harvester. In other words, see you on the dev group forum (particularly when I get around to the resumption token part).