OAI-PMH Harvester stuck on "In Progress"

Hi,

I am test Omeka before we use it as the public facing front for our repository system.

I am having some difficulty getting Omeka to harvest from an ePrints repo.

I had originally setup a version of Omeka on my local machine using a MAMP setup, which harvest the DC items successfully.

However I then needed to test this on our server so setup Omeka again and whilst it finds the repo and items and allows me to select the DC metadata to harvest it then hangs on the "In progress" status.

Everything as far as I can tell is setup properly, it is pointing to the correct php directory etc.

It has been stuck like this for quite some time now, but the eprints is also a test setup with only less than 10 items in so wouldn't expect it to take much time at all.

When I harvested the same repo on my local machine setup it took only a matter of seconds to complete.

Any help would be really appreciated.

Thanks, Patrick.

That sounds like some error caused the harvesting process to quit abnormally. Can you tell if any of the items were harvested? That would at least tell us that the process got started okay.

It's also possible that the background PHP path is incorrect. In application/config/config.php, there is a setting for background.php.path around line 152. It might be that digging up the correct path for the server setting and putting it in there will help.

Hi Patrick,

I have already set the background.php.path to the correct directory where php is on the server and have also enable the SetEnv APPLICATION_ENV development in the .htaccess file but this is not showing any error when I try to harvest.

It's strange because it looks as though it will do the harvest successfully but as yet none of the items have been brought through from the ePrints repo.

I have just checked again and the harvester does seem to be doing something as it has created some new collections (below)

  • Type = Image (Private)
  • No contributors Jan 23, 2014 0
  • Type = Other (Private)
  • No contributors Jan 23, 2014 0
  • [Untitled] (Private)
  • No contributors Jan 23, 2014 0

As yet it hasn't pulled in any items.

Is it correct though that it would take this long? The harvest has been running a few hours now.

Thanks.

That's correct, it shouldn't take that long at all. Most likely whatever is stopping the process is doing so so abruptly that the status is not being correctly updated.

Is anything appearing in application/logs/errors.log? It sounds like no, but thought I'd double-check.

Also, could I give a try at the endpoint?

Sorry for the late reply, I checked the error log but that is empty.

The setup is currently on a testing server that isn't public so would be difficult to get access to as I'm not the network admin but if there is anything you would suggest I try then I can do it then publish back the results onto here.

Thanks for the help.

Hi Patrick,

Got it working by doing a new install of Omeka on the server that we are using to test.

Doing a Harvest of dublin core metadata works fine however when trying to do a METS imports it gives an error.

The status message says Error: OaipmhHarvester_Harvest_Mets::_dmdSecToArray(): Node no longer exists (2014-01-24 15:38:29)

Is this because ePrints doesnt support METS harvesting or is there something wrong in our setup?

An example repo that you could use would be http://eprints.ulster.ac.uk/cgi/oai2 as this is similar to our setup.

Thanks for the help.

Hard to be sure right away, because I don't know how forgiving PHP's SimpleXml stuff is, but when I dropped a page of the METS XML returned from http://eprints.ulster.ac.uk/cgi/oai2?verb=ListRecords&metadataPrefix=mets into oXygen XML editor, it wouldn't validate, with over 1800 errors.

This one looks particularly suspicious, given the _dmdSecToArray() method that is complaining.

On this element

<mets:dmdSec ID="DMD_oai:generic.eprints.org:1_mods">

oXygen says:

cvc-datatype-valid.1.2.1: 'DMD_oai:generic.eprints.org:1_mods' is not a valid value for 'NCName'.

That might be it, but I'm not entirely sure at this stage