OAI-PMH Harvester : implementing an automated harvest

Hello everybody,

I've checked with the research but I don't think that this subject has already been discussed.

I'm currently setting up a platform where 4 different institutions are going de describe their archive into Atom (from ICA). We've installed Omeka and the OAI-PMH plugin which are working perfectly.

Since we have 4 different institutions, which are going to add

However, our database will be evolving since every institution will add some archive description. This is why it would be interesting to set up an automatization of the OAI-PMH harvesting.

I've been looking but couldn't find anything to do that. Is it possible ?

Thanks for your help.

Pauline.

I don't think that there's a way to have the harvest run automatically after a specified time limit.

It's not as easy as scheduling a cron job (which is what you were probably hoping for), but you could write a Ruby or Python script using Mechanize to login/click the appropriate buttons and then run that script at regular intervals.

Ruby: http://mechanize.rubyforge.org/
Python: https://pypi.python.org/pypi/mechanize/

Oooh! Shiney! Thanks!

Hello,

I've done it with PhantomJS, my code :


//
// Javascript script that automates the harvest of a repository in OMEKA
//
// You need a PhantomJS instance installed on your server (http://phantomjs.org)
// Call the script with the --cookies-file param (to store persistents cookies), for example :
// phantomjs --cookies-file=cookies.txt harvest.js
//
// @author Franck Dupont <kyfr59@gmail.com>
//

var page = require('webpage').create(),
url = 'http://localhost/admin/users/login',
data = 'username=admin&password=yourpassword';

page.open(url, 'post', data, function (status) { // Login on OMEKA backoffice

if (status !== 'success') {

console.log('Unable to post!');

} else {

// page.render('connexion.png'); // DEBUG
var url = 'http://localhost/admin/oaipmh-harvester/index/harvest';
var data = 'base_url=http://localhost/atom/index.php/;oai?verb=ListRecords&metadata_spec=oai_dc';

page.open(url, 'post', data, function (status) { // Call the harvest page

if (status !== 'success') {
console.log('Unable to post!');

} else {

// page.render('harvest.png'); // DEBUG
}
phantom.exit();
});
}
});

Franck