Importing from Flickr · Legacy Forums

JamesMorley October 11, 2013

Hi, just installed Omeka on a test site last night and have had just a few hours to configure, look around, and play. Very impressed so far, having worked with other open-source (but not specifically CollMS) tools like Wordpress and Drupal.

One thing I am specifically looking to do is a bulk import. As a test I want to take a large batch of images from Flickr - I have a lot of my personal photographic collection there, I have also worked a lot with Flickr Commons images, and I have worked extensively with the Flickr API.

I can see two methods - one would be a full-blown use of both APIs, which would need me to get up to speed with the Omeka API, or I could simply download all the data into csv and do an import. The latter appeals to me as I already have code to export Flickr images into csv for another project that I worked on. For now I'm being stopped in my tracks by an error "The configured PHP path (/usr/bin/php) does not point to a PHP-CLI binary" but I think that's a hosting issue.

Anyway, I was just checking in to say hi, and also wondering if anyone has done anything like this with Flickr collections before?

I'm happy to share any code I come up with if anyone is interested.

James

patrickmj October 11, 2013

While I'd be really excited to see something that works with both API's, the CSV import method is likely the more straightforward. The path to PHP-CLI is indeed something that your hosting provider should be able to give you. Then you enter it in Omeka's config.ini file: application/config.config.ini in the background.php.path value around line 152

Good luck!

JamesMorley October 11, 2013

Thanks Patrick - always encouraging when your first post in a new forum gets a quick and helpful response! It looks like /omeka/application/libraries/Omeka/Job/Process/Dispatcher.php is auto-detecting the path as /usr/bin/php which looks like it should be fine, but I've asked my hosts to confirm.

A couple more questions ...

It talks about being able to take csv export files from Omeka. Are those ones from omeka.net or can I export from my self-hosted Omeka? I thought the easiest way to get the correct template csv file to match my requirements would be to set up a manual item and do an export, then just add records to the exported csv file to match the format.

And the second question (which might be answered if I could do an export) is, if I use the Geolocation plugin, can I import latlng values in the csv to ensure my imported items keep their geotag information?

Thanks, James

patrickmj October 11, 2013

James,

The CSV export files it talks about are from omeka.net. Since omeka.net is still on 1.5, the plugin that generates the export won't work with a 2.x site.

That said, you don't have to worry too much about figuring out the correct template. The CSV Import plugin lets you map the columns in your CSV to the Dublin Core fields in Omeka when you import. Or, there is an auto-map structure that follows a predetermined template. There's an example of that in the csv_files folder in the plugin. It would require a little modification based on your particular data, but the structure is pretty regular.

Second question leads to the API route. CSV Import can't import any Geolocation data (or data for any other plugin, for that matter). Geolocation does, however, tap into the API, so you could use the API to import that data.

I'm not sure if a hybrid approach -- CSV for most data, then API for latlng -- would be easier or harder than trying to do it all with the API. If you went with the hybrid approach, it might be possible to put the Flickr image ID info into the Dublin Core Identifier field in Omeka, then have the script that talks between Omeka and Flickr look up latlng data in Flickr based on that and attach it to the correct item id in Omeka.

If you go the API route either for all of the data or just for part, we'd love to hear back about how it goes. To my knowledge you'd be the first to implement this kind of Flickr to Omeka transfer, especially involving the Geolocation plugin.

JamesMorley October 11, 2013

Hi Patrick

Thanks for the clear explanation. I'd got as far as the sample csv files, in fact I can use those and see the mapping options, it's just failing at the actual import stage due to that CLI issue. I'm hoping my hosts can advise.

I might have a look at the Omeka API and see if it's going to be possible to throw something together quickly, but it's going to be hard to give it too much time right now. Also, knowing the csv-only route will give me everything I need except the geo I think I'll have to settle with that for now. It helps though knowing that it might be possible to add the geo using the API at a later date, so anything I do via CSV won't be wasted.

I'm happy to share whatever code I come up with. One thing I do see as an issue is that each user's scenario will be different - each will potentially want different bits of Flickr content to go into differently named fields - but a fairly vanilla version using standard DC field names might work for the majority of cases and anyone with some basic coding knowledge could hack it around a bit if not.

Thanks again

James

patrickmj October 11, 2013

Makes sense to me! As long as you import something that'll let you keep track of which Omeka item corresponds to which Flickr image, should be easy to go back and add more data when you have a lot of spare time. Because we all know that spare time comes around so frequently! :)

Bespoke solutions that others can hack on are to be expected, and are all super helpful.

Good luck!

JamesMorley October 13, 2013

Success! My host gave me the CLI path (we're not sure why it didn't get automatically detected) and a test import using the supplied csv files worked. I've then hacked a really old script I had for exporting Flickr image data to a csv file (it was for importing into Historypin) to fit the Omeka format, including a bunch of DC fields.
For reference here's my csv header row:

$csvdata = array('Dublin Core:Title', 'Dublin Core:Description', 'Dublin Core:Source', 'Dublin Core:Date', 'Dublin Core:Contributor', 'Dublin Core:Rights', 'Dublin Core:Identifier', 'tags', 'Item Type Metadata:URL');

To be honest my knowledge of correct usage of DC is limited but:
- I've got the source and contributor set the same, to the username and ID of the Flickr user.
- The date takes the stored date in Flickr but applies some changes based on what they call 'datetaken granularity' so for example if an image is recorded as 'ca 1890' they will store '01-01-1890' and a granularity of '8' so my script translates this to the text 'ca 1890' for storing in Omeka. Not sure if that is the correct approach as it then stops it being used in timelines etc. How does DC handle imprecise dates?
- Rights is based again on a value stored in Flickr that can be translated into a Creative Commons annotation.
- As you suggest, I have set the identifier as the image ID from Flickr.

All these are easily changed.

So, you can see this in action at http://www.whatsthatpicture.com/flickr/omeka/ (very crude and untested - not even an alpha really!)

A test export/import of about 50 images from my personal collection can be seen on my testing site at http://omeka.catchingtherain.com/

To suit my own needs this just works manually by adding single images, but it wouldn't be hard to change it to do bulk exports, for example of sets. I think it's fairly obvious that it only pulls in images from the authenticated user's account, although you could change it for things like Flickr Commons I guess.

Ideally there should be some mechanism to identify records that have already been exported and prevent duplicates. Currently this is just implemented crudely by adding a tag 'omeka' to the Flickr image.

Sorry, long post, but contact me if anyone wants further details. I'd want to tidy up and document the code before sharing too widely! I do think though that any further development should head down the full API route rather than just csv import.

patrickmj October 14, 2013

In general, there's not a great solution for imprecise dates in the DC field. I think Neatline Time plugin does some more fancy things, though.

Glad it's working overall, though!