Extract Metadata

By Omeka Team Extract embedded metadata from files
Download 1.0.0

Extract Metadata

Extract embedded metadata from files. Once installed and active, this module has the following features:

  • When configuring the module, the user can:
    • View and enable/diable extractors;
    • View and enable/diable mappers;
    • Configure the metadata crosswalk for the JSON Pointer mapper (if enabled).
  • When adding a media, the module will automatically:
    • Extract metadata from the file using enabled;
    • Save the metadata alongside the media;
    • Map metadata to resource values.
  • When editing a media/item or batch editing media/item, the user can choose to perform a number of actions:
    • Refresh metadata: (re)extract metadata from files;
    • Refresh and map metadata: (re)extract metadata from files and map metadata to resource values;
    • Map metadata: Map extracted metadata to resource values;
    • Delete metadata: Delete extracted metadata.
  • When viewing and editing a media, the user can see the extracted metadata in the "Extract metadata" section.

Extractors:

Extractors extract metadata from files. Note that extractors must be enabled on the module configuration page. This module comes with four extractors, but more can be added depending on your need.

ExifTool

Used to extract many types of metadata from many types of files. Requires the ExifTool command-line application.

Exif

Used to extract EXIF metadata that is commonly found in JPEG and TIFF files. Requires PHP's exif extension.

getID3

Used to extract many types of metadata from many types of files. Uses the getID3 PHP library, which comes with this module.

Tika

Used to extract many types of metadata from many types of files. Requires the Apache Tika content analysis toolkit. Java must be installed and the path to the tika-app-*.jar file must be configured in config/module.config.php under [extract_metadata_extractor_config][tika][jar_path].

Mappers

Mappers map extracted metadata to resource values. Note that a mapper must be enabled on the module configuration page. This module comes with one mapper, but more can be added depending on your need.

JSON Pointer

Used to map metadata to resource values using JSON pointers. You must define your own metadata crosswalk in the module configuration page under "JSON Pointer crosswalk".

One common example is to map a JPEG file's creation date to Dublin Core's "Date Created" property:

  • Resource: [Media or Item]
  • Extractor: "Exif"
  • Pointer: /EXIF/DateTimeOriginal
  • Property: "Dublin Core : Date Created"
  • Replace values: [checked or unchecked]

Note that the pointer points to the DateTimeOriginal value in the Exif metadata output, which you can view in a JPEG media's "Extract metadata" section. Once you've saved this map, perform the "Map metadata" action as described above and, if your JPEG file includes DateTimeOriginal, the media/item should now have a "Date Created" value.

Installing Dependencies

If installing directly from the Git repository, you will need to install dependencies using Composer. Within the ExtractMetadata/ directory, run the following:

$ composer install

This will install dependencies in a vendor/ directory.

Copyright

ExtractMetadata is Copyright © 2019-present Corporation for Digital Scholarship, Vienna, Virginia, USA http://digitalscholar.org

The Corporation for Digital Scholarship distributes the Omeka source code under the GNU General Public License, version 3 (GPLv3). The full text of this license is given in the license file.

The Omeka name is a registered trademark of the Corporation for Digital Scholarship.

Third-party copyright in this distribution is noted where applicable.

All rights not expressly granted are reserved.

Version Released Minimum Omeka version
1.0.0November 22, 2021 [info]^3.1.0