Bulk Edit

By Daniel Berthereau Add batch process to replace, remove, order, fill values suggest data, etc. to help curators to update and clean metadata over many resources quickly and easily.
Download 3.4.31

Bulk Edit (module for Omeka S)

New versions of this module and support for Omeka S version 3.0 and above are available on GitLab, which seems to respect users and privacy better than the previous repository.

Bulk Edit is a module for Omeka S that adds tools to bulk edit resources in order to modify or to clean them.

Current processes are:

  • Modify language codes
  • Remove duplicate values
  • Remove all trailing white spaces
  • Replace value of a property (directly or via regex)
  • Remove the literal value of a property
  • Prepend or append a string to a value of a property
  • Set or remove language of a property
  • Order values by language (in particular for the title)
  • Set visibility of a property public or private
  • Displace values from a property to another one
  • Explode a value into multiple ones
  • Merge two values into one
  • Convert a value to another data type
  • Update or remove the owner
  • Update value suggest labels
  • apply visibility for values
  • fill label for Value Suggest values
  • remove specified values
  • remove specified media
  • add or remove a thumbnail
  • update order of media
  • explode item into multiple items by media
  • explode pdf into multiple images (mainly for quick display via iiif)
  • Update the media html via item or media
  • update media source
  • Update the media type (mime-type) of a media
  • update media visibility

Furthermore, values are automatically trimmed and deduplicated when a resource is saved.

Installation

Module

See general end user documentation for installing a module.

This module requires the module Common, that should be installed first.

  • From the zip

Download the last release [BulkEdit.zip] from the list of releases, and uncompress it in the modules directory.

  • From the source and for development

If the module was installed from the source, rename the name of the folder of the module to BulkEdit.

Then install it like any other Omeka module and follow the config instructions.

Libraries

If you want to explode a pdf, you need either the command pdftoppm (poppler) (recommended) or gs (ghostscript).

Usage

The tool is available via the standard bulk process in Admin > Items, Admin > Item Sets, and Admin > Media. Simply select specific resources or all, then click Go, then select and config the process to do.

To improve search of resources, you can use module Advanced Search, that allows to search by creation date, modification date, or by visibility.

The job is launched directly when specific resources are selected, and in the background when all resources are selected.

Cleaning metadata

Trim property values

Remove leading and trailing whitespaces preventively on any resource creation or update, or curatively via the batch edit, so values will be easier to find and to compare exactly (see omeka/omeka-s#1258). Note that the curative trimming uses a regex when possible (with mysql ≥ 8.0.4 or mariadb ≥ 10.0.5). There is no difference in most of the cases, except when there are multiple whitespace mixed (space, tabulation, new line, end of line, etc.).

Specify data type of linked resources

In some cases, in particular when using resource templates with data type "resource", linked resources are saved in the database with the generic data type "resource", not with the specific "resourc:item", "resource:media, etc. This process is needed to clarify output of the facets with module Advanced Search, lists from the module Reference, and in some other places.

Clean languages

Sometime, an empty language for a value is an empty string. This option makes it null.

Modify language codes

This options allows to replace all "fr" or "fre" by "fra", or any other language code, or into an empty code. It can be limited to specific properties.

Deduplicate property values

Remove exact duplicated values on any new or updated resource preventively. Note: preventive deduplication is case sensitive, but curative deduplication is case insensitive (it uses a direct query and the Omeka database is case insensitive by default).

Replace values

Replace value of a property directly or via regex

Fill fields "Replace" and "By", specify the type of replacement (simple, html or regex), then select the properties to update.

The mode "html" means that the original string will be checked as raw string and as html encoded string too. For example, string café will be checked with café and café). Be careful when simple characters are mixed with entities, it may be difficult to replace all strings. The replacement string is used unchanged, so it is recommended to use entities for it too.

For regex, use standard patterns. Examples: - Convert spaces into "-" for identifiers: pattern = ~\s+~, replacement = -. - Convert D:20220908101423 into a normalized date: pattern = ~^D:(\d{4})(\d{2})(\d{2})~, replacement = $1-$2-$3.

Remove the literal value of a property

Simply check the box "Remove string", then select the properties to update. The string will be removed, but it can be prepended or appended with another string. In that case, the value is kept, else it is removed.

Prepend or append a string

Fill fields "Prepend" and/or "Append" and select the properties to update.

Set or remove language of a property

Select the properties to set or remove language. Note: all values of the selected properties are updated, so be aware of existing languages when they are multiple.

Remove specified media

You can remove medias of items by media-types or extensions.

Add or remove a thumbnail

Add or remove a thumbnail from resources, with or without existing thumbnail.

Order values

Order values by language

Sometime, we need that the title and the description to be displayed to be in one language, but the value in this language is not always the first in the metadata. This is important for the title and the description, that are displayed in many places.

So just write the language in the order you want for the properties you want. Values with other languages or without language will be kept after.

Set or unset visibility of a property

Select the properties to set or unset visibility.

Displace values from a property to another one

Select the source properties and the destination property, then process edit. Some filters (datatype, language, string, visibility) allows to move only selected values.

Explode a value into multiple values

This tool is useful for an import of a csv file, where the checkbox for "multivalue" was missed. Select the properties and the separator, that may have multiple character.

Merge two values as uri

Select the properties and their values will be merged two by two. This tool can only be used when the number of values is even. When a value has already the datatype "uri" with a label, it is not changed and all values of the property are skipped to avoid merge issues. When two uris follow each other, the property is skipped too. At least one value should be an url. When the label and the values are different urls, the property is skipped. It’s recommended to check order of values first.

Convert datatype

Select the source datatype and the new datatype. Only some datatype are managed currently .

Fill data

Update or remove owner

Simply set the user to use or set "Remove user" in the select.

Fill labels

Select the source datatype and the new datatype. Only some datatype are managed currently .

Explode pdf into images, mainly for iiif

To explode a pdf, you need either the command pdftoppm (poppler) (recommended) or gs (ghostscript).

Update media html from item or media

Select the items or medias and update media html, then update it like an item value.

Update media source

Update or clear the media source, for example remove the directory path for files imported with the full path. The source may be processed via regex or a string may be prepended or appended too. The settings are the same than the ones used to replace values above.

Update media types (mime-types)

Select the items or medias and set the existing and the new media type. They should be standard ones and usually the more precise possible, like application/tei+xml instead of application/xml.

Update media visibility

Update media visibility according to some metadata.

TODO

  • [x] Add conversion for custom vocab.
  • [ ] Move hard-coded xml xpath into form.

Warning

Use it at your own risk.

It’s always recommended to backup your files and your databases and to check your archives regularly so you can roll back if needed.

Troubleshooting

See online issues on the module issues page.

License

This module is published under the CeCILL v2.1 license, compatible with GNU/GPL and approved by FSF and OSI.

This software is governed by the CeCILL license under French law and abiding by the rules of distribution of free software. You can use, modify and/ or redistribute the software under the terms of the CeCILL license as circulated by CEA, CNRS and INRIA at the following URL "http://www.cecill.info".

As a counterpart to the access to the source code and rights to copy, modify and redistribute granted by the license, users are provided only with a limited warranty and the software’s author, the holder of the economic rights, and the successive licensors have only limited liability.

In this respect, the user’s attention is drawn to the risks associated with loading, using, modifying and/or developing or reproducing the software by the user in light of its specific status of free software, that may mean that it is complicated to manipulate, and that also therefore means that it is reserved for developers and experienced professionals having in-depth computer knowledge. Users are therefore encouraged to load and test the software’s suitability as regards their requirements in conditions enabling the security of their systems and/or data to be ensured and, more generally, to use and operate it in the same conditions as regards security.

The fact that you are presently reading this means that you have had knowledge of the CeCILL license and that you accept its terms.

Copyright

  • Copyright Daniel Berthereau, 2018-2024 (see Daniel-KM)

First developed for the Archives Henri Poincaré of Université de Lorraine, then improved for various projects.

Version Released Minimum Omeka version
3.4.31July 01, 2024 [info]^4.0.0
3.4.30June 03, 2024 [info]^4.0.0
3.4.29May 27, 2024 [info]^4.0.0
3.4.28May 13, 2024 [info]^4.0.0
3.4.27March 25, 2024 [info]^4.0.0
3.4.26March 18, 2024 [info]^4.0.0
3.4.25February 26, 2024 [info]^4.0.0
3.4.24February 05, 2024 [info]^4.0.0
3.4.23January 29, 2024 [info]^4.0.0
3.4.22October 30, 2023 [info]^4.0.0
3.4.21June 19, 2023 [info]^4.0.0
3.4.20May 15, 2023 [info]^4.0.0
3.4.19May 01, 2023 [info]^4.0.0
3.4.18January 09, 2023 [info]^3.1.0 || ^4.0.0
3.3.18December 19, 2022 [info]^3.1.0
3.3.16October 17, 2022 [info]^3.1.0
3.3.15September 26, 2022 [info]^3.1.0
3.3.14July 18, 2022 [info]^3.1.0
3.3.13.4January 31, 2022 [info]^3.1.0
3.3.13.3November 29, 2021 [info]^3.1.0
3.3.13.2November 15, 2021 [info]^3.1.0
3.3.13.1September 13, 2021 [info]^3.1.0
3.3.13September 06, 2021 [info]^3.1.0
3.3.12.4February 08, 2021 [info]^3.0.0
3.3.12.3January 04, 2021 [info]^3.0.0
3.3.12.2November 09, 2020 [info]^3.0.0
3.3.12.1October 26, 2020 [info]^3.0.0
3.3.12October 19, 2020 [info]^3.0.0
3.0.12August 24, 2020 [info]^1.3.0 || ^2.0.0
3.0.11August 17, 2020 [info]^1.3.0 || ^2.0.0