Punctuation Marks

I have uploaded seven csv files. In one upload of 2800 records ' appears sometimes, but not always, as a symbol and ' and " used for dimensions of a work of art do not appear at all. Here are examples:

Côte d'Ivoire
Guinea Coast Baule
Côte d’Ivoire
Still Image Item Type Metadata
Original Format
wood, H 19 1/4”

G. O’Keeffe
O’Keeffe, Georgia. Georgia O’Keeffe.

Vladimir, Kievan Rus’

Before uploading the csv file I opened the file in OpenOffice.org and the data looked fine. Text was set to Unicode (UTF-8) and Language Default was English USA.

Any idea why this would happen with this one csv file?

Thank you.

Art History Slide Library

Can you share a row or a few rows of your CSV file that include the offending punctuation?

OpenOffice.org should be able to produce a CSV file that handles quotes and apostrophes correctly, but seeing how your file is currently formatted should help resolve matters.

Here are two rows from the CSV file. When opened in OpenOffice.org the punctuation appeared as it should. The Character Set was Unicode (UTF-8) and the Language was Default - English (USA).

http://slidelibrary.gmu.edu/images/26660.jpg 26660 Paris, Musée d’Orsay Getlein, Mark. Living with Art. 8th ed. 9/15/2009 SC Paintings Painting Dutch Netherlands V. Van Gogh Self Portrait 1889 oil on canvas, 25.5” x 21.5” Digital image Dutch painting self portraits, Vincent van Gogh figures, male, seated beards, red color, warm, cool http://slidelibrary.gmu.edu/images/25930.jpg 25930 Washington, D.C., National Gallery of Art Corn, Wanda M. The Great American Thing: Modern Art and National Identity, 1915-1935. 6/1/2009 SC Photographs Photography American United States A. Stieglitz Georgia O’Keeffe: A Portrait - Head 1918 Digital image American photography, paintings, painters portraits, female, Georgia O’Keeffe women artists composite portraits

I'm not sure what happened with this, but the piece of the CSV file you posted doesn't seem to contain apostrophes at all.

If that's how it actually looks when you open it in a text editor, then that would seem to indicate some problem on the OpenOffice side of things.

If you can get OpenOffice to display your data correctly, then choosing "Save As" and saving a new CSV from OpenOffice should get you the correct format that the CSV Import plugin is expecting.

Thank you for your replies.

In the two records above that I sent there should be an apostrophe in Musée d'Orsay in the first record and an apostrophe in Georgia O'Keeffe which appears twice in the second record. In the first record there should be quotation marks following 25.5 and 21.5 to show dimensions. Should be 25.5" x 21.5".

Is it possible to upload a small test file of records that were already uploaded in the large file of 2800 records by giving the small file a different name? I would re-save the small file as you suggest. Will Omeka replace records with the same file number or add a second copy of those records? Or must I remove the large file and do a test upload of the small file?

Omeka would add a second copy of any items that were duplicated between the two files.

You'd have to undo the first import before doing any extra tests if you want to avoid having duplicate items.

I took several records with problem punctuation marks from the CSV file and retyped the apostrophes and quotation marks. When uploaded to Omeka all data came out perfectly. I then spent over a week going over all 2800 records, retyping every ' and ". I removed the old CSV file from Omeka and uploaded the corrected file. Everything now looks as it should. There must have been a problem with the export of records from FileMaker? This is the only file with this problem. The problem punctuation marks were not easily visible in OpenOffice on a Mac but when opened in OpenOffice on a PC, the ' and " appeared as vertical rectangles. This made it a little easier to do corrections.

FileMaker may have used a character set other than UTF-8 when exporting your data, which could have easily caused problems when using other programs that make that assumption (like OpenOffice or Omeka).

Mac programs, particularly older ones, often use a Mac-specific character set called MacRoman that's not compatible with UTF-8. In particular, the way MacRoman stores "fancy" apostrophes and quotation marks doesn't map correctly to UTF-8.

It's likely that some combination of picking the right encoding when telling OpenOffice to open the file and then re-saving it in UTF-8 would have "fixed" the file without needing manual work, but it's tough to say.