csv import unicode issues

Hi,

i'm newbie with Omeka and i'm trying to import a csv file with 2200 records.

I have some unicode issues and i don't know whet else to do, to resolve them.

More specific, i'm trying to import records that have greek characters. After various unsuccesful attemps (always the greek characters displaying as ?????? in the mapping screen) i'd decide to test with test.csv file that comes with csv import plugin installation.

I downloaded test.csv, i opened it with Notepad++ and i edit a record with some greek characters. When i trying to import it, again the ????? are there.

I tried via Notepad++ to change the encoding to UTF-8 without BOM and UTF-8 but then the file does not uploading.

If someone can help, i'll appreciate it. I spend the last 10 hours to this issue.

Thanks

Is your database collated with utf8_unicode_ci ?

thanks,

yes, when i pass records by hand one-by-one the greek characters displaying without issues.

when i trying to import via CSV, and before import then in database, the greek characters displaying as ??????

i'm sure that the issue concerns the ANSI (windows -1252) that csv files has when i opening it via Notepad++. But when converting it to UTF-8 simply does not importing. (the same happens when i convert test.csv that comes with csv import plugin installation, to UTF-8)

Have you tried importing the spreadsheet, with the Greek characters, but without converting the csv to UTF-8 when saving? You shouldn't need to since the database will be able to read the characters.

Thanks for you help.

yes, i did this. Unfortunately, in fields that the csv imports the greek characters (i mean the ??????), for example in title, the database not takes all the string but only strings like dots or numbers.

It drives me crazy, i don;t know what else to do. I tried to convert the csv to utf-8 via Google docs, MS Excel, Open Office, Libre Office, Notepad++ and the simple Notepad. No luck.

It's interesting.. when I tried to test this with an old Omeka 1.4 test instance and an older version of the CSV plugin I was able to specify a UTF-8 CSV file (with Unix line breaks) with Japanese and Greek text in it but unable to load it. The import stalled at 0 items imported. When I then tried it with Omeka 1.5.1 and the current version of the CSV plugin it wouldn't even let me select the CSV file in the plugin. It just keeps returning me to the import dialogue. I'm using a new UTF-8 collated MySQL database. In each case I was using the test.csv file in the plugin and changing a few words to Japanese and Greek.

FWIW I'm using BBEdit on OSX as my text editor.

Ok, i quit. The conclusion after many-many attempts is that CSV import plugin is unable to load a utf-8 csv file. When the csv file is encoded as ANSI works perfectly but of course it is unable to import correctly no-latin characters.

i decide to try omeka.net hosting to see if there the utf-8 csv file is loaded successfully.

I enabled a basic account, i install the csv import plugin and i tried. Voila!!! The utf-8 csv file has been uploaded successfully and all no-lation characters (greeks, accents etc.) are correct.

So the issue is on our server. But what can be the issue with utf-8 csv files in a simple cpanel server?

I posted this issue on the plugin's Github page and received a note back from the developer with a possible answer - it's dependent on the locale settings used by the PHP scripts which import the data https://github.com/omeka/plugin-CsvImport/issues/7

Since the conversation is moving to the github discussion, I'm closing the issue here so we can keep all the conversation in one place.