CSVImport stays cool and does nothing · Legacy Forums

Pavel Kats June 28, 2012

Greetings,

I am testing CSVImport on a very fresh Omeka installation. After specifying the file and pressing "Next", I get the same screen back without anything happening in the system. I have checked that test files are imported normally. Where should I start looking for problems? Are there any debug settings one can enable in order to get the error?

Disclaimer: the file contains Unicode characters. Is this officially a problem? (I can see some other threads with Unicode problems, but I cannot get whether it should or should not work in the end of the day).

Thanks a lot,
Pavel

patrickmj June 28, 2012

Pavel,

You can turn on error messages with these instructions. I'm not sure if Unicode characters would cause a problem. Hopefully we'll get more info from the error logs.

Patrick

Pavel Kats June 28, 2012

Hello Patrick,

Thank you for the reply. Error messages have been turned on, but no error appears.
CSVImport still does nothing and returns to the Step1 page.

Pavel

John Flatness June 28, 2012

Is this a pretty big CSV file you're working with?

You can get this kind of problem if the file you're uploading is bigger than PHP's post_max_size setting.

Pavel Kats June 28, 2012

Hi John,

Thanks for the reply. No, the file contains only two rows: the header and one line with data. I removed on purpose all the rest to rule out size issues.

Regards,
Pavel

Pavel Kats June 29, 2012

Hello,

I think i am experiencing the same issue that was raised here in some other threads. The issue is related to proper UTF-8 settings on the server machine in order to have CSVImport working with non-English chars.

Is there any educated instruction from the Omeka team as for the proper configuration that we should do on the server?

Thanks a lot,
Pavel

patrickmj June 29, 2012

If you could post a link directly to the file, that will let us check whether it appears to be something on your server.

Pavel Kats June 29, 2012

Hello Patrick,

There is no need to check, I can tell for sure that it is something on my server. I tried it on Omeka.NET and it worked.

The question is what is it on my server? What should I check?

Regards,
Pavel

SheilaBrennan June 29, 2012

Just to double check, is the server configured with all of these requirements? http://omeka.org/codex/Preparing_to_Install

And is the database collation set to 'utf8_unicode_ci' and that the charset is 'utf8' ?

Pavel Kats June 29, 2012

Hello Sheila,

Indeed, everything is according to the requirements and the collation and character set parameters are set right. I double checked it.

I have stripped the problematic file down to two lines with non-English chars. Still, my own installation does nothing and returns to Step 1 and my Omeka.NET installation proceeds to Step 2. I can send in private all the URLs for your testing.

Thanks you for the help.
Pavel

Paul Buchanan July 2, 2012

I don't know if it has anything to do with your problem, but I experienced the same CVS import behavior (clicking "Next" does nothing except return to the same form with no errors) when my import file (which I had obtained from harvesting another server) contained decomposed Unicode. When I ran the file through iconv to turn everything into precomposed characters, things started working. Just a thought.

Pavel Kats July 10, 2012

Thank you, Paul.

After some looking at the code I have reached a perhaps trivial enlightenment: CSVImport uses the fgetcsv function which is officially known not to support UTF out of the box.

There are solutions to overcome this obstacle in case you know in advance the encoding of your text (by tweaking the locale). However, this solution is not good for us because (1) We do not know the encoding of the text in advance, and (2) we have got files with several languages inside, so we have to use Unicode.

Apparently, the sad news is that I cant use CSVImport for the kind of data I have. Is there any workaround that the esteemed team or the users can think of?

Thanks,
Pavel

John Flatness July 10, 2012

Pavel,

You're correct that CSV Import uses PHP's fgetcsv, and that fgetcsv has behavior that depends on the system's current locale setting.

UTF-8 text, supporting all of Unicode, should work fine, as long as your system is set to a UTF-8 locale. I believe you've said your CSV file is UTF-8 encoded, so you do know the encoding ahead of time.

We've suspected for a long while that it's the locale setting that's the culprit here, but we've thus far been unable to reproduce this problem even by trying to switch to a non-UTF-8 locale.

If you could share both the stripped-down problematic file, and your server's current locale settings (the output of the shell command locale, and/or the PHP code <?php echo setlocale(LC_ALL, 0); ?>), that would be very helpful.

Pavel Kats July 11, 2012

Hello John,

Thank you for the response. Under (1) I meant that my only option when UTF does not work is to save the text under specific encoding and to use locale to read it. But since I do not know the encoding, I have to use UTF, which is not good...

Anyway, here are all the details. (The text is Hebrew).

The problematic file: http://107.21.219.6/omeka/UTF_CSV.txt

You can witness yourself the behavior of CSVImport if you enter to

http://107.21.219.6/omeka/admin/csv-import (login with [removed])

and try this file. For some reason, now it does jump to the next step but the characters are still not read.

Here is the output of the shell command locale:

<br />
ip-10-126-45-121:~# locale<br />
LANG=<br />
LC_CTYPE="POSIX"<br />
LC_NUMERIC="POSIX"<br />
LC_TIME="POSIX"<br />
LC_COLLATE="POSIX"<br />
LC_MONETARY="POSIX"<br />
LC_MESSAGES="POSIX"<br />
LC_PAPER="POSIX"<br />
LC_NAME="POSIX"<br />
LC_ADDRESS="POSIX"<br />
LC_TELEPHONE="POSIX"<br />
LC_MEASUREMENT="POSIX"<br />
LC_IDENTIFICATION="POSIX"<br />
LC_ALL=<br />

Thanks

John Flatness July 11, 2012

Okay, once again (of course), when trying your CSV file on a few of my installations, I get what I assume is the intended result: three Hebrew aleph characters.

And, once again, trying to manually set my locale to POSIX seems to succeed, but doesn't affect the operation of the CSV Import plugin: the text still imports fine.

Can you additionally post the results of the shell command locale -a, which should list all the available locale settings for that server?

Pavel Kats July 11, 2012

Hello John,

Here is the result of the command:

ip-10-126-45-121:~# locale -a C POSIX

Pavel

John Flatness July 11, 2012

That's interesting; normally I'd expect there to be more locale options to choose from.

If you have root or sudo access you should be able to add additional locales yourself. What you're looking for is a locale with UTF-8 as the charset. Running this command (as root or with sudo) should create such a locale for you (this example uses US English as the language, but it's the UTF-8 parts that are important).

localedef -c -f UTF-8 -i en_US en_US.UTF-8

Pavel Kats July 11, 2012

Thanks, John!
I am running into some problems with this localedef command, but they are probably due to a bug with locales in my version of Linux (Debian).

As soon as I overcome them, I shall let you know.
(but if you got experience with locales on Debian, I could use a help!).

Thanks,
Pavel

John Flatness July 11, 2012

For Debian, if I remember correctly, you want to use their locale-gen tool that tries to take care of locale generation for you.

There's a config file at /etc/locale.gen, which lists the locales you want available on your system. You'd want to have a line in there like this (at least):

en_US.UTF-8 UTF-8

After editing that file, you should just be able to run locale-gen and verify that you have new locales available with locale -a.

John Flatness July 11, 2012

This Debian wiki page seems to indicate that Debian might have a more user-friendly way to handle this, with the command:

dpkg-reconfigure locales

Pavel Kats July 11, 2012

Hi John,

Thanks a lot! I have installed the locale properly and now it shows UTF.

But I am not there yet. For some reason (I had to restart apache) I am back to square one, where the form does not move forward to mapping stage, but redirects me again to Step 1. I do not know what it was last time that moved be further to Step 2.

Any ideas?

Thanks,
Pavel

Pavel Kats July 11, 2012

(You can see it yourself using the link I posted earlier).

John Flatness July 12, 2012

That is bizarre that it would work (a little) better for a while for seemingly no reason (I can confirm).

What I would have asked you to try way back when I asked you to run locale -a is setting the locale in PHP to see if that improved matters. We got sidetracked for a while making sure you actually had a UTF-8 locale to work with, but we can go back to that now.

What I'd like you to try is: add one line to the paths.php file in your Omeka install, right at the top (right after the <?php at the top):

setlocale(LC_ALL, 'en_US.UTF-8');

That line (finally) should be what would change how fgetcsv interprets the input file, and make it expect UTF-8. Fingers crossed...

Pavel Kats July 12, 2012

Alas...Did not help. I restarted the apache just in case. Also tried to save the file with/without BOM, did not help.

John Flatness July 12, 2012

Well, that's disappointing.

Everything I can look up, even digging into PHP's code a little, seems to indicate that this should be a locale problem.

There are other things you could try, like doing that dpkg-reconfigure and also picking UTF-8 as your default locale (assuming you didn't already do that), or adding putenv("LANG=en_US.UTF-8") also to paths.php, but I'm running a little low on ideas here.

Updating to Debian 6.0 "squeeze" might well fix things also, but that could easily be too much of a hassle.

Pavel Kats July 12, 2012

Hi John,

Neither of the two did help, unfortunately.

Can you point me out the place in the code, where the shift to UTF should have changed the situation? In other words, what is the place (roughly) where this thing matters?

After some digging in the code I can see that there is the following code in the function mapColumnsAction (in the beginning of it) of the class CsvImport_IndexController:

if (!$this->_sessionIsValid()) { return $this->_helper->redirector->goto('index'); }

This is actually where the function fails and enters into the block. But what does it have to do with UTF?

Regards,
Pavel

John Flatness July 13, 2012

It's really just in fgetcsv that the locale would be expected to alter things. Several parts of the code use fgetcsv, but I'd point the finger at the part of the controller that tries to parse the CSV file (just read the first few rows to get column names and sample data).

The part of the code you point out fails when all the data about the CSV file that should be saved in the session isn't there. This really shouldn't depend on the locale, but that, combined with the encoding of the culprit files, certainly seems to be the common denominator in these problems.

Pavel Kats July 13, 2012

I can see the code using fgetcsv in the class RowIterator of the controller (the function _getNextRow). Is that what you mean?

If that is the case, would it work if I substitute locally this code with a code that does not use fgetcsv?

John Flatness July 17, 2012

Substituting something equivalent for fgetcsv might very well solve your issue, but it could be quite the undertaking.

There is a new CSV Import version, but I don't really expect it to change the behavior you're seeing, with the exception that it should be much better about showing some error or warning message instead of silently kicking you back to the plugin's start page.