MySQL and charset problem

Hello,

I'm having a problem with the translation of element's names in Serbian. When I type a new element name using Serbian Cyrillic script in omeka_elements table of Omeka MySQL database, I'm getting ????? instead of letters on the webpage. I've searched the Internet for the solution and one of proposals suggests that several lines of code should be placed in the file when connection with the database is established, regarding that MySQL database and all of the tables are set to utf8_unicode_ci collation and that web page is also set to utf-8 charset, which is in this case.

These are the suggested queries to be executed:
SET NAMES utf8;
SET CHARACTER SET utf8;
SET COLLATION_CONNECTION='utf8_general_ci;

I've noticed that one programmer said that these queries are obligatory because "no one could tell what kind of conversions can happen during a query ". I don't know if this is true but would like to try. Well, I'm just a historian learning/loving IT and that's why I need help to identify which file/s are making connection with the database and extract the data. I also appreciate any other idea or suggestion how this problem could be solved.

Thanks

Hi Bogdan,

Could you share links to the different sources of information you've come across? I'm not entirely sure what this problem stems from in your particular case, so any additional information that you could share would be helpful. It appears that your Omeka site is running properly -- could you share a particular page or example of what's affected? Perhaps some text that I could paste into my own installation for testing purposes?

You may also want to double check that the collation of your database is set to 'utf8_unicode_ci', and that you're using the 'utf8' charset.

Dave

Hi Dave,

Please check this page http://www.cacak-dis.rs/kolekcije/items/show/4

Notice that there is ????? instead of Датум (Date in Serbian) bellow the image in the metadata. I've change that element name in omeka_elements table when testing how it works. I'm not sure this Cyrillic text will appear correctly on your computer, but you can try it with your database.

My Omeka 1.0beta is working, although there is still a problem with image conversion described in http://omeka.org/forums/topic/path-to-convert
Some of the images I have manually resized and placed in proper folders and after that I've got thumbnails on the pages.

You mentioned that 1.0beta is working -- which version of Omeka is the problem w/ Cyrillic text occurring in, 1.0beta or the latest version, 1.1?

The problem is with 1.0beta version. The 1.1 I've just installed parallel, but there is that exec problem which prevent me to see any of added items. So I don't know if Cyrillic text will appear or not. I'll contact server administrator and than I can tell you if 1.1 version is working with Cyrillic.

Interesting. Seems like it works fine for the headings of additional fields you've added to the end of that item, but not the Date field, which is part of the Dublin Core element set. How did you edit the Date name? Directly in the database, through PHPMyAdmin, or through some other method?

Hi Jeremy,

Additional elements fields were edited inside respective php files. When I located the correct php file, I've added the following lines on top of the document (in front of <?php ):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>

Notice that <html> tag is not closed with </html>, simple because the system works this way. This procedure was implemented in every php file I've edited. Without it (actually without meta for utf8 charset), if I change from English in Serbian the result will be question marks instead of letters. I've also tried to edit in this way index.php file inside the root folder of Omeka installation (thinking that it must be changed also for proper charset coding), but than it will not work. Nevertheless, source code of my homepage says that it's utf8 charset.

It was struggle to locate where I have to change the Dublin Core element set names, finally I look inside the database. I'm using PHPMyAdmin and Date element name was changed directly inside the database (editing omeka_elements table). Everything inside the database (tables, fields, the database itself) is utf8_unicode_ci collation.

I hope this will help.

It looks like, on that page you link to above, you declare the DOCTYPE and the <head> tag twice; Perhaps getting rid of one of those would solve the problem. I'm skeptical it will, since your other headings are working fine, but it might.

You're right, actually there are more duplicates towards the end of the code. I assume that's because that page is generated from several files. I doubt that erasing any of them will bring anything, probably I'll have problems with the text somewhere else on that page. But I'll try, thanks for suggesting.

This problem with the Cyrillic text is appearing in the 1.1 version also, see http://www.cacak-dis.rs/test/items/show/2
I managed to view added item if metadata is defined only, without uploading the image file itself (which will cause exec function to report an error, described in the other thread). So this is general issue for all recent versions of Omeka and must be related to how software is displaying/converting data from the database (or something like that, I don't know proper terminology).

Hello again,

I've managed to display Cyrillic names for some of Dublin Core elements. I noticed that there are strange looking symbols in element_texts table of MySQL, which represent Cyrillic texts added for metadata. I entered Cyrillic names of elements inside their value fields and than trace by an element id which text it is (because its unreadable). Than I copied those symbols in proper name field of elements table and it works.

See http://www.cacak-dis.rs/kolekcije/items/show/4

But the elements Title, Creator and Description can't be edited this way. The translated Title display an overall error on the site, Creator is making an error for the citation part (Error:There is no element named 'Creator'!), while Description is needed for the Featured and Recently added items on the homepage.

I suppose that the code must be edited as well with the new elements names to overpass these errors. I'll try that now, but will appreciate any help.

Hi,

After some time of inactivity I started again to work on translating Omeka's front-end in Serbian. The problem of displaying DC element names with Cyrillic characters was solved quite simple - translating element names in install.sql.php file from install folder before the installation process. With fresh install of Omeka (I'm still working with 1.0 beta version) the names are changed and properly displayed.

Now I have to deal with the Title, Creator and Description elements inside the pages, so the next step will be translating those pages where these elements occur. I hope that more good news will follow soon.

Which Elements files do you have to change once you've changed the field names using PHPMyAdmin? Thanks in advance!