Talk:UTF-8 support

From TYPO3Wiki

Jump to: navigation, search

Please help to compare this UTF-8 - article with the current TYPO3 4.0.x. Thanks, DocTeam

Contents

forceCharset, iconv extension, setDBinit directive, default_character_set, multiplyDBfieldSize directive

This article appears pretty out-of date to me. There is no need to modify the Apache configuration when forceCharset is set. Additionally this option also sets config.metaCharset and config.renderCharset automatically.

The iconv extension is probably not needed because TYPO3 contains an own implementation for that. However, it is used when it exists on the system, therefore I would change the text and say it is recommended but not required.

I also don't know any details about the email encoding problem but also think that this should be fixed completely in TYPO3 4.0. If there are any problems left, please get in touch with me and let me know.

The setDBinit directive is also not needed if default_character_set is specified in my.cnf.

Last but not least, the multiplyDBfieldSize directive seems also not needed if default_character_set is being used. Please check.

max key length is 1000 bytes -- 333 bytes

You might encounter this error:

SQL=Specified key was too long; max key length is 1000 bytes:

This particular problem occurs when you are using UTF-8 encoding. UTF-8 uses 3 bytes per character, and the maximum index length is 1000 bytes, so the effective maximum index is 1000/3 =333 characters. Some tables are longer than this, hence the error (many other packages are being bitten by this issue too).

To solve this, simply remove the index from that field and reload. 

And how does one "remove the index from that field"?

Another solution

I had this same problem while installing MediaWiki. One way to get around this problem is declaring single tables as latin1 (or whatever). you can do this with

CREATE TABLE tablename (... ) DEFAULT CHARSET=latin1

-- mati

more things to check or to change

.

typo3 sets incorrectly mysql tables at installation

I have installed typo3 (ver. 4.0.4. with php 5.1.6-5 [debian] and mysql 5.0.30-Debian_1-log) and only after some time I have realized that thought utf-8 seem to work correctly everywhere, the instalation has set incorrectly all collations of all db fields to default latin1_swedish_ci.

The result is that it is impossible to use the db in php connections which are correctly set up [as utf-8]. The settings mentioned on this page [particularly [setDBinit] String setting] sets the connection right, but then all int. characters on front page are crippled.

The situation is that thought typo3 stores all text values in utf-8, but to latin1 fields - so for example if you look at them in phpmyadmin they come out incorrectly. If you want to correctly set up typo3 according to this page you should alo convert whole database to utf8 somehow. The CONVERT TO sql command is useless, because it tries to convert characters from latin1 to utf-8 so they finished "doubleencoded". The correct way to do it is to convert all fields manually first to BINARY [which causes no conversion] and than back to utf8. You can use this script for this:

<?php
if ($argc != 3 || !mysql_select_db($argv[1])) {
   exit("Usage: php $argv[0] db collation\n");
}
$collation = $argv[2];

function mysql_convert($query)
{
   //~ echo "$query;\n";
   return mysql_query($query);
}

mysql_convert("ALTER DATABASE $argv[1] COLLATE $collation");
$result = mysql_query("SHOW TABLES");
while ($row = mysql_fetch_row($result)) {
   mysql_convert("ALTER TABLE $row[0] COLLATE $collation");
   $result1 = mysql_query("SHOW COLUMNS FROM $row[0]");
   while ($row1 = mysql_fetch_assoc($result1)) {
       if (preg_match('~char|text|enum|set~', $row1["Type"])) {
           mysql_convert("ALTER TABLE $row[0] MODIFY $row1[Field] $row1[Type] CHARACTER SET binary");
           mysql_convert("ALTER TABLE $row[0] MODIFY $row1[Field] $row1[Type] COLLATE $collation" .
                ($row1["Null"] ? "" : " NOT NULL") . ($row1["Default"] && $row1["Default"] != "NULL" ? " DEFAULT '$row1[Default]'" : ""));
       }
   }
}
mysql_free_result($result);
?>

this works from command line like that:

 php convert.php dbase collation

also you might need to add st. like

 mysql_connect("localhost","username","password");

at the beginning of the script.

The solution is translated from the Czech page http://php.vrana.cz/prevod-kodovani-mysql.php Hope it helps.


Wiki-Question:
My question to typo3 experts is: how to install typo3 so all this is not necessary. User:Gorn at 2007-02-06
Please remove "{{Question}}" when the problem is solved. See all questions.

You might want to skip the sed/chgrep part

I experienced trouble whenever I changed my dump-file (replaced latin1 with utf8), i couldn't get the special characters to be displayed correctly thereafter while using "forceCharset = utf-8".

So in the end it seemed to work out for me following these steps:

1. make the sql-dump (with phpmyadmin, include drop tables)

2. create a new database (keep the old one as a backup;)

3. if for some reason this new database is not yet set to utf-8 (e.g. because you had to do it with an admin-tool like plesk), try to get shell access and execute something like: (replace the brackets) mysql --user=(user) --password=(password) -e "ALTER DATABASE (new-db) DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;" if you can't do this, don't give up, you might be able to pull it off nevertheless.

4. import the dump-file with the typo3 install-tool (or for bigger dumps use bigdump.php) without replacing latin1 with utf8! this means your old tables will still be latin1, though... as I just migrated in order to display cyrillic and arabic language correctly, this didn't bother me, however. furthermore, this might turn out wrong when exporting from a mysql-version earlier than 4.1.

5. go to the install-tool (basic configuration), to set the correct user and (new) database

6. still in the install-tool (all configuration), set: forceCharset = utf-8

7. in the main template of your site, put: config.renderCharset = utf-8

8. if you're a religious person, this might be the time for a quick prayer.


should the special characters still be displayed wrongly at this point, the extension "convert2utf8" might be worth a shot. mike --80.219.160.158 04:16, 12 October 2007 (CEST)

Personal tools