UTF-8 support
From TYPO3Wiki
<< Back to Help, tips and troubleshooting page
When the content is good enough, please change the {{draft}} tag to {{review}} .
Contents |
Introduction
On this page we collect some information about the good old UTF-8 topic. There are many options to set and check. A good start is to make sure that everything in the chain is set to UTF-8, starting with the Apache, php.ini, my.cnf going to your TYPO3 settings. In some cases not all settings are necessary and everything will run fine even without certain changes. But at least you will find a checklist here, what could be responsible for awful characters or loads of question marks on your website.
General Settings
php.ini
Settings in php.ini
PHP extensions that should be enabled
extension=php_iconv.so
This also needs a setting in localconf.php (see below).
Don't enable the mbstring.func_overload setting with the name "php_mbstring.so". While it's generally useful in UTF-8 setups, it conflicts with Typo3's internal characterset handling t3lib_cs.
Mail encoding problems
This hint was found on the German web page http://www.exanto.de/typo3-und-utf-8.html. It says that the following setting inside php.ini will help you with problems you might encounter in sending mails via direct_mail:
mbstring.internal_encoding = UTF-8
For the case you can't edit the php.ini by yourself you also can set that option inside your PHP-Script (i.e. in the two index.php-Files of the TYPO3-Dummy and the TYPO3-Source):
mb_internal_encoding("UTF-8");
Further information about mbstring-functions you can find in the manual of php: http://us.php.net/manual/en/ref.mbstring.php
Apache vhost.conf
AddDefaultCharset UTF-8
my.cnf
Be careful with this setting! It will of cause affect also existing LATIN1 databases. So only set this when you are sure of what you are doing and only UTF-8 databases are supposed to be on the server. Also you don't need this when you set these TYPO3 options in the install tool.
[mysqld] default_character_set = utf8
TYPO3 settings
TypoScript setup
For the correct rendering of your frontend you should set these two options in the setup field of your TypoScript root template:
config.renderCharset = utf-8
Note: When you set config.renderCharset config.metaCharset will be set per default to the same value. When you set both values TYPO3 will use renderCharset internally and convert the generated page right before delivering it to the browser.
Note: If you set the forceCharset to utf-8, then this will be default for renderCharset, too. So this is not strictly needed at all.
localconf.php
// For backend charset $TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8'; // For GIFBUILDER support // Set it to 'iconv' or 'mbstring' $TYPO3_CONF_VARS['SYS']['t3lib_cs_convMethod'] = 'iconv'; // For 'iconv' support you need PHP 5! $TYPO3_CONF_VARS['SYS']['t3lib_cs_utils'] = 'iconv';
TYPO3 Install Tool Options
[setDBinit] String (textarea): Commands to send to database right after connecting, separated by newline. Ignored by the DBAL extension except for the 'native' type!
SET CHARACTER SET utf8; SET NAMES utf8; SET SESSION character_set_server=utf8;
In most cases it is sufficient to add this to the localconf.php:
$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';
More info see http://bugs.typo3.org/view.php?id=3547 Please note that each command in setDBinit is basically an SQL statement, and thus needs to have a semicolon behind each command.
Extensions
collect information here that is related to extensions
RealURL
One problem is that RealURL might not be able to understand a page title if it is in unusual (i.e. not Roman) characters. For example, with a page title in Japanese, I found that the title was not interpreted and the page was rendered as jp.html. Using the Navigation title solves this problem (to follow on the example, setting "home" as the Navigation title, my page was then rendered as jp/home.html).
Extensions that use strlen instead of t3lib_cs
Info: strlen doesn't care for UTF-8. UTF-8 uses 1 to 3 Bytes for one char.
What are the extensions what need a fix?
Further information
Database issues
While it is not strictly necessary to use UTF-8 in the database (for example MySQL), it is highly recommended. Otherwise database sorting functions will not work correctly.
Describe problems with UTF-8 in MySQL, versionnr?
MySQL As usual has problems when it comes to more advanced function.
You might encounter this error:
SQL=Specified key was too long; max key length is 1000 bytes:
This particular problem occurs when you are using UTF-8 encoding. UTF-8 uses up to 3 bytes per character, and the maximum index length is 1000 bytes, so the effective maximum index is 1000/3 =333 characters. Some tables are longer than this, hence the error (many other packages are being bitten by this issue too).
To solve this, simply remove the index from that field and reload.
Note: Using indexes that big anyway is not recommended and shows bad DB design.
Convert an already existing database to UTF-8
Some links to the conversion topic:
- http://dev.mysql.com/doc/refman/4.1/en/charset-convert.html (MySQL based conversion)
- http://www.typo3-media.com/blog/article/utf8-and-typo3-updated.html
- http://m.tacker.org/blog/64.script-to-convert-wordpress-content-encoding.html (useful PHP script to convert charsets)
- Php script to convert database can be found on Talk:UTF-8 support
- 6098: BE should check Mysql charset settings [closed to Michael Stucki]
Example:
CONVERT TYPO3 DATABASE TO UTF8 CHARACTER SET URL: http://tlug.dnho.net/?q=node/276 --------------------------------------------
Requirements: - Shell access to your unix based server - "Sed" package install on server Attention (for this example we assume): hostname: myhost.freedomson.com database: typo3
#Connect to server via ssh (this example is for linux users, windows users use putty.exe) ssh -l (user) myhost.freedomson.com #backup database for security reasons mysqldump -u (user) -p(pass) --max_allowed_packet=10000000 typo3 > typo3_utf8.sql #Dump database (without table typo3.sys_refindex*) mysqldump -u (user) -p(pass) --max_allowed_packet=10000000 --ignore-table=typo3.sys_refindex typo3 > typo3_utf8.sql #Convert all instances of latin1 (or your own character set) in typo3_utf8.sql to utf8 sed -e 's/latin1/utf8/g' -i typo3_utf8.sql #import database mysql -u (user) -p(pass) --default-character-set=utf8 typo3 < typo3_utf8.sql #alter database character set and collate mysql -u (user) -p(pass) -e "ALTER DATABASE typo3 DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_bin"
(*)Prevents SQL=Specified key was too long; max key length is 1000 bytes
t3lib_cs
Developers: Use these functions e.g. to get the length of a string. strlen doesn't get the correct string-length, because the chars of UTF-8 can have 1...3 Bytes.
In PHP 5.3 PECL/intl will be available, so maybe the TYPO3 Core-Developers switch to this.
bugtracker items
- 7869: <br /> and <link> tags not properly converted but instead escaped and displayed literally in (in conjunction with UTF-8, umlaut) [closed]: Don't use the function overload feature of mbstring. TYPO3 doesn't work with it as it does it's own character handling. The installer should check this setting and isse a warning.
- 7882: Cannot import previously exported t3d file [feedback to Oliver Hader]: It seems that it's again the overloading. The only fixed byte length is used for the beginning of the data structure.
Fonts
Info about what fonts are available.
HTML Tidy
If you are having problems with html entities like shown as ? in the browser, add the -utf8 option to the HTML tidy_path variable in the install tool, e.g.
$TYPO3_CONF_VARS['FE']['tidy_path'] = 'tidy -i --quiet true --tidy-mark true -wrap 0 -raw --output-xhtml true -utf8'
External links
- http://dev.mysql.com/doc/refman/4.1/en/charset.html
- http://en.opensuse.org/SDB%3AConverting_Files_or_File_Names_to_UTF-8_Encoding
- How to change the encoding like iso-8859-1, iso-8859-15, utf-8 of files: http://linuxwiki.de/tcs (just in german in the moment)
- GMENU & GIFBUILDER and fonts: http://people.merea.se/david/2007/03/08/gmenugifbuilder-broke-my-swedish-characters/
