UTF-8 support

From TYPO3Wiki

(Redirected from UFT-8 support)
Jump to: navigation, search

<< Back to Help, tips and troubleshooting page

[edit]

This is a draft version. You're welcome to edit it.
When the content is good enough, please change the {{draft}} tag to {{review}} .



Note Please delete what is outdated, and move the stuff from Talk-Page - some things need a change to this page


Contents

Introduction

On this page we collect some information about the good old UTF-8 topic. There are many options to set and check. A good start is to make sure that everything in the chain is set to UTF-8, starting with the Apache, php.ini, my.cnf going to your TYPO3 settings. In some cases not all settings are necessary and everything will run fine even without certain changes. But at least you will find a checklist here, what could be responsible for awful characters or loads of question marks on your website.

General Settings

php.ini

Settings in php.ini

PHP extensions that should be enabled
extension=php_iconv.so

If you're using php5 iconv is enabled by default and you don't need the extension.

This also needs a setting in localconf.php (see below).

Don't enable the mbstring.func_overload setting with the name "php_mbstring.so". While it's generally useful in UTF-8 setups, it conflicts with Typo3's internal characterset handling t3lib_cs.

Mail encoding problems

This hint was found on the German web page http://www.exanto.de/typo3-und-utf-8.html. It says that the following setting inside php.ini will help you with problems you might encounter in sending mails via direct_mail:

mbstring.internal_encoding = UTF-8

For the case you can't edit the php.ini by yourself you also can set that option inside your PHP-Script (i.e. in the two index.php-Files of the TYPO3-Dummy and the TYPO3-Source):

mb_internal_encoding("UTF-8");

Further information about mbstring-functions you can find in the manual of php: http://us.php.net/manual/en/ref.mbstring.php

Apache vhost.conf

AddDefaultCharset UTF-8

my.cnf

Be careful with this setting! It will of cause affect also existing LATIN1 databases. So only set this when you are sure of what you are doing and only UTF-8 databases are supposed to be on the server. Also you don't need this when you set these TYPO3 options in the install tool.

[mysqld]
default_character_set = utf8

TYPO3 settings

TypoScript setup

For the correct rendering of your frontend you should set these two options in the setup field of your TypoScript root template:

config.renderCharset = utf-8

Note: When you set config.renderCharset config.metaCharset will be set per default to the same value. When you set both values TYPO3 will use renderCharset internally and convert the generated page right before delivering it to the browser.

Note: If you set the forceCharset to utf-8, then this will be default for renderCharset, too. So this is not strictly needed at all.

localconf.php

// For backend charset
$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
 
// For GIFBUILDER support
// Set it to 'iconv' or 'mbstring'
$TYPO3_CONF_VARS['SYS']['t3lib_cs_convMethod'] = 'iconv';
// For 'iconv' support you need PHP 5!
$TYPO3_CONF_VARS['SYS']['t3lib_cs_utils'] = 'iconv';
Note If you set your database to UTF-8, do not use the setting $TYPO3_CONF_VARS['SYS']['multiplyDBfieldSize'] = 3 for Asian languages - it is not needed and only wastes space! Simply remove the index from that field and reload


TYPO3 Install Tool Options

[setDBinit] String (textarea): Commands to send to database right after connecting, separated by newline. Ignored by the DBAL extension except for the 'native' type!

SET CHARACTER SET utf8;
SET NAMES utf8;
SET SESSION character_set_server=utf8;

In most cases it is sufficient to add this to the localconf.php:

$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';

More info see http://bugs.typo3.org/view.php?id=3547 Please note that each command in setDBinit is basically an SQL statement, and thus needs to have a semicolon behind each command.

Extensions

collect information here that is related to extensions

RealURL

One problem is that RealURL might not be able to understand a page title if it is in unusual (i.e. not Roman) characters. For example, with a page title in Japanese, I found that the title was not interpreted and the page was rendered as jp.html. Using the Navigation title solves this problem (to follow on the example, setting "home" as the Navigation title, my page was then rendered as jp/home.html).

Extensions that use strlen instead of t3lib_cs

Info: strlen doesn't care for UTF-8. UTF-8 uses 1 to 3 Bytes for one char.

What are the extensions what need a fix?

Further information

Database issues

While it is not strictly necessary to use UTF-8 in the database (for example MySQL), it is highly recommended. Otherwise database sorting functions will not work correctly.

Note If you set your database to UTF-8, do not use the multiplyDBFieldSize setting - it is not needed and only wastes space!


Describe problems with UTF-8 in MySQL, versionnr? MySQL As usual has problems when it comes to more advanced function.

You might encounter this error:

SQL=Specified key was too long; max key length is 1000 bytes:

This particular problem occurs when you are using UTF-8 encoding. UTF-8 uses up to 3 bytes per character, and the maximum index length is 1000 bytes, so the effective maximum index is 1000/3 =333 characters. Some tables are longer than this, hence the error (many other packages are being bitten by this issue too).

To solve this, simply remove the index from that field and reload.

Note: Using indexes that big anyway is not recommended and shows bad DB design.

Convert an already existing database to UTF-8

Some links to the conversion topic:


Example:

 CONVERT TYPO3 DATABASE TO UTF8 CHARACTER SET
 URL: http://tlug.dnho.net/?q=node/276
 --------------------------------------------
 Requirements:
 - Shell access to your unix based server
 - "Sed" package install on server 
 Attention (for this example we assume):
 hostname: myhost.freedomson.com
 database: typo3
 #Connect to server via ssh (this example is for linux users, windows users use putty.exe)
 ssh -l (user) myhost.freedomson.com
 #backup database for security reasons
 mysqldump -u (user) -p(pass) --max_allowed_packet=10000000 typo3 > typo3_backup.sql
 #Dump database (without table typo3.sys_refindex*)
 mysqldump -u (user) -p(pass) --max_allowed_packet=10000000 --ignore-table=typo3.sys_refindex  typo3  > typo3_utf8.sql
 #Convert all instances of latin1 (or your own character set) in typo3_utf8.sql to utf8
 sed  -e 's/latin1/utf8/g' -i typo3_utf8.sql
 #import database
 mysql -u (user) -p(pass) --default-character-set=utf8  typo3 < typo3_utf8.sql
 #alter database character set and collate 
 mysql -u (user) -p(pass) -e "ALTER DATABASE typo3 DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_bin"

(*)Prevents SQL=Specified key was too long; max key length is 1000 bytes

t3lib_cs

Developers: Use these functions e.g. to get the length of a string. strlen doesn't get the correct string-length, because the chars of UTF-8 can have 1...3 Bytes.

In PHP 5.3 PECL/intl will be available, so maybe the TYPO3 Core-Developers switch to this.


bugtracker items

Fonts

Info about what fonts are available.

HTML Tidy

If you are having problems with html entities like &nbsp; shown as ? in the browser, add the -utf8 option to the HTML tidy_path variable in the install tool, e.g.

$TYPO3_CONF_VARS['FE']['tidy_path'] = 'tidy -i --quiet true --tidy-mark true -wrap 0 -raw --output-xhtml true -utf8'

External links

Personal tools