Every CHAR or VARCHAR field can (or, better: must) have a character set assigned. Firebird uses this information to correctly store the bytes that make up the character string.
In order to be able to sort or compare strings, you also need to define a collation. A collation defines the sort ordering and uppercase conversions for a string.
Firebird is unable to transliterate between character sets. So you must set the correct values on the server and on the client if everything is to work smoothely.
In the German language there are the "Umlauts", special vowels with a double-dot (diaeresis) over them. A common last name in Germany is Müller. (If you don't have umlauts on your keyboard, you could also write "Mueller", but that's not what we want to discuss here ;-)
When you convert Müller to uppercase you get MÜLLER, so there is an uppercase Ü and a lowercase ü.
When you want to perform a lexicographic compare on the name, you have several options:
You can define the default character set for a new database in the CREATE DATABASE statement:
CREATE DATABASE <database> USER <username> PASSWORD <password> PAGE_SIZE <pagesize> DEFAULT CHARACTER SET <charset>
For example:
CREATE DATABASE localhost:meter USER SYSDBA PASSWORD masterkey PAGE_SIZE 4096 DEFAULT CHARACTER SET ISO8859_1;
From now on, any VARCHAR or CHAR field will default to the ISO8859_1 character set. You can, however, specify a special character set for each column:
CREATE TABLE users ( CZECH_NAME VARCHAR(50) CHARACTER SET ISO8859_2, ...
There is no default collation. So you should define a collation for every field that is to be used for sorting (ORDER BY) or comparing (UPPER):
CREATE TABLE users ( NAME VARCHAR(50) COLLATE DE_DE, ...
COLLATE DE_DE means: "use a collation for the German language (the first DE), applying the rules from Germany (the second DE)"
You can also specify the collation with the ORDER BY clause:
ORDER BY LASTNAME COLLATE FR_CA, FIRSTNAME COLLATE FR_CA
or with the WHERE clause:
WHERE LASTNAME COLLATE FR_CA = :lastnametosearch
or when searching:
WHERE UPPER (LAST_NAME COLLATE SV_SV) = 'PAULSEN';
UPPER() only works correctly if there is a collation defined for the parameter field:
WHERE UPPER (NAME COLLATE DE_DE) = 'MÜLLER';
SET NAMES ISO8859_1;
The TIb_Connection class has a string property named CharSet. Assign it the name of the character set to use:
Ib_Connection1.CharSet := 'ISO8859_1';
The TIbDatabase class has a TStrings property named Params. Add a field with the name lc_ctype and specify the character set:
IbDatabase1.Params.Add ('lc_ctype=ISO8859_1');
In PHP you define the Client Character Set when you connect (or pconnect) to the database.
$db = ibase_connect ($Name, $Usr, $Pwd, "ISO8859_1");
Conversions between character sets are always done as: CHARSET1 -> UNICODE -> CHARSET2
With NONE or OCTETS as the connection character set, the bytes are just copied: NONE/OCTETS -> CHARSET2 and CHARSET1 -> NONE/OCTETS.
I have written a separate article about this.
Character Set | Languages | Collation | Comments |
---|---|---|---|
Generic Character Sets | |||
NONE | All | NONE | No character set applied. With this character set setting, Firebird is unable to perform conversion operations like UPPER() correctly on anything other than the standard 26 latin letters. |
OCTETS | BINARY | All | OCTETS | Same as NONE. Cannot be used as client connection character set. Space character is #x00. Will be displayed as hex in ISQL 2.0. |
ASCII | English | ASCII | English |
Unicode based Character Sets | |||
UNICODE_FSS | All | UNICODE_FSS | Unicode UTF-8. An old implementation that accepts malformed strings and does not enforce correct max. string length. All characters 3 bytes, no case mapping. Superseded in Firebird 2.0 with the UTF8 character set. Deprecated. |
UTF8 | All | UCS_BASIC | UCS_BASIC sorts in Unicode code-point order (Firebird 2.0) |
UNICODE | Sorts using the Unicode Collation Algorithm (UCA) (Firebird 2.0) | ||
UTF-8 | Case insensitive collation (Firebird 2.1) | ||
UNICODE_CI_AI | Case insentive, Accent insensitive collation for Unicode (Firebird 2.5) | ||
Current Character Sets | |||
ISO8859_1 | Western Europe | ISO8859_1 | Latin-1 |
DA_DA | Danish/Danmark | ||
DE_DE | German/Germany | ||
DU_NL | Dutch/The Netherlands | ||
EN_UK | English/United Kingdom | ||
EN_US | English/USA | ||
ES_ES | Spanish/Spain | ||
ES_ES_CI_AI | Spanish/Spain, case insensitive, accent insensitive (Firebird 2.0) | ||
FI_FI | Finnish/Finnland | ||
FR_CA | French/Canada | ||
FR_FR | French/France | ||
FR_FR_CI_AI | French/France, case insensitive, accent insensitive (Firebird 2.1) | ||
IS_IS | Icelandic/Iceland | ||
IT_IT | Italian/Italy | ||
NO_NO | Norwegian/Norway | ||
PT_PT | Portuguese/Portugal | ||
PT_BR | Portuguese/Brasil (Firebird 2.0). Case+Accent insensitive | ||
SV_SV | Swedish/Sweden | ||
ISO8859_2 | Central Europe | ISO8859_2 | Central Europe |
CS_CZ | Czech | ||
ISO_HUN | Hungarian | ||
ISO_PLK | Polish (Firebird 2.0) | ||
ISO8859_3 | Southern Europe | ISO8859_3 | Maltese, Esperanto |
ISO8859_4 | North European | ISO8859_4 | Estonian, Latvian, Lithuanian, Greenlandic, Lappish |
ISO8859_5 | Cyrillic | ISO8859_5 | Russian, Ukrainian |
ISO8859_6 | Arabic | ISO8859_6 | |
ISO8859_7 | Modern Greek | ISO8859_7 | |
ISO8859_8 | Hebrew | ISO8859_8 | |
ISO8859_9 | Turkish | ISO8859_9 | |
ISO8859_13 | Baltic | ISO8859_13 | Baltic |
LT_LT | Lithuanian | ||
WIN1250 | Central Europe | WIN1250 | Central Europe |
BS_BA | Bosnian (Firebird 2.0) | ||
WIN_CZ | Czech, case-insensitive (Firebird 2.0) | ||
WIN_CZ_CI_AI | Czech, case-insensitive, accent-insensitive (Firebird 2.0) | ||
PXW_CSY | Czech | ||
PXW_HUN | Hungarian | ||
PXW_HUNDC | Hungarian, Dictionary sort | ||
PXW_PLK | Polish | ||
PXW_SLOV | Slovanian | ||
WIN1251 | Cyrillic | WIN1251 | Cyrillic |
WIN1251_UA | Ukrainian | ||
PXW_CYRL | Cyrillic, Paradox compatibility | ||
WIN1252 | Western Europe, America | WIN1252 | Latin-1 with Windows extensions |
WIN_PTBR | Brasilian Portuguese (Firebird 2.0). Case+Accent insensitive | ||
PXW_INTL | Paradox ANSI International | ||
PXW_INTL850 | Paradox Multi-Lingual Latin-1 | ||
PXW_NORDAN4 | Paradox Norwegian and Danish | ||
PXW_SPAN | Paradox Spanish | ||
PXW_SWEDFIN | Paradox Swedish, Finnish | ||
WIN1253 | Modern Greek | WIN1253 | |
PXW_GREEK | Paradox Greek | ||
WIN1254 | Turkish | WIN1254 | |
PXW_TURK | Paradox Turkish | ||
WIN1255 | Hebrew | WIN1255 | |
WIN1256 | Arabic | WIN1256 | |
WIN1257 | Baltic | WIN1257 | Baltic |
WIN1257_LV | Latvian dictionary collation (Firebird 2.0) | ||
WIN1257_LT | Lithuanian dictionary collation (Firebird 2.0) | ||
WIN1257_EE | Estonian dictionary collation (Firebird 2.0) | ||
WIN1258 | Vietnamese | Vietnamese (Firebird 2.0) | |
MS-DOS, dBASE and Paradox compatibility | |||
DOS437 | Western Europe, America | DOS437 | English/USA |
DB_DEU437 | dBASE German | ||
DB_ESP437 | dBASE Spanish | ||
DB_FIN437 | dBASE Finnish | ||
DB_FRA437 | dBASE French | ||
DB_ITA437 | dBASE Italian | ||
DB_NLD437 | dBASE Dutch | ||
DB_SVE437 | dBASE Swedisch | ||
DB_UK437 | dBASE English/UK | ||
DB_US437 | dBASE English/US | ||
PDOX_ASCII | Paradox ASCII code page | ||
PDOX_INTL | Paradox International English code page | ||
PDOX_SWEDFIN | Paradox Swedish/Finnish code page | ||
DOS737 | Greek | DOS737 | Greek |
DOS775 | Baltic | DOS775 | Baltic |
DOS850 | Western Europe, America | DOS850 | Latin-1 (without Euro € symbol) |
DB_DEU850 | dBASE German | ||
DB_ESP850 | dBASE Spanish | ||
DB_FRA850 | dBASE French/France | ||
DB_FRC850 | dBASE French/Canada | ||
DB_ITA850 | dBASE Italian | ||
DB_NLD850 | dBASE Dutch | ||
DB_PTB850 | dBASE Portuguese/Brasil | ||
DB_SVE850 | dBASE Swedish | ||
DB_UK850 | dBASE English/UK | ||
DB_US850 | dBASE English/USA | ||
DOS852 | Central Europe | DOS852 | Latin-2 (Central Europe) |
DB_CSY | dBASE Czech | ||
DB_PLK | dBASE Polish | ||
DB_SLO | dBASE Slovakian | ||
PDOX_CSY | Paradox Czech | ||
PDOX_HUN | Paradox Hungarian | ||
PDOX_PLK | Paradox Polish | ||
PDOX_SLO | Paradox Slovakian | ||
DOS857 | Turkish | DOS857 | Turkish |
DB_TRK | dBASE Turkish | ||
DOS858 | DOS858 | Latin-1 plus Euro symbol € | |
DOS860 | Portuguese | DOS860 | Portuguese |
DB_PTG860 | dBASE Portuguese | ||
DOS861 | Icelandic | DOS861 | Icelandic |
PDOX_ISL | Paradox Icelandic | ||
DOS862 | Hebrew | DOS862 | Hebrew |
DOS863 | Canadian French | DOS863 | French/Canada |
DB_FRC863 | dBASE French/Canada | ||
DOS864 | Arabic | DOS864 | Arabic |
DOS865 | Scandinavian | DOS865 | Nordic |
DB_NOR865 | dBASE Norwegian | ||
DB_DAN865 | dBASE Danish | ||
PDOX_NORDAN4 | Paradox Norwegian | ||
DOS866 | Russian | DOS866 | Russian |
DOS869 | Greek | DOS869 | Modern Greek |
Others | |||
BIG_5 | Chinese | BIG_5 | Chinese |
KOI8R | Russian | Russian character set and dictionary collation (Firebird 2.0) | |
KOI8U | Ukrainian | Ukrainian character set and dictionary collation (Firebird 2.0) | |
CYRL | Russian/Ukrainian | CYRL | Cyrillic |
DB_RUS | dBASE Russian | ||
PDOX_CYRL | Paradox Cyrillic | ||
KSC_5601 | Korean | KSC_5601 | Unified Korean Hangeul, also known as windows-949 |
KSC_DICTIONARY | Korean dictionary ordering | ||
NEXT | NeXT Computers | NEXT | NeXTSTEP encoding |
NXT_DEU | German | ||
NXT_ESP | Spanish | ||
NXT_FRA | French | ||
NXT_ITA | Italian | ||
NXT_US | US-English | ||
SJIS_0208 | Japanese | SJIS_0208 | Shift-JIS |
EUCJ_0208 | Japanese | EUCJ_0208 | EUC Japanese |
GB_2312 | Chinese | GB_2312 | Simplified Chinese (HongKong, PRC), a subset of GBK/windows-936 |
CP943C | Japanese | CP943C_UNICODE | Japanese character set (Firebird 2.1) |
TIS620 | Thai | TIS620_UNICODE | Thai character set, single byte (Firebird 2.1) |
The question now is: which character set do I choose for my database?
Note: I don't have any experience with Asian scripts (Chinese, Korean, Japanese) so I can't give you any hint on these
The Unicode situation dramatically improved with Firebird 2.0. Now there is the new UTF8 character set that correctly handles Unicode strings in UTF-8 format. The Unicode collation algorithm has been implemented so now you can use UPPER() and the new LOWER() function without the need to specify a collation.
See also: www.firebirdsql.org/index.php?op=doc&id=fb_1_5_charsets
See also: www.collation-charts.org