gettext
gettext
gettext
declarationgettext
Operationsxgettext
Program
msginit
Program
msgmerge
Program
gettext
Installationmsgcat
Program
msgconv
Program
msggrep
Program
msgfilter
Program
msguniq
Program
msgcomm
Program
msgcmp
Program
msgattrib
Program
msgen
Program
msgexec
Program
--color
optionTERM
--style
optionless
for viewing PO filesmsgfmt
Program
msgunfmt
Program
catgets
catgets
Interface?!gettext
gettext
usesgettext
grokcatgets
gettextize
Programautopoint
Program
gettext.sh
gettext
programngettext
programenvsubst
programeval_gettext
functioneval_ngettext
functiongettext
Next: Introduction, Previous: (dir), Up: (dir) [Contents][Index]
gettext
utilitiesThis manual documents the GNU gettext tools and the GNU libintl library,version 0.19.8.
• Introduction: | Introduction | |
• Users: | The User’s View | |
• PO Files: | The Format of PO Files | |
• Sources: | Preparing Program Sources | |
• Template: | Making the PO Template File | |
• Creating: | Creating a New PO File | |
• Updating: | Updating Existing PO Files | |
• Editing: | Editing PO Files | |
• Manipulating: | Manipulating PO Files | |
• Binaries: | Producing Binary MO Files | |
• Programmers: | The Programmer’s View | |
• Translators: | The Translator’s View | |
• Maintainers: | The Maintainer’s View | |
• Installers: | The Installer’s and Distributor’s View | |
• Programming Languages: | Other Programming Languages | |
• Conclusion: | Concluding Remarks | |
• Language Codes: | ISO 639 language codes | |
• Country Codes: | ISO 3166 country codes | |
• Licenses: | Licenses | |
• Program Index: | Index of Programs | |
• Option Index: | Index of Command-Line Options | |
• Variable Index: | Index of Environment Variables | |
• PO Mode Index: | Index of Emacs PO Mode Commands | |
• Autoconf Macro Index: | Index of Autoconf Macros | |
• Index: | General Index | |
— The Detailed Node Listing — Introduction |
||
• Why: | The Purpose of GNU gettext |
|
• Concepts: | I18n, L10n, and Such | |
• Aspects: | Aspects in Native Language Support | |
• Files: | Files Conveying Translations | |
• Overview: | Overview of GNU gettext |
|
The User’s View |
||
• System Installation: | Questions During Operating System Installation | |
• Setting the GUI Locale: | How to Specify the Locale Used by GUI Programs | |
• Setting the POSIX Locale: | How to Specify the Locale According to POSIX | |
• Installing Localizations: | How to Install Additional Translations | |
Setting the Locale through Environment Variables |
||
• Locale Names: | How a Locale Specification Looks Like | |
• Locale Environment Variables: | Which Environment Variable Specfies What | |
• The LANGUAGE variable: | How to Specify a Priority List of Languages | |
Preparing Program Sources |
||
• Importing: | Importing the gettext declaration |
|
• Triggering: | Triggering gettext Operations |
|
• Preparing Strings: | Preparing Translatable Strings | |
• Mark Keywords: | How Marks Appear in Sources | |
• Marking: | Marking Translatable Strings | |
• c-format Flag: | Telling something about the following string | |
• Special cases: | Special Cases of Translatable Strings | |
• Bug Report Address: | Letting Users Report Translation Bugs | |
• Names: | Marking Proper Names for Translation | |
• Libraries: | Preparing Library Sources | |
Making the PO Template File |
||
• xgettext Invocation: | Invoking the xgettext Program |
|
Creating a New PO File |
||
• msginit Invocation: | Invoking the msginit Program |
|
• Header Entry: | Filling in the Header Entry | |
Updating Existing PO Files |
||
• msgmerge Invocation: | Invoking the msgmerge Program |
|
Editing PO Files |
||
• KBabel: | KDE’s PO File Editor | |
• Gtranslator: | GNOME’s PO File Editor | |
• PO Mode: | Emacs’s PO File Editor | |
• Compendium: | Using Translation Compendia | |
Emacs’s PO File Editor |
||
• Installation: | Completing GNU gettext Installation |
|
• Main PO Commands: | Main Commands | |
• Entry Positioning: | Entry Positioning | |
• Normalizing: | Normalizing Strings in Entries | |
• Translated Entries: | Translated Entries | |
• Fuzzy Entries: | Fuzzy Entries | |
• Untranslated Entries: | Untranslated Entries | |
• Obsolete Entries: | Obsolete Entries | |
• Modifying Translations: | Modifying Translations | |
• Modifying Comments: | Modifying Comments | |
• Subedit: | Mode for Editing Translations | |
• C Sources Context: | C Sources Context | |
• Auxiliary: | Consulting Auxiliary PO Files | |
Using Translation Compendia |
||
• Creating Compendia: | Merging translations for later use | |
• Using Compendia: | Using older translations if they fit | |
Manipulating PO Files |
||
• msgcat Invocation: | Invoking the msgcat Program |
|
• msgconv Invocation: | Invoking the msgconv Program |
|
• msggrep Invocation: | Invoking the msggrep Program |
|
• msgfilter Invocation: | Invoking the msgfilter Program |
|
• msguniq Invocation: | Invoking the msguniq Program |
|
• msgcomm Invocation: | Invoking the msgcomm Program |
|
• msgcmp Invocation: | Invoking the msgcmp Program |
|
• msgattrib Invocation: | Invoking the msgattrib Program |
|
• msgen Invocation: | Invoking the msgen Program |
|
• msgexec Invocation: | Invoking the msgexec Program |
|
• Colorizing: | Highlighting parts of PO files | |
• libgettextpo: | Writing your own programs that process PO files | |
Highlighting parts of PO files |
||
• The --color option: | Triggering colorized output | |
• The TERM variable: | The environment variable TERM |
|
• The --style option: | The --style option |
|
• Style rules: | Style rules for PO files | |
• Customizing less: | Customizing less for viewing PO files |
|
Producing Binary MO Files |
||
• msgfmt Invocation: | Invoking the msgfmt Program |
|
• msgunfmt Invocation: | Invoking the msgunfmt Program |
|
• MO Files: | The Format of GNU MO Files | |
The Programmer’s View |
||
• catgets: | About catgets |
|
• gettext: | About gettext |
|
• Comparison: | Comparing the two interfaces | |
• Using libintl.a: | Using libintl.a in own programs | |
• gettext grok: | Being a gettext grok |
|
• Temp Programmers: | Temporary Notes for the Programmers Chapter | |
About |
||
• Interface to catgets: | The interface | |
• Problems with catgets: | Problems with the catgets interface?! |
|
About |
||
• Interface to gettext: | The interface | |
• Ambiguities: | Solving ambiguities | |
• Locating Catalogs: | Locating message catalog files | |
• Charset conversion: | How to request conversion to Unicode | |
• Contexts: | Solving ambiguities in GUI programs | |
• Plural forms: | Additional functions for handling plurals | |
• Optimized gettext: | Optimization of the *gettext functions | |
Temporary Notes for the Programmers Chapter |
||
• Temp Implementations: | Temporary - Two Possible Implementations | |
• Temp catgets: | Temporary - About catgets |
|
• Temp WSI: | Temporary - Why a single implementation | |
• Temp Notes: | Temporary - Notes | |
The Translator’s View |
||
• Trans Intro 0: | Introduction 0 | |
• Trans Intro 1: | Introduction 1 | |
• Discussions: | Discussions | |
• Organization: | Organization | |
• Information Flow: | Information Flow | |
• Translating plural forms: | How to fill in msgstr[0] , msgstr[1] |
|
• Prioritizing messages: | How to find which messages to translate first | |
Organization |
||
• Central Coordination: | Central Coordination | |
• National Teams: | National Teams | |
• Mailing Lists: | Mailing Lists | |
National Teams |
||
• Sub-Cultures: | Sub-Cultures | |
• Organizational Ideas: | Organizational Ideas | |
The Maintainer’s View |
||
• Flat and Non-Flat: | Flat or Non-Flat Directory Structures | |
• Prerequisites: | Prerequisite Works | |
• gettextize Invocation: | Invoking the gettextize Program |
|
• Adjusting Files: | Files You Must Create or Alter | |
• autoconf macros: | Autoconf macros for use in configure.ac | |
• Version Control Issues: | ||
• Release Management: | Creating a Distribution Tarball | |
Files You Must Create or Alter |
||
• po/POTFILES.in: | POTFILES.in in po/ | |
• po/LINGUAS: | LINGUAS in po/ | |
• po/Makevars: | Makevars in po/ | |
• po/Rules-*: | Extending Makefile in po/ | |
• configure.ac: | configure.ac at top level | |
• config.guess: | config.guess, config.sub at top level | |
• mkinstalldirs: | mkinstalldirs at top level | |
• aclocal: | aclocal.m4 at top level | |
• acconfig: | acconfig.h at top level | |
• config.h.in: | config.h.in at top level | |
• Makefile: | Makefile.in at top level | |
• src/Makefile: | Makefile.in in src/ | |
• lib/gettext.h: | gettext.h in lib/ | |
Autoconf macros for use in configure.ac |
||
• AM_GNU_GETTEXT: | AM_GNU_GETTEXT in gettext.m4 | |
• AM_GNU_GETTEXT_VERSION: | AM_GNU_GETTEXT_VERSION in gettext.m4 | |
• AM_GNU_GETTEXT_NEED: | AM_GNU_GETTEXT_NEED in gettext.m4 | |
• AM_GNU_GETTEXT_INTL_SUBDIR: | AM_GNU_GETTEXT_INTL_SUBDIR in intldir.m4 | |
• AM_PO_SUBDIRS: | AM_PO_SUBDIRS in po.m4 | |
• AM_XGETTEXT_OPTION: | AM_XGETTEXT_OPTION in po.m4 | |
• AM_ICONV: | AM_ICONV in iconv.m4 | |
Integrating with Version Control Systems |
||
• Distributed Development: | Avoiding version mismatch in distributed development | |
• Files under Version Control: | Files to put under version control | |
• Translations under Version Control: | Put PO Files under Version Control | |
• autopoint Invocation: | Invoking the autopoint Program |
|
Other Programming Languages |
||
• Language Implementors: | The Language Implementor’s View | |
• Programmers for other Languages: | The Programmer’s View | |
• Translators for other Languages: | The Translator’s View | |
• Maintainers for other Languages: | The Maintainer’s View | |
• List of Programming Languages: | Individual Programming Languages | |
• List of Data Formats: | Internationalizable Data | |
The Translator’s View |
||
• c-format: | C Format Strings | |
• objc-format: | Objective C Format Strings | |
• sh-format: | Shell Format Strings | |
• python-format: | Python Format Strings | |
• lisp-format: | Lisp Format Strings | |
• elisp-format: | Emacs Lisp Format Strings | |
• librep-format: | librep Format Strings | |
• scheme-format: | Scheme Format Strings | |
• smalltalk-format: | Smalltalk Format Strings | |
• java-format: | Java Format Strings | |
• csharp-format: | C# Format Strings | |
• awk-format: | awk Format Strings | |
• object-pascal-format: | Object Pascal Format Strings | |
• ycp-format: | YCP Format Strings | |
• tcl-format: | Tcl Format Strings | |
• perl-format: | Perl Format Strings | |
• php-format: | PHP Format Strings | |
• gcc-internal-format: | GCC internal Format Strings | |
• gfc-internal-format: | GFC internal Format Strings | |
• qt-format: | Qt Format Strings | |
• qt-plural-format: | Qt Plural Format Strings | |
• kde-format: | KDE Format Strings | |
• boost-format: | Boost Format Strings | |
• lua-format: | Lua Format Strings | |
• javascript-format: | JavaScript Format Strings | |
Individual Programming Languages |
||
• C: | C, C++, Objective C | |
• sh: | sh - Shell Script | |
• bash: | bash - Bourne-Again Shell Script | |
• Python: | Python | |
• Common Lisp: | GNU clisp - Common Lisp | |
• clisp C: | GNU clisp C sources | |
• Emacs Lisp: | Emacs Lisp | |
• librep: | librep | |
• Scheme: | GNU guile - Scheme | |
• Smalltalk: | GNU Smalltalk | |
• Java: | Java | |
• C#: | C# | |
• gawk: | GNU awk | |
• Pascal: | Pascal - Free Pascal Compiler | |
• wxWidgets: | wxWidgets library | |
• YCP: | YCP - YaST2 scripting language | |
• Tcl: | Tcl - Tk’s scripting language | |
• Perl: | Perl | |
• PHP: | PHP Hypertext Preprocessor | |
• Pike: | Pike | |
• GCC-source: | GNU Compiler Collection sources | |
• Lua: | Lua | |
• JavaScript: | JavaScript | |
• Vala: | Vala | |
sh - Shell Script |
||
• Preparing Shell Scripts: | Preparing Shell Scripts for Internationalization | |
• gettext.sh: | Contents of gettext.sh |
|
• gettext Invocation: | Invoking the gettext program |
|
• ngettext Invocation: | Invoking the ngettext program |
|
• envsubst Invocation: | Invoking the envsubst program |
|
• eval_gettext Invocation: | Invoking the eval_gettext function |
|
• eval_ngettext Invocation: | Invoking the eval_ngettext function |
|
Perl |
||
• General Problems: | General Problems Parsing Perl Code | |
• Default Keywords: | Which Keywords Will xgettext Look For? | |
• Special Keywords: | How to Extract Hash Keys | |
• Quote-like Expressions: | What are Strings And Quote-like Expressions? | |
• Interpolation I: | Invalid String Interpolation | |
• Interpolation II: | Valid String Interpolation | |
• Parentheses: | When To Use Parentheses | |
• Long Lines: | How To Grok with Long Lines | |
• Perl Pitfalls: | Bugs, Pitfalls, and Things That Do Not Work | |
Internationalizable Data |
||
• POT: | POT - Portable Object Template | |
• RST: | Resource String Table | |
• Glade: | Glade - GNOME user interface description | |
• GSettings: | GSettings - GNOME user configuration schema | |
• AppData: | AppData - freedesktop.org application description | |
• Preparing ITS Rules: | Preparing Rules for XML Internationalization | |
Concluding Remarks |
||
• History: | History of GNU gettext |
|
• References: | Related Readings | |
Language Codes |
||
• Usual Language Codes: | Two-letter ISO 639 language codes | |
• Rare Language Codes: | Three-letter ISO 639 language codes | |
Licenses |
||
• GNU GPL: | GNU General Public License | |
• GNU LGPL: | GNU Lesser General Public License | |
• GNU FDL: | GNU Free Documentation License | |
Next: Users, Previous: Top, Up: Top [Contents][Index]
This chapter explains the goals sought in the creationof GNU gettext
and the free Translation Project.Then, it explains a few broad concepts aroundNative Language Support, and positions message translation with regardto other aspects of national and cultural variance, as they applyto programs. It also surveys those files used to convey thetranslations. It explains how the various tools interact in theinitial generation of these files, and later, how the maintenancecycle should usually operate.
In this manual, we use he when speaking of the programmer ormaintainer, she when speaking of the translator, and theywhen speaking of the installers or end users of the translated program.This is only a convenience for clarifying the documentation. It isabsolutely not meant to imply that some roles are more appropriateto males or females. Besides, as you might guess, GNU gettext
is meant to be useful for people using computers, whatever their sex,race, religion or nationality!
Please send suggestions and corrections to:
Internet address: [email protected]
Please include the manual’s edition number and update date in your messages.
• Why: | The Purpose of GNU gettext |
|
• Concepts: | I18n, L10n, and Such | |
• Aspects: | Aspects in Native Language Support | |
• Files: | Files Conveying Translations | |
• Overview: | Overview of GNU gettext |
Next: Concepts, Previous: Introduction, Up: Introduction [Contents][Index]
gettext
Usually, programs are written and documented in English, and useEnglish at execution time to interact with users. This is truenot only of GNU software, but also of a great deal of proprietaryand free software. Using a common language is quite handy forcommunication between developers, maintainers and users from allcountries. On the other hand, most people are less comfortable withEnglish than with their own native language, and would prefer touse their mother tongue for day to day’s work, as far as possible.Many would simply love to see their computer screen showinga lot less of English, and far more of their own language.
However, to many people, this dream might appear so far fetched thatthey may believe it is not even worth spending time thinking aboutit. They have no confidence at all that the dream might everbecome true. Yet some have not lost hope, and have organized themselves.The Translation Project is a formalization of this hope into aworkable structure, which has a good chance to get all of us nearerthe achievement of a truly multi-lingual set of programs.
GNU gettext
is an important step for the Translation Project,as it is an asset on which we may build many other steps. This packageoffers to programmers, translators and even users, a well integratedset of tools and documentation. Specifically, the GNU gettext
utilities are a set of tools that provides a framework within whichother free packages may produce multi-lingual messages. These toolsinclude
GNU gettext
is designed to minimize the impact ofinternationalization on program sources, keeping this impact as smalland hardly noticeable as possible. Internationalization has betterchances of succeeding if it is very light weighted, or at least,appear to be so, when looking at program sources.
The Translation Project also uses the GNU gettext
distributionas a vehicle for documenting its structure and methods. This goesbeyond the strict technicalities of documenting the GNU gettext
proper. By so doing, translators will find in a single place, asfar as possible, all they need to know for properly doing theirtranslating work. Also, this supplemental documentation might alsohelp programmers, and even curious users, in understanding how GNUgettext
is related to the remainder of the TranslationProject, and consequently, have a glimpse at the big picture.
Next: Aspects, Previous: Why, Up: Introduction [Contents][Index]
Two long words appear all the time when we discuss support of nativelanguage in programs, and these words have a precise meaning, worthbeing explained here, once and for all in this document. The words areinternationalization and localization. Many people,tired of writing these long words over and over again, took thehabit of writing i18n and l10n instead, quoting the firstand last letter of each word, and replacing the run of intermediateletters by a number merely telling how many such letters there are.But in this manual, in the sake of clarity, we will patiently writethe names in full, each time…
By internationalization, one refers to the operation by which aprogram, or a set of programs turned into a package, is made aware of andable to support multiple languages. This is a generalization process,by which the programs are untied from calling only English strings orother English specific habits, and connected to generic ways of doingthe same, instead. Program developers may use various techniques tointernationalize their programs. Some of these have been standardized.GNU gettext
offers one of these standards. See Programmers.
By localization, one means the operation by which, in a setof programs already internationalized, one gives the program allneeded information so that it can adapt itself to handle its inputand output in a fashion which is correct for some native language andcultural habits. This is a particularisation process, by which genericmethods already implemented in an internationalized program are usedin specific ways. The programming environment puts several functionsto the programmers disposal which allow this runtime configuration.The formal description of specific set of cultural habits for somecountry, together with all associated translations targeted to thesame native language, is called the locale for this languageor country. Users achieve localization of programs by setting propervalues to special environment variables, prior to executing thoseprograms, identifying which locale should be used.
In fact, locale message support is only one component of the culturaldata that makes up a particular locale. There are a whole host ofroutines and functions provided to aid programmers in developinginternationalized software and which allow them to access the datastored in a particular locale. When someone presently refers to aparticular locale, they are obviously referring to the data storedwithin that particular locale. Similarly, if a programmer is referringto “accessing the locale routines”, they are referring to thecomplete suite of routines that access all of the locale’s information.
One uses the expression Native Language Support, or merely NLS,for speaking of the overall activity or feature encompassing bothinternationalization and localization, allowing for multi-lingualinteractions in a program. In a nutshell, one could say thatinternationalization is the operation by which further localizationsare made possible.
Also, very roughly said, when it comes to multi-lingual messages,internationalization is usually taken care of by programmers, andlocalization is usually taken care of by translators.
Next: Files, Previous: Concepts, Up: Introduction [Contents][Index]
For a totally multi-lingual distribution, there are many things totranslate beyond output messages.
gettext
offers a complete toolset fortranslating messages output by C programs. Perl scripts and shellscripts will also need to be translated. Even if there are today some hooksby which this can be done, these hooks are not integrated as well as theyshould be.autoconf
or bison
, are ableto produce other programs (or scripts). Even if the generatingprograms themselves are internationalized, the generated programs theyproduce may need internationalization on their own, and this indirectinternationalization could be automated right from the generatingprogram. In fact, quite usually, generating and generated programscould be internationalized independently, as the effort needed isfairly orthogonal.recode
program is able to reconstruct at execution.Since these descriptions are extracted from the RFC by mechanical means,translating them properly would require a prior translation of the RFCitself.gcc
to allow diacriticized characters in identifiers or usetranslated keywords; ‘rm -i’ might accept something else than‘y’ or ‘n’ for replies, etc. Even if the program willeventually make most of its output in the foreign languages, one hasto decide whether the input syntax, option values, etc., are to belocalized or not.As we already stressed, translation is only one aspect of locales.Other internationalization aspects are system services and are handledin GNU libc
. Thereare many attributes that are needed to define a country’s culturalconventions. These attributes include beside the country’s nativelanguage, the formatting of the date and time, the representation ofnumbers, the symbols for currency, etc. These local rules aretermed the country’s locale. The locale represents the knowledgeneeded to support the country’s native attributes.
There are a few major areas which may vary between countries andhence, define what a locale must describe. The following list helpsputting multi-lingual messages into the proper context of other tasksrelated to locales. See the GNU libc
manual for details.
The codeset most commonly used through out the USA and most Englishspeaking parts of the world is the ASCII codeset. However, there aremany characters needed by various locales that are not found withinthis codeset. The 8-bit ISO 8859-1 code set has most of the specialcharacters needed to handle the major European languages. However, inmany cases, choosing ISO 8859-1 is nevertheless not adequate: itdoesn’t even handle the major European currency. Hence each localewill need to specify which codeset they need to use and will needto have the appropriate character handling routines to cope withthe codeset.
The symbols used vary from country to country as does the positionused by the symbol. Software needs to be able to transparentlydisplay currency figures in the native mode for each locale.
The format of date varies between locales. For example, Christmas dayin 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.Other countries might use ISO 8601 dates, etc.
Time of the day may be noted as hh:mm, hh.mm,or otherwise. Some locales require time to be specified in 24-hourmode rather than as AM or PM. Further, the nature and yearly extentof the Daylight Saving correction vary widely between countries.
Numbers can be represented differently in different locales.For example, the following numbers are all written correctly fortheir respective locales:
12,345.67 English 12.345,67 German 12345,67 French 1,2345.67 Asia
Some programs could go further and use different unit systems, likeEnglish units or Metric units, or even take into account variantsabout how numbers are spelled in full.
The most obvious area is the language support within a locale. This iswhere GNU gettext
provides the means for developers and users toeasily change the language that the software uses to communicate tothe user.
These areas of cultural conventions are called locale categories.It is an unfortunate term; locale aspects or locale featurecategories would be a better term, because each “locale category”describes an area or task that requires localization. The concrete datathat describes the cultural conventions for such an area and for a particularculture is also called a locale category. In this sense, a localeis composed of several locale categories: the locale category describingthe codeset, the locale category describing the formatting of numbers,the locale category containing the translated messages, and so on.
Components of locale outside of message handling are standardized inthe ISO C standard and the POSIX:2001 standard (also known as the SUSV3specification). GNU libc
fully implements this, and most other modern systems provide a moreor less reasonable support for at least some of the missing components.
Next: Overview, Previous: Aspects, Up: Introduction [Contents][Index]
The letters PO in .po files means Portable Object, todistinguish it from .mo files, where MO stands for MachineObject. This paradigm, as well as the PO file format, is inspiredby the NLS standard developed by Uniforum, and first implemented bySun in their Solaris system.
PO files are meant to be read and edited by humans, and associate eachoriginal, translatable string of a given package with its translationin a particular target language. A single PO file is dedicated toa single target language. If a package supports many languages,there is one such PO file per language supported, and each packagehas its own set of PO files. These PO files are best created bythe xgettext
program, and later updated or refreshed throughthe msgmerge
program. Program xgettext
extracts allmarked messages from a set of C files and initializes a PO file withempty translations. Program msgmerge
takes care of adjustingPO files between releases of the corresponding sources, commentingobsolete entries, initializing new ones, and updating all sourceline references. Files ending with .pot are kind of basetranslation files found in distributions, in PO file format.
MO files are meant to be read by programs, and are binary in nature.A few systems already offer tools for creating and handling MO filesas part of the Native Language Support coming with the system, but theformat of these MO files is often different from system to system,and non-portable. The tools already provided with these systems don’tsupport all the features of GNU gettext
. Therefore GNUgettext
uses its own format for MO files. Files ending with.gmo are really MO files, when it is known that these files usethe GNU format.
Previous: Files, Up: Introduction [Contents][Index]
gettext
The following diagram summarizes the relation between the fileshandled by GNU gettext
and the tools acting on these files.It is followed by somewhat detailed explanations, which you shouldread while keeping an eye on the diagram. Having a clear understandingof these interrelations will surely help programmers, translatorsand maintainers.
Original C Sources ───> Preparation ───> Marked C Sources ───╮ │ ╭─────────<─── GNU gettext Library │ ╭─── make <───┤ │ │ ╰─────────<────────────────────┬───────────────╯ │ │ │ ╭─────<─── PACKAGE.pot <─── xgettext <───╯ ╭───<─── PO Compendium │ │ │ ↑ │ │ ╰───╮ │ │ ╰───╮ ├───> PO editor ───╮ │ ├────> msgmerge ──────> LANG.po ────>────────╯ │ │ ╭───╯ │ │ │ │ │ ╰─────────────<───────────────╮ │ │ ├─── New LANG.po <────────────────────╯ │ ╭─── LANG.gmo <─── msgfmt <───╯ │ │ │ ╰───> install ───> /.../LANG/PACKAGE.mo ───╮ │ ├───> "Hello world!" ╰───────> install ───> /.../bin/PROGRAM ───────╯
As a programmer, the first step to bringing GNU gettext
into your package is identifying, right in the C sources, those stringswhich are meant to be translatable, and those which are untranslatable.This tedious job can be done a little more comfortably using emacs POmode, but you can use any means familiar to you for modifying yourC sources. Beside this some other simple, standard changes are needed toproperly initialize the translation library. See Sources, formore information about all this.
For newly written software the strings of course can and should bemarked while writing it. The gettext
approach makes thisvery easy. Simply put the following lines at the beginning of each fileor in a central header file:
#define _(String) (String) #define N_(String) String #define textdomain(Domain) #define bindtextdomain(Package, Directory)
Doing this allows you to prepare the sources for internationalization.Later when you feel ready for the step to use the gettext
librarysimply replace these definitions by the following:
#include#define _(String) gettext (String) #define gettext_noop(String) String #define N_(String) gettext_noop (String)
and link against libintl.a or libintl.so. Note that onGNU systems, you don’t need to link with libintl
because thegettext
library functions are already contained in GNU libc.That is all you have to change.
Once the C sources have been modified, the xgettext
programis used to find and extract all translatable strings, and create aPO template file out of all these. This package.pot filecontains all original program strings. It has sets of pointers toexactly where in C sources each string is used. All translationsare set to empty. The letter t
in .pot marks this asa Template PO file, not yet oriented towards any particular language.See xgettext Invocation, for more details about how one calls thexgettext
program. If you are really lazy, you mightbe interested at working a lot more right away, and preparing thewhole distribution setup (see Maintainers). By doing so, youspare yourself typing the xgettext
command, as make
should now generate the proper things automatically for you!
The first time through, there is no lang.po yet, so themsgmerge
step may be skipped and replaced by a mere copy ofpackage.pot to lang.po, where langrepresents the target language. See Creating for details.
Then comes the initial translation of messages. Translation initself is a whole matter, still exclusively meant for humans,and whose complexity far overwhelms the level of this manual.Nevertheless, a few hints are given in some other chapter of thismanual (see Translators). You will also find there indicationsabout how to contact translating teams, or becoming part of them,for sharing your translating concerns with others who target the samenative language.
While adding the translated messages into the lang.poPO file, if you are not using one of the dedicated PO file editors(see Editing), you are on your ownfor ensuring that your efforts fully respect the PO file format, and quotingconventions (see PO Files). This is surely not an impossible task,as this is the way many people have handled PO files around 1995.On the other hand, by using a PO file editor, most detailsof PO file format are taken care of for you, but you have to acquiresome familiarity with PO file editor itself.
If some common translations have already been saved into a compendiumPO file, translators may use PO mode for initializing untranslatedentries from the compendium, and also save selected translations intothe compendium, updating it (see Compendium). Compendium filesare meant to be exchanged between members of a given translation team.
Programs, or packages of programs, are dynamic in nature: users writebug reports and suggestion for improvements, maintainers react bymodifying programs in various ways. The fact that a package hasalready been internationalized should not make maintainers shyof adding new strings, or modifying strings already translated.They just do their job the best they can. For the TranslationProject to work smoothly, it is important that maintainers do notcarry translation concerns on their already loaded shoulders, and thattranslators be kept as free as possible of programming concerns.
The only concern maintainers should have is carefully marking newstrings as translatable, when they should be, and do not otherwiseworry about them being translated, as this will come in proper time.Consequently, when programs and their strings are adjusted in variousways by maintainers, and for matters usually unrelated to translation,xgettext
would construct package.pot files which areevolving over time, so the translations carried by lang.poare slowly fading out of date.
It is important for translators (and even maintainers) to understandthat package translation is a continuous process in the lifetime of apackage, and not something which is done once and for all at the start.After an initial burst of translation activity for a given package,interventions are needed once in a while, because here and there,translated entries become obsolete, and new untranslated entriesappear, needing translation.
The msgmerge
program has the purpose of refreshing an alreadyexisting lang.po file, by comparing it with a newerpackage.pot template file, extracted by xgettext
out of recent C sources. The refreshing operation adjusts allreferences to C source locations for strings, since these stringsmove as programs are modified. Also, msgmerge
comments out asobsolete, in lang.po, those already translated entrieswhich are no longer used in the program sources (see Obsolete Entries). It finally discovers new strings and inserts them inthe resulting PO file as untranslated entries (see Untranslated Entries). See msgmerge Invocation, for more information about whatmsgmerge
really does.
Whatever route or means taken, the goal is to obtain an updatedlang.po file offering translations for all strings.
The temporal mobility, or fluidity of PO files, is an integral part ofthe translation game, and should be well understood, and accepted.People resisting it will have a hard time participating in theTranslation Project, or will give a hard time to other participants! Inparticular, maintainers should relax and include all available officialPO files in their distributions, even if these have not recently beenupdated, without exerting pressure on the translator teams to get thejob done. The pressure should rather comefrom the community of users speaking a particular language, andmaintainers should consider themselves fairly relieved of any concernabout the adequacy of translation files. On the other hand, translatorsshould reasonably try updating the PO files they are responsible for,while the package is undergoing pretest, prior to an officialdistribution.
Once the PO file is complete and dependable, the msgfmt
programis used for turning the PO file into a machine-oriented format, whichmay yield efficient retrieval of translations by the programs of thepackage, whenever needed at runtime (see MO Files). See msgfmt Invocation, for more information about all modes of executionfor the msgfmt
program.
Finally, the modified and marked C sources are compiled and linkedwith the GNU gettext
library, usually through the operation ofmake
, given a suitable Makefile exists for the project,and the resulting executable is installed somewhere users will find it.The MO files themselves should also be properly installed. Given theappropriate environment variables are set (see Setting the POSIX Locale),the program should localize itself automatically, whenever it executes.
The remainder of this manual has the purpose of explaining in depth the varioussteps outlined above.
Next: PO Files, Previous: Introduction, Up: Top [Contents][Index]
Nowadays, when users log into a computer, they usually find that alltheir programs show messages in their native language – at least forusers of languages with an active free software community, like French orGerman; to a lesser extent for languages with a smaller participation infree software and the GNU project, like Hindi and Filipino.
How does this work? How can the user influence the language that is usedby the programs? This chapter will answer it.
• System Installation: | Questions During Operating System Installation | |
• Setting the GUI Locale: | How to Specify the Locale Used by GUI Programs | |
• Setting the POSIX Locale: | How to Specify the Locale According to POSIX | |
• Installing Localizations: | How to Install Additional Translations |
Next: Setting the GUI Locale, Previous: Users, Up: Users [Contents][Index]
The default language is often already specified during operating systeminstallation. When the operating system is installed, the installertypically asks for the language used for the installation process and,separately, for the language to use in the installed system. Some OSinstallers only ask for the language once.
This determines the system-wide default language for all users. But theinstallers often give the possibility to install extra localizations foradditional languages. For example, the localizations of KDE (the KDesktop Environment) and OpenOffice.org are often bundled separately,as one installable package per language.
At this point it is good to consider the intended use of the machine: Ifit is a machine designated for personal use, additional localizations areprobably not necessary. If, however, the machine is in use in anorganization or company that has international relationships, one canconsider the needs of guest users. If you have a guest from abroad, fora week, what could be his preferred locales? It may be worth installingthese additional localizations ahead of time, since they cost only a bitof disk space at this point.
The system-wide default language is the locale configuration that is usedwhen a new user account is created. But the user can have his own localeconfiguration that is different from the one of the other users of thesame machine. He can specify it, typically after the first login, asdescribed in the next section.
Next: Setting the POSIX Locale, Previous: System Installation, Up: Users [Contents][Index]
The immediately available programs in a user’s desktop come from a groupof programs called a “desktop environment”; it usually includes the windowmanager, a web browser, a text editor, and more. The most common freedesktop environments are KDE, GNOME, and Xfce.
The locale used by GUI programs of the desktop environment can be specifiedin a configuration screen called “control center”, “language settings”or “country settings”.
Individual GUI programs that are not part of the desktop environment canhave their locale specified either in a settings panel, or through environmentvariables.
For some programs, it is possible to specify the locale through environmentvariables, possibly even to a different locale than the desktop’s locale.This means, instead of starting a program through a menu or from the filesystem, you can start it from the command-line, after having set someenvironment variables. The environment variables can be those specifiedin the next section (Setting the POSIX Locale); for some versions ofKDE, however, the locale is specified through a variable KDE_LANG
,rather than LANG
or LC_ALL
.
Next: Installing Localizations, Previous: Setting the GUI Locale, Up: Users [Contents][Index]
As a user, if your language has been installed for this package, in thesimplest case, you only have to set the LANG
environment variableto the appropriate ‘ll_CC’ combination. For example,let’s suppose that you speak German and live in Germany. At the shellprompt, merely execute ‘setenv LANG de_DE’ (in csh
),‘export LANG; LANG=de_DE’ (in sh
) or‘export LANG=de_DE’ (in bash
). This can be done from your.login or .profile file, once and for all.
• Locale Names: | How a Locale Specification Looks Like | |
• Locale Environment Variables: | Which Environment Variable Specfies What | |
• The LANGUAGE variable: | How to Specify a Priority List of Languages |
Next: Locale Environment Variables, Previous: Setting the POSIX Locale, Up: Setting the POSIX Locale [Contents][Index]
A locale name usually has the form ‘ll_CC’. Here‘ll’ is an ISO 639 two-letter language code, and‘CC’ is an ISO 3166 two-letter country code. For example,for German in Germany, ll is de
, and CC is DE
.You find a list of the language codes in appendix Language Codes anda list of the country codes in appendix Country Codes.
You might think that the country code specification is redundant. But infact, some languages have dialects in different countries. For example,‘de_AT’ is used for Austria, and ‘pt_BR’ for Brazil. The countrycode serves to distinguish the dialects.
Many locale names have an extended syntax‘ll_CC.encoding’ that also specifies the characterencoding. These are in use because between 2000 and 2005, most users haveswitched to locales in UTF-8 encoding. For example, the German locale onglibc systems is nowadays ‘de_DE.UTF-8’. The older name ‘de_DE’still refers to the German locale as of 2000 that stores characters inISO-8859-1 encoding – a text encoding that cannot even accommodate the Eurocurrency sign.
Some locale names use ‘ll_CC.@variant’ instead of‘ll_CC’. The ‘@variant’ can denote any kind ofcharacteristics that is not already implied by the language ll andthe country CC. It can denote a particular monetary unit. For example,on glibc systems, ‘de_DE@euro’ denotes the locale that uses the Eurocurrency, in contrast to the older locale ‘de_DE’ which implies the useof the currency before 2002. It can also denote a dialect of the language,or the script used to write text (for example, ‘sr_RS@latin’ uses theLatin script, whereas ‘sr_RS’ uses the Cyrillic script to write Serbian),or the orthography rules, or similar.
On other systems, some variations of this scheme are used, such as‘ll’. You can get the list of locales supported by your systemfor your language by running the command ‘locale -a | grep '^ll'’.
There is also a special locale, called ‘C’.When it is used, it disables all localization: in this locale, all programsstandardized by POSIX use English messages and an unspecified characterencoding (often US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending onthe operating system).
Next: The LANGUAGE variable, Previous: Locale Names, Up: Setting the POSIX Locale [Contents][Index]
A locale is composed of several locale categories, see Aspects.When a program looks up locale dependent values, it does this according tothe following environment variables, in priority order:
LANGUAGE
LC_ALL
LC_xxx
, according to selected locale category:LC_CTYPE
, LC_NUMERIC
, LC_TIME
, LC_COLLATE
,LC_MONETARY
, LC_MESSAGES
, ...LANG
Variables whose value is set but is empty are ignored in this lookup.
LANG
is the normal environment variable for specifying a locale.As a user, you normally set this variable (unless some of the other variableshave already been set by the system, in /etc/profile or similarinitialization files).
LC_CTYPE
, LC_NUMERIC
, LC_TIME
, LC_COLLATE
,LC_MONETARY
, LC_MESSAGES
, and so on, are the environmentvariables meant to override LANG
and affecting a single localecategory only. For example, assume you are a Swedish user in Spain, and youwant your programs to handle numbers and dates according to Spanishconventions, and only the messages should be in Swedish. Then you couldcreate a locale named ‘sv_ES’ or ‘sv_ES.UTF-8’ by use of thelocaledef
program. But it is simpler, and achieves the same effect,to set the LANG
variable to es_ES.UTF-8
and theLC_MESSAGES
variable to sv_SE.UTF-8
; these two locales comealready preinstalled with the operating system.
LC_ALL
is an environment variable that overrides all of these.It is typically used in scripts that run particular programs. For example,configure
scripts generated by GNU autoconf use LC_ALL
to makesure that the configuration tests don’t operate in locale dependent ways.
Some systems, unfortunately, set LC_ALL
in /etc/profile or insimilar initialization files. As a user, you therefore have to unset thisvariable if you want to set LANG
and optionally some of the otherLC_xxx
variables.
The LANGUAGE
variable is described in the next subsection.
Previous: Locale Environment Variables, Up: Setting the POSIX Locale [Contents][Index]
Not all programs have translations for all languages. By default, anEnglish message is shown in place of a nonexistent translation. If youunderstand other languages, you can set up a priority list of languages.This is done through a different environment variable, calledLANGUAGE
. GNU gettext
gives preference to LANGUAGE
over LC_ALL
and LANG
for the purpose of message handling,but you still need to have LANG
(or LC_ALL
) set to the primarylanguage; this is required by other parts of the system libraries.For example, some Swedish users who would rather read translations inGerman than English for when Swedish is not available, set LANGUAGE
to ‘sv:de’ while leaving LANG
to ‘sv_SE’.
Special advice for Norwegian users: The language code for Norwegianbokmål changed from ‘no’ to ‘nb’ recently (in 2003).During the transition period, while some message catalogs for this languageare installed under ‘nb’ and some older ones under ‘no’, it isrecommended for Norwegian users to set LANGUAGE
to ‘nb:no’ so thatboth newer and older translations are used.
In the LANGUAGE
environment variable, but not in the otherenvironment variables, ‘ll_CC’ combinations can beabbreviated as ‘ll’ to denote the language’s main dialect.For example, ‘de’ is equivalent to ‘de_DE’ (German as spoken inGermany), and ‘pt’ to ‘pt_PT’ (Portuguese as spoken in Portugal)in this context.
Note: The variable LANGUAGE
is ignored if the locale is set to‘C’. In other words, you have to first enable localization, by settingLANG
(or LC_ALL
) to a value other than ‘C’, before you canuse a language priority list through the LANGUAGE
variable.
Previous: Setting the POSIX Locale, Up: Users [Contents][Index]
Languages are not equally well supported in all packages using GNUgettext
, and more translations are added over time. Usually, youuse the translations that are shipped with the operating systemor with particular packages that you install afterwards. But you can alsoinstall newer localizations directly. For doing this, you will need anunderstanding where each localization file is stored on the file system.
For programs that participate in the Translation Project, you can startlooking for translations here:http://translationproject.org/team/index.html.A snapshot of this information is also found in the ABOUT-NLS filethat is shipped with GNU gettext.
For programs that are part of the KDE project, the starting point is:http://i18n.kde.org/.
For programs that are part of the GNOME project, the starting point is:http://www.gnome.org/i18n/.
For other programs, you may check whether the program’s source code packagecontains some ll.po files; often they are kept together in adirectory called po/. Each ll.po file contains themessage translations for the language whose abbreviation of ll.
Next: Sources, Previous: Users, Up: Top [Contents][Index]
The GNU gettext
toolset helps programmers and translatorsat producing, updating and using translation files, mainly thosePO files which are textual, editable files. This chapter explainsthe format of PO files.
A PO file is made up of many entries, each entry holding the relationbetween an original untranslated string and its correspondingtranslation. All entries in a given PO file usually pertainto a single project, and all translations are expressed in a singletarget language. One PO file entry has the following schematicstructure:
white-space # translator-comments #. extracted-comments #: reference… #, flag… #| msgid previous-untranslated-string msgid untranslated-string msgstr translated-string
The general structure of a PO file should be well understood bythe translator. When using PO mode, very little has to be knownabout the format details, as PO mode takes care of them for her.
A simple entry can look like this:
#: lib/error.c:116 msgid "Unknown system error" msgstr "Error desconegut del sistema"
Entries begin with some optional white space. Usually, when generatedthrough GNU gettext
tools, there is exactly one blank linebetween entries. Then comments follow, on lines all starting with thecharacter #
. There are two kinds of comments: those which havesome white space immediately following the #
- the translatorcomments -, which comments are created and maintained exclusively by thetranslator, and those which have some non-white character just after the#
- the automatic comments -, which comments are created andmaintained automatically by GNU gettext
tools. Comment linesstarting with #.
contain comments given by the programmer, directedat the translator; these comments are called extracted commentsbecause the xgettext
program extracts them from the program’ssource code. Comment lines starting with #:
contain references tothe program’s source code. Comment lines starting with #,
containflags; more about these below. Comment lines starting with #|
contain the previous untranslated string for which the translator gavea translation.
All comments, of either kind, are optional.
After white space and comments, entries show two strings, namelyfirst the untranslated string as it appears in the original programsources, and then, the translation of this string. The originalstring is introduced by the keyword msgid
, and the translation,by msgstr
. The two strings, untranslated and translated,are quoted in various ways in the PO file, using "
delimiters and \
escapes, but the translator does not reallyhave to pay attention to the precise quoting format, as PO mode fullytakes care of quoting for her.
The msgid
strings, as well as automatic comments, are producedand managed by other GNU gettext
tools, and PO mode does notprovide means for the translator to alter these. The most she cando is merely deleting them, and only by deleting the whole entry.On the other hand, the msgstr
string, as well as translatorcomments, are really meant for the translator, and PO mode gives herthe full control she needs.
The comment lines beginning with #,
are special because they arenot completely ignored by the programs as comments generally are. Thecomma separated list of flags is used by the msgfmt
program to give the user some better diagnostic messages. Currentlythere are two forms of flags defined:
fuzzy
This flag can be generated by the msgmerge
program or it can beinserted by the translator herself. It shows that the msgstr
string might not be a correct translation (anymore). Only the translatorcan judge if the translation requires further modification, or isacceptable as is. Once satisfied with the translation, she then removesthis fuzzy
attribute. The msgmerge
program inserts thiswhen it combined the msgid
and msgstr
entries after fuzzysearch only. See Fuzzy Entries.
c-format
no-c-format
These flags should not be added by a human. Instead only thexgettext
program adds them. In an automated PO file processingsystem as proposed here, the user’s changes would be thrown away again assoon as the xgettext
program generates a new template file.
The c-format
flag indicates that the untranslated string and thetranslation are supposed to be C format strings. The no-c-format
flag indicates that they are not C format strings, even though the untranslatedstring happens to look like a C format string (with ‘%’ directives).
When the c-format
flag is given for a string the msgfmt
program does some more tests to check the validity of the translation.See msgfmt Invocation, c-format Flag and c-format.
objc-format
no-objc-format
Likewise for Objective C, see objc-format.
sh-format
no-sh-format
Likewise for Shell, see sh-format.
python-format
no-python-format
Likewise for Python, see python-format.
python-brace-format
no-python-brace-format
Likewise for Python brace, see python-format.
lisp-format
no-lisp-format
Likewise for Lisp, see lisp-format.
elisp-format
no-elisp-format
Likewise for Emacs Lisp, see elisp-format.
librep-format
no-librep-format
Likewise for librep, see librep-format.
scheme-format
no-scheme-format
Likewise for Scheme, see scheme-format.
smalltalk-format
no-smalltalk-format
Likewise for Smalltalk, see smalltalk-format.
java-format
no-java-format
Likewise for Java, see java-format.
csharp-format
no-csharp-format
Likewise for C#, see csharp-format.
awk-format
no-awk-format
Likewise for awk, see awk-format.
object-pascal-format
no-object-pascal-format
Likewise for Object Pascal, see object-pascal-format.
ycp-format
no-ycp-format
Likewise for YCP, see ycp-format.
tcl-format
no-tcl-format
Likewise for Tcl, see tcl-format.
perl-format
no-perl-format
Likewise for Perl, see perl-format.
perl-brace-format
no-perl-brace-format
Likewise for Perl brace, see perl-format.
php-format
no-php-format
Likewise for PHP, see php-format.
gcc-internal-format
no-gcc-internal-format
Likewise for the GCC sources, see gcc-internal-format.
gfc-internal-format
no-gfc-internal-format
Likewise for the GNU Fortran Compiler sources, see gfc-internal-format.
qt-format
no-qt-format
Likewise for Qt, see qt-format.
qt-plural-format
no-qt-plural-format
Likewise for Qt plural forms, see qt-plural-format.
kde-format
no-kde-format
Likewise for KDE, see kde-format.
boost-format
no-boost-format
Likewise for Boost, see boost-format.
lua-format
no-lua-format
Likewise for Lua, see lua-format.
javascript-format
no-javascript-format
Likewise for JavaScript, see javascript-format.
It is also possible to have entries with a context specifier. They look likethis:
white-space # translator-comments #. extracted-comments #: reference… #, flag… #| msgctxt previous-context #| msgid previous-untranslated-string msgctxt context msgid untranslated-string msgstr translated-string
The context serves to disambiguate messages with the sameuntranslated-string. It is possible to have several entries withthe same untranslated-string in a PO file, provided that they eachhave a different context. Note that an empty context stringand an absent msgctxt
line do not mean the same thing.
A different kind of entries is used for translations which involveplural forms.
white-space # translator-comments #. extracted-comments #: reference… #, flag… #| msgid previous-untranslated-string-singular #| msgid_plural previous-untranslated-string-plural msgid untranslated-string-singular msgid_plural untranslated-string-plural msgstr[0] translated-string-case-0 ... msgstr[N] translated-string-case-n
Such an entry can look like this:
#: src/msgcmp.c:338 src/po-lex.c:699 #, c-format msgid "found %d fatal error" msgid_plural "found %d fatal errors" msgstr[0] "s'ha trobat %d error fatal" msgstr[1] "s'han trobat %d errors fatals"
Here also, a msgctxt
context can be specified before msgid
,like above.
Here, additional kinds of flags can be used:
range:
This flag is followed by a range of non-negative numbers, using the syntaxrange: minimum-value..maximum-value
. It designates thepossible values that the numeric parameter of the message can take. In somelanguages, translators may produce slightly better translations if they knowthat the value can only take on values between 0 and 10, for example.
The previous-untranslated-string is optionally inserted by themsgmerge
program, at the same time when it marks a message fuzzy.It helps the translator to see which changes were done by the developerson the untranslated-string.
It happens that some lines, usually whitespace or comments, follow thevery last entry of a PO file. Such lines are not part of any entry,and will be dropped when the PO file is processed by the tools, or maydisturb some PO file editors.
The remainder of this section may be safely skipped by those usinga PO file editor, yet it may be interesting for everybody to have a betteridea of the precise format of a PO file. On the other hand, thosewishing to modify PO files by hand should carefully continue reading on.
An empty untranslated-string is reserved to contain the headerentry with the meta information (see Header Entry). This headerentry should be the first entry of the file. The emptyuntranslated-string is reserved for this purpose and mustnot be used anywhere else.
Each of untranslated-string and translated-string respectsthe C syntax for a character string, including the surrounding quotesand embedded backslashed escape sequences. When the time comesto write multi-line strings, one should not use escaped newlines.Instead, a closing quote should follow the last character on theline to be continued, and an opening quote should resume the stringat the beginning of the following PO file line. For example:
msgid "" "Here is an example of how one might continue a very long string\n" "for the common case the string represents multi-line output.\n"
In this example, the empty string is used on the first line, toallow better alignment of the H
from the word ‘Here’over the f
from the word ‘for’. In this example, themsgid
keyword is followed by three strings, which are meantto be concatenated. Concatenating the empty string does not changethe resulting overall string, but it is a way for us to comply withthe necessity of msgid
to be followed by a string on the sameline, while keeping the multi-line presentation left-justified, aswe find this to be a cleaner disposition. The empty string could havebeen omitted, but only if the string starting with ‘Here’ waspromoted on the first line, right after msgid
.2 It was not really necessaryeither to switch between the two last quoted strings immediately afterthe newline ‘\n’, the switch could have occurred after anyother character, we just did it this way because it is neater.
One should carefully distinguish between end of lines marked as‘\n’ inside quotes, which are part of the representedstring, and end of lines in the PO file itself, outside string quotes,which have no incidence on the represented string.
Outside strings, white lines and comments may be used freely.Comments start at the beginning of a line with ‘#’ and extenduntil the end of the PO file line. Comments written by translatorsshould have the initial ‘#’ immediately followed by some whitespace. If the ‘#’ is not immediately followed by white space,this comment is most likely generated and managed by specialized GNUtools, and might disappear or be replaced unexpectedly when the POfile is given to msgmerge
.
Next: Template, Previous: PO Files, Up: Top [Contents][Index]
For the programmer, changes to the C source code fall into threecategories. First, you have to make the localization functionsknown to all modules needing message translation. Second, you shouldproperly trigger the operation of GNU gettext
when the programinitializes, usually from the main
function. Last, you shouldidentify, adjust and mark all constant strings in your programneeding translation.
• Importing: | Importing the gettext declaration |
|
• Triggering: | Triggering gettext Operations |
|
• Preparing Strings: | Preparing Translatable Strings | |
• Mark Keywords: | How Marks Appear in Sources | |
• Marking: | Marking Translatable Strings | |
• c-format Flag: | Telling something about the following string | |
• Special cases: | Special Cases of Translatable Strings | |
• Bug Report Address: | Letting Users Report Translation Bugs | |
• Names: | Marking Proper Names for Translation | |
• Libraries: | Preparing Library Sources |
Next: Triggering, Previous: Sources, Up: Sources [Contents][Index]
gettext
declarationPresuming that your set of programs, or package, has been adjustedso all needed GNU gettext
files are available, and yourMakefile files are adjusted (see Maintainers), each C modulehaving translated C strings should contain the line:
#include
Similarly, each C module containing printf()
/fprintf()
/...calls with a format string that could be a translated C string (even ifthe C string comes from a different C module) should contain the line:
#include
Next: Preparing Strings, Previous: Importing, Up: Sources [Contents][Index]
gettext
OperationsThe initialization of locale data should be done with more or lessthe same code in every program, as demonstrated below:
int main (int argc, char *argv[]) { … setlocale (LC_ALL, ""); bindtextdomain (PACKAGE, LOCALEDIR); textdomain (PACKAGE); … }
PACKAGE and LOCALEDIR should be provided either byconfig.h or by the Makefile. For now consult the gettext
or hello
sources for more information.
The use of LC_ALL
might not be appropriate for you.LC_ALL
includes all locale categories and especiallyLC_CTYPE
. This latter category is responsible for determiningcharacter classes with the isalnum
etc. functions fromctype.h which could especially for programs, which process somekind of input language, be wrong. For example this would mean that asource code using the ç (c-cedilla character) is runnable inFrance but not in the U.S.
Some systems also have problems with parsing numbers using thescanf
functions if an other but the LC_ALL
locale category isused. The standards say that additional formats but the one known in the"C"
locale might be recognized. But some systems seem to rejectnumbers in the "C"
locale format. In some situation, it mightalso be a problem with the notation itself which makes it impossible torecognize whether the number is in the "C"
locale or the localformat. This can happen if thousands separator characters are used.Some locales define this character according to the nationalconventions to '.'
which is the same character used in the"C"
locale to denote the decimal point.
So it is sometimes necessary to replace the LC_ALL
line in thecode above by a sequence of setlocale
lines
{ … setlocale (LC_CTYPE, ""); setlocale (LC_MESSAGES, ""); … }
On all POSIX conformant systems the locale categories LC_CTYPE
,LC_MESSAGES
, LC_COLLATE
, LC_MONETARY
,LC_NUMERIC
, and LC_TIME
are available. On some systemswhich are only ISO C compliant, LC_MESSAGES
is missing, buta substitute for it is defined in GNU gettext’s
andin GNU gnulib’s
.
Note that changing the LC_CTYPE
also affects the functionsdeclared in the
standard header and some functionsdeclared in the
and
standard headers.If this is notdesirable in your application (for example in a compiler’s parser),you can use a set of substitute functions which hardwire the C locale,such as found in the modules ‘c-ctype’, ‘c-strcase’,‘c-strcasestr’, ‘c-strtod’, ‘c-strtold’ in the GNU gnulibsource distribution.
It is also possible to switch the locale forth and back between theenvironment dependent locale and the C locale, but this approach isnormally avoided because a setlocale
call is expensive,because it is tedious to determine the places where a locale switchis needed in a large program’s source, and because switching a localeis not multithread-safe.
Next: Mark Keywords, Previous: Triggering, Up: Sources [Contents][Index]
Before strings can be marked for translations, they sometimes need tobe adjusted. Usually preparing a string for translation is done rightbefore marking it, during the marking phase which is described in thenext sections. What you have to keep in mind while doing that is thefollowing.
Let’s look at some examples of these guidelines.
Translatable strings should be in good English style. If slang languagewith abbreviations and shortcuts is used, often translators will notunderstand the message and will produce very inappropriate translations.
"%s: is parameter\n"
This is nearly untranslatable: Is the displayed item a parameter orthe parameter?
"No match"
The ambiguity in this message makes it unintelligible: Is the programattempting to set something on fire? Does it mean "The given object doesnot match the template"? Does it mean "The template does not fit for anyof the objects"?
In both cases, adding more words to the message will help both thetranslator and the English speaking user.
Translatable strings should be entire sentences. It is often not possibleto translate single verbs or adjectives in a substitutable way.
printf ("File %s is %s protected", filename, rw ? "write" : "read");
Most translators will not look at the source and will thus only see thestring "File %s is %s protected"
, which is unintelligible. Changethis to
printf (rw ? "File %s is write protected" : "File %s is read protected", filename);
This way the translator will not only understand the message, she willalso be able to find the appropriate grammatical construction. A Frenchtranslator for example translates "write protected" like "protectedagainst writing".
Entire sentences are also important because in many languages, thedeclination of some word in a sentence depends on the gender or thenumber (singular/plural) of another part of the sentence. There areusually more interdependencies between words than in English. Theconsequence is that asking a translator to translate two half-sentencesand then combining these two half-sentences through dumb string concatenationwill not work, for many languages, even though it would work for English.That’s why translators need to handle entire sentences.
Often sentences don’t fit into a single line. If a sentence is outputusing two subsequent printf
statements, like this
printf ("Locale charset \"%s\" is different from\n", lcharset); printf ("input file charset \"%s\".\n", fcharset);
the translator would have to translate two half sentences, but nothingin the POT file would tell her that the two half sentences belong together.It is necessary to merge the two printf
statements so that thetranslator can handle the entire sentence at once and decide at whichplace to insert a line break in the translation (if at all):
printf ("Locale charset \"%s\" is different from\n\ input file charset \"%s\".\n", lcharset, fcharset);
You may now ask: how about two or more adjacent sentences? Like in this case:
puts ("Apollo 13 scenario: Stack overflow handling failed."); puts ("On the next stack overflow we will crash!!!");
Should these two statements merged into a single one? I would recommend tomerge them if the two sentences are related to each other, because then itmakes it easier for the translator to understand and translate both. Onthe other hand, if one of the two messages is a stereotypic one, occurringin other places as well, you will do a favour to the translator by notmerging the two. (Identical messages occurring in several places arecombined by xgettext, so the translator has to handle them once only.)
Translatable strings should be limited to one paragraph; don’t let asingle message be longer than ten lines. The reason is that when thetranslatable string changes, the translator is faced with the task ofupdating the entire translated string. Maybe only a single word willhave changed in the English string, but the translator doesn’t see that(with the current translation tools), therefore she has to proofreadthe entire message.
Many GNU programs have a ‘--help’ output that extends over severalscreen pages. It is a courtesy towards the translators to split such amessage into several ones of five to ten lines each. While doing that,you can also attempt to split the documented options into groups,such as the input options, the output options, and the informativeoutput options. This will help every user to find the option he islooking for.
Hardcoded string concatenation is sometimes used to construct Englishstrings:
strcpy (s, "Replace "); strcat (s, object1); strcat (s, " with "); strcat (s, object2); strcat (s, "?");
In order to present to the translator only entire sentences, and alsobecause in some languages the translator might want to swap the orderof object1
and object2
, it is necessary to change thisto use a format string:
sprintf (s, "Replace %s with %s?", object1, object2);
A similar case is compile time concatenation of strings. The ISO C 99include file
contains a macro PRId64
thatcan be used as a formatting directive for outputting an ‘int64_t’integer through printf
. It expands to a constant string, usually"d" or "ld" or "lld" or something like this, depending on the platform.Assume you have code like
printf ("The amount is %0" PRId64 "\n", number);
The gettext
tools and library have special support for these
macros. You can therefore simply write
printf (gettext ("The amount is %0" PRId64 "\n"), number);
The PO file will contain the string "The amount is %0gettext
function’s result willcontain the appropriate constant string, "d" or "ld" or "lld".
This works only for the predefined
macros. Ifyou have defined your own similar macros, let’s say ‘MYPRId64’,that are not known to xgettext
, the solution for this problemis to change the code like this:
char buf1[100]; sprintf (buf1, "%0" MYPRId64, number); printf (gettext ("The amount is %s\n"), buf1);
This means, you put the platform dependent code in one statement, and theinternationalization code in a different statement. Note that a buffer lengthof 100 is safe, because all available hardware integer types are limited to128 bits, and to print a 128 bit integer one needs at most 54 characters,regardless whether in decimal, octal or hexadecimal.
All this applies to other programming languages as well. For example, inJava and C#, string concatenation is very frequently used, because it is acompiler built-in operator. Like in C, in Java, you would change
System.out.println("Replace "+object1+" with "+object2+"?");
into a statement involving a format string:
System.out.println( MessageFormat.format("Replace {0} with {1}?", new Object[] { object1, object2 }));
Similarly, in C#, you would change
Console.WriteLine("Replace "+object1+" with "+object2+"?");
into a statement involving a format string:
Console.WriteLine( String.Format("Replace {0} with {1}?", object1, object2));
Unusual markup or control characters should not be used in translatablestrings. Translators will likely not understand the particular meaningof the markup or control characters.
For example, if you have a convention that ‘|’ delimits theleft-hand and right-hand part of some GUI elements, translators willoften not understand it without specific comments. It might bebetter to have the translator translate the left-hand and right-handpart separately.
Another example is the ‘argp’ convention to use a single ‘\v’(vertical tab) control character to delimit two sections inside astring. This is flawed. Some translators may convert it to a simplenewline, some to blank lines. With some PO file editors it may not beeasy to even enter a vertical tab control character. So, you cannotbe sure that the translation will contain a ‘\v’ character, at thecorresponding position. The solution is, again, to let the translatortranslate two separate strings and combine at run-time the two translatedstrings with the ‘\v’ required by the convention.
HTML markup, however, is common enough that it’s probably ok to use intranslatable strings. But please bear in mind that the GNU gettext toolsdon’t verify that the translations are well-formed HTML.
Next: Marking, Previous: Preparing Strings, Up: Sources [Contents][Index]
All strings requiring translation should be marked in the C sources. Markingis done in such a way that each translatable string appears to bethe sole argument of some function or preprocessor macro. There areonly a few such possible functions or macros meant for translation,and their names are said to be marking keywords. The marking isattached to strings themselves, rather than to what we do with them.This approach has more uses. A blatant example is an error messageproduced by formatting. The format string needs translation, aswell as some strings inserted through some ‘%s’ specificationin the format, while the result from sprintf
may have so manydifferent instances that it is impractical to list them all in some‘error_string_out()’ routine, say.
This marking operation has two goals. The first goal of markingis for triggering the retrieval of the translation, at run time.The keyword is possibly resolved into a routine able to dynamicallyreturn the proper translation, as far as possible or wanted, for theargument string. Most localizable strings are found in executablepositions, that is, attached to variables or given as parameters tofunctions. But this is not universal usage, and some translatablestrings appear in structured initializations. See Special cases.
The second goal of the marking operation is to help xgettext
at properly extracting all translatable strings when it scans a setof program sources and produces PO file templates.
The canonical keyword for marking translatable strings is‘gettext’, it gave its name to the whole GNU gettext
package. For packages making only light use of the ‘gettext’keyword, macro or function, it is easily used as is. However,for packages using the gettext
interface more heavily, itis usually more convenient to give the main keyword a shorter, lessobtrusive name. Indeed, the keyword might appear on a lot of stringsall over the package, and programmers usually do not want nor needtheir program sources to remind them forcefully, all the time, that theyare internationalized. Further, a long keyword has the disadvantageof using more horizontal space, forcing more indentation work onsources for those trying to keep them within 79 or 80 columns.
Many packages use ‘_’ (a simple underline) as a keyword,and write ‘_("Translatable string")’ instead of ‘gettext("Translatable string")’. Further, the coding rule, from GNU standards,wanting that there is a space between the keyword and the openingparenthesis is relaxed, in practice, for this particular usage.So, the textual overhead per translatable string is reduced toonly three characters: the underline and the two parentheses.However, even if GNU gettext
uses this convention internally,it does not offer it officially. The real, genuine keyword is truly‘gettext’ indeed. It is fairly easy for those wanting to use‘_’ instead of ‘gettext’ to declare:
#include#define _(String) gettext (String)
instead of merely using ‘#include
The marking keywords ‘gettext’ and ‘_’ take the translatablestring as sole argument. It is also possible to define marking functionsthat take it at another argument position. It is even possible to makethe marked argument position depend on the total number of arguments ofthe function call; this is useful in C++. All this is achieved usingxgettext
’s ‘--keyword’ option. How to pass such an optionto xgettext
, assuming that gettextize
is used, is describedin po/Makevars and AM_XGETTEXT_OPTION.
Note also that long strings can be split across lines, into multipleadjacent string tokens. Automatic string concatenation is performedat compile time according to ISO C and ISO C++; xgettext
alsosupports this syntax.
Later on, the maintenance is relatively easy. If, as a programmer,you add or modify a string, you will have to ask yourself if thenew or altered string requires translation, and include it within‘_()’ if you think it should be translated. For example, ‘"%s"’is an example of string not requiring translation. But‘"%s: %d"’ does require translation, because in French, unlikein English, it’s customary to put a space before a colon.
Next: c-format Flag, Previous: Mark Keywords, Up: Sources [Contents][Index]
In PO mode, one set of features is meant more for the programmer thanfor the translator, and allows him to interactively mark which strings,in a set of program sources, are translatable, and which are not.Even if it is a fairly easy job for a programmer to find and marksuch strings by other means, using any editor of his choice, PO modemakes this work more comfortable. Further, this gives translatorswho feel a little like programmers, or programmers who feel a littlelike translators, a tool letting them work at marking translatablestrings in the program sources, while simultaneously producing a set oftranslation in some language, for the package being internationalized.
The set of program sources, targeted by the PO mode commands describehere, should have an Emacs tags table constructed for your project,prior to using these PO file commands. This is easy to do. In anyshell window, change the directory to the root of your project, thenexecute a command resembling:
etags src/*.[hc] lib/*.[hc]
presuming here you want to process all .h and .c filesfrom the src/ and lib/ directories. This command willexplore all said files and create a TAGS file in your rootdirectory, somewhat summarizing the contents using a special fileformat Emacs can understand.
For packages following the GNU coding standards, there isa make goal tags
or TAGS
which constructs the tag files inall directories and for all files containing source code.
Once your TAGS file is ready, the following commands assistthe programmer at marking translatable strings in his set of sources.But these commands are necessarily driven from within a PO filewindow, and it is likely that you do not even have such a PO file yet.This is not a problem at all, as you may safely open a new, empty POfile, mainly for using these commands. This empty PO file will slowlyfill in while you mark strings as translatable in your program sources.
Search through program sources for a string which looks like acandidate for translation (po-tags-search
).
Mark the last string found with ‘_()’ (po-mark-translatable
).
Mark the last string found with a keyword taken from a set of possiblekeywords. This command with a prefix allows some management of thesekeywords (po-select-mark-and-mark
).
The , (po-tags-search
) command searches for the nextoccurrence of a string which looks like a possible candidate fortranslation, and displays the program source in another Emacs window,positioned in such a way that the string is near the top of this otherwindow. If the string is too big to fit whole in this window, it ispositioned so only its end is shown. In any case, the cursoris left in the PO file window. If the shown string would be betterpresented differently in different native languages, you may mark itusing M-, or M-.. Otherwise, you might rather ignore itand skip to the next string by merely repeating the , command.
A string is a good candidate for translation if it contains a sequenceof three or more letters. A string containing at most two letters ina row will be considered as a candidate if it has more letters thannon-letters. The command disregards strings containing no letters,or isolated letters only. It also disregards strings within comments,or strings already marked with some keyword PO mode knows (see below).
If you have never told Emacs about some TAGS file to use, thecommand will request that you specify one from the minibuffer, thefirst time you use the command. You may later change your TAGSfile by using the regular Emacs command M-x visit-tags-table,which will ask you to name the precise TAGS file you wantto use. See Tag Tables in The Emacs Editor.
Each time you use the , command, the search resumes from where it wasleft by the previous search, and goes through all program sources,obeying the TAGS file, until all sources have been processed.However, by giving a prefix argument to the command (C-u ,), you may request that the search be restarted all over againfrom the first program source; but in this case, strings that yourecently marked as translatable will be automatically skipped.
Using this , command does not prevent using of other regularEmacs tags commands. For example, regular tags-search
ortags-query-replace
commands may be used without disrupting theindependent , search sequence. However, as implemented, theinitial , command (or the , command is used with aprefix) might also reinitialize the regular Emacs tags searching to thefirst tags file, this reinitialization might be considered spurious.
The M-, (po-mark-translatable
) command will mark therecently found string with the ‘_’ keyword. The M-.(po-select-mark-and-mark
) command will request that you typeone keyword from the minibuffer and use that keyword for markingthe string. Both commands will automatically create a new PO fileuntranslated entry for the string being marked, and make it thecurrent entry (making it easy for you to immediately proceed to itstranslation, if you feel like doing it right away). It is possiblethat the modifications made to the program source by M-, orM-. render some source line longer than 80 columns, forcing youto break and re-indent this line differently. You may use the Ocommand from PO mode, or any other window changing command fromEmacs, to break out into the program source window, and do anyneeded adjustments. You will have to use some regular Emacs commandto return the cursor to the PO file window, if you want command, for the next string, say.
The M-. command has a few built-in speedups, so you do nothave to explicitly type all keywords all the time. The first suchspeedup is that you are presented with a preferred keyword,which you may accept by merely typing RET at the prompt.The second speedup is that you may type any non-ambiguous prefix of thekeyword you really mean, and the command will complete it automaticallyfor you. This also means that PO mode has to know allyour possible keywords, and that it will not accept mistyped keywords.
If you reply ? to the keyword request, the command gives alist of all known keywords, from which you may choose. When thecommand is prefixed by an argument (C-u M-.), it inhibitsupdating any program source or PO file buffer, and does some simplekeyword management instead. In this case, the command asks for akeyword, written in full, which becomes a new allowed keyword forlater M-. commands. Moreover, this new keyword automaticallybecomes the preferred keyword for later commands. By typingan already known keyword in response to C-u M-., one merelychanges the preferred keyword and does nothing more.
All keywords known for M-. are recognized by the , commandwhen scanning for strings, and strings already marked by any of thoseknown keywords are automatically skipped. If many PO files are openedsimultaneously, each one has its own independent set of known keywords.There is no provision in PO mode, currently, for deleting a knownkeyword, you have to quit the file (maybe using q) and reopenit afresh. When a PO file is newly brought up in an Emacs window, only‘gettext’ and ‘_’ are known as keywords, and ‘gettext’is preferred for the M-. command. In fact, this is not useful toprefer ‘_’, as this one is already built in the M-, command.
Next: Special cases, Previous: Marking, Up: Sources [Contents][Index]
In C programs strings are often used within calls of functions from theprintf
family. The special thing about these format strings isthat they can contain format specifiers introduced with %. Assumewe have the code
printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
A possible German translation for the above string might be:
"%d Zeichen lang ist die Zeichenkette `%s'"
A C programmer, even if he cannot speak German, will recognize thatthere is something wrong here. The order of the two format specifiersis changed but of course the arguments in the printf
don’t have.This will most probably lead to problems because now the length of thestring is regarded as the address.
To prevent errors at runtime caused by translations, the msgfmt
tool can check statically whether the arguments in the original and thetranslation string match in type and number. If this is not the caseand the ‘-c’ option has been passed to msgfmt
, msgfmt
will give an error and refuse to produce a MO file. Thus consistentuse of ‘msgfmt -c’ will catch the error, so that it cannot causeproblems at runtime.
If the word order in the above German translation would be correct onewould have to write
"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
The routines in msgfmt
know about this special notation.
Because not all strings in a program will be format strings, it is notuseful for msgfmt
to test all the strings in the .po file.This might cause problems because the string might contain what lookslike a format specifier, but the string is not used in printf
.
Therefore xgettext
adds a special tag to those messages itthinks might be a format string. There is no absolute rule for this,only a heuristic. In the .po file the entry is marked using thec-format
flag in the #,
comment line (see PO Files).
The careful reader now might say that this again can cause problems.The heuristic might guess it wrong. This is true and thereforexgettext
knows about a special kind of comment which letsthe programmer take over the decision. If in the same line as orthe immediately preceding line to the gettext
keywordthe xgettext
program finds a comment containing the wordsxgettext:c-format
, it will mark the string in any case withthe c-format
flag. This kind of comment should be used whenxgettext
does not recognize the string as a format string butit really is one and it should be tested. Please note that when thecomment is in the same line as the gettext
keyword, it must bebefore the string to be translated.
This situation happens quite often. The printf
function is oftencalled with strings which do not contain a format specifier. Of courseone would normally use fputs
but it does happen. In this casexgettext
does not recognize this as a format string but whathappens if the translation introduces a valid format specifier? Theprintf
function will try to access one of the parameters but noneexists because the original code does not pass any parameters.
xgettext
of course could make a wrong decision the other wayround, i.e. a string marked as a format string actually is not a formatstring. In this case the msgfmt
might give too many warnings andwould prevent translating the .po file. The method to preventthis wrong decision is similar to the one used above, only the commentto use must contain the string xgettext:no-c-format
.
If a string is marked with c-format
and this is not correct theuser can find out who is responsible for the decision. Seexgettext Invocation to see how the --debug
option can beused for solving this problem.
Next: Bug Report Address, Previous: c-format Flag, Up: Sources [Contents][Index]
The attentive reader might now point out that it is not always possibleto mark translatable string with gettext
or something like this.Consider the following case:
{ static const char *messages[] = { "some very meaningful message", "and another one" }; const char *string; … string = index > 1 ? "a default message" : messages[index]; fputs (string); … }
While it is no problem to mark the string "a default message"
itis not possible to mark the string initializers for messages
.What is to be done? We have to fulfill two tasks. First we have to mark thestrings so that the xgettext
program (see xgettext Invocation)can find them, and second we have to translate the string at runtimebefore printing them.
The first task can be fulfilled by creating a new keyword, which names ano-op. For the second we have to mark all access points to a stringfrom the array. So one solution can look like this:
#define gettext_noop(String) String { static const char *messages[] = { gettext_noop ("some very meaningful message"), gettext_noop ("and another one") }; const char *string; … string = index > 1 ? gettext ("a default message") : gettext (messages[index]); fputs (string); … }
Please convince yourself that the string which is written byfputs
is translated in any case. How to get xgettext
knowthe additional keyword gettext_noop
is explained in xgettext Invocation.
The above is of course not the only solution. You could also come alongwith the following one:
#define gettext_noop(String) String { static const char *messages[] = { gettext_noop ("some very meaningful message"), gettext_noop ("and another one") }; const char *string; … string = index > 1 ? gettext_noop ("a default message") : messages[index]; fputs (gettext (string)); … }
But this has a drawback. The programmer has to take care thathe uses gettext_noop
for the string "a default message"
.A use of gettext
could have in rare cases unpredictable results.
One advantage is that you need not make control flow analysis to makesure the output is really translated in any case. But this analysis isgenerally not very difficult. If it should be in any situation you canuse this second method in this situation.
Next: Names, Previous: Special cases, Up: Sources [Contents][Index]
Code sometimes has bugs, but translations sometimes have bugs too. Theusers need to be able to report them. Reporting translation bugs to theprogrammer or maintainer of a package is not very useful, since themaintainer must never change a translation, except on behalf of thetranslator. Hence the translation bugs must be reported to thetranslators.
Here is a way to organize this so that the maintainer does not need toforward translation bug reports, nor even keep a list of the addresses ofthe translators or their translation teams.
Every program has a place where is shows the bug report address. ForGNU programs, it is the code which handles the “–help” option,typically in a function called “usage”. In this place, instruct thetranslator to add her own bug reporting address. For example, if thatcode has a statement
printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
you can add some translator instructions like this:
/* TRANSLATORS: The placeholder indicates the bug-reporting address for this package. Please add _another line_ saying "Report translation bugs to <...>\n" with the address for translation bugs (typically your translation team's web or email address). */ printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT);
These will be extracted by ‘xgettext’, leading to a .pot file thatcontains this:
#. TRANSLATORS: The placeholder indicates the bug-reporting address #. for this package. Please add _another line_ saying #. "Report translation bugs to <...>\n" with the address for translation #. bugs (typically your translation team's web or email address). #: src/hello.c:178 #, c-format msgid "Report bugs to <%s>.\n" msgstr ""
Next: Libraries, Previous: Bug Report Address, Up: Sources [Contents][Index]
Should names of persons, cities, locations etc. be marked for translationor not? People who only know languages that can be written with Latinletters (English, Spanish, French, German, etc.) are tempted to say “no”,because names usually do not change when transported between these languages.However, in general when translating from one script to another, namesare translated too, usually phonetically or by transliteration. Forexample, Russian or Greek names are converted to the Latin alphabet whenbeing translated to English, and English or French names are convertedto the Katakana script when being translated to Japanese. This isnecessary because the speakers of the target language in general cannotread the script the name is originally written in.
As a programmer, you should therefore make sure that names are markedfor translation, with a special comment telling the translators that itis a proper name and how to pronounce it. In its simple form, it lookslike this:
printf (_("Written by %s.\n"), /* TRANSLATORS: This is a proper name. See the gettext manual, section Names. Note this is actually a non-ASCII name: The first name is (with Unicode escapes) "Fran\u00e7ois" or (with HTML entities) "François". Pronunciation is like "fraa-swa pee-nar". */ _("Francois Pinard"));
The GNU gnulib library offers a module ‘propername’(http://www.gnu.org/software/gnulib/MODULES.html#module=propername)which takes care to automatically append the original name, in parentheses,to the translated name. For names that cannot be written in ASCII, italso frees the translator from the task of entering the appropriate non-ASCIIcharacters if no script change is needed. In this more comfortable form,it looks like this:
printf (_("Written by %s and %s.\n"), proper_name ("Ulrich Drepper"), /* TRANSLATORS: This is a proper name. See the gettext manual, section Names. Note this is actually a non-ASCII name: The first name is (with Unicode escapes) "Fran\u00e7ois" or (with HTML entities) "François". Pronunciation is like "fraa-swa pee-nar". */ proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard"));
You can also write the original name directly in Unicode (rather than withUnicode escapes or HTML entities) and denote the pronunciation using theInternational Phonetic Alphabet (seehttp://www.wikipedia.org/wiki/International_Phonetic_Alphabet).
As a translator, you should use some care when translating names, becauseit is frustrating if people see their names mutilated or distorted.
If your language uses the Latin script, all you need to do is to reproducethe name as perfectly as you can within the usual character set of yourlanguage. In this particular case, this means to provide a translationcontaining the c-cedilla character. If your language uses a differentscript and the people speaking it don’t usually read Latin words, it meanstransliteration. If the programmer used the simple case, you should stillgive, in parentheses, the original writing of the name – for the sake ofthe people that do read the Latin script. If the programmer used the‘propername’ module mentioned above, you don’t need to give the originalwriting of the name in parentheses, because the program will already do so.Here is an example, using Greek as the target script:
#. This is a proper name. See the gettext #. manual, section Names. Note this is actually a non-ASCII #. name: The first name is (with Unicode escapes) #. "Fran\u00e7ois" or (with HTML entities) "François". #. Pronunciation is like "fraa-swa pee-nar". msgid "Francois Pinard" msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" " (Francois Pinard)"
Because translation of names is such a sensitive domain, it is a goodidea to test your translation before submitting it.
Previous: Names, Up: Sources [Contents][Index]
When you are preparing a library, not a program, for the use ofgettext
, only a few details are different. Here we assume thatthe library has a translation domain and a POT file of its own. (Ifit uses the translation domain and POT file of the main program, thenthe previous sections apply without changes.)
setlocale (LC_ALL, "")
. It’s theresponsibility of the main program to set the locale. The library’sdocumentation should mention this fact, so that developers of programsusing the library are aware of it.textdomain (PACKAGE)
, because itwould interfere with the text domain set by the main program.setlocale (LC_ALL, ""); bindtextdomain (PACKAGE, LOCALEDIR); textdomain (PACKAGE);
For a library it is reduced to
bindtextdomain (PACKAGE, LOCALEDIR);
If your library’s API doesn’t already have an initialization function,you need to create one, containing at least the bindtextdomain
invocation. However, you usually don’t need to export and document thisinitialization function: It is sufficient that all entry points of thelibrary call the initialization function if it hasn’t been called before.The typical idiom used to achieve this is a static boolean variable thatindicates whether the initialization function has been called. Like this:
static bool libfoo_initialized; static void libfoo_initialize (void) { bindtextdomain (PACKAGE, LOCALEDIR); libfoo_initialized = true; } /* This function is part of the exported API. */ struct foo * create_foo (...) { /* Must ensure the initialization is performed. */ if (!libfoo_initialized) libfoo_initialize (); ... } /* This function is part of the exported API. The argument must be non-NULL and have been created through create_foo(). */ int foo_refcount (struct foo *argument) { /* No need to invoke the initialization function here, because create_foo() must already have been called before. */ ... }
#include#define _(String) gettext (String)
for a program. For a library, which has its own translation domain,it reads like this:
#include#define _(String) dgettext (PACKAGE, String)
In other words, dgettext
is used instead of gettext
.Similarly, the dngettext
function should be used in place of thengettext
function.
Next: Creating, Previous: Sources, Up: Top [Contents][Index]
After preparing the sources, the programmer creates a PO template file.This section explains how to use xgettext
for this purpose.
xgettext
creates a file named domainname.po. Youshould then rename it to domainname.pot. (Why doesn’txgettext
create it under the name domainname.potright away? The answer is: for historical reasons. When xgettext
was specified, the distinction between a PO file and PO file templatewas fuzzy, and the suffix ‘.pot’ wasn’t in use at that time.)
• xgettext Invocation: | Invoking the xgettext Program |
Previous: Template, Up: Template [Contents][Index]
xgettext
Programxgettext [option] [inputfile] …
The xgettext
program extracts translatable strings from giveninput files.
Input files.
Read the names of the input files from file instead of gettingthem from the command line.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If inputfile is ‘-’, standard input is read.
Use name.po for output (instead of messages.po).
Write output to specified file (instead of name.po ormessages.po).
Output files will be placed in directory dir.
If the output file is ‘-’ or ‘/dev/stdout’, the outputis written to standard output.
Specifies the language of the input files. The supported languagesare C
, C++
, ObjectiveC
, PO
, Shell
,Python
, Lisp
, EmacsLisp
, librep
, Scheme
,Smalltalk
, Java
, JavaProperties
, C#
, awk
,YCP
, Tcl
, Perl
, PHP
, GCC-source
,NXStringTable
, RST
, Glade
, Lua
, JavaScript
,Vala
, GSettings
, Desktop
.
This is a shorthand for --language=C++
.
By default the language is guessed depending on the input file nameextension.
Specifies the encoding of the input files. This option is needed onlyif some untranslated message strings or their corresponding commentscontain non-ASCII characters. Note that Tcl and Glade input files arealways assumed to be in UTF-8, regardless of this option.
By default the input files are assumed to be in ASCII.
Join messages with existing file.
Entries from file are not extracted. file should be a PO orPOT file.
Place comment blocks starting with tag and preceding keyword linesin the output file. Without a tag, the option means to put allcomment blocks preceding keyword lines in the output file.
Note that comment blocks supposed to be extracted must be adjacent tokeyword lines. For example, in the following C source code:
/* This is the first comment. */ gettext ("foo"); /* This is the second comment: not extracted */ gettext ( "bar"); gettext ( /* This is the third comment. */ "baz");
The second comment line will not be extracted, because there is oneblank line between the comment line and the keyword.
Perform a syntax check on msgid and msgid_plural. The supported checksare:
Prefer Unicode ellipsis character over ASCII ...
Prohibit whitespace before an ellipsis character
Prefer Unicode quotation marks over ASCII "'`
Prefer Unicode bullet character over ASCII *
or -
The option has an effect on all input files. To enable or disablechecks for a certain string, you can mark it with an xgettext:
special comment in the source file. For example, if you specify the--check=space-ellipsis
option, but want to suppress the check ona particular string, add the following comment:
/* xgettext: no-space-ellipsis-check */ gettext ("We really want a space before ellipsis here ...");
The xgettext:
comment can be followed by flags separated with acomma. The possible flags are of the form ‘[no-]name-check’,where name is the name of a valid syntax check. If a flag isprefixed by no-
, the meaning is negated.
Some tests apply the checks to each sentence within the msgid, ratherthan the whole string. xgettext detects the end of sentence byperforming a pattern match, which usually looks for a period followed bya certain number of spaces. The number is specified with the--sentence-end
option.
The supported values are:
Expect at least one whitespace after a period
Expect at least two whitespaces after a period
Extract all strings.
This option has an effect with most languages, namely C, C++, ObjectiveC,Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, Tcl, Perl, PHP,GCC-source, Glade, Lua, JavaScript, Vala, GSettings.
Specify keywordspec as an additional keyword to be looked for.Without a keywordspec, the option means to not use default keywords.
If keywordspec is a C identifier id, xgettext
looksfor strings in the first argument of each call to the function or macroid. If keywordspec is of the form‘id:argnum’, xgettext
looks for strings in theargnumth argument of the call. If keywordspec is of the form‘id:argnum1,argnum2’, xgettext
looks forstrings in the argnum1st argument and in the argnum2nd argumentof the call, and treats them as singular/plural variants for a messagewith plural handling. Also, if keywordspec is of the form‘id:contextargnumc,argnum’ or‘id:argnum,contextargnumc’, xgettext
treatsstrings in the contextargnumth argument as a context specifier.And, as a special-purpose support for GNOME, if keywordspec is of theform ‘id:argnumg’, xgettext
recognizes theargnumth argument as a string with context, using the GNOME glib
syntax ‘"msgctxt|msgid"’.
Furthermore, if keywordspec is of the form‘id:…,totalnumargst’, xgettext
recognizes thisargument specification only if the number of actual arguments is equal tototalnumargs. This is useful for disambiguating overloaded functioncalls in C++.
Finally, if keywordspec is of the form‘id:argnum...,"xcomment"’, xgettext
, whenextracting a message from the specified argument strings, adds an extractedcomment xcomment to the message. Note that when used through a normalshell command line, the double-quotes around the xcomment need to beescaped.
This option has an effect with most languages, namely C, C++, ObjectiveC,Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, Tcl, Perl, PHP,GCC-source, Glade, Lua, JavaScript, Vala, GSettings, Desktop.
The default keyword specifications, which are always looked for if notexplicitly disabled, are language dependent. They are:
gettext
, dgettext:2
,dcgettext:2
, ngettext:1,2
, dngettext:2,3
,dcngettext:2,3
, gettext_noop
, and pgettext:1c,2
,dpgettext:2c,3
, dcpgettext:2c,3
, npgettext:1c,2,3
,dnpgettext:2c,3,4
, dcnpgettext:2c,3,4
.NSLocalizedString
, _
,NSLocalizedStaticString
, __
.gettext
, ngettext:1,2
, eval_gettext
,eval_ngettext:1,2
.gettext
, ugettext
, dgettext:2
,ngettext:1,2
, ungettext:1,2
, dngettext:2,3
, _
.gettext
, ngettext:1,2
, gettext-noop
._
._
.gettext
, ngettext:1,2
, gettext-noop
.GettextResource.gettext:2
,GettextResource.ngettext:2,3
, GettextResource.pgettext:2c,3
,GettextResource.npgettext:2c,3,4
, gettext
, ngettext:1,2
,pgettext:1c,2
, npgettext:1c,2,3
, getString
.GetString
, GetPluralString:1,2
,GetParticularString:1c,2
, GetParticularPluralString:1c,2,3
.dcgettext
, dcngettext:1,2
.::msgcat::mc
.gettext
, %gettext
, $gettext
, dgettext:2
,dcgettext:2
, ngettext:1,2
, dngettext:2,3
,dcngettext:2,3
, gettext_noop
._
, gettext
, dgettext:2
, dcgettext:2
,ngettext:1,2
, dngettext:2,3
, dcngettext:2,3
.label
, title
, text
, format
,copyright
, comments
, preview_text
, tooltip
._
, gettext.gettext
, gettext.dgettext:2
,gettext.dcgettext:2
, gettext.ngettext:1,2
,gettext.dngettext:2,3
, gettext.dcngettext:2,3
._
, gettext
, dgettext:2
,dcgettext:2
, ngettext:1,2
, dngettext:2,3
,pgettext:1c,2
, dpgettext:2c,3
._
, Q_
, N_
, NC_
, dgettext:2
,dcgettext:2
, ngettext:1,2
, dngettext:2,3
,dpgettext:2c,3
, dpgettext2:2c,3
.Name
, GenericName
, Comment
,Icon
, Keywords
.To disable the default keyword specifications, the option ‘-k’ or‘--keyword’ or ‘--keyword=’, without a keywordspec, can beused.
Specifies additional flags for strings occurring as part of the argthargument of the function word. The possible flags are the possibleformat string indicators, such as ‘c-format’, and their negations,such as ‘no-c-format’, possibly prefixed with ‘pass-’.
The meaning of --flag=function:arg:lang-format
is that in language lang, the specified function expects asargth argument a format string. (For those of you familiar withGCC function attributes, --flag=function:arg:c-format
isroughly equivalent to the declaration‘__attribute__ ((__format__ (__printf__, arg, ...)))’ attachedto function in a C source file.)For example, if you use the ‘error’ function from GNU libc, you canspecify its behaviour through --flag=error:3:c-format
. The effect ofthis specification is that xgettext
will mark as format strings allgettext
invocations that occur as argth argument offunction.This is useful when such strings contain no format string directives:together with the checks done by ‘msgfmt -c’ it will ensure thattranslators cannot accidentally use format string directives that wouldlead to a crash at runtime.
The meaning of --flag=function:arg:pass-lang-format
is that in language lang, if the function call occurs in aposition that must yield a format string, then its argth argumentmust yield a format string of the same type as well. (If you know GCCfunction attributes, the --flag=function:arg:pass-c-format
option is roughly equivalent to the declaration‘__attribute__ ((__format_arg__ (arg)))’ attached to functionin a C source file.)For example, if you use the ‘_’ shortcut for the gettext
function,you should use --flag=_:1:pass-c-format
. The effect of thisspecification is that xgettext
will propagate a format stringrequirement for a _("string")
call to its first argument, the literal"string"
, and thus mark it as a format string.This is useful when such strings contain no format string directives:together with the checks done by ‘msgfmt -c’ it will ensure thattranslators cannot accidentally use format string directives that wouldlead to a crash at runtime.
This option has an effect with most languages, namely C, C++, ObjectiveC,Shell, Python, Lisp, EmacsLisp, librep, Scheme, Java, C#, awk, YCP, Tcl, Perl, PHP,GCC-source, Lua, JavaScript, Vala.
Understand ANSI C trigraphs for input.
This option has an effect only with the languages C, C++, ObjectiveC.
Recognize Qt format strings.
This option has an effect only with the language C++.
Recognize KDE 4 format strings.
This option has an effect only with the language C++.
Recognize Boost format strings.
This option has an effect only with the language C++.
Use the flags c-format
and possible-c-format
to show who wasresponsible for marking a message as a format string. The latter form isused if the xgettext
program decided, the former form is used ifthe programmer prescribed it.
By default only the c-format
form is used. The translator shouldnot have to care about these details.
This implementation of xgettext
is able to process a few awkwardcases, like strings in preprocessor macros, ANSI concatenation ofadjacent strings, and escaped end of lines for continued strings.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if no message is defined.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines. Note that usingthis option makes it harder for technically skilled translators to understandeach message’s context.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Use ITS rules defined in file.Note that this is only effective with XML files.
Write out comments recognized by itstool (http://itstool.org).Note that this is only effective with XML files.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Don’t write header with ‘msgid ""’ entry.
This is useful for testing purposes because it eliminates a sourceof variance for generated .gmo
files. With --omit-header
,two invocations of xgettext
on the same files with the sameoptions at different times are guaranteed to produce the same results.
Note that using this option will lead to an error if the resulting filewould not entirely be in ASCII.
Set the copyright holder in the output. string should be thecopyright holder of the surrounding package. (Note that the msgstrstrings, extracted from the package’s sources, belong to the copyrightholder of the package.) Translators are expected to transfer or disclaimthe copyright for their translations, so that package maintainers candistribute them without legal risk. If string is empty, the outputfiles are marked as being in the public domain; in this case, the translatorsare expected to disclaim their copyright, again so that package maintainerscan distribute them without legal risk.
The default value for string is the Free Software Foundation, Inc.,simply because xgettext
was first used in the GNU project.
Omit FSF copyright in output. This option is equivalent to‘--copyright-holder=''’. It can be useful for packages outside the GNUproject that want their translations to be in the public domain.
Set the package name in the header of the output.
Set the package version in the header of the output. This option has aneffect only if the ‘--package-name’ option is also used.
Set the reporting address for msgid bugs. This is the email address or URLto which the translators shall report bugs in the untranslated strings:
It can be your email address, or a mailing list address where translatorscan write to without being subscribed, or the URL of a web page throughwhich the translators can contact you.
The default value is empty, which means that translators will be clueless!Don’t forget to specify this option.
Use string (or "" if not specified) as prefix for msgstr values.
Use string (or "" if not specified) as suffix for msgstr values.
Display this help and exit.
Output version information and exit.
Next: Updating, Previous: Template, Up: Top [Contents][Index]
When starting a new translation, the translator creates a file calledLANG.po, as a copy of the package.pot templatefile with modifications in the initial comments (at the beginning of the file)and in the header entry (the first entry, near the beginning of the file).
The easiest way to do so is by use of the ‘msginit’ program.For example:
$ cd PACKAGE-VERSION $ cd po $ msginit
The alternative way is to do the copy and modifications by hand.To do so, the translator copies package.pot toLANG.po. Then she modifies the initial comments andthe header entry of this file.
• msginit Invocation: | Invoking the msginit Program |
|
• Header Entry: | Filling in the Header Entry |
Next: Header Entry, Previous: Creating, Up: Creating [Contents][Index]
msginit
Programmsginit [option]
The msginit
program creates a new PO file, initializing the metainformation with values from the user’s environment.
Here are more details. The following header fields of a PO file areautomatically filled, when possible.
The value is guessed from the configure
script or any other filesin the current directory.
The value is taken from the PO-Creation-Data
in the input POTfile, or the current date is used.
The value is taken from user’s password file entry and the mailerconfiguration files.
These values are set according to the current locale and the predefinedlist of translation teams.
These values are set according to the content of the POT file and thecurrent locale. If the POT file contains charset=UTF-8, it means thatthe POT file contains non-ASCII characters, and we keep the UTF-8encoding. Otherwise, when the POT file is plain ASCII, we use thelocale’s encoding.
The value is first looked up from the embedded table.
As an experimental feature, you can instruct msginit
to use theinformation from Unicode CLDR, by setting the GETTEXTCLDRDIR
environment variable.
Input POT file.
If no inputfile is given, the current directory is searched for thePOT file. If it is ‘-’, standard input is read.
Write output to specified PO file.
If no output file is given, it depends on the ‘--locale’ option or theuser’s locale setting. If it is ‘-’, the results are written tostandard output.
Assume the input file is a Java ResourceBundle in Java .properties
syntax, not in PO file syntax.
Assume the input file is a NeXTstep/GNUstep localized resource file in.strings
syntax, not in PO file syntax.
Set target locale. ll should be a language code, and CC shouldbe a country code. The command ‘locale -a’ can be used to output a listof all installed locales. The default is the user’s locale setting.
Declares that the PO file will not have a human translator and is insteadautomatically generated.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Display this help and exit.
Output version information and exit.
Previous: msginit Invocation, Up: Creating [Contents][Index]
The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and"FIRST AUTHOR
Modifying the header entry can already be done using PO mode: in Emacs,type M-x po-mode RET and then RET again to start editing theentry. You should fill in the following fields.
This is the name and version of the package. Fill it in if it has notalready been filled in by xgettext
.
This has already been filled in by xgettext
. It contains an emailaddress or URL where you can report bugs in the untranslated strings:
This has already been filled in by xgettext
.
You don’t need to fill this in. It will be filled by the PO file editorwhen you save the file.
Fill in your name and email address (without double quotes).
Fill in the English name of the language, and the email address orhomepage URL of the language team you are part of.
Before starting a translation, it is a good idea to get in touch withyour translation team, not only to make sure you don’t do duplicated work,but also to coordinate difficult linguistic issues.
In the Free Translation Project, each translation team has its own mailinglist. The up-to-date list of teams can be found at the Free TranslationProject’s homepage, http://translationproject.org/, in the "Teams"area.
Fill in the language code of the language. This can be in one of threeforms:
The naming convention ‘ll_CC’ is also the way locales arenamed on systems based on GNU libc. But there are three important differences:
So, if your locale name is ‘de_DE.UTF-8’, the language specification inPO files is just ‘de’.
Replace ‘CHARSET’ with the character encoding used for your language,in your locale, or UTF-8. This field is needed for correct operation of themsgmerge
and msgfmt
programs, as well as for users whoselocale’s character encoding differs from yours (see Charset conversion).
You get the character encoding of your locale by running the shell command‘locale charmap’. If the result is ‘C’ or ‘ANSI_X3.4-1968’,which is equivalent to ‘ASCII’ (= ‘US-ASCII’), it means that yourlocale is not correctly configured. In this case, ask your translationteam which charset to use. ‘ASCII’ is not usable for any languageexcept Latin.
Because the PO files must be portable to operating systems with less advancedinternationalization facilities, the character encodings that can be usedare limited to those supported by both GNU libc
and GNUlibiconv
. These are:ASCII
, ISO-8859-1
, ISO-8859-2
, ISO-8859-3
,ISO-8859-4
, ISO-8859-5
, ISO-8859-6
, ISO-8859-7
,ISO-8859-8
, ISO-8859-9
, ISO-8859-13
, ISO-8859-14
,ISO-8859-15
,KOI8-R
, KOI8-U
, KOI8-T
,CP850
, CP866
, CP874
,CP932
, CP949
, CP950
, CP1250
, CP1251
,CP1252
, CP1253
, CP1254
, CP1255
, CP1256
,CP1257
, GB2312
, EUC-JP
, EUC-KR
, EUC-TW
,BIG5
, BIG5-HKSCS
, GBK
, GB18030
, SHIFT_JIS
,JOHAB
, TIS-620
, VISCII
, GEORGIAN-PS
, UTF-8
.
In the GNU system, the following encodings are frequently used for thecorresponding languages.
ISO-8859-1
forAfrikaans, Albanian, Basque, Breton, Catalan, Cornish, Danish, Dutch,English, Estonian, Faroese, Finnish, French, Galician, German,Greenlandic, Icelandic, Indonesian, Irish, Italian, Malay, Manx,Norwegian, Occitan, Portuguese, Spanish, Swedish, Tagalog, Uzbek,Walloon,ISO-8859-2
forBosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak,Slovenian,ISO-8859-3
for Maltese,ISO-8859-5
for Macedonian, Serbian,ISO-8859-6
for Arabic,ISO-8859-7
for Greek,ISO-8859-8
for Hebrew,ISO-8859-9
for Turkish,ISO-8859-13
for Latvian, Lithuanian, Maori,ISO-8859-14
for Welsh,ISO-8859-15
forBasque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,Italian, Portuguese, Spanish, Swedish, Walloon,KOI8-R
for Russian,KOI8-U
for Ukrainian,KOI8-T
for Tajik,CP1251
for Bulgarian, Belarusian,GB2312
, GBK
, GB18030
for simplified writing of Chinese,BIG5
, BIG5-HKSCS
for traditional writing of Chinese,EUC-JP
for Japanese,EUC-KR
for Korean,TIS-620
for Thai,GEORGIAN-PS
for Georgian,UTF-8
for any language, including those listed above.When single quote characters or double quote characters are used intranslations for your language, and your locale’s encoding is one of theISO-8859-* charsets, it is best if you create your PO files in UTF-8encoding, instead of your locale’s encoding. This is because in UTF-8the real quote characters can be represented (single quote characters:U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none ofISO-8859-* charsets has them all. Users in UTF-8 locales will see thereal quote characters, whereas users in ISO-8859-* locales will see thevertical apostrophe and the vertical double quote instead (because that’swhat the character set conversion will transliterate them to).
To enter such quote characters under X11, you can change your keyboardmapping using the xmodmap
program. The X11 names of the quotecharacters are "leftsinglequotemark", "rightsinglequotemark","leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark","doublelowquotemark".
Note that only recent versions of GNU Emacs support the UTF-8 encoding:Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn’tsupport the UTF-8 encoding.
The character encoding name can be written in either upper or lower case.Usually upper case is preferred.
Set this to 8bit
.
This field is optional. It is only needed if the PO file has plural forms.You can find them by searching for the ‘msgid_plural’ keyword. Theformat of the plural forms field is described in Plural forms andTranslating plural forms.
Next: Editing, Previous: Creating, Up: Top [Contents][Index]
• msgmerge Invocation: | Invoking the msgmerge Program |
Previous: Updating, Up: Updating [Contents][Index]
msgmerge
Programmsgmerge [option] def.po ref.pot
The msgmerge
program merges two Uniforum style .po files together.The def.po file is an existing PO file with translations which willbe taken over to the newly created file as long as they still match;comments will be preserved, but extracted comments and file positions willbe discarded. The ref.pot file is the last created PO file withup-to-date source references but old translations, or a PO Template file(generally created by xgettext
); any translations or commentsin the file will be discarded, however dot comments and file positionswill be preserved. Where an exact match cannot be found, fuzzy matchingis used to produce better results.
Translations referring to old sources.
References to the new sources.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
Specify an additional library of message translations. See Compendium.This option may be specified more than once.
Update def.po. Do nothing if def.po is already up to date.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
The result is written back to def.po.
Make a backup of def.po
Override the usual backup suffix.
The version control method may be selected via the --backup
optionor through the VERSION_CONTROL
environment variable. Here are thevalues:
Never make backups (even if --backup
is given).
Make numbered backups.
Make numbered backups if numbered backups for this file already exist,otherwise make simple backups.
Always make simple backups.
The backup suffix is ‘~’, unless set with --suffix
or theSIMPLE_BACKUP_SUFFIX
environment variable.
Apply ref.pot to each of the domains in def.po.
Do not use fuzzy matching when an exact match is not found. This may speedup the operation considerably.
Keep the previous msgids of translated messages, marked with ‘#|’, whenadding the fuzzy marker to such messages.
Assume the input files are Java ResourceBundles in Java .properties
syntax, not in PO file syntax.
Assume the input files are NeXTstep/GNUstep localized resource files in.strings
syntax, not in PO file syntax.
Specify the ‘Language’ field to be used in the header entry. SeeHeader Entry for the meaning of this field. Note: The‘Language-Team’ and ‘Plural-Forms’ fields are left unchanged.If this option is not specified, the ‘Language’ field is inferred, asbest as possible, from the ‘Language-Team’ field.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Display this help and exit.
Output version information and exit.
Increase verbosity level.
Suppress progress indicators.
Next: Manipulating, Previous: Updating, Up: Top [Contents][Index]
• KBabel: | KDE’s PO File Editor | |
• Gtranslator: | GNOME’s PO File Editor | |
• PO Mode: | Emacs’s PO File Editor | |
• Compendium: | Using Translation Compendia |
Next: Gtranslator, Previous: Editing, Up: Editing [Contents][Index]
Next: PO Mode, Previous: KBabel, Up: Editing [Contents][Index]
Next: Compendium, Previous: Gtranslator, Up: Editing [Contents][Index]
For those of you beingthe lucky users of Emacs, PO mode has been specifically createdfor providing a cozy environment for editing or modifying PO files.While editing a PO file, PO mode allows for the easy browsing ofauxiliary and compendium PO files, as well as for following references intothe set of C program sources from which PO files have been derived.It has a few special features, among which are the interactive markingof program strings as translatable, and the validation of PO fileswith easy repositioning to PO file lines showing errors.
For the beginning, besides main PO mode commands(see Main PO Commands), you should know how to move between entries(see Entry Positioning), and how to handle untranslated entries(see Untranslated Entries).
• Installation: | Completing GNU gettext Installation |
|
• Main PO Commands: | Main Commands | |
• Entry Positioning: | Entry Positioning | |
• Normalizing: | Normalizing Strings in Entries | |
• Translated Entries: | Translated Entries | |
• Fuzzy Entries: | Fuzzy Entries | |
• Untranslated Entries: | Untranslated Entries | |
• Obsolete Entries: | Obsolete Entries | |
• Modifying Translations: | Modifying Translations | |
• Modifying Comments: | Modifying Comments | |
• Subedit: | Mode for Editing Translations | |
• C Sources Context: | C Sources Context | |
• Auxiliary: | Consulting Auxiliary PO Files |
Next: Main PO Commands, Previous: PO Mode, Up: PO Mode [Contents][Index]
gettext
InstallationOnce you have received, unpacked, configured and compiled the GNUgettext
distribution, the ‘make install’ command puts inplace the programs xgettext
, msgfmt
, gettext
, andmsgmerge
, as well as their available message catalogs. Totop off a comfortable installation, you might also want to make thePO mode available to your Emacs users.
During the installation of the PO mode, you might want to modify yourfile .emacs, once and for all, so it contains a few lines lookinglike:
(setq auto-mode-alist (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist)) (autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
Later, whenever you edit some .pofile, or any file having the string ‘.po.’ within its name,Emacs loads po-mode.elc (or po-mode.el) as needed, andautomatically activates PO mode commands for the associated buffer.The string PO appears in the mode line for any buffer forwhich PO mode is active. Many PO files may be active at once in asingle Emacs session.
If you are using Emacs version 20 or newer, and have already installedthe appropriate international fonts on your system, you may also tellEmacs how to determine automatically the coding system of every PO file.This will often (but not always) cause the necessary fonts to be loadedand used for displaying the translations on your Emacs screen. For thisto happen, add the lines:
(modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\." 'po-find-file-coding-system) (autoload 'po-find-file-coding-system "po-mode")
to your .emacs file. If, with this, you still see boxes insteadof international characters, try a different font set (via Shift Mousebutton 1).
Next: Entry Positioning, Previous: Installation, Up: PO Mode [Contents][Index]
After setting up Emacs with something similar to the lines inInstallation, PO mode is activated for a window when Emacs finds aPO file in that window. This puts the window read-only and establishes apo-mode-map, which is a genuine Emacs mode, in a way that is not derivedfrom text mode in any way. Functions found on po-mode-hook
,if any, will be executed.
When PO mode is active in a window, the letters ‘PO’ appearin the mode line for that window. The mode line also displays howmany entries of each kind are held in the PO file. For example,the string ‘132t+3f+10u+2o’ would tell the translator that thePO mode contains 132 translated entries (see Translated Entries,3 fuzzy entries (see Fuzzy Entries), 10 untranslated entries(see Untranslated Entries) and 2 obsolete entries (see Obsolete Entries). Zero-coefficients items are not shown. So, in this example, ifthe fuzzy entries were unfuzzied, the untranslated entries were translatedand the obsolete entries were deleted, the mode line would merely display‘145t’ for the counters.
The main PO commands are those which do not fit into the other categories ofsubsequent sections. These allow for quitting PO mode or for managing windowsin special ways.
Undo last modification to the PO file (po-undo
).
Quit processing and save the PO file (po-quit
).
Quit processing, possibly after confirmation (po-confirm-and-quit
).
Temporary leave the PO file window (po-other-window
).
Show help about PO mode (po-help
).
Give some PO file statistics (po-statistics
).
Batch validate the format of the whole PO file (po-validate
).
The command _ (po-undo
) interfaces to the Emacsundo facility. See Undoing Changes in The EmacsEditor. Each time _ is typed, modifications which the translatordid to the PO file are undone a little more. For the purpose ofundoing, each PO mode command is atomic. This is especially true forthe RET command: the whole edition made by using a singleuse of this command is undone at once, even if the edition itselfimplied several actions. However, while in the editing window, onecan undo the edition work quite parsimoniously.
The commands Q (po-quit
) and q(po-confirm-and-quit
) are used when the translator is done with thePO file. The former is a bit less verbose than the latter. If the filehas been modified, it is saved to disk first. In both cases, and prior toall this, the commands check if any untranslated messages remain in thePO file and, if so, the translator is asked if she really wants to leaveoff working with this PO file. This is the preferred way of getting ridof an Emacs PO file buffer. Merely killing it through the usual commandC-x k (kill-buffer
) is not the tidiest way to proceed.
The command 0 (po-other-window
) is another, softer way,to leave PO mode, temporarily. It just moves the cursor to some otherEmacs window, and pops one if necessary. For example, if the translatorjust got PO mode to show some source context in some other, she mightdiscover some apparent bug in the program source that needs correction.This command allows the translator to change sex, become a programmer,and have the cursor right into the window containing the program she(or rather he) wants to modify. By later getting the cursor backin the PO file window, or by asking Emacs to edit this file once again,PO mode is then recovered.
The command h (po-help
) displays a summary of all available POmode commands. The translator should then type any character to resumenormal PO mode operations. The command ? has the same effectas h.
The command = (po-statistics
) computes the total number ofentries in the PO file, the ordinal of the current entry (counted from1), the number of untranslated entries, the number of obsolete entries,and displays all these numbers.
The command V (po-validate
) launches msgfmt
inchecking and verbosemode over the current PO file. This command first offers to save thecurrent PO file on disk. The msgfmt
tool, from GNU gettext
,has the purpose of creating a MO file out of a PO file, and PO mode usesthe features of this program for checking the overall format of a PO file,as well as all individual entries.
The program msgfmt
runs asynchronously with Emacs, so thetranslator regains control immediately while her PO file is being studied.Error output is collected in the Emacs ‘*compilation*’ buffer,displayed in another window. The regular Emacs command C-x`(next-error
), as well as other usual compile commands, allow thetranslator to reposition quickly to the offending parts of the PO file.Once the cursor is on the line in error, the translator may decide onany PO mode action which would help correcting the error.
Next: Normalizing, Previous: Main PO Commands, Up: PO Mode [Contents][Index]
The cursor in a PO file window is almost always part ofan entry. The only exceptions are the special case when the cursoris after the last entry in the file, or when the PO file isempty. The entry where the cursor is found to be is said to be thecurrent entry. Many PO mode commands operate on the current entry,so moving the cursor does more than allowing the translator to browsethe PO file, this also selects on which entry commands operate.
Some PO mode commands alter the position of the cursor in a specializedway. A few of those special purpose positioning are described here,the others are described in following sections (for a complete list tryC-h m):
Redisplay the current entry (po-current-entry
).
Select the entry after the current one (po-next-entry
).
Select the entry before the current one (po-previous-entry
).
Select the first entry in the PO file (po-first-entry
).
Select the last entry in the PO file (po-last-entry
).
Record the location of the current entry for later use(po-push-location
).
Return to a previously saved entry location (po-pop-location
).
Exchange the current entry location with the previously saved one(po-exchange-location
).
Any Emacs command able to reposition the cursor may be usedto select the current entry in PO mode, including commands whichmove by characters, lines, paragraphs, screens or pages, and searchcommands. However, there is a kind of standard way to display thecurrent entry in PO mode, which usual Emacs commands movingthe cursor do not especially try to enforce. The command .(po-current-entry
) has the sole purpose of redisplaying thecurrent entry properly, after the current entry has been changed bymeans external to PO mode, or the Emacs screen otherwise altered.
It is yet to be decided if PO mode helps the translator, or otherwiseirritates her, by forcing a rigid window disposition while sheis doing her work. We originally had quite precise ideas abouthow windows should behave, but on the other hand, anyone used toEmacs is often happy to keep full control. Maybe a fixed windowdisposition might be offered as a PO mode option that the translatormight activate or deactivate at will, so it could be offered on anexperimental basis. If nobody feels a real need for using it, ora compulsion for writing it, we should drop this whole idea.The incentive for doing it should come from translators rather thanprogrammers, as opinions from an experienced translator are surelymore worth to me than opinions from programmers thinking abouthow others should do translation.
The commands n (po-next-entry
) and p(po-previous-entry
) move the cursor the entry following,or preceding, the current one. If n is given while thecursor is on the last entry of the PO file, or if pis given while the cursor is on the first entry, no move is done.
The commands < (po-first-entry
) and >(po-last-entry
) move the cursor to the first entry, or lastentry, of the PO file. When the cursor is located past the lastentry in a PO file, most PO mode commands will return an error saying‘After last entry’. Moreover, the commands < and >have the special property of being able to work even when the cursoris not into some PO file entry, and one may use them for nicelycorrecting this situation. But even these commands will fail on atruly empty PO file. There are development plans for the PO mode for itto interactively fill an empty PO file from sources. See Marking.
The translator may decide, before working at the translation ofa particular entry, that she needs to browse the remainder of thePO file, maybe for finding the terminology or phraseology usedin related entries. She can of course use the standard Emacs idiomsfor saving the current cursor location in some register, and use thatregister for getting back, or else, use the location ring.
PO mode offers another approach, by which cursor locations may be savedonto a special stack. The command m (po-push-location
)merely adds the location of current entry to the stack, pushingthe already saved locations under the new one. The commandr (po-pop-location
) consumes the top stack element andrepositions the cursor to the entry associated with that top element.This position is then lost, for the next r will move the cursorto the previously saved location, and so on until no locations remainon the stack.
If the translator wants the position to be kept on the location stack,maybe for taking a look at the entry associated with the topelement, then go elsewhere with the intent of getting back later, sheought to use m immediately after r.
The command x (po-exchange-location
) simultaneouslyrepositions the cursor to the entry associated with the top element ofthe stack of saved locations, and replaces that top element with thelocation of the current entry before the move. Consequently, repeatingthe x command toggles alternatively between two entries.For achieving this, the translator will position the cursor on thefirst entry, use m, then position to the second entry, andmerely use x for making the switch.
Next: Translated Entries, Previous: Entry Positioning, Up: PO Mode [Contents][Index]
There are many different ways for encoding a particular string into aPO file entry, because there are so many different ways to split andquote multi-line strings, and even, to represent special charactersby backslashed escaped sequences. Some features of PO mode rely onthe ability for PO mode to scan an already existing PO file for aparticular string encoded into the msgid
field of some entry.Even if PO mode has internally all the built-in machinery forimplementing this recognition easily, doing it fast is technicallydifficult. To facilitate a solution to this efficiency problem,we decided on a canonical representation for strings.
A conventional representation of strings in a PO file is currentlyunder discussion, and PO mode experiments with a canonical representation.Having both xgettext
and PO mode converging towards a uniformway of representing equivalent strings would be useful, as the internalnormalization needed by PO mode could be automatically satisfiedwhen using xgettext
from GNU gettext
. An explicitPO mode normalization should then be only necessary for PO filesimported from elsewhere, or for when the convention itself evolves.
So, for achieving normalization of at least the strings of a givenPO file needing a canonical representation, the following PO modecommand is available:
Tidy the whole PO file by making entries more uniform.
The special command M-x po-normalize, which has no associatedkeys, revises all entries, ensuring that strings of both originaland translated entries use uniform internal quoting in the PO file.It also removes any crumb after the last entry. This command may beuseful for PO files freshly imported from elsewhere, or if we everimprove on the canonical quoting format we use. This canonical formatis not only meant for getting cleaner PO files, but also for greatlyspeeding up msgid
string lookup for some other PO mode commands.
M-x po-normalize presently makes three passes over the entries.The first implements heuristics for converting PO files for GNUgettext
0.6 and earlier, in which msgid
and msgstr
fields were using K&R style C string syntax for multi-line strings.These heuristics may fail for comments not related to obsoleteentries and ending with a backslash; they also depend on subsequentpasses for finalizing the proper commenting of continued lines forobsolete entries. This first pass might disappear once all oldish POfiles would have been adjusted. The second and third pass normalizeall msgid
and msgstr
strings respectively. They alsoclean out those trailing backslashes used by XView’s msgfmt
for continued lines.
Having such an explicit normalizing command allows for importing POfiles from other sources, but also eases the evolution of the currentconvention, evolution driven mostly by aesthetic concerns, as of now.It is easy to make suggested adjustments at a later time, as thenormalizing command and eventually, other GNU gettext
toolsshould greatly automate conformance. A description of the canonicalstring format is given below, for the particular benefit of those nothaving Emacs handy, and who would nevertheless want to handcrafttheir PO files in nice ways.
Right now, in PO mode, strings are single line or multi-line. A stringgoes multi-line if and only if it has embedded newlines, thatis, if it matches ‘[^\n]\n+[^\n]’. So, we would have:
msgstr "\n\nHello, world!\n\n\n"
but, replacing the space by a newline, this becomes:
msgstr "" "\n" "\n" "Hello,\n" "world!\n" "\n" "\n"
We are deliberately using a caricatural example, here, to make thepoint clearer. Usually, multi-lines are not that bad looking.It is probable that we will implement the following suggestion.We might lump together all initial newlines into the empty string,and also all newlines introducing empty lines (that is, for n > 1, the n-1’th last newlines would go together on a separatestring), so making the previous example appear:
msgstr "\n\n" "Hello,\n" "world!\n" "\n\n"
There are a few yet undecided little points about string normalization,to be documented in this manual, once these questions settle.
Next: Fuzzy Entries, Previous: Normalizing, Up: PO Mode [Contents][Index]
Each PO file entry for which the msgstr
field has been filled witha translation, and which is not marked as fuzzy (see Fuzzy Entries),is said to be a translated entry. Only translated entries willlater be compiled by GNU msgfmt
and become usable in programs.Other entry types will be excluded; translation will not occur for them.
Some commands are more specifically related to translated entry processing.
Find the next translated entry (po-next-translated-entry
).
Find the previous translated entry (po-previous-translated-entry
).
The commands t (po-next-translated-entry
) and T(po-previous-translated-entry
) move forwards or backwards, chasingfor an translated entry. If none is found, the search is extended andwraps around in the PO file buffer.
Translated entries usually result from the translator having edited ina translation for them, Modifying Translations. However, if thevariable po-auto-fuzzy-on-edit
is not nil
, the entry havingreceived a new translation first becomes a fuzzy entry, which ought tobe later unfuzzied before becoming an official, genuine translated entry.See Fuzzy Entries.
Next: Untranslated Entries, Previous: Translated Entries, Up: PO Mode [Contents][Index]
Each PO file entry may have a set of attributes, which arequalities given a name and explicitly associated with the translation,using a special system comment. One of these attributeshas the name fuzzy
, and entries having this attribute are saidto have a fuzzy translation. They are called fuzzy entries, for short.
Fuzzy entries, even if they account for translated entries formost other purposes, usually call for revision by the translator.Those may be produced by applying the program msgmerge
toupdate an older translated PO files according to a new PO templatefile, when this tool hypothesises that some new msgid
hasbeen modified only slightly out of an older one, and chooses to pairwhat it thinks to be the old translation for the new modified entry.The slight alteration in the original string (the msgid
string)should often be reflected in the translated string, and this requiresthe intervention of the translator. For this reason, msgmerge
might mark some entries as being fuzzy.
Also, the translator may decide herself to mark an entry as fuzzyfor her own convenience, when she wants to remember that the entryhas to be later revisited. So, some commands are more specificallyrelated to fuzzy entry processing.
Find the next fuzzy entry (po-next-fuzzy-entry
).
Find the previous fuzzy entry (po-previous-fuzzy-entry
).
Remove the fuzzy attribute of the current entry (po-unfuzzy
).
The commands f (po-next-fuzzy-entry
) and F(po-previous-fuzzy-entry
) move forwards or backwards, chasing fora fuzzy entry. If none is found, the search is extended and wrapsaround in the PO file buffer.
The command TAB (po-unfuzzy
) removes the fuzzyattribute associated with an entry, usually leaving it translated.Further, if the variable po-auto-select-on-unfuzzy
has notthe nil
value, the TAB command will automatically chasefor another interesting entry to work on. The initial value ofpo-auto-select-on-unfuzzy
is nil
.
The initial value of po-auto-fuzzy-on-edit
is nil
. However,if the variable po-auto-fuzzy-on-edit
is set to t
, any entryedited through the RET command is marked fuzzy, as a way toensure some kind of double check, later. In this case, the usual paradigmis that an entry becomes fuzzy (if not already) whenever the translatormodifies it. If she is satisfied with the translation, she then usesTAB to pick another entry to work on, clearing the fuzzy attributeon the same blow. If she is not satisfied yet, she merely uses SPCto chase another entry, leaving the entry fuzzy.
The translator may also use the DEL command(po-fade-out-entry
) over any translated entry to mark it as beingfuzzy, when she wants to easily leave a trace she wants to later returnworking at this entry.
Also, when time comes to quit working on a PO file buffer with the qcommand, the translator is asked for confirmation, if fuzzy stringstill exists.
Next: Obsolete Entries, Previous: Fuzzy Entries, Up: PO Mode [Contents][Index]
When xgettext
originally creates a PO file, unless toldotherwise, it initializes the msgid
field with the untranslatedstring, and leaves the msgstr
string to be empty. Such entries,having an empty translation, are said to be untranslated entries.Later, when the programmer slightly modifies some string right inthe program, this change is later reflected in the PO fileby the appearance of a new untranslated entry for the modified string.
The usual commands moving from entry to entry consider untranslatedentries on the same level as active entries. Untranslated entriesare easily recognizable by the fact they end with ‘msgstr ""’.
The work of the translator might be (quite naively) seen as the processof seeking for an untranslated entry, editing a translation forit, and repeating these actions until no untranslated entries remain.Some commands are more specifically related to untranslated entryprocessing.
Find the next untranslated entry (po-next-untranslated-entry
).
Find the previous untranslated entry (po-previous-untransted-entry
).
Turn the current entry into an untranslated one (po-kill-msgstr
).
The commands u (po-next-untranslated-entry
) and U(po-previous-untransted-entry
) move forwards or backwards,chasing for an untranslated entry. If none is found, the search isextended and wraps around in the PO file buffer.
An entry can be turned back into an untranslated entry bymerely emptying its translation, using the command k(po-kill-msgstr
). See Modifying Translations.
Also, when time comes to quit working on a PO file bufferwith the q command, the translator is asked for confirmation,if some untranslated string still exists.
Next: Modifying Translations, Previous: Untranslated Entries, Up: PO Mode [Contents][Index]
By obsolete PO file entries, we mean those entries which arecommented out, usually by msgmerge
when it found that thetranslation is not needed anymore by the package being localized.
The usual commands moving from entry to entry consider obsoleteentries on the same level as active entries. Obsolete entries areeasily recognizable by the fact that all their lines start with#
, even those lines containing msgid
or msgstr
.
Commands exist for emptying the translation or reinitializing itto the original untranslated string. Commands interfacing with thekill ring may force some previously saved text into the translation.The user may interactively edit the translation. All these commandsmay apply to obsolete entries, carefully leaving the entry obsoleteafter the fact.
Moreover, some commands are more specifically related to obsoleteentry processing.
Find the next obsolete entry (po-next-obsolete-entry
).
Find the previous obsolete entry (po-previous-obsolete-entry
).
Make an active entry obsolete, or zap out an obsolete entry(po-fade-out-entry
).
The commands o (po-next-obsolete-entry
) and O(po-previous-obsolete-entry
) move forwards or backwards,chasing for an obsolete entry. If none is found, the search isextended and wraps around in the PO file buffer.
PO mode does not provide ways for un-commenting an obsolete entryand making it active, because this would reintroduce an originaluntranslated string which does not correspond to any marked stringin the program sources. This goes with the philosophy of neverintroducing useless msgid
values.
However, it is possible to comment out an active entry, so makingit obsolete. GNU gettext
utilities will later react to thedisappearance of a translation by using the untranslated string.The command DEL (po-fade-out-entry
) pushes the current entrya little further towards annihilation. If the entry is active (it is atranslated entry), then it is first made fuzzy. If it is already fuzzy,then the entry is merely commented out, with confirmation. If the entryis already obsolete, then it is completely deleted from the PO file.It is easy to recycle the translation so deleted into some other PO fileentry, usually one which is untranslated. See Modifying Translations.
Here is a quite interesting problem to solve for later development ofPO mode, for those nights you are not sleepy. The idea would be thatPO mode might become bright enough, one of these days, to make goodguesses at retrieving the most probable candidate, among all obsoleteentries, for initializing the translation of a newly appeared string.I think it might be a quite hard problem to do this algorithmically, aswe have to develop good and efficient measures of string similarity.Right now, PO mode completely lets the decision to the translator,when the time comes to find the adequate obsolete translation, itmerely tries to provide handy tools for helping her to do so.
Next: Modifying Comments, Previous: Obsolete Entries, Up: PO Mode [Contents][Index]
PO mode prevents direct modification of the PO file, by the usualmeans Emacs gives for altering a buffer’s contents. By doing so,it pretends helping the translator to avoid little clerical errorsabout the overall file format, or the proper quoting of strings,as those errors would be easily made. Other kinds of errors arestill possible, but some may be caught and diagnosed by the batchvalidation process, which the translator may always trigger by theV command. For all other errors, the translator has to rely onher own judgment, and also on the linguistic reports submitted to herby the users of the translated package, having the same mother tongue.
When the time comes to create a translation, correct an error diagnosedmechanically or reported by a user, the translators have to resort tousing the following commands for modifying the translations.
Interactively edit the translation (po-edit-msgstr
).
Reinitialize the translation with the original, untranslated string(po-msgid-to-msgstr
).
Save the translation on the kill ring, and delete it (po-kill-msgstr
).
Save the translation on the kill ring, without deleting it(po-kill-ring-save-msgstr
).
Replace the translation, taking the new from the kill ring(po-yank-msgstr
).
The command RET (po-edit-msgstr
) opens a new Emacswindow meant to edit in a new translation, or to modify an already existingtranslation. The new window contains a copy of the translation taken fromthe current PO file entry, all ready for edition, expunged of all quotingmarks, fully modifiable and with the complete extent of Emacs modifyingcommands. When the translator is done with her modifications, she may useC-c C-c to close the subedit window with the automatically requotedresults, or C-c C-k to abort her modifications. See Subedit,for more information.
The command LFD (po-msgid-to-msgstr
) initializes, orreinitializes the translation with the original string. This command isnormally used when the translator wants to redo a fresh translation ofthe original string, disregarding any previous work.
It is possible to arrange so, whenever editing an untranslatedentry, the LFD command be automatically executed. If you setpo-auto-edit-with-msgid
to t
, the translation getsinitialised with the original string, in case none exists already.The default value for po-auto-edit-with-msgid
is nil
.
In fact, whether it is best to start a translation with an emptystring, or rather with a copy of the original string, is a matter oftaste or habit. Sometimes, the source language and thetarget language are so different that is simply best to start writingon an empty page. At other times, the source and target languagesare so close that it would be a waste to retype a number of wordsalready being written in the original string. A translator may alsolike having the original string right under her eyes, as she willprogressively overwrite the original text with the translation, evenif this requires some extra editing work to get rid of the original.
The command k (po-kill-msgstr
) merely empties thetranslation string, so turning the entry into an untranslatedone. But while doing so, its previous contents is put apart ina special place, known as the kill ring. The command w(po-kill-ring-save-msgstr
) has also the effect of taking acopy of the translation onto the kill ring, but it otherwise leavesthe entry alone, and does not remove the translation from theentry. Both commands use exactly the Emacs kill ring, which is sharedbetween buffers, and which is well known already to Emacs lovers.
The translator may use k or w many times in the courseof her work, as the kill ring may hold several saved translations.From the kill ring, strings may later be reinserted in variousEmacs buffers. In particular, the kill ring may be used for movingtranslation strings between different entries of a single PO filebuffer, or if the translator is handling many such buffers at once,even between PO files.
To facilitate exchanges with buffers which are not in PO mode, thetranslation string put on the kill ring by the k command is fullyunquoted before being saved: external quotes are removed, multi-linestrings are concatenated, and backslash escaped sequences are turnedinto their corresponding characters. In the special case of obsoleteentries, the translation is also uncommented prior to saving.
The command y (po-yank-msgstr
) completely replaces thetranslation of the current entry by a string taken from the kill ring.Following Emacs terminology, we then say that the replacementstring is yanked into the PO file buffer.See Yanking in The Emacs Editor.The first time y is used, the translation receives the value ofthe most recent addition to the kill ring. If y is typed onceagain, immediately, without intervening keystrokes, the translationjust inserted is taken away and replaced by the second most recentaddition to the kill ring. By repeating y many times in a row,the translator may travel along the kill ring for saved strings,until she finds the string she really wanted.
When a string is yanked into a PO file entry, it is fully andautomatically requoted for complying with the format PO files shouldhave. Further, if the entry is obsolete, PO mode then appropriatelypush the inserted string inside comments. Once again, translatorsshould not burden themselves with quoting considerations besides, ofcourse, the necessity of the translated string itself respective tothe program using it.
Note that k or w are not the only commands pushing stringson the kill ring, as almost any PO mode command replacing translationstrings (or the translator comments) automatically saves the old stringon the kill ring. The main exceptions to this general rule are theyanking commands themselves.
To better illustrate the operation of killing and yanking, let’suse an actual example, taken from a common situation. When theprogrammer slightly modifies some string right in the program, hischange is later reflected in the PO file by the appearanceof a new untranslated entry for the modified string, and the factthat the entry translating the original or unmodified string becomesobsolete. In many cases, the translator might spare herself some workby retrieving the unmodified translation from the obsolete entry,then initializing the untranslated entry msgstr
field withthis retrieved translation. Once this done, the obsolete entry isnot wanted anymore, and may be safely deleted.
When the translator finds an untranslated entry and suspects that aslight variant of the translation exists, she immediately uses mto mark the current entry location, then starts chasing obsoleteentries with o, hoping to find some translation correspondingto the unmodified string. Once found, she uses the DEL commandfor deleting the obsolete entry, knowing that DEL also killsthe translation, that is, pushes the translation on the kill ring.Then, r returns to the initial untranslated entry, and ythen yanks the saved translation right into the msgstr
field. The translator is then free to use RET for finetuning the translation contents, and maybe to later use u,then m again, for going on with the next untranslated string.
When some sequence of keys has to be typed over and over again, thetranslator may find it useful to become better acquainted with the Emacscapability of learning these sequences and playing them back under request.See Keyboard Macros in The Emacs Editor.
Next: Subedit, Previous: Modifying Translations, Up: PO Mode [Contents][Index]
Any translation work done seriously will raise many linguisticdifficulties, for which decisions have to be made, and the choicesfurther documented. These documents may be saved within thePO file in form of translator comments, which the translatoris free to create, delete, or modify at will. These comments maybe useful to herself when she returns to this PO file after a while.
Comments not having whitespace after the initial ‘#’, for example,those beginning with ‘#.’ or ‘#:’, are not translatorcomments, they are exclusively created by other gettext
tools.So, the commands below will never alter such system added comments,they are not meant for the translator to modify. See PO Files.
The following commands are somewhat similar to those modifying translations,so the general indications given for those apply here. See Modifying Translations.
Interactively edit the translator comments (po-edit-comment
).
Save the translator comments on the kill ring, and delete it(po-kill-comment
).
Save the translator comments on the kill ring, without deleting it(po-kill-ring-save-comment
).
Replace the translator comments, taking the new from the kill ring(po-yank-comment
).
These commands parallel PO mode commands for modifying the translationstrings, and behave much the same way as they do, except that they handlethis part of PO file comments meant for translator usage, ratherthan the translation strings. So, if the descriptions given below areslightly succinct, it is because the full details have already been given.See Modifying Translations.
The command # (po-edit-comment
) opens a new Emacs windowcontaining a copy of the translator comments on the current PO file entry.If there are no such comments, PO mode understands that the translator wantsto add a comment to the entry, and she is presented with an empty screen.Comment marks (#
) and the space following them are automaticallyremoved before edition, and reinstated after. For translator commentspertaining to obsolete entries, the uncommenting and recommenting operationsare done twice. Once in the editing window, the keys C-c C-callow the translator to tell she is finished with editing the comment.See Subedit, for further details.
Functions found on po-subedit-mode-hook
, if any, are executed afterthe string has been inserted in the edit buffer.
The command K (po-kill-comment
) gets rid of alltranslator comments, while saving those comments on the kill ring.The command W (po-kill-ring-save-comment
) takesa copy of the translator comments on the kill ring, but leavesthem undisturbed in the current entry. The command Y(po-yank-comment
) completely replaces the translator commentsby a string taken at the front of the kill ring. When this commandis immediately repeated, the comments just inserted are withdrawn,and replaced by other strings taken along the kill ring.
On the kill ring, all strings have the same nature. There is nodistinction between translation strings and translatorcomments strings. So, for example, let’s presume the translatorhas just finished editing a translation, and wants to create a newtranslator comment to document why the previous translation wasnot good, just to remember what was the problem. Foreseeing that shewill do that in her documentation, the translator may want to quotethe previous translation in her translator comments. To do so, shemay initialize the translator comments with the previous translation,still at the head of the kill ring. Because editing already pushed theprevious translation on the kill ring, she merely has to type M-wprior to #, and the previous translation will be right there,all ready for being introduced by some explanatory text.
On the other hand, presume there are some translator comments alreadyand that the translator wants to add to those comments, insteadof wholly replacing them. Then, she should edit the comment rightaway with #. Once inside the editing window, she can use theregular Emacs commands C-y (yank
) and M-y(yank-pop
) to get the previous translation where she likes.
Next: C Sources Context, Previous: Modifying Comments, Up: PO Mode [Contents][Index]
The PO subedit minor mode has a few peculiarities worth being describedin fuller detail. It installs a few commands over the usual editing setof Emacs, which are described below.
Complete edition (po-subedit-exit
).
Abort edition (po-subedit-abort
).
Consult auxiliary PO files (po-subedit-cycle-auxiliary
).
The window’s contents represents a translation for a given message,or a translator comment. The translator may modify this window toher heart’s content. Once this is done, the command C-c C-c(po-subedit-exit
) may be used to return the edited translation intothe PO file, replacing the original translation, even if it moved out ofsight or if buffers were switched.
If the translator becomes unsatisfied with her translation or comment,to the extent she prefers keeping what was existent prior to theRET or # command, she may use the command C-c C-k(po-subedit-abort
) to merely get rid of edition, while preservingthe original translation or comment. Another way would be for her to exitnormally with C-c C-c, then type U
once for undoing thewhole effect of last edition.
The command C-c C-a (po-subedit-cycle-auxiliary
)allows for glancing through translationsalready achieved in other languages, directly while editing the currenttranslation. This may be quite convenient when the translator is fluentat many languages, but of course, only makes sense when such completedauxiliary PO files are already available to her (see Auxiliary).
Functions found on po-subedit-mode-hook
, if any, are executed afterthe string has been inserted in the edit buffer.
While editing her translation, the translator should pay attention to notinserting unwanted RET (newline) characters at the end ofthe translated string if those are not meant to be there, or to removingsuch characters when they are required. Since these characters are notvisible in the editing buffer, they are easily introduced by mistake.To help her, RET automatically puts the character <
at the end of the string being edited, but this <
is not reallypart of the string. On exiting the editing window with C-c C-c,PO mode automatically removes such < and all whitespace added afterit. If the translator adds characters after the terminating <
, itlooses its delimiting property and integrally becomes part of the string.If she removes the delimiting <
, then the edited string is takenas is, with all trailing newlines, even if invisible. Also, ifthe translated string ought to end itself with a genuine <
, thenthe delimiting <
may not be removed; so the string should appear,in the editing window, as ending with two <
in a row.
When a translation (or a comment) is being edited, the translator may movethe cursor back into the PO file buffer and freely move to other entries,browsing at will. If, with an edition pending, the translator wanders in thePO file buffer, she may decide to start modifying another entry. Each entrybeing edited has its own subedit buffer. It is possible to simultaneouslyedit the translation and the comment of a single entry, or toedit entries in different PO files, all at once. Typing RETon a field already being edited merely resumes that particular edit. Yet,the translator should better be comfortable at handling many Emacs windows!
Pending subedits may be completed or aborted in any order, regardlessof how or when they were started. When many subedits are pending and thetranslator asks for quitting the PO file (with the q command), subeditsare automatically resumed one at a time, so she may decide for each of them.
Next: Auxiliary, Previous: Subedit, Up: PO Mode [Contents][Index]
PO mode is particularly powerful when used with PO filescreated through GNU gettext
utilities, as those utilitiesinsert special comments in the PO files they generate.Some of these special comments relate the PO file entry toexactly where the untranslated string appears in the program sources.
When the translator gets to an untranslated entry, she is fairlyoften faced with an original string which is not as informative asit normally should be, being succinct, cryptic, or otherwise ambiguous.Before choosing how to translate the string, she needs to understandbetter what the string really means and how tight the translation hasto be. Most of the time, when problems arise, the only way left to makeher judgment is looking at the true program sources from where thisstring originated, searching for surrounding comments the programmermight have put in there, and looking around for helping clues ofany kind.
Surely, when looking at program sources, the translator will receivemore help if she is a fluent programmer. However, even if she isnot versed in programming and feels a little lost in C code, thetranslator should not be shy at taking a look, once in a while.It is most probable that she will still be able to find some of thehints she needs. She will learn quickly to not feel uncomfortablein program code, paying more attention to programmer’s comments,variable and function names (if he dared choosing them well), andoverall organization, than to the program code itself.
The following commands are meant to help the translator at gettingprogram source context for a PO file entry.
Resume the display of a program source context, or cycle through them(po-cycle-source-reference
).
Display of a program source context selected by menu(po-select-source-reference
).
Add a directory to the search path for source files(po-consider-source-path
).
Delete a directory from the search path for source files(po-ignore-source-path
).
The commands s (po-cycle-source-reference
) and M-s(po-select-source-reference
) both open another window displayingsome source program file, and already positioned in such a way thatit shows an actual use of the string to be translated. By doingso, the command gives source program context for the string. But ifthe entry has no source context references, or if all referencesare unresolved along the search path for program sources, then thecommand diagnoses this as an error.
Even if s (or M-s) opens a new window, the cursor staysin the PO file window. If the translator really wants toget into the program source window, she ought to do it explicitly,maybe by using command O.
When s is typed for the first time, or for a PO file entry whichis different of the last one used for getting source context, then thecommand reacts by giving the first context available for this entry,if any. If some context has already been recently displayed for thecurrent PO file entry, and the translator wandered off to do otherthings, typing s again will merely resume, in another window,the context last displayed. In particular, if the translator movedthe cursor away from the context in the source file, the command willbring the cursor back to the context. By using s many timesin a row, with no other commands intervening, PO mode will cycle tothe next available contexts for this particular entry, getting backto the first context once the last has been shown.
The command M-s behaves differently. Instead of cycling throughreferences, it lets the translator choose a particular reference amongmany, and displays that reference. It is best used with completion,if the translator types TAB immediately after M-s, inresponse to the question, she will be offered a menu of all possiblereferences, as a reminder of which are the acceptable answers.This command is useful only where there are really many contextsavailable for a single string to translate.
Program source files are usually found relative to where the POfile stands. As a special provision, when this fails, the file isalso looked for, but relative to the directory immediately above it.Those two cases take proper care of most PO files. However, it mighthappen that a PO file has been moved, or is edited in a differentplace than its normal location. When this happens, the translatorshould tell PO mode in which directory normally sits the genuine POfile. Many such directories may be specified, and all together, theyconstitute what is called the search path for program sources.The command S (po-consider-source-path
) is used to interactivelyenter a new directory at the front of the search path, and the commandM-S (po-ignore-source-path
) is used to select, with completion,one of the directories she does not want anymore on the search path.
Previous: C Sources Context, Up: PO Mode [Contents][Index]
PO mode is able to help the knowledgeable translator, being fluent inmany languages, at taking advantage of translations already achievedin other languages she just happens to know. It provides these otherlanguage translations as additional context for her own work. Moreover,it has features to ease the production of translations for many languagesat once, for translators preferring to work in this way.
An auxiliary PO file is an existing PO file meant for the samepackage the translator is working on, but targeted to a different mothertongue language. Commands exist for declaring and handling auxiliaryPO files, and also for showing contexts for the entry under work.
Here are the auxiliary file commands available in PO mode.
Seek auxiliary files for another translation for the same entry(po-cycle-auxiliary
).
Switch to a particular auxiliary file (po-select-auxiliary
).
Declare this PO file as an auxiliary file (po-consider-as-auxiliary
).
Remove this PO file from the list of auxiliary files(po-ignore-as-auxiliary
).
Command A (po-consider-as-auxiliary
) adds the currentPO file to the list of auxiliary files, while command M-A(po-ignore-as-auxiliary
just removes it.
The command a (po-cycle-auxiliary
) seeks all auxiliary POfiles, round-robin, searching for a translated entry in some other languagehaving an msgid
field identical as the one for the current entry.The found PO file, if any, takes the place of the current PO file inthe display (its window gets on top). Before doing so, the current POfile is also made into an auxiliary file, if not already. So, ain this newly displayed PO file will seek another PO file, and so on,so repeating a will eventually yield back the original PO file.
The command C-c C-a (po-select-auxiliary
) asks the translatorfor her choice of a particular auxiliary file, with completion, andthen switches to that selected PO file. The command also checks ifthe selected file has an msgid
field identical as the one forthe current entry, and if yes, this entry becomes current. Otherwise,the cursor of the selected file is left undisturbed.
For all this to work fully, auxiliary PO files will have to be normalized,in that way that msgid
fields should be written exactlythe same way. It is possible to write msgid
fields in variousways for representing the same string, different writing would break theproper behaviour of the auxiliary file commands of PO mode. This is notexpected to be much a problem in practice, as most existing PO files havetheir msgid
entries written by the same GNU gettext
tools.
However, PO files initially created by PO mode itself, while markingstrings in source files, are normalised differently. So are POfiles resulting of the ‘M-x normalize’ command. Until thesediscrepancies between PO mode and other GNU gettext
tools getfully resolved, the translator should stay aware of normalisation issues.
Previous: PO Mode, Up: Editing [Contents][Index]
A compendium is a special PO file containing a set oftranslations recurring in many different packages. The translator canuse gettext tools to build a new compendium, to add entries to hercompendium, and to initialize untranslated entries, or to updatealready translated entries, from translations kept in the compendium.
• Creating Compendia: | Merging translations for later use | |
• Using Compendia: | Using older translations if they fit |
Next: Using Compendia, Previous: Compendium, Up: Compendium [Contents][Index]
Basically every PO file consisting of translated entries only can bedeclared as a valid compendium. Often the translator wants to havespecial compendia; let’s consider two cases: concatenating POfiles and extracting a message subset from a PO file.
To concatenate several valid PO files into one compendium file you canuse ‘msgcomm’ or ‘msgcat’ (the latter preferred):
msgcat -o compendium.po file1.po file2.po
By default, msgcat
will accumulate divergent translationsfor the same string. Those occurrences will be marked as fuzzy
and highly visible decorated; calling msgcat
onfile1.po:
#: src/hello.c:200 #, c-format msgid "Report bugs to <%s>.\n" msgstr "Comunicar `bugs' a <%s>.\n"
and file2.po:
#: src/bye.c:100 #, c-format msgid "Report bugs to <%s>.\n" msgstr "Comunicar \"bugs\" a <%s>.\n"
will result in:
#: src/hello.c:200 src/bye.c:100 #, fuzzy, c-format msgid "Report bugs to <%s>.\n" msgstr "" "#-#-#-#-# file1.po #-#-#-#-#\n" "Comunicar `bugs' a <%s>.\n" "#-#-#-#-# file2.po #-#-#-#-#\n" "Comunicar \"bugs\" a <%s>.\n"
The translator will have to resolve this “conflict” manually; shehas to decide whether the first or the second version is appropriate(or provide a new translation), to delete the “marker lines”, andfinally to remove the fuzzy
mark.
If the translator knows in advance the first found translation of amessage is always the best translation she can make use to the‘--use-first’ switch:
msgcat --use-first -o compendium.po file1.po file2.po
A good compendium file must not contain fuzzy
or untranslatedentries. If input files are “dirty” you must preprocess the inputfiles or postprocess the result using ‘msgattrib --translated --no-fuzzy’.
Nobody wants to translate the same messages again and again; thus youmay wish to have a compendium file containing getopt.c messages.
To extract a message subset (e.g., all getopt.c messages) from anexisting PO file into one compendium file you can use ‘msggrep’:
msggrep --location src/getopt.c -o compendium.po file.po
Previous: Creating Compendia, Up: Compendium [Contents][Index]
You can use a compendium file to initialize a translation from scratchor to update an already existing translation.
Since a PO file with translations does not exist the translator canmerely use /dev/null to fake the “old” translation file.
msgmerge --compendium compendium.po -o file.po /dev/null file.pot
Concatenate the compendium file(s) and the existing PO, merge theresult with the POT file and remove the obsolete entries (optional,here done using ‘msgattrib’):
msgcat --use-first -o update.po compendium1.po compendium2.po file.po msgmerge update.po file.pot | msgattrib --no-obsolete > file.po
Next: Binaries, Previous: Editing, Up: Top [Contents][Index]
Sometimes it is necessary to manipulate PO files in a way that is betterperformed automatically than by hand. GNU gettext
includes acomplete set of tools for this purpose.
When merging two packages into a single package, the resulting POT filewill be the concatenation of the two packages’ POT files. Thus themaintainer must concatenate the two existing package translations intoa single translation catalog, for each language. This is best performedusing ‘msgcat’. It is then the translators’ duty to deal with anypossible conflicts that arose during the merge.
When a translator takes over the translation job from another translator,but she uses a different character encoding in her locale, she willconvert the catalog to her character encoding. This is best done throughthe ‘msgconv’ program.
When a maintainer takes a source file with tagged messages from anotherpackage, he should also take the existing translations for this sourcefile (and not let the translators do the same job twice). One way to dothis is through ‘msggrep’, another is to create a POT file forthat source file and use ‘msgmerge’.
When a translator wants to adjust some translation catalog for a specialdialect or orthography — for example, German as written in Switzerlandversus German as written in Germany — she needs to apply some textprocessing to every message in the catalog. The tool for doing this is‘msgfilter’.
Another use of msgfilter
is to produce approximately the POT file forwhich a given PO file was made. This can be done through a filter commandlike ‘msgfilter sed -e d | sed -e '/^# /d'’. Note that the originalPOT file may have had different comments and different plural message counts,that’s why it’s better to use the original POT file if available.
When a translator wants to check her translations, for example accordingto orthography rules or using a non-interactive spell checker, she can doso using the ‘msgexec’ program.
When third party tools create PO or POT files, sometimes duplicates cannotbe avoided. But the GNU gettext
tools give an error when theyencounter duplicate msgids in the same file and in the same domain.To merge duplicates, the ‘msguniq’ program can be used.
‘msgcomm’ is a more general tool for keeping or throwing awayduplicates, occurring in different files.
‘msgcmp’ can be used to check whether a translation catalog iscompletely translated.
‘msgattrib’ can be used to select and extract only the fuzzyor untranslated messages of a translation catalog.
‘msgen’ is useful as a first step for preparing English translationcatalogs. It copies each message’s msgid to its msgstr.
Finally, for those applications where all these various programs are notsufficient, a library ‘libgettextpo’ is provided that can be used towrite other specialized programs that process PO files.
• msgcat Invocation: | Invoking the msgcat Program |
|
• msgconv Invocation: | Invoking the msgconv Program |
|
• msggrep Invocation: | Invoking the msggrep Program |
|
• msgfilter Invocation: | Invoking the msgfilter Program |
|
• msguniq Invocation: | Invoking the msguniq Program |
|
• msgcomm Invocation: | Invoking the msgcomm Program |
|
• msgcmp Invocation: | Invoking the msgcmp Program |
|
• msgattrib Invocation: | Invoking the msgattrib Program |
|
• msgen Invocation: | Invoking the msgen Program |
|
• msgexec Invocation: | Invoking the msgexec Program |
|
• Colorizing: | Highlighting parts of PO files | |
• libgettextpo: | Writing your own programs that process PO files |
Next: msgconv Invocation, Previous: Manipulating, Up: Manipulating [Contents][Index]
msgcat
Programmsgcat [option] [inputfile]...
The msgcat
program concatenates and merges the specified PO files.It finds messages which are common to two or more of the specified PO files.By using the --more-than
option, greater commonality may be requestedbefore messages are printed. Conversely, the --less-than
option may beused to specify less commonality before messages are printed (i.e.‘--less-than=2’ will only print the unique messages). Translations,comments, extracted comments, and file positions will be cumulated, except thatif --use-first
is specified, they will be taken from the first PO fileto define them.
Input files.
Read the names of the input files from file instead of gettingthem from the command line.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If inputfile is ‘-’, standard input is read.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
Print messages with less than number definitions, defaults to infiniteif not set.
Print messages with more than number definitions, defaults to 0 if notset.
Shorthand for ‘--less-than=2’. Requests that only unique messages beprinted.
Assume the input files are Java ResourceBundles in Java .properties
syntax, not in PO file syntax.
Assume the input files are NeXTstep/GNUstep localized resource files in.strings
syntax, not in PO file syntax.
Specify encoding for output.
Use first available translation for each message. Don’t merge severaltranslations into one.
Specify the ‘Language’ field to be used in the header entry. SeeHeader Entry for the meaning of this field. Note: The‘Language-Team’ and ‘Plural-Forms’ fields are left unchanged.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Display this help and exit.
Output version information and exit.
Next: msggrep Invocation, Previous: msgcat Invocation, Up: Manipulating [Contents][Index]
msgconv
Programmsgconv [option] [inputfile]
The msgconv
program converts a translation catalog to a differentcharacter encoding.
Input PO file.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If no inputfile is given or if it is ‘-’, standard input is read.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
Specify encoding for output.
The default encoding is the current locale’s encoding.
Assume the input file is a Java ResourceBundle in Java .properties
syntax, not in PO file syntax.
Assume the input file is a NeXTstep/GNUstep localized resource file in.strings
syntax, not in PO file syntax.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Display this help and exit.
Output version information and exit.
Next: msgfilter Invocation, Previous: msgconv Invocation, Up: Manipulating [Contents][Index]
msggrep
Programmsggrep [option] [inputfile]
The msggrep
program extracts all messages of a translation catalogthat match a given pattern or belong to some given source files.
Input PO file.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If no inputfile is given or if it is ‘-’, standard input is read.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
[-N sourcefile]... [-M domainname]... [-J msgctxt-pattern] [-K msgid-pattern] [-T msgstr-pattern] [-C comment-pattern]
A message is selected if
When more than one selection criterion is specified, the set of selectedmessages is the union of the selected messages of each criterion.
msgctxt-pattern or msgid-pattern or msgstr-pattern syntax:
[-E | -F] [-e pattern | -f file]...
patterns are basic regular expressions by default, or extended regularexpressions if -E is given, or fixed strings if -F is given.
Select messages extracted from sourcefile. sourcefile can beeither a literal file name or a wildcard pattern.
Select messages belonging to domain domainname.
Start of patterns for the msgctxt.
Start of patterns for the msgid.
Start of patterns for the msgstr.
Start of patterns for the translator’s comment.
Start of patterns for the extracted comments.
Specify that pattern is an extended regular expression.
Specify that pattern is a set of newline-separated strings.
Use pattern as a regular expression.
Obtain pattern from file.
Ignore case distinctions.
Output only the messages that do not match any selection criterion, insteadof the messages that match a selection criterion.
Assume the input file is a Java ResourceBundle in Java .properties
syntax, not in PO file syntax.
Assume the input file is a NeXTstep/GNUstep localized resource file in.strings
syntax, not in PO file syntax.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Display this help and exit.
Output version information and exit.
To extract the messages that come from the source filesgnulib-lib/error.c
and gnulib-lib/getopt.c
:
msggrep -N gnulib-lib/error.c -N gnulib-lib/getopt.c input.po
To extract the messages that contain the string “Please specify” in theoriginal string:
msggrep --msgid -F -e 'Please specify' input.po
To extract the messages that have a context specifier of either “Menu>File”or “Menu>Edit” or a submenu of them:
msggrep --msgctxt -E -e '^Menu>(File|Edit)' input.po
To extract the messages whose translation contains one of the strings in thefile wordlist.txt
:
msggrep --msgstr -F -f wordlist.txt input.po
Next: msguniq Invocation, Previous: msggrep Invocation, Up: Manipulating [Contents][Index]
msgfilter
Programmsgfilter [option] filter [filter-option]
The msgfilter
program applies a filter to all translations of atranslation catalog.
During each filter invocation, the environment variableMSGFILTER_MSGID
is bound to the message’s msgid, and the environmentvariable MSGFILTER_LOCATION
is bound to the location in the PO fileof the message. If the message has a context, the environment variableMSGFILTER_MSGCTXT
is bound to the message’s msgctxt, otherwise it isunbound. If the message has a plural form, environment variableMSGFILTER_MSGID_PLURAL
is bound to the message’s msgid_plural andMSGFILTER_PLURAL_FORM
is bound to the order number of the pluralactually processed (starting with 0), otherwise both are unbound.If the message has a previous msgid (added by msgmerge
),environment variable MSGFILTER_PREV_MSGCTXT
is bound to themessage’s previous msgctxt, MSGFILTER_PREV_MSGID
is bound tothe previous msgid, and MSGFILTER_PREV_MSGID_PLURAL
is bound tothe previous msgid_plural.
Input PO file.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If no inputfile is given or if it is ‘-’, standard input is read.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
The filter can be any program that reads a translation from standardinput and writes a modified translation to standard output. A frequentlyused filter is ‘sed’. A few particular built-in filters are alsorecognized.
Add newline at the end of each input line and also strip the endingnewline from the output line.
Note: If the filter is not a built-in filter, you have to care about encodings:It is your responsibility to ensure that the filter can copewith input encoded in the translation catalog’s encoding. If thefilter wants input in a particular encoding, you can in a first stepconvert the translation catalog to that encoding using the ‘msgconv’program, before invoking ‘msgfilter’. If the filter wants inputin the locale’s encoding, but you want to avoid the locale’s encoding, thenyou can first convert the translation catalog to UTF-8 using the‘msgconv’ program and then make ‘msgfilter’ work in an UTF-8locale, by using the LC_ALL
environment variable.
Note: Most translations in a translation catalog don’t end with anewline character. For this reason, unless the --newline
option is used, it is important that the filter recognizes itslast input line even if it ends without a newline, and that it doesn’tadd an undesired trailing newline at the end. The ‘sed’ program onsome platforms is known to ignore the last line of input if it is notterminated with a newline. You can use GNU sed
instead; it doesnot have this limitation.
Add script to the commands to be executed.
Add the contents of scriptfile to the commands to be executed.
Suppress automatic printing of pattern space.
The filter ‘recode-sr-latin’ is recognized as a built-in filter.The command ‘recode-sr-latin’ converts Serbian text, written in theCyrillic script, to the Latin script.The command ‘msgfilter recode-sr-latin’ applies this conversion to thetranslations of a PO file. Thus, it can be used to convert an sr.pofile to an [email protected] file.
The filter ‘quot’ is recognized as a built-in filter.The command ‘msgfilter quot’ converts any quotations surroundedby a pair of ‘"’, ‘'’, and ‘`’.
The filter ‘boldquot’ is recognized as a built-in filter.The command ‘msgfilter boldquot’ converts any quotationssurrounded by a pair of ‘"’, ‘'’, and ‘`’, also adding theVT100 escape sequences to the text to decorate it as bold.
The use of built-in filters is not sensitive to the current locale’s encoding.Moreover, when used with a built-in filter, ‘msgfilter’ can automaticallyconvert the message catalog to the UTF-8 encoding when needed.
Assume the input file is a Java ResourceBundle in Java .properties
syntax, not in PO file syntax.
Assume the input file is a NeXTstep/GNUstep localized resource file in.strings
syntax, not in PO file syntax.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Keep the header entry, i.e. the message with ‘msgid ""’, unmodified,instead of filtering it. By default, the header entry is subject tofiltering like any other message.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Display this help and exit.
Output version information and exit.
To convert German translations to Swiss orthography (in an UTF-8 locale):
msgconv -t UTF-8 de.po | msgfilter sed -e 's/ß/ss/g'
To convert Serbian translations in Cyrillic script to Latin script:
msgfilter recode-sr-latin < sr.po
Next: msgcomm Invocation, Previous: msgfilter Invocation, Up: Manipulating [Contents][Index]
msguniq
Programmsguniq [option] [inputfile]
The msguniq
program unifies duplicate translations in a translationcatalog. It finds duplicate translations of the same message ID. Suchduplicates are invalid input for other programs like msgfmt
,msgmerge
or msgcat
. By default, duplicates are mergedtogether. When using the ‘--repeated’ option, only duplicates areoutput, and all other messages are discarded. Comments and extractedcomments will be cumulated, except that if ‘--use-first’ isspecified, they will be taken from the first translation. File positionswill be cumulated. When using the ‘--unique’ option, duplicates arediscarded.
Input PO file.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If no inputfile is given or if it is ‘-’, standard input is read.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
Print only duplicates.
Print only unique messages, discard duplicates.
Assume the input file is a Java ResourceBundle in Java .properties
syntax, not in PO file syntax.
Assume the input file is a NeXTstep/GNUstep localized resource file in.strings
syntax, not in PO file syntax.
Specify encoding for output.
Use first available translation for each message. Don’t merge severaltranslations into one.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Display this help and exit.
Output version information and exit.
Next: msgcmp Invocation, Previous: msguniq Invocation, Up: Manipulating [Contents][Index]
msgcomm
Programmsgcomm [option] [inputfile]...
The msgcomm
program finds messages which are common to two or moreof the specified PO files.By using the --more-than
option, greater commonality may be requestedbefore messages are printed. Conversely, the --less-than
option may beused to specify less commonality before messages are printed (i.e.‘--less-than=2’ will only print the unique messages). Translations,comments and extracted comments will be preserved, but only from the firstPO file to define them. File positions from all PO files will becumulated.
Input files.
Read the names of the input files from file instead of gettingthem from the command line.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If inputfile is ‘-’, standard input is read.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
Print messages with less than number definitions, defaults to infiniteif not set.
Print messages with more than number definitions, defaults to 1 if notset.
Shorthand for ‘--less-than=2’. Requests that only unique messages beprinted.
Assume the input files are Java ResourceBundles in Java .properties
syntax, not in PO file syntax.
Assume the input files are NeXTstep/GNUstep localized resource files in.strings
syntax, not in PO file syntax.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Don’t write header with ‘msgid ""’ entry.
Display this help and exit.
Output version information and exit.
Next: msgattrib Invocation, Previous: msgcomm Invocation, Up: Manipulating [Contents][Index]
msgcmp
Programmsgcmp [option] def.po ref.pot
The msgcmp
program compares two Uniforum style .po files to check thatboth contain the same set of msgid strings. The def.po file is anexisting PO file with the translations. The ref.pot file is the lastcreated PO file, or a PO Template file (generally created by xgettext
).This is useful for checking that you have translated each and every messagein your program. Where an exact match cannot be found, fuzzy matching isused to produce better diagnostics.
Translations.
References to the sources.
Add directory to the list of directories. Source files aresearched relative to this list of directories.
Apply ref.pot to each of the domains in def.po.
Do not use fuzzy matching when an exact match is not found. This may speedup the operation considerably.
Consider fuzzy messages in the def.po file like translated messages.Note that using this option is usually wrong, because fuzzy messages areexactly those which have not been validated by a human translator.
Consider untranslated messages in the def.po file like translatedmessages. Note that using this option is usually wrong.
Assume the input files are Java ResourceBundles in Java .properties
syntax, not in PO file syntax.
Assume the input files are NeXTstep/GNUstep localized resource files in.strings
syntax, not in PO file syntax.
Display this help and exit.
Output version information and exit.
Next: msgen Invocation, Previous: msgcmp Invocation, Up: Manipulating [Contents][Index]
msgattrib
Programmsgattrib [option] [inputfile]
The msgattrib
program filters the messages of a translation catalogaccording to their attributes, and manipulates the attributes.
Input PO file.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If no inputfile is given or if it is ‘-’, standard input is read.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
Keep translated messages, remove untranslated messages.
Keep untranslated messages, remove translated messages.
Remove‘fuzzy’marked messages.
Keep‘fuzzy’marked messages, remove all other messages.
Remove obsolete #~ messages.
Keep obsolete #~ messages, remove all other messages.
Attributes are modified after the message selection/removal has beenperformed. If the ‘--only-file’ or ‘--ignore-file’ option isspecified, the attribute modification is applied only to those messagesthat are listed in the only-file and not listed in theignore-file.
Set all messages‘fuzzy’.
Set all messagesnon-‘fuzzy’.
Set all messages obsolete.
Set all messages non-obsolete.
When setting‘fuzzy’mark, keep “previous msgid” of translated messages.
Remove the “previous msgid” (‘#|’) comments from all messages.
When removing‘fuzzy’mark, also set msgstr empty.
Limit the attribute changes to entries that are listed in file.file should be a PO or POT file.
Limit the attribute changes to entries that are not listed in file.file should be a PO or POT file.
Synonym for ‘--only-fuzzy --clear-fuzzy’: It keeps only the fuzzymessages and removes their‘fuzzy’mark.
Synonym for ‘--only-obsolete --clear-obsolete’: It keeps only theobsolete messages and makes them non-obsolete.
Assume the input file is a Java ResourceBundle in Java .properties
syntax, not in PO file syntax.
Assume the input file is a NeXTstep/GNUstep localized resource file in.strings
syntax, not in PO file syntax.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Display this help and exit.
Output version information and exit.
Next: msgexec Invocation, Previous: msgattrib Invocation, Up: Manipulating [Contents][Index]
msgen
Programmsgen [option] inputfile
The msgen
program creates an English translation catalog. Theinput file is the last created English PO file, or a PO Template file(generally created by xgettext). Untranslated entries are assigned atranslation that is identical to the msgid.
Note: ‘msginit --no-translator --locale=en’ performs a very similartask. The main difference is that msginit
cares specially aboutthe header entry, whereas msgen
doesn’t.
Input PO or POT file.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If inputfile is ‘-’, standard input is read.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
Assume the input file is a Java ResourceBundle in Java .properties
syntax, not in PO file syntax.
Assume the input file is a NeXTstep/GNUstep localized resource file in.strings
syntax, not in PO file syntax.
Specify the ‘Language’ field to be used in the header entry. SeeHeader Entry for the meaning of this field. Note: The‘Language-Team’ and ‘Plural-Forms’ fields are not set by thisoption.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Do not write ‘#: filename:line’ lines.
Generate ‘#: filename:line’ lines (default).
The optional type can be either ‘full’, ‘file’, or‘never’. If it is not given or ‘full’, it generates thelines with both file name and line number. If it is ‘file’, theline number part is omitted. If it is ‘never’, it completelysuppresses the lines (same as --no-location
).
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Sort output by file location.
Display this help and exit.
Output version information and exit.
Next: Colorizing, Previous: msgen Invocation, Up: Manipulating [Contents][Index]
msgexec
Programmsgexec [option] command [command-option]
The msgexec
program applies a command to all translations of atranslation catalog.The command can be any program that reads a translation from standardinput. It is invoked once for each translation. Its output becomesmsgexec’s output. msgexec
’s return code is the maximum return codeacross all invocations.
A special builtin command called ‘0’ outputs the translation, followedby a null byte. The output of ‘msgexec 0’ is suitable as input for‘xargs -0’.
Add newline at the end of each input line.
During each command invocation, the environment variableMSGEXEC_MSGID
is bound to the message’s msgid, and the environmentvariable MSGEXEC_LOCATION
is bound to the location in the PO fileof the message. If the message has a context, the environment variableMSGEXEC_MSGCTXT
is bound to the message’s msgctxt, otherwise it isunbound. If the message has a plural form, environment variableMSGEXEC_MSGID_PLURAL
is bound to the message’s msgid_plural andMSGEXEC_PLURAL_FORM
is bound to the order number of the pluralactually processed (starting with 0), otherwise both are unbound.If the message has a previous msgid (added by msgmerge
),environment variable MSGEXEC_PREV_MSGCTXT
is bound to themessage’s previous msgctxt, MSGEXEC_PREV_MSGID
is bound tothe previous msgid, and MSGEXEC_PREV_MSGID_PLURAL
is bound tothe previous msgid_plural.
Note: It is your responsibility to ensure that the command can copewith input encoded in the translation catalog’s encoding. If thecommand wants input in a particular encoding, you can in a first stepconvert the translation catalog to that encoding using the ‘msgconv’program, before invoking ‘msgexec’. If the command wants inputin the locale’s encoding, but you want to avoid the locale’s encoding, thenyou can first convert the translation catalog to UTF-8 using the‘msgconv’ program and then make ‘msgexec’ work in an UTF-8locale, by using the LC_ALL
environment variable.
Input PO file.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting .pofile will be written relative to the current directory, though.
If no inputfile is given or if it is ‘-’, standard input is read.
Assume the input file is a Java ResourceBundle in Java .properties
syntax, not in PO file syntax.
Assume the input file is a NeXTstep/GNUstep localized resource file in.strings
syntax, not in PO file syntax.
Display this help and exit.
Output version information and exit.
Next: libgettextpo, Previous: msgexec Invocation, Up: Manipulating [Contents][Index]
Translators are usually only interested in seeing the untranslated andfuzzy messages of a PO file. Also, when a message is set fuzzy becausethe msgid changed, they want to see the differences between the previousmsgid and the current one (especially if the msgid is long and only fewwords in it have changed). Finally, it’s always welcome to highlight thedifferent sections of a message in a PO file (comments, msgid, msgstr, etc.).
Such highlighting is possible through the msgcat
options‘--color’ and ‘--style’.
• The --color option: | Triggering colorized output | |
• The TERM variable: | The environment variable TERM |
|
• The --style option: | The --style option |
|
• Style rules: | Style rules for PO files | |
• Customizing less: | Customizing less for viewing PO files |
Next: The TERM variable, Previous: Colorizing, Up: Colorizing [Contents][Index]
--color
optionThe ‘--color=when’ option specifies under which conditionscolorized output should be generated. The when part can be one ofthe following:
always
yes
The output will be colorized.
never
no
The output will not be colorized.
auto
tty
The output will be colorized if the output device is a tty, i.e. when theoutput goes directly to a text screen or terminal emulator window.
html
The output will be colorized and be in HTML format.
‘--color’ is equivalent to ‘--color=yes’. The default is‘--color=auto’.
Thus, a command like ‘msgcat vi.po’ will produce colorized outputwhen called by itself in a command window. Whereas in a pipe, such as‘msgcat vi.po | less -R’, it will not produce colorized output. Toget colorized output in this situation nevertheless, use the command‘msgcat --color vi.po | less -R’.
The ‘--color=html’ option will produce output that can be viewed ina browser. This can be useful, for example, for Indic languages,because the renderic of Indic scripts in browser is usually better thanin terminal emulators.
Note that the output produced with the --color
option is nota valid PO file in itself. It contains additional terminal-specific escapesequences or HTML tags. A PO file reader will give a syntax error whenconfronted with such content. Except for the ‘--color=html’ case,you therefore normally don’t need to save output produced with the--color
option in a file.
Next: The --style option, Previous: The --color option, Up: Colorizing [Contents][Index]
TERM
The environment variable TERM
contains a identifier for the textwindow’s capabilities. You can get a detailed list of these cababilitiesby using the ‘infocmp’ command, using ‘man 5 terminfo’ as areference.
When producing text with embedded color directives, msgcat
looksat the TERM
variable. Text windows today typically support at least8 colors. Often, however, the text window supports 16 or more colors,even though the TERM
variable is set to a identifier denoting only8 supported colors. It can be worth setting the TERM
variable toa different value in these cases:
xterm
xterm
is in most cases built with support for 16 colors. It can alsobe built with support for 88 or 256 colors (but not both). You can try toset TERM
to either xterm-16color
, xterm-88color
, orxterm-256color
.
rxvt
rxvt
is often built with support for 16 colors. You can try to setTERM
to rxvt-16color
.
konsole
konsole
too is often built with support for 16 colors. You can try toset TERM
to konsole-16color
or xterm-16color
.
After setting TERM
, you can verify it by invoking‘msgcat --color=test’ and seeing whether the output looks like areasonable color map.
Next: Style rules, Previous: The TERM variable, Up: Colorizing [Contents][Index]
--style
optionThe ‘--style=style_file’ option specifies the style file to usewhen colorizing. It has an effect only when the --color
option iseffective.
If the --style
option is not specified, the environment variablePO_STYLE
is considered. It is meant to point to the user’spreferred style for PO files.
The default style file is $prefix/share/gettext/styles/po-default.css,where $prefix
is the installation location.
A few style files are predefined:
This style imitates the look used by vim 7.
This style imitates the look used by GNU Emacs 21 and 22 in an X11 window.
This style imitates the look used by GNU Emacs 22 in a terminal of type‘xterm’ (8 colors) or ‘xterm-16color’ (16 colors) or‘xterm-256color’ (256 colors), respectively.
You can use these styles without specifying a directory. They are actuallylocated in $prefix/share/gettext/styles/, where $prefix
is theinstallation location.
You can also design your own styles. This is described in the next section.
Next: Customizing less, Previous: The --style option, Up: Colorizing [Contents][Index]
The same style file can be used for styling of a PO file, for terminaloutput and for HTML output. It is written in CSS (Cascading Style Sheet)syntax. See http://www.w3.org/TR/css2/cover.html for a formaldefinition of CSS. Many HTML authoring tutorials also contain explanationsof CSS.
In the case of HTML output, the style file is embedded in the HTML output.In the case of text output, the style file is interpreted by themsgcat
program. This means, in particular, that when@import
is used with relative file names, the file names are
@import
, in the case oftext output. (Actually, @import
s are not yet supported in this case,due to a limitation in libcroco
.)CSS rules are built up from selectors and declarations. The declarationsspecify graphical properties; the selectors specify specify when they apply.
In PO files, the following simple selectors (based on "CSS classes", seethe CSS2 spec, section 5.8.3) are supported.
.header
This matches the header entry of a PO file.
.translated
This matches a translated message.
.untranslated
This matches an untranslated message (i.e. a message with empty translation).
.fuzzy
This matches a fuzzy message (i.e. a message which has a translation thatneeds review by the translator).
.obsolete
This matches an obsolete message (i.e. a message that was translated but isnot needed by the current POT file any more).
white-space # translator-comments #. extracted-comments #: reference… #, flag… #| msgid previous-untranslated-string msgid untranslated-string msgstr translated-string
.comment
This matches all comments (translator comments, extracted comments,source file reference comments, flag comments, previous message comments,as well as the entire obsolete messages).
.translator-comment
This matches the translator comments.
.extracted-comment
This matches the extracted comments, i.e. the comments placed by theprogrammer at the attention of the translator.
.reference-comment
This matches the source file reference comments (entire lines).
.reference
This matches the individual source file references inside the source filereference comment lines.
.flag-comment
This matches the flag comment lines (entire lines).
.flag
This matches the individual flags inside flag comment lines.
.fuzzy-flag
This matches the ‘fuzzy’ flag inside flag comment lines.
.previous-comment
This matches the comments containing the previous untranslated string (entirelines).
.previous
This matches the previous untranslated string including the string delimiters,the associated keywords (msgid
etc.) and the spaces between them.
.msgid
This matches the untranslated string including the string delimiters,the associated keywords (msgid
etc.) and the spaces between them.
.msgstr
This matches the translated string including the string delimiters,the associated keywords (msgstr
etc.) and the spaces between them.
.keyword
This matches the keywords (msgid
, msgstr
, etc.).
.string
This matches strings, including the string delimiters (double quotes).
.text
This matches the entire contents of a string (excluding the string delimiters,i.e. the double quotes).
.escape-sequence
This matches an escape sequence (starting with a backslash).
.format-directive
This matches a format string directive (starting with a ‘%’ sign in thecase of most programming languages, with a ‘{’ in the case ofjava-format
and csharp-format
, with a ‘~’ in the case oflisp-format
and scheme-format
, or with ‘$’ in the case ofsh-format
).
.invalid-format-directive
This matches an invalid format string directive.
.added
In an untranslated string, this matches a part of the string that was notpresent in the previous untranslated string. (Not yet implemented in thisrelease.)
.changed
In an untranslated string or in a previous untranslated string, this matchesa part of the string that is changed or replaced. (Not yet implemented inthis release.)
.removed
In a previous untranslated string, this matches a part of the string thatis not present in the current untranslated string. (Not yet implemented inthis release.)
These selectors can be combined to hierarchical selectors. For example,
.msgstr .invalid-format-directive { color: red; }
will highlight the invalid format directives in the translated strings.
In text mode, pseudo-classes (CSS2 spec, section 5.11) and pseudo-elements(CSS2 spec, section 5.12) are not supported.
The declarations in HTML mode are not limited; any graphical attributesupported by the browsers can be used.
The declarations in text mode are limited to the following properties. Otherproperties will be silently ignored.
color
(CSS2 spec, section 14.1)
background-color
(CSS2 spec, section 14.2.1)
These properties is supported. Colors will be adjusted to match the terminal’scapabilities. Note that many terminals support only 8 colors.
font-weight
(CSS2 spec, section 15.2.3)
This property is supported, but most terminals can only render two differentweights: normal
and bold
. Values >= 600 are rendered asbold
.
font-style
(CSS2 spec, section 15.2.3)
This property is supported. The values italic
and oblique
arerendered the same way.
text-decoration
(CSS2 spec, section 16.3.1)
This property is supported, limited to the values none
andunderline
.
Previous: Style rules, Up: Colorizing [Contents][Index]
less
for viewing PO filesThe ‘less’ program is a popular text file browser for use in a textscreen or terminal emulator. It also supports text with embedded escapesequences for colors and text decorations.
You can use less
to view a PO file like this (assuming an UTF-8environment):
msgcat --to-code=UTF-8 --color xyz.po | less -R
You can simplify this to this simple command:
less xyz.po
after these three preparations:
LESS
environmentvariable. In sh shells:
$ LESS="$LESS -R -f" $ export LESS
LESSOPEN
andLESSCLOSE
environment variables, as indicated in the manual page(‘man less’).msgcat
on them, producinga temporary file. Like this:
case "$1" in *.po) tmpfile=`mktemp "${TMPDIR-/tmp}/less.XXXXXX"` msgcat --to-code=UTF-8 --color "$1" > "$tmpfile" echo "$tmpfile" exit 0 ;; esac
Previous: Colorizing, Up: Manipulating [Contents][Index]
For the tasks for which a combination of ‘msgattrib’, ‘msgcat’ etc.is not sufficient, a set of C functions is provided in a library, to make itpossible to process PO files in your own programs. When you use this library,you don’t need to write routines to parse the PO file; instead, you retrievea pointer in memory to each of messages contained in the PO file. Functionsfor writing PO files are not provided at this time.
The functions are declared in the header file ‘
This is a pointer type that refers to the contents of a PO file, after it hasbeen read into memory.
This is a pointer type that refers to an iterator that produces a sequence ofmessages.
This is a pointer type that refers to a message of a PO file, including itstranslation.
The po_file_read
function reads a PO file into memory. The file nameis given as argument. The return value is a handle to the PO file’s contents,valid until po_file_free
is called on it. In case of error, the returnvalue is NULL
, and errno
is set.
The po_file_free
function frees a PO file’s contents from memory,including all messages that are only implicitly accessible through iterators.
The po_file_domains
function returns the domains for which the givenPO file has messages. The return value is a NULL
terminated arraywhich is valid as long as the file handle is valid. For PO files whichcontain no ‘domain’ directive, the return value contains only one domain,namely the default domain "messages"
.
The po_message_iterator
returns an iterator that will produce themessages of file that belong to the given domain. If domainis NULL
, the default domain is used instead. To list the messages,use the function po_next_message
repeatedly.
The po_message_iterator_free
function frees an iterator previouslyallocated through the po_message_iterator
function.
The po_next_message
function returns the next message fromiterator and advances the iterator. It returns NULL
when theiterator has reached the end of its message list.
The following functions returns details of a po_message_t
. Recallthat the results are valid as long as the file handle is valid.
The po_message_msgid
function returns the msgid
(untranslatedEnglish string) of a message. This is guaranteed to be non-NULL
.
The po_message_msgid_plural
function returns the msgid_plural
(untranslated English plural string) of a message with plurals, or NULL
for a message without plural.
The po_message_msgstr
function returns the msgstr
(translation)of a message. For an untranslated message, the return value is an emptystring.
The po_message_msgstr_plural
function returns themsgstr[index]
of a message with plurals, or NULL
whenthe index is out of range or for a message without plural.
Here is an example code how these functions can be used.
const char *filename = …; po_file_t file = po_file_read (filename); if (file == NULL) error (EXIT_FAILURE, errno, "couldn't open the PO file %s", filename); { const char * const *domains = po_file_domains (file); const char * const *domainp; for (domainp = domains; *domainp; domainp++) { const char *domain = *domainp; po_message_iterator_t iterator = po_message_iterator (file, domain); for (;;) { po_message_t *message = po_next_message (iterator); if (message == NULL) break; { const char *msgid = po_message_msgid (message); const char *msgstr = po_message_msgstr (message); … } } po_message_iterator_free (iterator); } } po_file_free (file);
Next: Programmers, Previous: Manipulating, Up: Top [Contents][Index]
• msgfmt Invocation: | Invoking the msgfmt Program |
|
• msgunfmt Invocation: | Invoking the msgunfmt Program |
|
• MO Files: | The Format of GNU MO Files |
Next: msgunfmt Invocation, Previous: Binaries, Up: Binaries [Contents][Index]
msgfmt
Programmsgfmt [option] filename.po …
The msgfmt
programs generates a binary message catalog from a textualtranslation description.
Add directory to the list of directories. Source files aresearched relative to this list of directories. The resulting binaryfile will be written relative to the current directory, though.
If an input file is ‘-’, standard input is read.
Java mode: generate a Java ResourceBundle
class.
Like –java, and assume Java2 (JDK 1.2 or higher).
C# mode: generate a .NET .dll file containing a subclass ofGettextResourceSet
.
C# resources mode: generate a .NET .resources file.
Tcl mode: generate a tcl/msgcat .msg file.
Qt mode: generate a Qt .qm file.
Desktop Entry mode: generate a .desktop file.
XML mode: generate an XML file.
Write output to specified file.
Direct the program to work strictly following the Uniforum/Sunimplementation. Currently this only affects the naming of the outputfile. If this option is not given the name of the output file is thesame as the domain name. If the strict Uniforum mode is enabled thesuffix .mo is added to the file name if it is not alreadypresent.
We find this behaviour of Sun’s implementation rather silly and so bydefault this mode is not selected.
If the output file is ‘-’, output is written to standard output.
Specify the resource name.
Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.
Specify the base directory of classes directory hierarchy.
Produce a .java source file, instead of a compiled .class file.
The class name is determined by appending the locale name to the resource name,separated with an underscore. The ‘-d’ option is mandatory. The classis written under the specified directory.
Specify the resource name.
Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.
Specify the base directory for locale dependent .dll files.
The ‘-l’ and ‘-d’ options are mandatory. The .dll file iswritten in a subdirectory of the specified directory whose name depends on thelocale.
Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.
Specify the base directory of .msg message catalogs.
The ‘-l’ and ‘-d’ options are mandatory. The .msg file iswritten in the specified directory.
Specify a .desktop file used as a template.
Specify keywordspec as an additional keyword to be looked for.Without a keywordspec, the option means to not use default keywords.
Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.
Specify the directory where PO files are read. The directory mustcontain the ‘LINGUAS’ file.
To generate a ‘.desktop’ file for a single locale, you can use itas follows.
msgfmt --desktop --template=template --locale=locale \ -o file filename.po …
msgfmt provides a special "bulk" operation mode to process multiple.po files at a time.
msgfmt --desktop --template=template -d directory -o file
msgfmt first reads the ‘LINGUAS’ file under directory, andthen processes all ‘.po’ files listed there. You can also limitthe locales to a subset, through the ‘LINGUAS’ environmentvariable.
For either operation modes, the ‘-o’ and ‘--template’options are mandatory.
Specify an XML file used as a template.
Specifies the language of the input files.
Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.
Specify the base directory of .po message catalogs.
To generate an XML file for a single locale, you can use it as follows.
msgfmt --xml --template=template --locale=locale \ -o file filename.po …
msgfmt provides a special "bulk" operation mode to process multiple.po files at a time.
msgfmt --xml --template=template -d directory -o file
msgfmt first reads the ‘LINGUAS’ file under directory, andthen processes all ‘.po’ files listed there. You can also limitthe locales to a subset, through the ‘LINGUAS’ environmentvariable.
For either operation modes, the ‘-o’ and ‘--template’options are mandatory.
Assume the input files are Java ResourceBundles in Java .properties
syntax, not in PO file syntax.
Assume the input files are NeXTstep/GNUstep localized resource files in.strings
syntax, not in PO file syntax.
Perform all the checks implied by --check-format
, --check-header
,--check-domain
.
Check language dependent format strings.
If the string represents a format string used in aprintf
-like function both strings should have the same number of‘%’ format specifiers, with matching types. If the flagc-format
or possible-c-format
appears in the specialcomment #, for this entry a check is performed. For example, thecheck will diagnose using ‘%.*s’ against ‘%s’, or ‘%d’against ‘%s’, or ‘%d’ against ‘%x’. It can even handlepositional parameters.
Normally the xgettext
program automatically decides whether astring is a format string or not. This algorithm is not perfect,though. It might regard a string as a format string though it is notused in a printf
-like function and so msgfmt
might reporterrors where there are none.
To solve this problem the programmer can dictate the decision to thexgettext
program (see c-format). The translator should notconsider removing the flag from the #, line. This "fix" would bereversed again as soon as msgmerge
is called the next time.
Verify presence and contents of the header entry. See Header Entry,for a description of the various fields in the header entry.
Check for conflicts between domain directives and the --output-file
option
Check that GNU msgfmt behaves like X/Open msgfmt. This will give an errorwhen attempting to use the GNU extensions.
Check presence of keyboard accelerators for menu items. This is based onthe convention used in some GUIs that a keyboard accelerator in a menuitem string is designated by an immediately preceding ‘&’ character.Sometimes a keyboard accelerator is also called "keyboard mnemonic".This check verifies that if the untranslated string has exactly one‘&’ character, the translated string has exactly one ‘&’ as well.If this option is given with a char argument, this char shouldbe a non-alphanumeric character and is used as keyboard accelerator markinstead of ‘&’.
Use fuzzy entries in output. Note that using this option is usually wrong,because fuzzy messages are exactly those which have not been validated bya human translator.
Align strings to number bytes (default: 1).
Write out 32-bit numbers in the given byte order. The possible values arebig
and little
. The default is little
.
MO files of any endianness can be used on any platform. When a MO file hasan endianness other than the platform’s one, the 32-bit numbers from the MOfile are swapped at runtime. The performance impact is negligible.
This option can be useful to produce MO files that are optimized for oneplatform.
Don’t include a hash table in the binary file. Lookup will be more expensiveat run time (binary search instead of hash table lookup).
Display this help and exit.
Output version information and exit.
Print statistics about translations. When the option --verbose
is usedin combination with --statistics
, the input file name is printed infront of the statistics line.
Increase verbosity level.
Next: MO Files, Previous: msgfmt Invocation, Up: Binaries [Contents][Index]
msgunfmt
Programmsgunfmt [option] [file]...
The msgunfmt
program converts a binary message catalog to aUniforum style .po file.
Java mode: input is a Java ResourceBundle
class.
C# mode: input is a .NET .dll file containing a subclass ofGettextResourceSet
.
C# resources mode: input is a .NET .resources file.
Tcl mode: input is a tcl/msgcat .msg file.
Input .mo files.
If no input file is given or if it is ‘-’, standard input is read.
Specify the resource name.
Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.
The class name is determined by appending the locale name to the resource name,separated with an underscore. The class is located using the CLASSPATH
.
Specify the resource name.
Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.
Specify the base directory for locale dependent .dll files.
The ‘-l’ and ‘-d’ options are mandatory. The .msg file islocated in a subdirectory of the specified directory whose name depends on thelocale.
Specify the locale name, either a language specification of the form llor a combined language and country specification of the form ll_CC.
Specify the base directory of .msg message catalogs.
The ‘-l’ and ‘-d’ options are mandatory. The .msg file islocated in the specified directory.
Write output to specified file.
The results are written to standard output if no output file is specifiedor if it is ‘-’.
Specify whether or when to use colors and other text attributes.See The --color option for details.
Specify the CSS style rule file to use for --color
.See The --style option for details.
Always write an output file even if it contains no message.
Write the .po file using indented style.
Write out a strict Uniforum conforming PO file. Note that thisUniforum format should be avoided because it doesn’t support theGNU extensions.
Write out a Java ResourceBundle in Java .properties
syntax. Notethat this file format doesn’t support plural forms and silently dropsobsolete messages.
Write out a NeXTstep/GNUstep localized resource file in .strings
syntax.Note that this file format doesn’t support plural forms.
Set the output page width. Long strings in the output files will besplit across multiple lines in order to ensure that each line’s width(= number of screen columns) is less or equal to the given number.
Do not break long message lines. Message lines whose width exceeds theoutput page width will not be split into several lines. Only file referencelines which are wider than the output page width will be split.
Generate sorted output. Note that using this option makes it much harderfor the translator to understand each message’s context.
Display this help and exit.
Output version information and exit.
Increase verbosity level.
Previous: msgunfmt Invocation, Up: Binaries [Contents][Index]
The format of the generated MO files is best described by a picture,which appears below.
The first two words serve the identification of the file. The magicnumber will always signal GNU MO files. The number is stored in thebyte order used when the MO file was generated, so the magic numberreally is two numbers: 0x950412de
and 0xde120495
.
The second word describes the current revision of the file format,composed of a major and a minor revision number. The revision numbersensure that the readers of MO files can distinguish new formats fromold ones and handle their contents, as far as possible. For now themajor revision is 0 or 1, and the minor revision is also 0 or 1. Morerevisions might be added in the future. A program seeing an unexpectedmajor revision number should stop reading the MO file entirely; whereasan unexpected minor revision number means that the file can be read butwill not reveal its full contents, when parsed by a program thatsupports only smaller minor revision numbers.
The version is keptseparate from the magic number, instead of using different magicnumbers for different formats, mainly because /etc/magic isnot updated often.
Follow a number of pointers to later tables in the file, allowingfor the extension of the prefix part of MO files without having torecompile programs reading them. This might become useful for laterinserting a few flag bits, indication about the charset used, newtables, or other things.
Then, at offset O and offset T in the picture, two tablesof string descriptors can be found. In both tables, each stringdescriptor uses two 32 bits integers, one for the string length,another for the offset of the string in the MO file, counting in bytesfrom the start of the file. The first table contains descriptorsfor the original strings, and is sorted so the original stringsare in increasing lexicographical order. The second table containsdescriptors for the translated strings, and is parallel to the firsttable: to find the corresponding translation one has to access thearray slot in the second array with the same index.
Having the original strings sorted enables the use of simple binarysearch, for when the MO file does not contain an hashing table, orfor when it is not practical to use the hashing table provided inthe MO file. This also has another advantage, as the empty stringin a PO file GNU gettext
is usually translated intosome system information attached to that particular MO file, and theempty string necessarily becomes the first in both the original andtranslated tables, making the system information very easy to find.
The size S of the hash table can be zero. In this case, thehash table itself is not contained in the MO file. Some people mightprefer this because a precomputed hashing table takes disk space, anddoes not win that much speed. The hash table contains indicesto the sorted array of strings in the MO file. Conflict resolution isdone by double hashing. The precise hashing algorithm used is fairlydependent on GNU gettext
code, and is not documented here.
As for the strings themselves, they follow the hash file, and eachis terminated with a NUL, and this NUL is not counted inthe length which appears in the string descriptor. The msgfmt
program has an option selecting the alignment for MO file strings.With this option, each string is separately aligned so it starts atan offset which is a multiple of the alignment value. On some RISCmachines, a correct alignment will speed things up.
Contexts are stored by storing the concatenation of the context, aEOT byte, and the original string, instead of the original string.
Plural forms are stored by letting the plural of the original stringfollow the singular of the original string, separated through aNUL byte. The length which appears in the string descriptorincludes both. However, only the singular of the original stringtakes part in the hash table lookup. The plural variants of thetranslation are all stored consecutively, separated through aNUL byte. Here also, the length in the string descriptorincludes all of them.
Nothing prevents a MO file from having embedded NULs in strings.However, the program interface currently used already presumesthat strings are NUL terminated, so embedded NULs aresomewhat useless. But the MO file format is general enough so otherinterfaces would be later possible, if for example, we ever want toimplement wide characters right in MO files, where NUL bytes mayaccidentally appear. (No, we don’t want to have wide characters in MOfiles. They would make the file unnecessarily large, and the‘wchar_t’ type being platform dependent, MO files would beplatform dependent as well.)
This particular issue has been strongly debated in the GNUgettext
development forum, and it is expectable that MO fileformat will evolve or change over time. It is even possible that manyformats may later be supported concurrently. But surely, we have tostart somewhere, and the MO file format described here is a good start.Nothing is cast in concrete, and the format may later evolve fairlyeasily, so we should feel comfortable with the current approach.
byte +------------------------------------------+ 0 | magic number = 0x950412de | | | 4 | file format revision = 0 | | | 8 | number of strings | == N | | 12 | offset of table with original strings | == O | | 16 | offset of table with translation strings | == T | | 20 | size of hashing table | == S | | 24 | offset of hashing table | == H | | . . . (possibly more entries later) . . . | | O | length & offset 0th string ----------------. O + 8 | length & offset 1st string ------------------. ... ... | | O + ((N-1)*8)| length & offset (N-1)th string | | | | | | | T | length & offset 0th translation ---------------. T + 8 | length & offset 1st translation -----------------. ... ... | | | | T + ((N-1)*8)| length & offset (N-1)th translation | | | | | | | | | | | H | start hash table | | | | | ... ... | | | | H + S * 4 | end hash table | | | | | | | | | | | | NUL terminated 0th string <----------------' | | | | | | | | | NUL terminated 1st string <------------------' | | | | | | ... ... | | | | | | | NUL terminated 0th translation <---------------' | | | | | NUL terminated 1st translation <-----------------' | | ... ... | | +------------------------------------------+
Next: Translators, Previous: Binaries, Up: Top [Contents][Index]
One aim of the current message catalog implementation provided byGNU gettext
was to use the system’s message catalog handling, if theinstaller wishes to do so. So we perhaps should first take a look atthe solutions we know about. The people in the POSIX committee did notmanage to agree on one of the semi-official standards which we’lldescribe below. In fact they couldn’t agree on anything, so they decidedonly to include an example of an interface. The major Unix vendorsare split in the usage of the two most important specifications: X/Open’scatgets vs. Uniforum’s gettext interface. We’ll describe them both andlater explain our solution of this dilemma.
• catgets: | About catgets |
|
• gettext: | About gettext |
|
• Comparison: | Comparing the two interfaces | |
• Using libintl.a: | Using libintl.a in own programs | |
• gettext grok: | Being a gettext grok |
|
• Temp Programmers: | Temporary Notes for the Programmers Chapter |
Next: gettext, Previous: Programmers, Up: Programmers [Contents][Index]
catgets
The catgets
implementation is defined in the X/Open PortabilityGuide, Volume 3, XSI Supplementary Definitions, Chapter 5. But theprocess of creating this standard seemed to be too slow for some ofthe Unix vendors so they created their implementations on preliminaryversions of the standard. Of course this leads again to problems whilewriting platform independent programs: even the usage of catgets
does not guarantee a unique interface.
Another, personal comment on this that only a bunch of committee memberscould have made this interface. They never really tried to programusing this interface. It is a fast, memory-saving implementation, anuser can happily live with it. But programmers hate it (at least I andsome others do…)
But we must not forget one point: after all the trouble with transferringthe rights on Unix(tm) they at last came to X/Open, the very same whopublished this specification. This leads me to making the predictionthat this interface will be in future Unix standards (e.g. Spec1170) andtherefore part of all Unix implementation (implementations, which areallowed to wear this name).
• Interface to catgets: | The interface | |
• Problems with catgets: | Problems with the catgets interface?! |
Next: Problems with catgets, Previous: catgets, Up: catgets [Contents][Index]
The interface to the catgets
implementation consists of threefunctions which correspond to those used in file access: catopen
to open the catalog for using, catgets
for accessing the messagetables, and catclose
for closing after work is done. Prototypesfor the functions and the needed definitions are in the
header file.
catopen
is used like in this:
nl_catd catd = catopen ("catalog_name", 0);
The function takes as the argument the name of the catalog. This usualrefers to the name of the program or the package. The second parameteris not further specified in the standard. I don’t even know whether itis implemented consistently among various systems. So the common adviceis to use 0
as the value. The return value is a handle to themessage catalog, equivalent to handles to file returned by open
.
This handle is of course used in the catgets
function which canbe used like this:
char *translation = catgets (catd, set_no, msg_id, "original string");
The first parameter is this catalog descriptor. The second parameterspecifies the set of messages in this catalog, in which the messagedescribed by msg_id
is obtained. catgets
therefore uses athree-stage addressing:
catalog name ⇒ set number ⇒ message ID ⇒ translation
The fourth argument is not used to address the translation. It is givenas a default value in case when one of the addressing stages fail. Oneimportant thing to remember is that although the return type of catgetsis char *
the resulting string must not be changed. Itshould better be const char *
, but the standard is published in1988, one year before ANSI C.
The last of these functions is used and behaves as expected:
catclose (catd);
After this no catgets
call using the descriptor is legal anymore.
Previous: Interface to catgets, Up: catgets [Contents][Index]
catgets
Interface?!Now that this description seemed to be really easy — where are theproblems we speak of? In fact the interface could be used in areasonable way, but constructing the message catalogs is a pain. Thereason for this lies in the third argument of catgets
: the uniquemessage ID. This has to be a numeric value for all messages in a singleset. Perhaps you could imagine the problems keeping such a list whilechanging the source code. Add a new message here, remove one there. Ofcourse there have been developed a lot of tools helping to organize thischaos but one as the other fails in one aspect or the other. We don’twant to say that the other approach has no problems but they are farmore easy to manage.
Next: Comparison, Previous: catgets, Up: Programmers [Contents][Index]
gettext
The definition of the gettext
interface comes from a Uniforumproposal. It was submitted there by Sun, who had implemented thegettext
function in SunOS 4, around 1990. Nowadays, thegettext
interface is specified by the OpenI18N standard.
The main point about this solution is that it does not follow themethod of normal file handling (open-use-close) and that it does notburden the programmer with so many tasks, especially the unique key handling.Of course here also a unique key is needed, but this key is the messageitself (how long or short it is). See Comparison for a moredetailed comparison of the two methods.
The following section contains a rather detailed description of theinterface. We make it that detailed because this is the interfacewe chose for the GNU gettext
Library. Programmers interestedin using this library will be interested in this description.
• Interface to gettext: | The interface | |
• Ambiguities: | Solving ambiguities | |
• Locating Catalogs: | Locating message catalog files | |
• Charset conversion: | How to request conversion to Unicode | |
• Contexts: | Solving ambiguities in GUI programs | |
• Plural forms: | Additional functions for handling plurals | |
• Optimized gettext: | Optimization of the *gettext functions |
Next: Ambiguities, Previous: gettext, Up: gettext [Contents][Index]
The minimal functionality an interface must have is a) to select adomain the strings are coming from (a single domain for all programs isnot reasonable because its construction and maintenance is difficult,perhaps impossible) and b) to access a string in a selected domain.
This is principally the description of the gettext
interface. Ithas a global domain which unqualified usages reference. Of course thisdomain is selectable by the user.
char *textdomain (const char *domain_name);
This provides the possibility to change or query the current status ofthe current global domain of the LC_MESSAGE
category. Theargument is a null-terminated string, whose characters must be legal inthe use in filenames. If the domain_name argument is NULL
,the function returns the current value. If no value has been setbefore, the name of the default domain is returned: messages.Please note that although the return value of textdomain
is oftype char *
no changing is allowed. It is also important to knowthat no checks of the availability are made. If the name is notavailable you will see this by the fact that no translations are provided.
To use a domain set by textdomain
the function
char *gettext (const char *msgid);
is to be used. This is the simplest reasonable form one can imagine.The translation of the string msgid is returned if it is availablein the current domain. If it is not available, the argument itself isreturned. If the argument is NULL
the result is undefined.
One thing which should come into mind is that no explicit dependency tothe used domain is given. The current value of the domain is used.If this changes between twoexecutions of the same gettext
call in the program, both callsreference a different message catalog.
For the easiest case, which is normally used in internationalizedpackages, once at the beginning of execution a call to textdomain
is issued, setting the domain to a unique name, normally the packagename. In the following code all strings which have to be translated arefiltered through the gettext function. That’s all, the package speaksyour language.
Next: Locating Catalogs, Previous: Interface to gettext, Up: gettext [Contents][Index]
While this single name domain works well for most applications theremight be the need to get translations from more than one domain. Ofcourse one could switch between different domains with calls totextdomain
, but this is really not convenient nor is it fast. Apossible situation could be one case subject to discussion during thiswriting: allerror messages of functions in the set of common used functions shouldgo into a separate domain error
. By this mean we would only needto translate them once.Another case are messages from a library, as these have to beindependent of the current domain set by the application.
For this reasons there are two more functions to retrieve strings:
char *dgettext (const char *domain_name, const char *msgid); char *dcgettext (const char *domain_name, const char *msgid, int category);
Both take an additional argument at the first place, which correspondsto the argument of textdomain
. The third argument ofdcgettext
allows to use another locale category but LC_MESSAGES
.But I really don’t know where this can be useful. If thedomain_name is NULL
or category has an value besidethe known ones, the result is undefined. It should also be noted thatthis function is not part of the second known implementation of thisfunction family, the one found in Solaris.
A second ambiguity can arise by the fact, that perhaps more than onedomain has the same name. This can be solved by specifying where theneeded message catalog files can be found.
char *bindtextdomain (const char *domain_name, const char *dir_name);
Calling this function binds the given domain to a file in the specifieddirectory (how this file is determined follows below). Especially afile in the systems default place is not favored against the specifiedfile anymore (as it would be by solely using textdomain
). ANULL
pointer for the dir_name parameter returns the bindingassociated with domain_name. If domain_name itself isNULL
nothing happens and a NULL
pointer is returned. Hereagain as for all the other functions is true that none of the returnvalue must be changed!
It is important to remember that relative path names for thedir_name parameter can be trouble. Since the path is alwayscomputed relative to the current directory different results will beachieved when the program executes a chdir
command. Relativepaths should always be avoided to avoid dependencies andunreliabilities.
Next: Charset conversion, Previous: Ambiguities, Up: gettext [Contents][Index]
Because many different languages for many different packages have to bestored we need some way to add these information to file message catalogfiles. The way usually used in Unix environments is have this encodingin the file name. This is also done here. The directory name given inbindtextdomain
s second argument (or the default directory),followed by the name of the locale, the locale category, and the domain nameare concatenated:
dir_name/locale/LC_category/domain_name.mo
The default value for dir_name is system specific. For the GNUlibrary, and for packages adhering to its conventions, it’s:
/usr/local/share/locale
locale is the name of the locale category which is designated byLC_category
. For gettext
and dgettext
thisLC_category
is always LC_MESSAGES
.3The name of the locale category is determined throughsetlocale (LC_category, NULL)
.4When using the function dcgettext
, you can specify the locale categorythrough the third argument.
Next: Contexts, Previous: Locating Catalogs, Up: gettext [Contents][Index]
gettext
usesgettext
not only looks up a translation in a message catalog. Italso converts the translation on the fly to the desired output characterset. This is useful if the user is working in a different character setthan the translator who created the message catalog, because it avoidsdistributing variants of message catalogs which differ only in thecharacter set.
The output character set is, by default, the value of nl_langinfo(CODESET)
, which depends on the LC_CTYPE
part of the currentlocale. But programs which store strings in a locale independent way(e.g. UTF-8) can request that gettext
and related functionsreturn the translations in that encoding, by use of thebind_textdomain_codeset
function.
Note that the msgid argument to gettext
is not subject tocharacter set conversion. Also, when gettext
does not find atranslation for msgid, it returns msgid unchanged –independently of the current output character set. It is thereforerecommended that all msgids be US-ASCII strings.
The bind_textdomain_codeset
function can be used to specify theoutput character set for message catalogs for domain domainname.The codeset argument must be a valid codeset name which can be usedfor the iconv_open
function, or a null pointer.
If the codeset parameter is the null pointer,bind_textdomain_codeset
returns the currently selected codesetfor the domain with the name domainname. It returns NULL
ifno codeset has yet been selected.
The bind_textdomain_codeset
function can be used several times. If used multiple times with the same domainname argument, thelater call overrides the settings made by the earlier one.
The bind_textdomain_codeset
function returns a pointer to astring containing the name of the selected codeset. The string isallocated internally in the function and must not be changed by theuser. If the system went out of core during the execution ofbind_textdomain_codeset
, the return value is NULL
and theglobal variable errno is set accordingly.
Next: Plural forms, Previous: Charset conversion, Up: gettext [Contents][Index]
One place where the gettext
functions, if used normally, have bigproblems is within programs with graphical user interfaces (GUIs). Theproblem is that many of the strings which have to be translated are veryshort. They have to appear in pull-down menus which restricts thelength. But strings which are not containing entire sentences or atleast large fragments of a sentence may appear in more than onesituation in the program but might have different translations. This isespecially true for the one-word strings which are frequently used inGUI programs.
As a consequence many people say that the gettext
approach iswrong and instead catgets
should be used which indeed does nothave this problem. But there is a very simple and powerful method tohandle this kind of problems with the gettext
functions.
Contexts can be added to strings to be translated. A context dependenttranslation lookup is when a translation for a given string is searched,that is limited to a given context. The translation for the same stringin a different context can be different. The different translations ofthe same string in different contexts can be stored in the in the sameMO file, and can be edited by the translator in the same PO file.
The gettext.h include file contains the lookup macros for stringswith contexts. They are implemented as thin macros and inline functionsover the functions from
.
const char *pgettext (const char *msgctxt, const char *msgid);
In a call of this macro, msgctxt and msgid must be stringliterals. The macro returns the translation of msgid, restrictedto the context given by msgctxt.
The msgctxt string is visible in the PO file to the translator.You should try to make it somehow canonical and never changing. Becauseevery time you change an msgctxt, the translator will have to reviewthe translation of msgid.
Finding a canonical msgctxt string that doesn’t change over time canbe hard. But you shouldn’t use the file name or class name containing thepgettext
call – because it is a common development task to renamea file or a class, and it shouldn’t cause translator work. Also you shouldn’tuse a comment in the form of a complete English sentence as msgctxt –because orthography or grammar changes are often applied to such sentences,and again, it shouldn’t force the translator to do a review.
The ‘p’ in ‘pgettext’ stands for “particular”: pgettext
fetches a particular translation of the msgid.
const char *dpgettext (const char *domain_name, const char *msgctxt, const char *msgid); const char *dcpgettext (const char *domain_name, const char *msgctxt, const char *msgid, int category);
These are generalizations of pgettext
. They behave similarly todgettext
and dcgettext
, respectively. The domain_nameargument defines the translation domain. The category argumentallows to use another locale category than LC_MESSAGES
.
As as example consider the following fictional situation. A GUI programhas a menu bar with the following entries:
+------------+------------+--------------------------------------+ | File | Printer | | +------------+------------+--------------------------------------+ | Open | | Select | | New | | Open | +----------+ | Connect | +----------+
To have the strings File
, Printer
, Open
,New
, Select
, and Connect
translated there has to beat some point in the code a call to a function of the gettext
family. But in two places the string passed into the function would beOpen
. The translations might not be the same and therefore weare in the dilemma described above.
What distinguishes the two places is the menu path from the menu root tothe particular menu entries:
Menu|File Menu|Printer Menu|File|Open Menu|File|New Menu|Printer|Select Menu|Printer|Open Menu|Printer|Connect
The context is thus the menu path without its last part. So, the callslook like this:
pgettext ("Menu|", "File") pgettext ("Menu|", "Printer") pgettext ("Menu|File|", "Open") pgettext ("Menu|File|", "New") pgettext ("Menu|Printer|", "Select") pgettext ("Menu|Printer|", "Open") pgettext ("Menu|Printer|", "Connect")
Whether or not to use the ‘|’ character at the end of the context is amatter of style.
For more complex cases, where the msgctxt or msgid are notstring literals, more general macros are available:
const char *pgettext_expr (const char *msgctxt, const char *msgid); const char *dpgettext_expr (const char *domain_name, const char *msgctxt, const char *msgid); const char *dcpgettext_expr (const char *domain_name, const char *msgctxt, const char *msgid, int category);
Here msgctxt and msgid can be arbitrary string-valued expressions.These macros are more general. But in the case that both argument expressionsare string literals, the macros without the ‘_expr’ suffix are moreefficient.
Next: Optimized gettext, Previous: Contexts, Up: gettext [Contents][Index]
The functions of the gettext
family described so far (and all thecatgets
functions as well) have one problem in the real worldwhich have been neglected completely in all existing approaches. Whatis meant here is the handling of plural forms.
Looking through Unix source code before the time anybody thought aboutinternationalization (and, sadly, even afterwards) one can often findcode similar to the following:
printf ("%d file%s deleted", n, n == 1 ? "" : "s");
After the first complaints from people internationalizing the code peopleeither completely avoided formulations like this or used strings like"file(s)"
. Both look unnatural and should be avoided. Firsttries to solve the problem correctly looked like this:
if (n == 1) printf ("%d file deleted", n); else printf ("%d files deleted", n);
But this does not solve the problem. It helps languages where theplural form of a noun is not simply constructed by adding an‘s’but that is all. Once again people fell into the trap of believing therules their language is using are universal. But the handling of pluralforms differs widely between the language families. For example,Rafal Maszkowski
reports:
In Polish we use e.g. plik (file) this way:
1 plik 2,3,4 pliki 5-21 pliko'w 22-24 pliki 25-31 pliko'wand so on (o’ means 8859-2 oacute which should be rather okreska,similar to aogonek).
There are two things which can differ between languages (and even insidelanguage families);
But other language families have only one form or many forms. Moreinformation on this in an extra section.
The consequence of this is that application writers should not try tosolve the problem in their code. This would be localization since it isonly usable for certain, hardcoded language environments. Instead theextended gettext
interface should be used.
These extra functions are taking instead of the one key string twostrings and a numerical argument. The idea behind this is that usingthe numerical argument and the first string as a key, the implementationcan select using rules specified by the translator the right pluralform. The two string arguments then will be used to provide a returnvalue in case no message catalog is found (similar to the normalgettext
behavior). In this case the rules for Germanic languageis used and it is assumed that the first string argument is the singularform, the second the plural form.
This has the consequence that programs without language catalogs candisplay the correct strings only if the program itself is written usinga Germanic language. This is a limitation but since the GNU C library(as well as the GNU gettext
package) are written as part of theGNU package and the coding standards for the GNU project require programbeing written in English, this solution nevertheless fulfills itspurpose.
The ngettext
function is similar to the gettext
functionas it finds the message catalogs in the same way. But it takes twoextra arguments. The msgid1 parameter must contain the singularform of the string to be converted. It is also used as the key for thesearch in the catalog. The msgid2 parameter is the plural form.The parameter n is used to determine the plural form. If nomessage catalog is found msgid1 is returned if n == 1
,otherwise msgid2
.
An example for the use of this function is:
printf (ngettext ("%d file removed", "%d files removed", n), n);
Please note that the numeric value n has to be passed to theprintf
function as well. It is not sufficient to pass it only tongettext
.
In the English singular case, the number – always 1 – can be replaced with"one":
printf (ngettext ("One file removed", "%d files removed", n), n);
This works because the ‘printf’ function discards excess arguments thatare not consumed by the format string.
If this function is meant to yield a format string that takes two or morearguments, you can not use it like this:
printf (ngettext ("%d file removed from directory %s", "%d files removed from directory %s", n), n, dir);
because in many languages the translators want to replace the ‘%d’with an explicit word in the singular case, just like “one” in English,and C format strings cannot consume the second argument but skip the firstargument. Instead, you have to reorder the arguments so that ‘n’comes last:
printf (ngettext ("%2$d file removed from directory %1$s", "%2$d files removed from directory %1$s", n), dir, n);
See c-format for details about this argument reordering syntax.
When you know that the value of n
is within a given range, you canspecify it as a comment directed to the xgettext
tool. Thisinformation may help translators to use more adequate translations. Likethis:
if (days > 7 && days < 14) /* xgettext: range: 1..6 */ printf (ngettext ("one week and one day", "one week and %d days", days - 7), days - 7);
It is also possible to use this function when the strings don’t contain acardinal number:
puts (ngettext ("Delete the selected file?", "Delete the selected files?", n));
In this case the number n is only used to choose the plural form.
The dngettext
is similar to the dgettext
function in theway the message catalog is selected. The difference is that it takestwo extra parameter to provide the correct plural form. These twoparameters are handled in the same way ngettext
handles them.
The dcngettext
is similar to the dcgettext
function in theway the message catalog is selected. The difference is that it takestwo extra parameter to provide the correct plural form. These twoparameters are handled in the same way ngettext
handles them.
Now, how do these functions solve the problem of the plural forms?Without the input of linguists (which was not available) it was notpossible to determine whether there are only a few different forms inwhich plural forms are formed or whether the number can increase withevery new supported language.
Therefore the solution implemented is to allow the translator to specifythe rules of how to select the plural form. Since the formula varieswith every language this is the only viable solution except forhardcoding the information in the code (which still would require thepossibility of extensions to not prevent the use of new languages).
The information about the plural form selection has to be stored in theheader entry of the PO file (the one with the empty msgid
string).The plural form information looks like this:
Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
The nplurals
value must be a decimal number which specifies howmany different plural forms exist for this language. The stringfollowing plural
is an expression which is using the C languagesyntax. Exceptions are that no negative numbers are allowed, numbersmust be decimal, and the only variable allowed is n
. Spaces areallowed in the expression, but backslash-newlines are not; in theexamples below the backslash-newlines are present for formatting purposesonly. This expression will be evaluated whenever one of the functionsngettext
, dngettext
, or dcngettext
is called. Thenumeric value passed to these functions is then substituted for all usesof the variable n
in the expression. The resulting value thenmust be greater or equal to zero and smaller than the value given as thevalue of nplurals
.
The following rules are known at this point. The language with familiesare listed. But this does not necessarily mean the information can begeneralized for the whole family (as can be easily seen in the tablebelow).5
Some languages only require one single form. There is no distinctionbetween the singular and plural form. An appropriate header entrywould look like this:
Plural-Forms: nplurals=1; plural=0;
Languages with this property include:
Japanese, Vietnamese, Korean
Thai
This is the form used in most existing programs since it is what Englishis using. A header entry would look like this:
Plural-Forms: nplurals=2; plural=n != 1;
(Note: this uses the feature of C expressions that boolean expressionshave to value zero or one.)
Languages with this property include:
English, German, Dutch, Swedish, Danish, Norwegian, Faroese
Spanish, Portuguese, Italian, Bulgarian
Greek
Finnish, Estonian
Hebrew
Bahasa Indonesian
Esperanto
Other languages using the same header entry are:
Hungarian
Turkish
Hungarian does not appear to have a plural if you look at sentences involvingcardinal numbers. For example, “1 apple” is “1 alma”, and “123 apples” is“123 alma”. But when the number is not explicit, the distinction betweensingular and plural exists: “the apple” is “az alma”, and “the apples” is“az almák”. Since ngettext
has to support both types of sentences,it is classified here, under “two forms”.
The same holds for Turkish: “1 apple” is “1 elma”, and “123 apples” is“123 elma”. But when the number is omitted, the distinction between singularand plural exists: “the apple” is “elma”, and “the apples” is“elmalar”.
Exceptional case in the language family. The header entry would be:
Plural-Forms: nplurals=2; plural=n>1;
Languages with this property include:
Brazilian Portuguese, French
The header entry would be:
Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
Languages with this property include:
Latvian
The header entry would be:
Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
Languages with this property include:
Gaeilge (Irish)
The header entry would be:
Plural-Forms: nplurals=3; \ plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;
Languages with this property include:
Romanian
The header entry would look like this:
Plural-Forms: nplurals=3; \ plural=n%10==1 && n%100!=11 ? 0 : \ n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
Languages with this property include:
Lithuanian
The header entry would look like this:
Plural-Forms: nplurals=3; \ plural=n%10==1 && n%100!=11 ? 0 : \ n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
Languages with this property include:
Russian, Ukrainian, Belarusian, Serbian, Croatian
The header entry would look like this:
Plural-Forms: nplurals=3; \ plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
Languages with this property include:
Czech, Slovak
The header entry would look like this:
Plural-Forms: nplurals=3; \ plural=n==1 ? 0 : \ n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
Languages with this property include:
Polish
The header entry would look like this:
Plural-Forms: nplurals=4; \ plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
Languages with this property include:
Slovenian
The header entry would look like this:
Plural-Forms: nplurals=6; \ plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \ : n%100>=11 ? 4 : 5;
Languages with this property include:
Arabic
You might now ask, ngettext
handles only numbers n of type‘unsigned long’. What about larger integer types? What about negativenumbers? What about floating-point numbers?
About larger integer types, such as ‘uintmax_t’ or ‘unsigned long long’: they can be handled by reducing the value to arange that fits in an ‘unsigned long’. Simply casting the value to‘unsigned long’ would not do the right thing, since it would treatULONG_MAX + 1
like zero, ULONG_MAX + 2
like singular, andthe like. Here you can exploit the fact that all mentioned plural formformulas eventually become periodic, with a period that is a divisor of 100(or 1000 or 1000000). So, when you reduce a large value to another one inthe range [1000000, 1999999] that ends in the same 6 decimal digits, youcan assume that it will lead to the same plural form selection. This codedoes this:
#includeuintmax_t nbytes = ...; printf (ngettext ("The file has %"PRIuMAX" byte.", "The file has %"PRIuMAX" bytes.", (nbytes > ULONG_MAX ? (nbytes % 1000000) + 1000000 : nbytes)), nbytes);
Negative and floating-point values usually represent physical entities forwhich singular and plural don’t clearly apply. In such cases, there is noneed to use ngettext
; a simple gettext
call with a form suitablefor all values will do. For example:
printf (gettext ("Time elapsed: %.3f seconds"), num_milliseconds * 0.001);
Even if num_milliseconds happens to be a multiple of 1000, the output
Time elapsed: 1.000 seconds
is acceptable in English, and similarly for other languages.
The translators’ perspective regarding plural forms is explained inTranslating plural forms.
Previous: Plural forms, Up: gettext [Contents][Index]
At this point of the discussion we should talk about an advantage of theGNU gettext
implementation. Some readers might have pointed outthat an internationalized program might have a poor performance if somestring has to be translated in an inner loop. While this is unavoidablewhen the string varies from one run of the loop to the other it issimply a waste of time when the string is always the same. Take thefollowing example:
{ while (…) { puts (gettext ("Hello world")); } }
When the locale selection does not change between two runs the resultingstring is always the same. One way to use this is:
{ str = gettext ("Hello world"); while (…) { puts (str); } }
But this solution is not usable in all situation (e.g. when the localeselection changes) nor does it lead to legible code.
For this reason, GNU gettext
caches previous translation results.When the same translation is requested twice, with no new messagecatalogs being loaded in between, gettext
will, the second time,find the result through a single cache lookup.
Next: Using libintl.a, Previous: gettext, Up: Programmers [Contents][Index]
The following discussion is perhaps a little bit colored. As saidabove we implemented GNU gettext
following the Uniforumproposal and this surely has its reasons. But it should show how wecame to this decision.
First we take a look at the developing process. When we write anapplication using NLS provided by gettext
we proceed as always.Only when we come to a string which might be seen by the users and thushas to be translated we use gettext("…")
instead of"…"
. At the beginning of each source file (or in a centralheader file) we define
#define gettext(String) (String)
Even this definition can be avoided when the system supports thegettext
function in its C library. When we compile this code theresult is the same as if no NLS code is used. When you take a look atthe GNU gettext
code you will see that we use _("…")
instead of gettext("…")
. This reduces the number ofadditional characters per translatable string to 3 (in words:three).
When now a production version of the program is needed we simply replacethe definition
#define _(String) (String)
by
#include#define _(String) gettext (String)
Additionally we run the program xgettext on all source code filewhich contain translatable strings and that’s it: we have a runningprogram which does not depend on translations to be available, but whichcan use any that becomes available.
The same procedure can be done for the gettext_noop
invocations(see Special cases). One usually defines gettext_noop
as ano-op macro. So you should consider the following code for your project:
#define gettext_noop(String) String #define N_(String) gettext_noop (String)
N_
is a short form similar to _
. The Makefile inthe po/ directory of GNU gettext
knows by default both of thementioned short forms so you are invited to follow this proposal foryour own ease.
Now to catgets
. The main problem is the work for theprogrammer. Every time he comes to a translatable string he has todefine a number (or a symbolic constant) which has also be defined inthe message catalog file. He also has to take care for duplicateentries, duplicate message IDs etc. If he wants to have the samequality in the message catalog as the GNU gettext
programprovides he also has to put the descriptive comments for the strings andthe location in all source code files in the message catalog. This isnearly a Mission: Impossible.
But there are also some points people might call advantages speaking forcatgets
. If you have a single word in a string and this stringis used in different contexts it is likely that in one or the otherlanguage the word has different translations. Example:
printf ("%s: %d", gettext ("number"), number_of_errors) printf ("you should see %d %s", number_count, number_count == 1 ? gettext ("number") : gettext ("numbers"))
Here we have to translate two times the string "number"
. Evenif you do not speak a language beside English it might be possible torecognize that the two words have a different meaning. In German thefirst appearance has to be translated to "Anzahl"
and the secondto "Zahl"
.
Now you can say that this example is really esoteric. And you areright! This is exactly how we felt about this problem and decide thatit does not weight that much. The solution for the above problem couldbe very easy:
printf ("%s %d", gettext ("number:"), number_of_errors) printf (number_count == 1 ? gettext ("you should see %d number") : gettext ("you should see %d numbers"), number_count)
We believe that we can solve all conflicts with this method. If it isdifficult one can also consider changing one of the conflicting string alittle bit. But it is not impossible to overcome.
catgets
allows same original entry to have different translations,but gettext
has another, scalable approach for solving ambiguitiesof this kind: See Ambiguities.
Next: gettext grok, Previous: Comparison, Up: Programmers [Contents][Index]
Starting with version 0.9.4 the library libintl.h
should beself-contained. I.e., you can use it in your own programs withoutproviding additional functions. The Makefile will put the headerand the library in directories selected using the $(prefix)
.
Next: Temp Programmers, Previous: Using libintl.a, Up: Programmers [Contents][Index]
gettext
grokNOTE: This documentation section is outdated and needs to berevised.
To fully exploit the functionality of the GNU gettext
library itis surely helpful to read the source code. But for those who don’t wantto spend that much time in reading the (sometimes complicated) code hereis a list comments:
For interactive programs it might be useful to offer a selection of theused language at runtime. To understand how to do this one need to knowhow the used language is determined while executing the gettext
function. The method which is presented here only works correctlywith the GNU implementation of the gettext
functions.
In the function dcgettext
at every call the current setting ofthe highest priority environment variable is determined and used.Highest priority means here the following list with decreasingpriority:
LANGUAGE
LC_ALL
LC_xxx
, according to selected locale categoryLANG
Afterwards the path is constructed using the found value and thetranslation file is loaded if available.
What happens now when the value for, say, LANGUAGE
changes? Accordingto the process explained above the new value of this variable is foundas soon as the dcgettext
function is called. But this also meansthe (perhaps) different message catalog file is loaded. In otherwords: the used language is changed.
But there is one little hook. The code for gcc-2.7.0 and up providessome optimization. This optimization normally prevents the calling ofthe dcgettext
function as long as no new catalog is loaded. Butif dcgettext
is not called the program also cannot find theLANGUAGE
variable be changed (see Optimized gettext). Asolution for this is very easy. Include the following code in thelanguage switching function.
/* Change language. */ setenv ("LANGUAGE", "fr", 1); /* Make change known. */ { extern int _nl_msg_cat_cntr; ++_nl_msg_cat_cntr; }
The variable _nl_msg_cat_cntr
is defined in loadmsgcat.c.You don’t need to know what this is for. But it can be used to detectwhether a gettext
implementation is GNU gettext and not non-GNUsystem’s native gettext implementation.
Previous: gettext grok, Up: Programmers [Contents][Index]
NOTE: This documentation section is outdated and needs to berevised.
• Temp Implementations: | Temporary - Two Possible Implementations | |
• Temp catgets: | Temporary - About catgets |
|
• Temp WSI: | Temporary - Why a single implementation | |
• Temp Notes: | Temporary - Notes |
Next: Temp catgets, Previous: Temp Programmers, Up: Temp Programmers [Contents][Index]
There are two competing methods for language independent messages:the X/Open catgets
method, and the Uniforum gettext
method. The catgets
method indexes messages by integers; thegettext
method indexes them by their English translations.The catgets
method has been around longer and is supportedby more vendors. The gettext
method is supported by Sun,and it has been heard that the COSE multi-vendor initiative issupporting it. Neither method is a POSIX standard; the POSIX.1committee had a lot of disagreement in this area.
Neither one is in the POSIX standard. There was much disagreementin the POSIX.1 committee about using the gettext
routinesvs. catgets
(XPG). In the end the committee couldn’tagree on anything, so no messaging system was included as partof the standard. I believe the informative annex of the standardincludes the XPG3 messaging interfaces, “…as an example ofa messaging system that has been implemented…”
They were very careful not to say anywhere that you should use oneset of interfaces over the other. For more on this topic pleasesee the Programming for Internationalization FAQ.
Next: Temp WSI, Previous: Temp Implementations, Up: Temp Programmers [Contents][Index]
catgets
There have been a few discussions of late on the use ofcatgets
as a base. I think it important to present bothsides of the argument and hence am opting to play devil’s advocatefor a little bit.
I’ll not deny the fact that catgets
could have been designeda lot better. It currently has quite a number of limitations andthese have already been pointed out.
However there is a great deal to be said for consistency andstandardization. A common recurring problem when writing Unixsoftware is the myriad portability problems across Unix platforms.It seems as if every Unix vendor had a look at the operating systemand found parts they could improve upon. Undoubtedly, thesemodifications are probably innovative and solve real problems.However, software developers have a hard time keeping up with allthese changes across so many platforms.
And this has prompted the Unix vendors to begin to standardize theirsystems. Hence the impetus for Spec1170. Every major Unix vendorhas committed to supporting this standard and every Unix softwaredeveloper waits with glee the day they can write software to thisstandard and simply recompile (without having to use autoconf)across different platforms.
As I understand it, Spec1170 is roughly based upon version 4 of theX/Open Portability Guidelines (XPG4). Because catgets
andfriends are defined in XPG4, I’m led to believe that catgets
is a part of Spec1170 and hence will become a standardized componentof all Unix systems.
Next: Temp Notes, Previous: Temp catgets, Up: Temp Programmers [Contents][Index]
Now it seems kind of wasteful to me to have two different systemsinstalled for accessing message catalogs. If we do want to remedycatgets
deficiencies why don’t we try to expand catgets
(in a compatible manner) rather than implement an entirely new system.Otherwise, we’ll end up with two message catalog access systems installedwith an operating system - one set of routines for packages using GNUgettext
for their internationalization, and another set of routines(catgets) for all other software. Bloated?
Supposing another catalog access system is implemented. Which dowe recommend? At least for Linux, we need to attract as manysoftware developers as possible. Hence we need to make it as easyfor them to port their software as possible. Which means supportingcatgets
. We will be implementing the libintl
codewithin our libc
, but does this mean we also have to incorporateanother message catalog access scheme within our libc
as well?And what about people who are going to be using the libintl
+ non-catgets
routines. When they port their software toother platforms, they’re now going to have to include the front-end(libintl
) code plus the back-end code (the non-catgets
access routines) with their software instead of just including thelibintl
code with their software.
Message catalog support is however only the tip of the iceberg.What about the data for the other locale categories? They also havea number of deficiencies. Are we going to abandon them as well anddevelop another duplicate set of routines (should libintl
expand beyond message catalog support)?
Like many parts of Unix that can be improved upon, we’re stuck with balancingcompatibility with the past with useful improvements and innovations forthe future.
Previous: Temp WSI, Up: Temp Programmers [Contents][Index]
X/Open agreed very late on the standard form so that manyimplementations differ from the final form. Both of my system (oldLinux catgets and Ultrix-4) have a strange variation.
OK. After incorporating the last changes I have to spend some time onmaking the GNU/Linux libc
gettext
functions. So in futureSolaris is not the only system having gettext
.
Next: Maintainers, Previous: Programmers, Up: Top [Contents][Index]
• Trans Intro 0: | Introduction 0 | |
• Trans Intro 1: | Introduction 1 | |
• Discussions: | Discussions | |
• Organization: | Organization | |
• Information Flow: | Information Flow | |
• Translating plural forms: | How to fill in msgstr[0] , msgstr[1] |
|
• Prioritizing messages: | How to find which messages to translate first |
Next: Trans Intro 1, Previous: Translators, Up: Translators [Contents][Index]
NOTE: This documentation section is outdated and needs to berevised.
Free software is going international! The Translation Project is a wayto get maintainers, translators and users all together, so free softwarewill gradually become able to speak many native languages.
The GNU gettext
tool set contains everything maintainersneed for internationalizing their packages for messages. It alsocontains quite useful tools for helping translators at localizingmessages to their native language, once a package has already beeninternationalized.
To achieve the Translation Project, we need many interestedpeople who like their own language and write it well, and who are alsoable to synergize with other translators speaking the same language.If you’d like to volunteer to work at translating messages,please send mail to your translating team.
Each team has its own mailing list, courtesy of LinuxInternational. You may reach your translating team at the [email protected], replacing ll by the two-letter ISO 639code for your language. Language codes are not the same ascountry codes given in ISO 3166. The following translating teamsexist:
Chinese
zh
, Czechcs
, Danishda
, Dutchnl
,Esperantoeo
, Finnishfi
, Frenchfr
, Irishga
, Germande
, Greekel
, Italianit
,Japaneseja
, Indonesianin
, Norwegianno
, Polishpl
, Portuguesept
, Russianru
, Spanishes
,Swedishsv
and Turkishtr
.
For example, you may reach the Chinese translating team by writing [email protected]. When you become a member of the translating teamfor your own language, you may subscribe to its list. For example,Swedish people can send a message to sv-request@li.org,having this message body:
subscribe
Keep in mind that team members should be interested in workingat translations, or at solving translational difficulties, rather thanmerely lurking around. If your team does not exist yet and you want tostart one, please write to [email protected];you will then reach the coordinator for all translator teams.
A handful of GNU packages have already been adapted and providedwith message translations for several languages. Translationteams have begun to organize, using these packages as a startingpoint. But there are many more packages and many languages forwhich we have no volunteer translators. If you would like tovolunteer to work at translating messages, please send mail [email protected] indicating what language(s)you can work on.
Next: Discussions, Previous: Trans Intro 0, Up: Translators [Contents][Index]
NOTE: This documentation section is outdated and needs to berevised.
This is now official, GNU is going international! Here is theannouncement submitted for the January 1995 GNU Bulletin:
A handful of GNU packages have already been adapted and providedwith message translations for several languages. Translationteams have begun to organize, using these packages as a startingpoint. But there are many more packages and many languagesfor which we have no volunteer translators. If you’d like tovolunteer to work at translating messages, please send mail to‘[email protected]’ indicating what language(s)you can work on.
This document should answer many questions for those who are curious aboutthe process or would like to contribute. Please at least skim over it,hoping to cut down a little of the high volume of e-mail generated by thiscollective effort towards internationalization of free software.
Most free programming which is widely shared is done in English, andcurrently, English is used as the main communicating language betweennational communities collaborating to free software. This very documentis written in English. This will not change in the foreseeable future.
However, there is a strong appetite from national communities forhaving more software able to write using national language and habits,and there is an on-going effort to modify free software in such a waythat it becomes able to do so. The experiments driven so far raisedan enthusiastic response from pretesters, so we believe thatinternationalization of free software is dedicated to succeed.
For suggestion clarifications, additions or corrections to thisdocument, please e-mail to [email protected].
Next: Organization, Previous: Trans Intro 1, Up: Translators [Contents][Index]
NOTE: This documentation section is outdated and needs to berevised.
Facing this internationalization effort, a few users expressed theirconcerns. Some of these doubts are presented and discussed, here.
Some languages are not spoken by a very large number of people, so peoplespeaking them sometimes consider that there may not be all that muchdemand such versions of free software packages. Moreover, many peoplebeing into computers, in some countries, generally seem to preferEnglish versions of their software.
On the other end, people might enjoy their own language a lot, and bevery motivated at providing to themselves the pleasure of having theirbeloved free software speaking their mother tongue. They do themselvesa personal favor, and do not pay that much attention to the number ofpeople benefiting of their work.
Other users are shy to push forward their own language, seeing in thissome kind of misplaced propaganda. Someone thought there must be someusers of the language over the networks pestering other people with it.
But any spoken language is worth localization, because there arepeople behind the language for whom the language is important anddear to their hearts.
The biggest problem is to find the right translations so thateverybody can understand the messages. Translations are usually alittle odd. Some people get used to English, to the extent they mayfind translations into their own language “rather pushy, obnoxiousand sometimes even hilarious.” As a French speaking man, I havethe experience of those instruction manuals for goods, so poorlytranslated in French in Korea or Taiwan…
The fact is that we sometimes have to create a kind of nationalcomputer culture, and this is not easy without the collaboration ofmany people liking their mother tongue. This is why translations arebetter achieved by people knowing and loving their own language, andready to work together at improving the results they obtain.
Some people wonder if using GNU gettext
necessarily brings theirpackage under the protective wing of the GNU General Public License orthe GNU Lesser General Public License, when they do not want to maketheir program free, or want other kinds of freedom. The simplestanswer is “normally not”.
The gettext-runtime
part of GNU gettext
, i.e. thecontents of libintl
, is covered by the GNU Lesser General PublicLicense. The gettext-tools
part of GNU gettext
, i.e. therest of the GNU gettext
package, is covered by the GNU GeneralPublic License.
The mere marking of localizable strings in a package, or conditionalinclusion of a few lines for initialization, is not really includingGPL’ed or LGPL’ed code. However, since the localization routines inlibintl
are under the LGPL, the LGPL needs to be considered.It gives the right to distribute the complete unmodified source oflibintl
even with non-free programs. It also gives the rightto use libintl
as a shared library, even for non-free programs.But it gives the right to use libintl
as a static library orto incorporate libintl
into another library only to freesoftware.
Next: Information Flow, Previous: Discussions, Up: Translators [Contents][Index]
NOTE: This documentation section is outdated and needs to berevised.
On a larger scale, the true solution would be to organize some kind offairly precise set up in which volunteers could participate. I gavesome thought to this idea lately, and realize there will be sometouchy points. I thought of writing to Richard Stallman to launchsuch a project, but feel it might be good to shake out the ideasbetween ourselves first. Most probably that Linux International hassome experience in the field already, or would like to orchestratethe volunteer work, maybe. Food for thought, in any case!
I guess we have to setup something early, somehow, that will helpmany possible contributors of the same language to interlock and avoidwork duplication, and further be put in contact for solving togetherproblems particular to their tongue (in most languages, there are manydifficulties peculiar to translating technical English). My Swedishcontributor acknowledged these difficulties, and I’m well aware ofthem for French.
This is surely not a technical issue, but we should manage so theeffort of locale contributors be maximally useful, despite the nationalteam layer interface between contributors and maintainers.
The Translation Project needs some setup for coordinating languagecoordinators. Localizing evolving programs will surelybecome a permanent and continuous activity in the free software community,once well started.The setup should be minimally completed and tested before GNUgettext
becomes an official reality. The e-mail [email protected] has been set up for receivingoffers from volunteers and general e-mail on these topics. This addressreaches the Translation Project coordinator.
• Central Coordination: | Central Coordination | |
• National Teams: | National Teams | |
• Mailing Lists: | Mailing Lists |
Next: National Teams, Previous: Organization, Up: Organization [Contents][Index]
I also think GNU will need sooner than it thinks, that someone set upa way to organize and coordinate these groups. Some kind of groupof groups. My opinion is that it would be good that GNU delegatesthis task to a small group of collaborating volunteers, shortly.Perhaps in gnu.announce a list of this national committee’scan be published.
My role as coordinator would simply be to refer to Ulrich any Germanspeaking volunteer interested to localization of free software packages, andmaybe helping national groups to initially organize, while maintainingnational registries for until national groups are ready to take over.In fact, the coordinator should ease volunteers to get in contact withone another for creating national teams, which should then selectone coordinator per language, or country (regionalized language).If well done, the coordination should be useful without being anoverwhelming task, the time to put delegations in place.
Next: Mailing Lists, Previous: Central Coordination, Up: Organization [Contents][Index]
I suggest we look for volunteer coordinators/editors for individuallanguages. These people will scan contributions of translation filesfor various programs, for their own languages, and will ensure highand uniform standards of diction.
From my current experience with other people in these days, those whoprovide localizations are very enthusiastic about the process, and aremore interested in the localization process than in the program theylocalize, and want to do many programs, not just one. This seemsto confirm that having a coordinator/editor for each language is agood idea.
We need to choose someone who is good at writing clear and conciseprose in the language in question. That is hard—we can’t checkit ourselves. So we need to ask a few people to judge each others’writing and select the one who is best.
I announce my prerelease to a few dozen people, and you would notbelieve all the discussions it generated already. I shudder to thinkwhat will happen when this will be launched, for true, officially,world wide. Who am I to arbitrate between two Czekolsovak userscontradicting each other, for example?
I assume that your German is not much better than my French so thatI would not be able to judge about these formulations. What I wouldsuggest is that for each language there is a group for people whomaintain the PO files and judge about changes. I suspect there willbe cultural differences between how such groups of people will behave.Some will have relaxed ways, reach consensus easily, and have anyoneof the group relate to the maintainers, while others will fight todeath, organize heavy administrations up to national standards, anduse strict channels.
The German team is putting out a good example. Right now, they aremaybe half a dozen people revising translations of each other anddiscussing the linguistic issues. I do not even have all the names.Ulrich Drepper is taking care of coordinating the German team.He subscribed to all my pretest lists, so I do not even have to warnhim specifically of incoming releases.
I’m sure, that is a good idea to get teams for each language workingon translations. That will make the translations better and moreconsistent.
• Sub-Cultures: | Sub-Cultures | |
• Organizational Ideas: | Organizational Ideas |
Next: Organizational Ideas, Previous: National Teams, Up: National Teams [Contents][Index]
Taking French for example, there are a few sub-cultures around computerswhich developed diverging vocabularies. Picking volunteers here andthere without addressing this problem in an organized way, soon in theproject, might produce a distasteful mix of internationalized programs,and possibly trigger endless quarrels among those who really care.
Keeping some kind of unity in the way French localization ofinternationalized programs is achieved is a difficult (and delicate) job.Knowing the latin character of French people (:-), if we take thisthe wrong way, we could end up nowhere, or spoil a lot of energies.Maybe we should begin to address this problem seriously beforeGNU gettext
become officially published. And I suspect that thismeans soon!
Previous: Sub-Cultures, Up: National Teams [Contents][Index]
I expect the next big changes after the official release. Please notethat I use the German translation of the short GPL message. We needto set a few good examples before the localization goes out for truein the free software community. Here are a few points to discuss:
Previous: National Teams, Up: Organization [Contents][Index]
If we get any inquiries about GNU gettext
, send them on to:
The *-pretest lists are quite useful to me, maybe the idea couldbe generalized to many GNU, and non-GNU packages. But each maintainerhis/her way!
François, we have a mechanism in place here atgnu.ai.mit.edu to track teams, support mailing lists forthem and log members. We have a slight preference that you use it.If this is OK with you, I can get you clued in.
Things are changing! A few years ago, when Daniel Fekete and Iasked for a mailing list for GNU localization, nested at the FSF, wewere politely invited to organize it anywhere else, and so did we.For communicating with my pretesters, I later made a handful ofmailing lists located at iro.umontreal.ca and administrated bymajordomo
. These lists have been very dependableso far…
I suspect that the German team will organize itself a mailing listlocated in Germany, and so forth for other countries. But before theyorganize for true, it could surely be useful to offer mailing listslocated at the FSF to each national team. So yes, please explain mehow I should proceed to create and handle them.
We should create temporary mailing lists, one per country, to helppeople organize. Temporary, because once regrouped and structured, itwould be fair the volunteers from country bring back their listin there and manage it as they want. My feeling is that, in the longrun, each team should run its own list, from within their country.There also should be some central list to which all teams couldsubscribe as they see fit, as long as each team is represented in it.
Next: Translating plural forms, Previous: Organization, Up: Translators [Contents][Index]
NOTE: This documentation section is outdated and needs to berevised.
There will surely be some discussion about this messages after thepackages are finally released. If people now send you some proposalsfor better messages, how do you proceed? Jim, please note thatright now, as I put forward nearly a dozen of localizable programs, Ireceive both the translations and the coordination concerns about them.
If I put one of my things to pretest, Ulrich receives the announcementand passes it on to the German team, who make last minute revisions.Then he submits the translation files to me as the maintainer.For free packages I do not maintain, I would not even hear about it.This scheme could be made to work for the whole Translation Project,I think. For security reasons, maybe Ulrich (national coordinators,in fact) should update central registry kept at the Translation Project(Jim, me, or Len’s recruits) once in a while.
In December/January, I was aggressively ready to internationalizeall of GNU, giving myself the duty of one small GNU package per weekor so, taking many weeks or months for bigger packages. But it doesnot work this way. I first did all the things I’m responsible for.I’ve nothing against some missionary work on other maintainers, butI’m also losing a lot of energy over it—same debates over again.
And when the first localized packages are released we’ll get a lot ofresponses about ugly translations :-). Surely, and we need to havebeforehand a fairly good idea about how to handle the informationflow between the national teams and the package maintainers.
Please start saving somewhere a quick history of each PO file. I knowfor sure that the file format will change, allowing for comments.It would be nice that each file has a kind of log, and references forthose who want to submit comments or gripes, or otherwise contribute.I sent a proposal for a fast and flexible format, but it is notreceiving acceptance yet by the GNU deciders. I’ll tell you when Ihave more information about this.
Next: Prioritizing messages, Previous: Information Flow, Up: Translators [Contents][Index]
Suppose you are translating a PO file, and it contains an entry like this:
#, c-format msgid "One file removed" msgid_plural "%d files removed" msgstr[0] "" msgstr[1] ""
What does this mean? How do you fill it in?
Such an entry denotes a message with plural forms, that is, a message wherethe text depends on a cardinal number. The general form of the message,in English, is the msgid_plural
line. The msgid
line is theEnglish singular form, that is, the form for when the number is equal to 1.More details about plural forms are explained in Plural forms.
The first thing you need to look at is the Plural-Forms
line in theheader entry of the PO file. It contains the number of plural forms and aformula. If the PO file does not yet have such a line, you have to add it.It only depends on the language into which you are translating. You canget this info by using the msginit
command (see Creating) –it contains a database of known plural formulas – or by asking othermembers of your translation team.
Suppose the line looks as follows:
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n" "%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
It’s logically one line; recall that the PO file formatting is allowed tobreak long lines so that each physical line fits in 80 monospaced columns.
The value of nplurals
here tells you that there are three pluralforms. The first thing you need to do is to ensure that the entry containsan msgstr
line for each of the forms:
#, c-format msgid "One file removed" msgid_plural "%d files removed" msgstr[0] "" msgstr[1] "" msgstr[2] ""
Then translate the msgid_plural
line and fill it in into eachmsgstr
line:
#, c-format msgid "One file removed" msgid_plural "%d files removed" msgstr[0] "%d slika uklonjenih" msgstr[1] "%d slika uklonjenih" msgstr[2] "%d slika uklonjenih"
Now you can refine the translation so that it matches the plural form.According to the formula above, msgstr[0]
is used when the numberends in 1 but does not end in 11; msgstr[1]
is used when the numberends in 2, 3, 4, but not in 12, 13, 14; and msgstr[2]
is used inall other cases. With this knowledge, you can refine the translations:
#, c-format msgid "One file removed" msgid_plural "%d files removed" msgstr[0] "%d slika je uklonjena" msgstr[1] "%d datoteke uklonjenih" msgstr[2] "%d slika uklonjenih"
You noticed that in the English singular form (msgid
) the numberplaceholder could be omitted and replaced by the numeral word “one”.Can you do this in your translation as well?
msgstr[0] "jednom datotekom je uklonjen"
Well, it depends on whether msgstr[0]
applies only to the number 1,or to other numbers as well. If, according to the plural formula,msgstr[0]
applies only to n == 1
, then you can use thespecialized translation without the number placeholder. In our case,however, msgstr[0]
also applies to the numbers 21, 31, 41, etc.,and therefore you cannot omit the placeholder.
Previous: Translating plural forms, Up: Translators [Contents][Index]
A translator sometimes has only a limited amount of time per week tospend on a package, and some packages have quite large message catalogs(over 1000 messages). Therefore she wishes to translate the messagesfirst that are the most visible to the user, or that occur most frequently.This section describes how to determine these "most urgent" messages.It also applies to determine the "next most urgent" messages after themessage catalog has already been partially translated.
In a first step, she uses the programs like a user would do. While shedoes this, the GNU gettext
library logs into a file the not yettranslated messages for which a translation was requested from the program.
In a second step, she uses the PO mode to translate precisely this setof messages.
Here a more details. The GNU libintl
library (but not thecorresponding functions in GNU libc
) supports an environment variableGETTEXT_LOG_UNTRANSLATED
. The GNU libintl
library willlog into this file the messages for which gettext()
and relatedfunctions couldn’t find the translation. If the file doesn’t exist, itwill be created as needed. On systems with GNU libc
a shared library‘preloadable_libintl.so’ is provided that can be used with the ELF‘LD_PRELOAD’ mechanism.
So, in the first step, the translator uses these commands on systems withGNU libc
:
$ LD_PRELOAD=/usr/local/lib/preloadable_libintl.so $ export LD_PRELOAD $ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused $ export GETTEXT_LOG_UNTRANSLATED
and these commands on other systems:
$ GETTEXT_LOG_UNTRANSLATED=$HOME/gettextlogused $ export GETTEXT_LOG_UNTRANSLATED
Then she uses and peruses the programs. (It is a good and recommendedpractice to use the programs for which you provide translations: itgives you the needed context.) When done, she removes the environmentvariables:
$ unset LD_PRELOAD $ unset GETTEXT_LOG_UNTRANSLATED
The second step starts with removing duplicates:
$ msguniq $HOME/gettextlogused > missing.po
The result is a PO file, but needs some preprocessing before a PO file editorcan be used with it. First, it is a multi-domain PO file, containingmessages from many translation domains. Second, it lacks all translatorcomments and source references. Here is how to get a list of the affectedtranslation domains:
$ sed -n -e 's,^domain "\(.*\)"$,\1,p' < missing.po | sort | uniq
Then the translator can handle the domains one by one. For simplicity,let’s use environment variables to denote the language, domain and sourcepackage.
$ lang=nl # your language $ domain=coreutils # the name of the domain to be handled $ package=/usr/src/gnu/coreutils-4.5.4 # the package where it comes from
She takes the latest copy of $lang.po from the Translation Project,or from the package (in most cases, $package/po/$lang.po), orcreates a fresh one if she’s the first translator (see Creating).She then uses the following commands to mark the not urgent messages as"obsolete". (This doesn’t mean that these messages - translated anduntranslated ones - will go away. It simply means that the PO file editorwill ignore them in the following editing session.)
$ msggrep --domain=$domain missing.po | grep -v '^domain' \ > $domain-missing.po $ msgattrib --set-obsolete --ignore-file $domain-missing.po $domain.$lang.po \ > $domain.$lang-urgent.po
The she translates $domain.$lang-urgent.po by use of a PO file editor(see Editing).(FIXME: I don’t know whether KBabel
and gtranslator
alsopreserve obsolete messages, as they should.)Finally she restores the not urgent messages (with their earliertranslations, for those which were already translated) through this command:
$ msgmerge --no-fuzzy-matching $domain.$lang-urgent.po $package/po/$domain.pot \ > $domain.$lang.po
Then she can submit $domain.$lang.po and proceed to the next domain.
Next: Installers, Previous: Translators, Up: Top [Contents][Index]
The maintainer of a package has many responsibilities. One of themis ensuring that the package will install easily on many platforms,and that the magic we described earlier (see Users) will workfor installers and end users.
Of course, there are many possible ways by which GNU gettext
might be integrated in a distribution, and this chapter does not coverthem in all generality. Instead, it details one possible approach whichis especially adequate for many free software distributions following GNUstandards, or even better, Gnits standards, because GNU gettext
is purposely for helping the internationalization of the whole GNUproject, and as many other good free packages as possible. So, themaintainer’s view presented here presumes that the package already hasa configure.ac file and uses GNU Autoconf.
Nevertheless, GNU gettext
may surely be useful for free packagesnot following GNU standards and conventions, but the maintainers of suchpackages might have to show imagination and initiative in organizingtheir distributions so gettext
work for them in all situations.There are surely many, out there.
Even if gettext
methods are now stabilizing, slight adjustmentsmight be needed between successive gettext
versions, so youshould ideally revise this chapter in subsequent releases, lookingfor changes.
• Flat and Non-Flat: | Flat or Non-Flat Directory Structures | |
• Prerequisites: | Prerequisite Works | |
• gettextize Invocation: | Invoking the gettextize Program |
|
• Adjusting Files: | Files You Must Create or Alter | |
• autoconf macros: | Autoconf macros for use in configure.ac | |
• Version Control Issues: | ||
• Release Management: | Creating a Distribution Tarball |
Next: Prerequisites, Previous: Maintainers, Up: Maintainers [Contents][Index]
Some free software packages are distributed as tar
files which unpackin a single directory, these are said to be flat distributions.Other free software packages have a one level hierarchy of subdirectories, usingfor example a subdirectory named doc/ for the Texinfo manual andman pages, another called lib/ for holding functions meant toreplace or complement C libraries, and a subdirectory src/ forholding the proper sources for the package. These other distributionsare said to be non-flat.
We cannot say much about flat distributions. A flatdirectory structure has the disadvantage of increasing the difficultyof updating to a new version of GNU gettext
. Also, if you havemany PO files, this could somewhat pollute your single directory.Also, GNU gettext
’s libintl sources consist of C sources, shellscripts, sed
scripts and complicated Makefile rules, which don’tfit well into an existing flat structure. For these reasons, werecommend to use non-flat approach in this case as well.
Maybe because GNU gettext
itself has a non-flat structure,we have more experience with this approach, and this is what will bedescribed in the remaining of this chapter. Some maintainers mightuse this as an opportunity to unflatten their package structure.
Next: gettextize Invocation, Previous: Flat and Non-Flat, Up: Maintainers [Contents][Index]
There are some works which are required for using GNU gettext
in one of your package. These works have some kind of generalitythat escape the point by point descriptions used in the remainderof this chapter. So, we describe them here.
gettextize
you should install someother packages first.Ensure that recent versions of GNU m4
, GNU Autoconf and GNUgettext
are already installed at your site, and if not, proceedto do this first. If you get to install these things, beware thatGNU m4
must be fully installed before GNU Autoconf is evenconfigured. To further ease the task of a package maintainer the automake
package was designed and implemented. GNU gettext
now uses thistool and the Makefiles in the intl/ and po/therefore know about all the goals necessary for using automake
and libintl in one project.
Those four packages are only needed by you, as a maintainer; theinstallers of your own package and end users do not really need any ofGNU m4
, GNU Autoconf, GNU gettext
, or GNU automake
for successfully installing and running your package, with messagesproperly translated. But this is not completely true if you provideinternationalized shell scripts within your own package: GNUgettext
shall then be installed at the user site if the end userswant to see the translation of shell script messages.
It is worth adding here a few words about how the maintainer shouldideally behave with PO files submissions. As a maintainer, your role isto authenticate the origin of the submission as being the representativeof the appropriate translating teams of the Translation Project (forwardthe submission to [email protected] in case of doubt),to ensure that the PO file format is not severely broken and does notprevent successful installation, and for the rest, to merely put thesePO files in po/ for distribution.
As a maintainer, you do not have to take on your shoulders theresponsibility of checking if the translations are adequate orcomplete, and should avoid diving into linguistic matters. Translationteams drive themselves and are fully responsible of their linguisticchoices for the Translation Project. Keep in mind that translator teams are notdriven by maintainers. You can help by carefully redirecting allcommunications and reports from users about linguistic matters to theappropriate translation team, or explain users how to reach or jointheir team. The simplest might be to send them the ABOUT-NLS file.
Maintainers should never ever apply PO file bug reportsthemselves, short-cutting translation teams. If some translator hasdifficulty to get some of her points through her team, it should not bean option for her to directly negotiate translations with maintainers.Teams ought to settle their problems themselves, if any. If you, asa maintainer, ever think there is a real problem with a team, pleasenever try to solve a team’s problem on your own.
Next: Adjusting Files, Previous: Prerequisites, Up: Maintainers [Contents][Index]
gettextize
ProgramThe gettextize
program is an interactive tool that helps themaintainer of a package internationalized through GNU gettext
.It is used for two purposes:
gettext
forthe first time.gettext
support ina package from a previous to a newer version of GNU gettext
.This program performs the following tasks:
gettext
.gettext
versions to the form recommended for the current GNUgettext
version.gettextize
.It can be invoked as follows:
gettextize [ option… ] [ directory ]
and accepts the following options:
Force replacement of files which already exist.
Install the libintl sources in a subdirectory named intl/.This libintl will be used to provide internationalization on systemsthat don’t have GNU libintl installed. If this option is omitted,the call to AM_GNU_GETTEXT
in configure.ac should read:‘AM_GNU_GETTEXT([external])’, and internationalization will notbe enabled on systems lacking GNU gettext.
Specify a directory containing PO files. Such a directory contains thetranslations into various languages of a particular POT file. Thisoption can be specified multiple times, once for each translation domain.If it is not specified, the directory named po/ is updated.
Don’t update or create ChangeLog files. By default, gettextize
logs all changes (file additions, modifications and removals) in afile called ‘ChangeLog’ in each affected directory.
Make symbolic links instead of copying the needed files. This can beuseful to save a few kilobytes of disk space, but it requires extraeffort to create self-contained tarballs, it may disturb some mechanismthe maintainer applies to the sources, and it is likely to introducebugs when a newer version of gettext
is installed on the system.
Print modifications but don’t perform them. All actions thatgettextize
would normally execute are inhibited and instead onlylisted on standard output.
Display this help and exit.
Output version information and exit.
If directory is given, this is the top level directory of apackage to prepare for using GNU gettext
. If not given, itis assumed that the current directory is the top level directory ofsuch a package.
The program gettextize
provides the following files. However,no existing file will be replaced unless the option --force
(-f
) is specified.
gettextize
,if you have one handy. You may also fetch a more recent copy of fileABOUT-NLS from Translation Project sites, and from most GNUarchive sites.gettext
distribution(beware the double ‘.in’ in the file name) and a few auxiliaryfiles. If the po/ directory already exists, it will be preservedalong with the files it contains, and only Makefile.in.in andthe auxiliary files will be overwritten. If ‘--po-dir’ has been specified, this holds for every directoryspecified through ‘--po-dir’, instead of po/.
gettext
distribution. Also, if option --force
(-f
) is given,the intl/ directory is emptied first.AM_GNU_GETTEXT
autoconf macro.automake
:A set of autoconf
macro files is copied into the package’sautoconf
macro repository, usually in a directory called m4/.If your site support symbolic links, gettextize
will notactually copy the files into your package, but establish symboliclinks instead. This avoids duplicating the disk space needed inall packages. Merely using the ‘-h’ option while creating thetar
archive of your distribution will resolve each link by anactual copy in the distribution archive. So, to insist, you reallyshould use ‘-h’ option with tar
within your dist
goal of your main Makefile.in.
Furthermore, gettextize
will update all Makefile.am filesin each affected directory, as well as the top level configure.acor configure.in file.
It is interesting to understand that most new files for supportingGNU gettext
facilities in one package go in intl/,po/ and m4/ subdirectories. One distinction betweenintl/ and the two other directories is that intl/ ismeant to be completely identical in all packages using GNU gettext
,while the other directories will mostly contain package dependentfiles.
The gettextize
program makes backup files for all files itreplaces or changes, and also write ChangeLog entries about thesechanges. This way, the careful maintainer can check after runninggettextize
whether its changes are acceptable to him, andpossibly adjust them. An exception to this rule is the intl/directory, which is added or replaced or removed as a whole.
It is important to understand that gettextize
can not do theentire job of adapting a package for using GNU gettext
. Theamount of remaining work depends on whether the package uses GNUautomake
or not. But in any case, the maintainer should stillread the section Adjusting Files after invoking gettextize
.
In particular, if after using ‘gettexize’, you get an error‘AC_COMPILE_IFELSE was called before AC_GNU_SOURCE’ or‘AC_RUN_IFELSE was called before AC_GNU_SOURCE’, you can fix itby modifying configure.ac, as described in configure.ac.
It is also important to understand that gettextize
is not partof the GNU build system, in the sense that it should not be invokedautomatically, and not be invoked by someone who doesn’t assume theresponsibilities of a package maintainer. For the latter purpose, aseparate tool is provided, see autopoint Invocation.
Next: autoconf macros, Previous: gettextize Invocation, Up: Maintainers [Contents][Index]
Besides files which are automatically added through gettextize
,there are many files needing revision for properly interacting withGNU gettext
. If you are closely following GNU standards forMakefile engineering and auto-configuration, the adaptations shouldbe easier to achieve. Here is a point by point description of thechanges needed in each.
So, here comes a list of files, each one followed by a description ofall alterations it needs. Many examples are taken out from the GNUgettext
0.19.8 distribution itself, or from the GNUhello
distribution (http://www.gnu.org/software/hello).You may indeed refer to the source code of the GNU gettext
andGNU hello
packages, as they are intended to be good examples forusing GNU gettext functionality.
• po/POTFILES.in: | POTFILES.in in po/ | |
• po/LINGUAS: | LINGUAS in po/ | |
• po/Makevars: | Makevars in po/ | |
• po/Rules-*: | Extending Makefile in po/ | |
• configure.ac: | configure.ac at top level | |
• config.guess: | config.guess, config.sub at top level | |
• mkinstalldirs: | mkinstalldirs at top level | |
• aclocal: | aclocal.m4 at top level | |
• acconfig: | acconfig.h at top level | |
• config.h.in: | config.h.in at top level | |
• Makefile: | Makefile.in at top level | |
• src/Makefile: | Makefile.in in src/ | |
• lib/gettext.h: | gettext.h in lib/ |
Next: po/LINGUAS, Previous: Adjusting Files, Up: Adjusting Files [Contents][Index]
The po/ directory should receive a file namedPOTFILES.in. This file tells which files, among all programsources, have marked strings needing translation. Here is an exampleof such a file:
# List of source files containing translatable strings. # Copyright (C) 1995 Free Software Foundation, Inc. # Common library files lib/error.c lib/getopt.c lib/xmalloc.c # Package source files src/gettext.c src/msgfmt.c src/xgettext.c
Hash-marked comments and white lines are ignored. All other lineslist those source files containing strings marked for translation(see Mark Keywords), in a notation relative to the top levelof your whole distribution, rather than the location of thePOTFILES.in file itself.
When a C file is automatically generated by a tool, like flex
orbison
, that doesn’t introduce translatable strings by itself,it is recommended to list in po/POTFILES.in the real source file(ending in .l in the case of flex
, or in .y in thecase of bison
), not the generated C file.
Next: po/Makevars, Previous: po/POTFILES.in, Up: Adjusting Files [Contents][Index]
The po/ directory should also receive a file namedLINGUAS. This file contains the list of available translations.It is a whitespace separated list. Hash-marked comments and white linesare ignored. Here is an example file:
# Set of available languages. de fr
This example means that German and French PO files are available, sothat these languages are currently supported by your package. If youwant to further restrict, at installation time, the set of installedlanguages, this should not be done by modifying the LINGUAS file,but rather by using the LINGUAS
environment variable(see Installers).
It is recommended that you add the "languages" ‘en@quot’ and‘en@boldquot’ to the LINGUAS
file. en@quot
is avariant of English message catalogs (en
) which uses real quotationmarks instead of the ugly looking asymmetric ASCII substitutes ‘`’and ‘'’. en@boldquot
is a variant of en@quot
thatadditionally outputs quoted pieces of text in a bold font, when used ina terminal emulator which supports the VT100 escape sequences (such asxterm
or the Linux console, but not Emacs in M-x shell mode).
These extra message catalogs ‘en@quot’ and ‘en@boldquot’are constructed automatically, not by translators; to support them, youneed the files Rules-quot, quot.sed, boldquot.sed,[email protected], [email protected], insert-header.sinin the po/ directory. You can copy them from GNU gettext’s po/directory; they are also installed by running gettextize
.
Next: po/Rules-*, Previous: po/LINGUAS, Up: Adjusting Files [Contents][Index]
The po/ directory also has a file named Makevars. Itcontains variables that are specific to your project. po/Makevarsgets inserted into the po/Makefile when the latter is created.The variables thus take effect when the POT file is created or updated,and when the message catalogs get installed.
The first three variables can be left unmodified if your package has asingle message domain and, accordingly, a single po/ directory.Only packages which have multiple po/ directories at differentlocations need to adjust the three first variables defined inMakevars.
As an alternative to the XGETTEXT_OPTIONS
variables, it is alsopossible to specify xgettext
options through theAM_XGETTEXT_OPTION
autoconf macro. See AM_XGETTEXT_OPTION.
Next: configure.ac, Previous: po/Makevars, Up: Adjusting Files [Contents][Index]
All files called Rules-* in the po/ directory get appended tothe po/Makefile when it is created. They present an opportunity toadd rules for special PO files to the Makefile, without needing to messwith po/Makefile.in.in.
GNU gettext comes with a Rules-quot file, containing rules forbuilding catalogs [email protected] and [email protected]. Theeffect of [email protected] is that people who set their LANGUAGE
environment variable to ‘en@quot’ will get messages with properlooking symmetric Unicode quotation marks instead of abusing the ASCIIgrave accent and the ASCII apostrophe for indicating quotations. Toenable this catalog, simply add en@quot
to the po/LINGUASfile. The effect of [email protected] is that people who setLANGUAGE
to ‘en@boldquot’ will get not only proper quotationmarks, but also the quoted text will be shown in a bold font on terminalsand consoles. This catalog is useful only for command-line programs, notGUI programs. To enable it, similarly add en@boldquot
to thepo/LINGUAS file.
Similarly, you can create rules for building message catalogs for thesr@latin locale – Serbian written with the Latin alphabet –from those for the sr locale – Serbian written with Cyrillicletters. See msgfilter Invocation.
Next: config.guess, Previous: po/Rules-*, Up: Adjusting Files [Contents][Index]
configure.ac or configure.in - this is the source from whichautoconf
generates the configure script.
This is done by a set of lines like these:
PACKAGE=gettext VERSION=0.19.8 AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE") AC_DEFINE_UNQUOTED(VERSION, "$VERSION") AC_SUBST(PACKAGE) AC_SUBST(VERSION)
or, if you are using GNU automake
, by a line like this:
AM_INIT_AUTOMAKE(gettext, 0.19.8)
Of course, you replace ‘gettext’ with the name of your package,and ‘0.19.8’ by its version numbers, exactly as theyshould appear in the packaged tar
file name of your distribution(gettext-0.19.8.tar.gz, here).
Here is the main m4
macro for triggering internationalizationsupport. Just add this line to configure.ac:
AM_GNU_GETTEXT
This call is purposely simple, even if it generates a lot of configuretime checking and actions.
If you have suppressed the intl/ subdirectory by callinggettextize
without ‘--intl’ option, this call should read
AM_GNU_GETTEXT([external])
The AC_OUTPUT
directive, at the end of your configure.acfile, needs to be modified in two ways:
AC_OUTPUT([existing configuration files intl/Makefile po/Makefile.in], [existing additional actions])
The modification to the first argument to AC_OUTPUT
asksfor substitution in the intl/ and po/ directories.Note the ‘.in’ suffix used for po/ only. This is becausethe distributed file is really po/Makefile.in.in.
If you have suppressed the intl/ subdirectory by callinggettextize
without ‘--intl’ option, then you don’t need toadd intl/Makefile
to the AC_OUTPUT
line.
If, after doing the recommended modifications, a command like‘aclocal -I m4’ or ‘autoconf’ or ‘autoreconf’ fails witha trace similar to this:
configure.ac:44: warning: AC_COMPILE_IFELSE was called before AC_GNU_SOURCE ../../lib/autoconf/specific.m4:335: AC_GNU_SOURCE is expanded from... m4/lock.m4:224: gl_LOCK is expanded from... m4/gettext.m4:571: gt_INTL_SUBDIR_CORE is expanded from... m4/gettext.m4:472: AM_INTL_SUBDIR is expanded from... m4/gettext.m4:347: AM_GNU_GETTEXT is expanded from... configure.ac:44: the top level configure.ac:44: warning: AC_RUN_IFELSE was called before AC_GNU_SOURCE
you need to add an explicit invocation of ‘AC_GNU_SOURCE’ in theconfigure.ac file - after ‘AC_PROG_CC’ but before‘AM_GNU_GETTEXT’, most likely very close to the ‘AC_PROG_CC’invocation. This is necessary because of ordering restrictions imposedby GNU autoconf.
Next: mkinstalldirs, Previous: configure.ac, Up: Adjusting Files [Contents][Index]
If you haven’t suppressed the intl/ subdirectory,you need to add the GNU config.guess and config.sub filesto your distribution. They are needed because the intl/ directoryhas platform dependent support for determining the locale’s characterencoding and therefore needs to identify the platform.
You can obtain the newest version of config.guess andconfig.sub from the ‘config’ project athttp://savannah.gnu.org/. The commands to fetch them are
$ wget -O config.guess 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess;hb=HEAD' $ wget -O config.sub 'http://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub;hb=HEAD'
Less recent versions are also contained in the GNU automake
andGNU libtool
packages.
Normally, config.guess and config.sub are put at thetop level of a distribution. But it is also possible to put them in asubdirectory, altogether with other configuration support files likeinstall-sh, ltconfig, ltmain.sh or missing.All you need to do, other than moving the files, is to add the following lineto your configure.ac.
AC_CONFIG_AUX_DIR([subdir])
Next: aclocal, Previous: config.guess, Up: Adjusting Files [Contents][Index]
With earlier versions of GNU gettext, you needed to add the GNUmkinstalldirs script to your distribution. This is not needed anymore. You can remove it if you not also using an automake version older thanautomake 1.9.
Next: acconfig, Previous: mkinstalldirs, Up: Adjusting Files [Contents][Index]
If you do not have an aclocal.m4 file in your distribution,the simplest is to concatenate the files codeset.m4, fcntl-o.m4,gettext.m4, glibc2.m4, glibc21.m4, iconv.m4,intdiv0.m4, intl.m4, intldir.m4, intlmacosx.m4,intmax.m4, inttypes_h.m4, inttypes-pri.m4,lcmessage.m4, lib-ld.m4, lib-link.m4,lib-prefix.m4, lock.m4, longlong.m4, nls.m4,po.m4, printf-posix.m4, progtest.m4, size_max.m4,stdint_h.m4, threadlib.m4, uintmax_t.m4,visibility.m4, wchar_t.m4, wint_t.m4, xsize.m4from GNU gettext
’sm4/ directory into a single file. If you have suppressed theintl/ directory, only gettext.m4, iconv.m4,lib-ld.m4, lib-link.m4, lib-prefix.m4,nls.m4, po.m4, progtest.m4 need to be concatenated.
If you are not using GNU automake
1.8 or newer, you will need toadd a file mkdirp.m4 from a newer automake distribution to thelist of files above.
If you already have an aclocal.m4 file, then you will haveto merge the said macro files into your aclocal.m4. Note that ifyou are upgrading from a previous release of GNU gettext
, youshould most probably replace the macros (AM_GNU_GETTEXT
,etc.), as they usuallychange a little from one release of GNU gettext
to the next.Their contents may vary as we get more experience with strange systemsout there.
If you are using GNU automake
1.5 or newer, it is enough to putthese macro files into a subdirectory named m4/ and add the line
ACLOCAL_AMFLAGS = -I m4
to your top level Makefile.am.
If you are using GNU automake
1.10 or newer, it is even easier:Add the line
ACLOCAL_AMFLAGS = --install -I m4
to your top level Makefile.am, and run ‘aclocal --install -I m4’.This will copy the needed files to the m4/ subdirectory automatically,before updating aclocal.m4.
These macros check for the internationalization support functionsand related informations. Hopefully, once stabilized, these macrosmight be integrated in the standard Autoconf set, because thispiece of m4
code will be the same for all projects using GNUgettext
.
Next: config.h.in, Previous: aclocal, Up: Adjusting Files [Contents][Index]
Earlier GNU gettext
releases required to put definitions forENABLE_NLS
, HAVE_GETTEXT
and HAVE_LC_MESSAGES
,HAVE_STPCPY
, PACKAGE
and VERSION
into anacconfig.h file. This is not needed any more; you can removethem from your acconfig.h file unless your package uses themindependently from the intl/ directory.
Next: Makefile, Previous: acconfig, Up: Adjusting Files [Contents][Index]
The include file template that holds the C macros to be defined byconfigure
is usually called config.h.in and may bemaintained either manually or automatically.
If gettextize
has created an intl/ directory, this filemust be called config.h.in and must be at the top level. If,however, you have suppressed the intl/ directory by callinggettextize
without ‘--intl’ option, then you can choose thename of this file and its location freely.
If it is maintained automatically, by use of the ‘autoheader’program, you need to do nothing about it. This is the case in particularif you are using GNU automake
.
If it is maintained manually, and if gettextize
has created anintl/ directory, you should switch to using ‘autoheader’.The list of C macros to be added for the sake of the intl/directory is just too long to be maintained manually; it also changesbetween different versions of GNU gettext
.
If it is maintained manually, and if on the other hand you havesuppressed the intl/ directory by calling gettextize
without ‘--intl’ option, then you can get away by adding thefollowing lines to config.h.in:
/* Define to 1 if translation of program messages to the user's native language is requested. */ #undef ENABLE_NLS
Next: src/Makefile, Previous: config.h.in, Up: Adjusting Files [Contents][Index]
Here are a few modifications you need to make to your main, top-levelMakefile.in file.
PACKAGE = @PACKAGE@ VERSION = @VERSION@
DISTFILES
definition, so the file getsdistributed.If you are using Makefiles, either generated by automake, or hand-writtenso they carefully follow the GNU coding standards, the effected goals forwhich the new subdirectories must be handled include ‘installdirs’,‘install’, ‘uninstall’, ‘clean’, ‘distclean’.
Here is an example of a canonical order of processing. In thisexample, we also define SUBDIRS
in Makefile.in
for itto be further used in the ‘dist:’ goal.
SUBDIRS = doc intl lib src po
Note that you must arrange for ‘make’ to descend into theintl
directory before descending into other directories containingcode which make use of the libintl.h
header file. For thisreason, here we mention intl
before lib
and src
.
distdir = $(PACKAGE)-$(VERSION) dist: Makefile rm -fr $(distdir) mkdir $(distdir) chmod 777 $(distdir) for file in $(DISTFILES); do \ ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \ done for subdir in $(SUBDIRS); do \ mkdir $(distdir)/$$subdir || exit 1; \ chmod 777 $(distdir)/$$subdir; \ (cd $$subdir && $(MAKE) $@) || exit 1; \ done tar chozf $(distdir).tar.gz $(distdir) rm -fr $(distdir)
Note that if you are using GNU automake
, Makefile.in isautomatically generated from Makefile.am, and all needed changesto Makefile.am are already made by running ‘gettextize’.
Next: lib/gettext.h, Previous: Makefile, Up: Adjusting Files [Contents][Index]