Plural Form(s) in Translation(s)

by Jan-Arve Sжther

Ever found yourself writing tr("%1 object(s) found") .arg(count) in one of your applications? Qt 4.2 will introduce a powerful mechanism to handle plurals in a graceful way that works for all languages and that requires little extra work from the developer.

What's the Problem with Plural Forms?
How Does Qt 4.2 Address This Problem?
How Does It Work Under the Hood?

What's the Problem with Plural Forms?

You have most probably seen programs that use the same string for singular and plural, using parentheses to combine the singular and the plural forms into one string (e.g., "6 occurrence(s) replaced").

A common use of plurals in dialogs

Natually, it would be preferable to show "6 occurrences replaced" with an 's', and "1 occurrence replaced" with no 's'. Some developers solve this problem through code that looks like this:

tr("%1 item%2 replaced").arg(count)
                        .arg(count == 1 ? "" : "s");

This approach works for languages like English that form their plural using 's', but as soon as we try to translate the application to languages like Arabic, Chinese, German, Hebrew, or Japanese (to name just a few), this breaks in a horrible way.

Developers who are slightly more sympathetic might write code that looks more like this:

QString message;
if (count == 1) {
    message = tr("%1 item replaced").arg(count);
} else {
    message = tr("%1 items replaced").arg(count);
}

This code is definitely more internationalization-friendly, but it still makes two assumptions about the target language:

It assumes that the target language has two grammatical numbers (singular and plural).
It assumes that the plural form should be used in the "n = 0" case (e.g., "0 items").

These assumptions hold for many of the world's languages, including Dutch, English, Finnish, Greek, Hebrew, Hindi, Mongolian, Swahili, Turkish, and Zulu, but there are many languages out there for which they don't.

Case in point: In French and Brazilian Portuguese (but not international Portuguese, interestingly enough), the singular form is used in conjunction with 0 (e.g., "0 maison", not "0 maisons"), breaking assumption 2. In Polish, there are three grammatical numbers:

Singular: n = 1
Paucal: n = 2--4, 22--24, 32--34, 42--44, ...
Plural: n = 0, 5--21, 25--31, 35--41, ...

For example, the Polish word dom ("house") has the paucal form domy and the plural form domуw. The table below shows the rendition of "n house(s)" in English, French, and Polish for different values of n.

English	French	Polish
0 houses	0 maison	0 domуw
1 house	1 maison	1 dom
2 houses	2 maisons	2 domy
3 houses	3 maisons	3 domy
4 houses	4 maisons	4 domy
5 houses	5 maisons	5 domуw
21 houses	21 maisons	21 domуw
22 houses	22 maisons	22 domy
24 houses	24 maisons	24 domy
30 houses	30 maisons	30 domуw

Other languages have other rules:

Latvian has a specific grammatical number, the nullar, for the "n = 0" case.
Dhivehi, Inuktitut, Irish, Maori, and a few other languages have a dual form for the "n = 2" case.
Czech, Slovak, Lithuanian, and Macedonian have a dual, but they use it according to more complex rules.
Slovenian has a trial in addition to the singular, dual, and plural forms.
Romanian handles the "n >= 20" case differently from the "n < 20" case.
Arabic has six different forms, depending on the value of n.
Chinese, Japanese, Korean, and many other languages don't distinguish between the singular and the plural.

This is just a partial list, but it clearly shows the complexity of the problem.

How Does Qt 4.2 Address This Problem?

Qt 4.2 includes a QObject::tr() overload that will make it very easy to write "plural-aware" internationalized applications. This new overload has the following signature:

QString tr(const char *text, const char *comment, int n);

Depending on the value of n, the tr() function will return a different translation, with the correct grammatical number for the target language. Also, any occurrence of "%n" is replaced with n's value. For example:

tr("%n item(s) replaced", "", count);

If a French translation is loaded, this will expand to "0 item remplacй", "1 item remplacй", "2 items remplacйs", etc., depending on n's value. And if no translation is loaded, the orignal string is used, with "%n" replaced with count's value (e.g., "6 item(s) replaced").

To obtain a more natural English text, you need to load an English translation.^[1] An English translation offers other advantages, such as the possibility of editing the application's English user interface without touching the source code.

When the application is ready to be translated, the developers must run lupdate as usual to generate one or several .ts files that can be edited using Qt Linguist. In Qt Linguist, the translator can specify the target language by clicking Edit|Translation File Settings. Specifying a target language is necessary so that Qt Linguist knows how many translations are necessary for a source string that contains "%n".

Qt Linguist's being used to translate plural forms in Polish

The screenshot above shows how Qt Linguist lets the translator enter three different translations corresponding to the three grammatical numbers (singular, paucal, and plural) in the Polish language.

How Does It Work Under the Hood?

Qt Linguist and its helper tool lrelease know the specific plural rules for all the languages supported by QLocale. These rules are encoded in the binary .qm file that is generated from the .ts file, so that tr() uses the correct form based on n's value. The table below shows the specific rules that are produced by Qt Linguist and lrelease for a selection of languages.

Language	Form 1	Form 2	Form 3
English	`n == 1`	otherwise	N/A
French	`n < 2`	otherwise	N/A
Czech	`n % 100 == 1`	`n % 100 >= 2 && n % 100 <= 4`	otherwise
Irish	`n == 1`	`n == 2`	otherwise
Latvian	`n % 10 == 1 && n % 100 != 11`	`n != 0`	otherwise
Lithuanian	`n % 10 == 1 && n % 100 != 11`	`n % 100 != 12 && n % 10 == 2`	otherwise
Macedonian	`n % 10 == 1`	`n % 10 == 2`	otherwise
Polish	`n == 1`	`n % 10 >= 2 && n % 10 <= 4 && (n % 100 < 10 \|\| n % 100 > 20)`	otherwise
Romanian	`n == 1`	`n == 0 \|\| (n % 100 >= 1 && n % 100 <= 20)`	otherwise
Russian	`n % 10 == 1 && n % 100 != 11`	`n % 10 >= 2 && n % 10 <= 4 && (n % 100 < 10 \|\| n % 100 > 20)`	otherwise
Slovak	`n == 1`	`n >= 2 && n <= 4`	otherwise
Japanese	otherwise	N/A	N/A

These rules are hard-coded in Qt Linguist and lrelease and neither the application developers nor the translators need to understand them.

Considering how easy it is to use the new tr() overload, there should be no excuse(s) anymore for not handling plural forms correctly in Qt applications.

^[1] For simplicity, we assume that the source language is English. It can be any language, even languages that cannot be expressed using the ISO 8859-1 (Latin-1) encoding. See the Release Manager chapter of the Qt Linguist manual for details.

Trademarks

Исходники

Другое