.
Different countries and cultures have varying conventions for how to
communicate. These conventions range from very simple ones, such as the
format for representing dates and times, to very complex ones, such as
the language spoken.
Internationalization of software means programming it to be able
to adapt to the user's favorite conventions. In ISO C,
internationalization works by means of locales. Each locale
specifies a collection of conventions, one convention for each purpose.
The user chooses a set of conventions by specifying a locale (via
environment variables).
All programs inherit the chosen locale as part of their environment.
Provided the programs are written to obey the choice of locale, they
will follow the conventions preferred by the user.
Each locale specifies conventions for several purposes, including the
following:
-
What multibyte character sequences are valid, and how they are
interpreted (see section Character Set Handling).
-
Classification of which characters in the local character set are
considered alphabetic, and upper- and lower-case conversion conventions
(see section Character Handling).
-
The collating sequence for the local language and character set
(see section Collation Functions).
-
Formatting of numbers and currency amounts (see section Generic Numeric Formatting Parameters).
-
Formatting of dates and times (see section Formatting Date and Time).
-
What language to use for output, including error messages
(see section Message Translation).
-
What language to use for user answers to yes-or-no questions.
-
What language to use for more complex user input.
(The C library doesn't yet help you implement this.)
Some aspects of adapting to the specified locale are handled
automatically by the library subroutines. For example, all your program
needs to do in order to use the collating sequence of the chosen locale
is to use strcoll or strxfrm to compare strings.
Other aspects of locales are beyond the comprehension of the library.
For example, the library can't automatically translate your program's
output messages into other languages. The only way you can support
output in the user's favorite language is to program this more or less
by hand. The C library provides functions to handle translations for
multiple languages easily.
This chapter discusses the mechanism by which you can modify the current
locale. The effects of the current locale on specific library functions
are discussed in more detail in the descriptions of those functions.
The simplest way for the user to choose a locale is to set the
environment variable LANG. This specifies a single locale to use
for all purposes. For example, a user could specify a hypothetical
locale named `espana-castellano' to use the standard conventions of
most of Spain.
The set of locales supported depends on the operating system you are
using, and so do their names. We can't make any promises about what
locales will exist, except for one standard locale called `C' or
`POSIX'. Later we will describe how to construct locales XXX.
A user also has the option of specifying different locales for different
purposes--in effect, choosing a mixture of multiple locales.
For example, the user might specify the locale `espana-castellano'
for most purposes, but specify the locale `usa-english' for
currency formatting. This might make sense if the user is a
Spanish-speaking American, working in Spanish, but representing monetary
amounts in US dollars.
Note that both locales `espana-castellano' and `usa-english',
like all locales, would include conventions for all of the purposes to
which locales apply. However, the user can choose to use each locale
for a particular subset of those purposes.
The purposes that locales serve are grouped into categories, so
that a user or a program can choose the locale for each category
independently. Here is a table of categories; each name is both an
environment variable that a user can set, and a macro name that you can
use as an argument to setlocale.
LC_COLLATE
-
This category applies to collation of strings (functions
strcoll
and strxfrm); see section Collation Functions.
LC_CTYPE
-
This category applies to classification and conversion of characters,
and to multibyte and wide characters;
see section Character Handling, and section Character Set Handling.
LC_MONETARY
-
This category applies to formatting monetary values; see section Generic Numeric Formatting Parameters.
LC_NUMERIC
-
This category applies to formatting numeric values that are not
monetary; see section Generic Numeric Formatting Parameters.
LC_TIME
-
This category applies to formatting date and time values; see
section Formatting Date and Time.
LC_MESSAGES
-
This category applies to selecting the language used in the user
interface for message translation (see section The Uniforum approach to Message Translation;
see section X/Open Message Catalog Handling).
LC_ALL
-
This is not an environment variable; it is only a macro that you can use
with
setlocale to set a single locale for all purposes. Setting
this environment variable overwrites all selections by the other
LC_* variables or LANG.
LANG
-
If this environment variable is defined, its value specifies the locale
to use for all purposes except as overridden by the variables above.
When developing the message translation functions it was felt that the
functionality provided by the variables above is not sufficient. E.g., it
should be possible to specify more than one locale name. For an example
take a Swedish user who better speaks German than English, the programs
messages by default are written in English. Then it should be possible
to specify that the first choice for the language is Swedish, the second
choice is German, and if this also fails English is used. This is
possible with the variable LANGUAGE. For further description of
this GNU extension see section User influence on gettext.
A C program inherits its locale environment variables when it starts up.
This happens automatically. However, these variables do not
automatically control the locale used by the library functions, because
ISO C says that all programs start by default in the standard `C'
locale. To use the locales specified by the environment, you must call
setlocale. Call it as follows:
setlocale (LC_ALL, "");
to select a locale based on the user choice of the appropriate
environment variables.
You can also use setlocale to specify a particular locale, for
general use or for a specific category.
The symbols in this section are defined in the header file `locale.h'.
- Function: char * setlocale (int category, const char *locale)
-
The function
setlocale sets the current locale for
category category to locale.
If category is LC_ALL, this specifies the locale for all
purposes. The other possible values of category specify an
individual purpose (see section Categories of Activities that Locales Affect).
You can also use this function to find out the current locale by passing
a null pointer as the locale argument. In this case,
setlocale returns a string that is the name of the locale
currently selected for category category.
The string returned by setlocale can be overwritten by subsequent
calls, so you should make a copy of the string (see section Copying and Concatenation) if you want to save it past any further calls to
setlocale. (The standard library is guaranteed never to call
setlocale itself.)
You should not modify the string returned by setlocale.
It might be the same string that was passed as an argument in a
previous call to setlocale.
When you read the current locale for category LC_ALL, the value
encodes the entire combination of selected locales for all categories.
In this case, the value is not just a single locale name. In fact, we
don't make any promises about what it looks like. But if you specify
the same "locale name" with LC_ALL in a subsequent call to
setlocale, it restores the same combination of locale selections.
To ensure to be able to use the string encoding the currently selected
locale at a later time one has to make a copy of the string. It is not
guaranteed that the return value stays valid all the time.
When the locale argument is not a null pointer, the string returned
by setlocale reflects the newly modified locale.
If you specify an empty string for locale, this means to read the
appropriate environment variable and use its value to select the locale
for category.
If a nonempty string is given for locale the locale with this name
is used, if this is possible.
If you specify an invalid locale name, setlocale returns a null
pointer and leaves the current locale unchanged.
Here is an example showing how you might use setlocale to
temporarily switch to a new locale.
#include <stddef.h>
#include <locale.h>
#include <stdlib.h>
#include <string.h>
void
with_other_locale (char *new_locale,
void (*subroutine) (int),
int argument)
{
char *old_locale, *saved_locale;
/* Get the name of the current locale. */
old_locale = setlocale (LC_ALL, NULL);
/* Copy the name so it won't be clobbered by setlocale. */
saved_locale = strdup (old_locale);
if (saved_locale == NULL)
fatal ("Out of memory");
/* Now change the locale and do some stuff with it. */
setlocale (LC_ALL, new_locale);
(*subroutine) (argument);
/* Restore the original locale. */
setlocale (LC_ALL, saved_locale);
free (saved_locale);
}
Portability Note: Some ISO C systems may define additional
locale categories and future versions of the library will do so. For
portability, assume that any symbol beginning with `LC_' might be
defined in `locale.h'.
The only locale names you can count on finding on all operating systems
are these three standard ones:
"C"
-
This is the standard C locale. The attributes and behavior it provides
are specified in the ISO C standard. When your program starts up, it
initially uses this locale by default.
"POSIX"
-
This is the standard POSIX locale. Currently, it is an alias for the
standard C locale.
""
-
The empty name says to select a locale based on environment variables.
See section Categories of Activities that Locales Affect.
Defining and installing named locales is normally a responsibility of
the system administrator at your site (or the person who installed the
GNU C library). It is also possible for the user to create private
locales. All this will be discussed later when describing the tool to
do so XXX.
If your program needs to use something other than the `C' locale,
it will be more portable if you use whatever locale the user specifies
with the environment, rather than trying to specify some non-standard
locale explicitly by name. Remember, different machines might have
different sets of locales installed.
There are several ways to access the locale information. The simplest
way is to let the C library itself do the work. Several of the
functions in this library access implicitly the locale data and use
what information is available in the currently selected locale. This is
how the locale model is meant to work normally.
As an example take the strftime function which is meant to nicely
format date and time information (see section Formatting Date and Time).
Part of the standard information contained in the LC_TIME
category are, e.g., the names of the months. Instead of requiring the
programmer to take care of providing the translations the
strftime function does this all by itself. When using %A
in the format string this will be replaced by the appropriate weekday
name of the locale currently selected for LC_TIME. This is the
easy part and wherever possible functions do things automatically as in
this case.
But there are quite often situations when there is simply no functions
to perform the task or it is simply not possible to do the work
automatically. For these cases it is necessary to access the
information in the locale directly. To do this the C library provides
two functions: localeconv and nl_langinfo. The former is
part of ISO C and therefore portable, but has a brain-damaged
interface. The second is part of the Unix interface and is portable in
as far as the system follows the Unix standards.
Together with the setlocale function the ISO C people
invented localeconv function. It is a masterpiece of misdesign.
It is expensive to use, it is not extendable, and is not generally
usable as it provides access only to the LC_MONETARY and
LC_NUMERIC related information. If it is applicable for a
certain situation it should nevertheless be used since it is very
portable. In general it is better to use the function strfmon
which can be used to format monetary amounts correctly according to the
selected locale by implicitly using this information.
- Function: struct lconv * localeconv (void)
-
The
localeconv function returns a pointer to a structure whose
components contain information about how numeric and monetary values
should be formatted in the current locale.
You should not modify the structure or its contents. The structure might
be overwritten by subsequent calls to localeconv, or by calls to
setlocale, but no other function in the library overwrites this
value.
- Data Type: struct lconv
-
This is the data type of the value returned by
localeconv. Its
elements are described in the following subsections.
If a member of the structure struct lconv has type char,
and the value is CHAR_MAX, it means that the current locale has
no value for that parameter.
These are the standard members of struct lconv; there may be
others.
char *decimal_point
-
char *mon_decimal_point
-
These are the decimal-point separators used in formatting non-monetary
and monetary quantities, respectively. In the `C' locale, the
value of
decimal_point is ".", and the value of
mon_decimal_point is "".
char *thousands_sep
-
char *mon_thousands_sep
-
These are the separators used to delimit groups of digits to the left of
the decimal point in formatting non-monetary and monetary quantities,
respectively. In the `C' locale, both members have a value of
"" (the empty string).
char *grouping
-
char *mon_grouping
-
These are strings that specify how to group the digits to the left of
the decimal point.
grouping applies to non-monetary quantities
and mon_grouping applies to monetary quantities. Use either
thousands_sep or mon_thousands_sep to separate the digit
groups.
Each string is made up of decimal numbers separated by semicolons.
Successive numbers (from left to right) give the sizes of successive
groups (from right to left, starting at the decimal point). The last
number in the string is used over and over for all the remaining groups.
If the last integer is -1, it means that there is no more
grouping--or, put another way, any remaining digits form one large
group without separators.
For example, if grouping is "4;3;2", the correct grouping
for the number 123456787654321 is `12', `34',
`56', `78', `765', `4321'. This uses a group of 4
digits at the end, preceded by a group of 3 digits, preceded by groups
of 2 digits (as many as needed). With a separator of `,', the
number would be printed as `12,34,56,78,765,4321'.
A value of "3" indicates repeated groups of three digits, as
normally used in the U.S.
In the standard `C' locale, both grouping and
mon_grouping have a value of "". This value specifies no
grouping at all.
char int_frac_digits
-
char frac_digits
-
These are small integers indicating how many fractional digits (to the
right of the decimal point) should be displayed in a monetary value in
international and local formats, respectively. (Most often, both
members have the same value.)
In the standard `C' locale, both of these members have the value
CHAR_MAX, meaning "unspecified". The ISO standard doesn't say
what to do when you find this the value; we recommend printing no
fractional digits. (This locale also specifies the empty string for
mon_decimal_point, so printing any fractional digits would be
confusing!)
These members of the struct lconv structure specify how to print
the symbol to identify a monetary value--the international analog of
`$' for US dollars.
Each country has two standard currency symbols. The local currency
symbol is used commonly within the country, while the
international currency symbol is used internationally to refer to
that country's currency when it is necessary to indicate the country
unambiguously.
For example, many countries use the dollar as their monetary unit, and
when dealing with international currencies it's important to specify
that one is dealing with (say) Canadian dollars instead of U.S. dollars
or Australian dollars. But when the context is known to be Canada,
there is no need to make this explicit--dollar amounts are implicitly
assumed to be in Canadian dollars.
char *currency_symbol
-
The local currency symbol for the selected locale.
In the standard `C' locale, this member has a value of
""
(the empty string), meaning "unspecified". The ISO standard doesn't
say what to do when you find this value; we recommend you simply print
the empty string as you would print any other string found in the
appropriate member.
char *int_curr_symbol
-
The international currency symbol for the selected locale.
The value of
int_curr_symbol should normally consist of a
three-letter abbreviation determined by the international standard
ISO 4217 Codes for the Representation of Currency and Funds,
followed by a one-character separator (often a space).
In the standard `C' locale, this member has a value of ""
(the empty string), meaning "unspecified". We recommend you simply
print the empty string as you would print any other string found in the
appropriate member.
char p_cs_precedes
-
char n_cs_precedes
-
These members are
1 if the currency_symbol string should
precede the value of a monetary amount, or 0 if the string should
follow the value. The p_cs_precedes member applies to positive
amounts (or zero), and the n_cs_precedes member applies to
negative amounts.
In the standard `C' locale, both of these members have a value of
CHAR_MAX, meaning "unspecified". The ISO standard doesn't say
what to do when you find this value, but we recommend printing the
currency symbol before the amount. That's right for most countries.
In other words, treat all nonzero values alike in these members.
The POSIX standard says that these two members apply to the
int_curr_symbol as well as the currency_symbol. The ISO
C standard seems to imply that they should apply only to the
currency_symbol---so the int_curr_symbol should always
precede the amount.
We can only guess which of these (if either) matches the usual
conventions for printing international currency symbols. Our guess is
that they should always precede the amount. If we find out a reliable
answer, we will put it here.
char p_sep_by_space
-
char n_sep_by_space
-
These members are
1 if a space should appear between the
currency_symbol string and the amount, or 0 if no space
should appear. The p_sep_by_space member applies to positive
amounts (or zero), and the n_sep_by_space member applies to
negative amounts.
In the standard `C' locale, both of these members have a value of
CHAR_MAX, meaning "unspecified". The ISO standard doesn't say
what you should do when you find this value; we suggest you treat it as
one (print a space). In other words, treat all nonzero values alike in
these members.
These members apply only to currency_symbol. When you use
int_curr_symbol, you never print an additional space, because
int_curr_symbol itself contains the appropriate separator.
The POSIX standard says that these two members apply to the
int_curr_symbol as well as the currency_symbol. But an
example in the ISO C standard clearly implies that they should apply
only to the currency_symbol---that the int_curr_symbol
contains any appropriate separator, so you should never print an
additional space.
Based on what we know now, we recommend you ignore these members when
printing international currency symbols, and print no extra space.
These members of the struct lconv structure specify how to print
the sign (if any) in a monetary value.
char *positive_sign
-
char *negative_sign
-
These are strings used to indicate positive (or zero) and negative
(respectively) monetary quantities.
In the standard `C' locale, both of these members have a value of
"" (the empty string), meaning "unspecified".
The ISO standard doesn't say what to do when you find this value; we
recommend printing positive_sign as you find it, even if it is
empty. For a negative value, print negative_sign as you find it
unless both it and positive_sign are empty, in which case print
`-' instead. (Failing to indicate the sign at all seems rather
unreasonable.)
char p_sign_posn
-
char n_sign_posn
-
These members have values that are small integers indicating how to
position the sign for nonnegative and negative monetary quantities,
respectively. (The string used by the sign is what was specified with
positive_sign or negative_sign.) The possible values are
as follows:
0
-
The currency symbol and quantity should be surrounded by parentheses.
1
-
Print the sign string before the quantity and currency symbol.
2
-
Print the sign string after the quantity and currency symbol.
3
-
Print the sign string right before the currency symbol.
4
-
Print the sign string right after the currency symbol.
CHAR_MAX
-
"Unspecified". Both members have this value in the standard
`C' locale.
The ISO standard doesn't say what you should do when the value is
CHAR_MAX. We recommend you print the sign after the currency
symbol.
It is not clear whether you should let these members apply to the
international currency format or not. POSIX says you should, but
intuition plus the examples in the ISO C standard suggest you should
not. We hope that someone who knows well the conventions for formatting
monetary quantities will tell us what we should recommend.
When writing the X/Open Portability Guide the authors realized that the
localeconv function is not enough to provide reasonable access to
the locale information. The information which was meant to be available
in the locale (as later specified in the POSIX.1 standard) requires more
possibilities to access it. Therefore the nl_langinfo function
was introduced.
- Function: char * nl_langinfo (nl_item item)
-
The
nl_langinfo function can be used to access individual
elements of the locale categories. I.e., unlike the localeconv
function which always returns all the information nl_langinfo
lets the caller select what information is necessary. This is very
fast and it is no problem to call this function multiple times.
The second advantage is that not only the numeric and monetary
formatting information is available. Also the information of the
LC_TIME and LC_MESSAGES categories is available.
The type nl_type is defined in `nl_types.h'.
The argument item is a numeric values which must be one of the
values defined in the header `langinfo.h'. The X/Open standard
defines the following values:
ABDAY_1
-
ABDAY_2
-
ABDAY_3
-
ABDAY_4
-
ABDAY_5
-
ABDAY_6
-
ABDAY_7
-
nl_langinfo returns the abbreviated weekday name. ABDAY_1
corresponds to Sunday.
DAY_1
-
DAY_2
-
DAY_3
-
DAY_4
-
DAY_5
-
DAY_6
-
DAY_7
-
Similar to
ABDAY_1 etc, but here the return value is the
unabbreviated weekday name.
ABMON_1
-
ABMON_2
-
ABMON_3
-
ABMON_4
-
ABMON_5
-
ABMON_6
-
ABMON_7
-
ABMON_8
-
ABMON_9
-
ABMON_10
-
ABMON_11
-
ABMON_12
-
The return value is abbreviated name for the month names.
ABMON_1
corresponds to January.
MON_1
-
MON_2
-
MON_3
-
MON_4
-
MON_5
-
MON_6
-
MON_7
-
MON_8
-
MON_9
-
MON_10
-
MON_11
-
MON_12
-
Similar to
ABMON_1 etc but here the month names are not abbreviated.
Here the first value MON_1 also corresponds to January.
AM_STR
-
PM_STR
-
The return values are strings which can be used in the time representation
which uses to American 1 to 12 hours plus am/pm representation.
Please note that in locales which do not know this time representation
these strings actually might be empty and therefore the am/pm format
cannot be used at all.
D_T_FMT
-
The return value can be used as a format string for
strftime to
represent time and date in a locale specific way.
D_FMT
-
The return value can be used as a format string for
strftime to
represent a date in a locale specific way.
T_FMT
-
The return value can be used as a format string for
strftime to
represent time in a locale specific way.
T_FMT_AMPM
-
The return value can be used as a format string for
strftime to
represent time using the American-style am/pm format.
Please note that if the am/pm format does not make any sense for the
selected locale the returned value might be the same as the one for
T_FMT.
ERA
-
The return value is value representing the eras of time used in the
current locale.
Most locales do not define this value. An example for a locale which
does define this value is the Japanese. Here the traditional data
representation is based on the eras measured by the reigns of the
emperors.
Normally it should not be necessary to use this value directly. Using
the
E modifier for its formats the strftime functions can
be made to use this information. The format of the returned string
is not specified and therefore one should not generalize the knowledge
about the representation on one system.
ERA_YEAR
-
The return value describes the name years for the eras of this locale.
As for
ERA it should not be necessary to use this value directly.
ERA_D_T_FMT
-
This return value can be used as a format string for
strftime to
represent time and date using the era representation in a locale
specific way.
ERA_D_FMT
-
This return value can be used as a format string for
strftime to
represent a date using the era representation in a locale specific way.
ERA_T_FMT
-
This return value can be used as a format string for
strftime to
represent time using the era representation in a locale specific way.
ALT_DIGITS
-
The return value is a representation of up to 100 values used to
represent the values 0 to 99. As for
ERA this
value is not intended to be used directly, but instead indirectly
through the strftime function. When the modifier O is
used for format which would use numerals to represent hours, minutes,
seconds, weekdays, months, or weeks the appropriate value for this
locale values is used instead of the number.
INT_CURR_SYMBOL
-
This value is the same as returned by
localeconv in the
int_curr_symbol element of the struct lconv.
CURRENCY_SYMBOL
-
CRNCYSTR
-
This value is the same as returned by
localeconv in the
currency_symbol element of the struct lconv.
CRNCYSTR is a deprecated alias, still required by Unix98.
MON_DECIMAL_POINT
-
This value is the same as returned by
localeconv in the
mon_decimal_point element of the struct lconv.
MON_THOUSANDS_SEP
-
This value is the same as returned by
localeconv in the
mon_thousands_sep element of the struct lconv.
MON_GROUPING
-
This value is the same as returned by
localeconv in the
mon_grouping element of the struct lconv.
POSITIVE_SIGN
-
This value is the same as returned by
localeconv in the
positive_sign element of the struct lconv.
NEGATIVE_SIGN
-
This value is the same as returned by
localeconv in the
negative_sign element of the struct lconv.
INT_FRAC_DIGITS
-
This value is the same as returned by
localeconv in the
int_frac_digits element of the struct lconv.
FRAC_DIGITS
-
This value is the same as returned by
localeconv in the
frac_digits element of the struct lconv.
P_CS_PRECEDES
-
This value is the same as returned by
localeconv in the
p_cs_precedes element of the struct lconv.
P_SEP_BY_SPACE
-
This value is the same as returned by
localeconv in the
p_sep_by_space element of the struct lconv.
N_CS_PRECEDES
-
This value is the same as returned by
localeconv in the
n_cs_precedes element of the struct lconv.
N_SEP_BY_SPACE
-
This value is the same as returned by
localeconv in the
n_sep_by_space element of the struct lconv.
P_SIGN_POSN
-
This value is the same as returned by
localeconv in the
p_sign_posn element of the struct lconv.
N_SIGN_POSN
-
This value is the same as returned by
localeconv in the
n_sign_posn element of the struct lconv.
DECIMAL_POINT
-
RADIXCHAR
-
This value is the same as returned by
localeconv in the
decimal_point element of the struct lconv.
The name RADIXCHAR is a deprecated alias still used in Unix98.
THOUSANDS_SEP
-
THOUSEP
-
This value is the same as returned by
localeconv in the
thousands_sep element of the struct lconv.
The name THOUSEP is a deprecated alias still used in Unix98.
GROUPING
-
This value is the same as returned by
localeconv in the
grouping element of the struct lconv.
YESEXPR
-
The return value is a regular expression which can be used with the
regex function to recognize a positive response to a yes/no
question.
NOEXPR
-
The return value is a regular expression which can be used with the
regex function to recognize a negative response to a yes/no
question.
YESSTR
-
The return value is a locale specific translation of the positive response
to a yes/no question.
Using this value is deprecated since it is a very special case of
message translation and this better can be handled using the message
translation functions (see section Message Translation).
NOSTR
-
The return value is a locale specific translation of the negative response
to a yes/no question. What is said for
YESSTR is also true here.
The file `langinfo.h' defines a lot more symbols but none of them
is official. Using them is completely unportable and the format of the
return values might change. Therefore it is highly requested to not use
them in any situation.
Please note that the return value for any valid argument can be used for
in all situations (with the possible exception of the am/pm time format
related values). If the user has not selected any locale for the
appropriate category nl_langinfo returns the information from the
"C" locale. It is therefore possible to use this function as
shown in the example below.
If the argument item is not valid the global variable errno
is set to EINVAL and a NULL pointer is returned.
An example for the use of nl_langinfo is a function which has to
print a given date and time in the locale specific way. At first one
might think the since strftime internally uses the locale
information writing something like the following is enough:
size_t
i18n_time_n_data (char *s, size_t len, const struct tm *tp)
{
return strftime (s, len, "%X %D", tp);
}
The format contains no weekday or month names and therefore is
internationally usable. Wrong! The output produced is something like
"hh:mm:ss MM/DD/YY". This format is only recognizable in the
USA. Other countries use different formats. Therefore the function
should be rewritten like this:
size_t
i18n_time_n_data (char *s, size_t len, const struct tm *tp)
{
return strftime (s, len, nl_langinfo (D_T_FMT), tp);
}
Now the date and time format which is explicitly selected for the locale
in place when the program runs is used. If the user selects the locale
correctly there should never be a misunderstanding over the time and
date format.
We have seen that the structure returned by localeconv as well as
the values given to nl_langinfo allow to retrieve the various
pieces of locale specific information to format numbers and monetary
amounts. But we have also seen that the rules underlying this
information are quite complex.
Therefore the X/Open standards introduce a function which uses this
information from the locale and so makes it is for the user to format
numbers according to these rules.
- Function: ssize_t strfmon (char *s, size_t maxsize, const char *format, ...)
-
The
strfmon function is similar to the strftime function
in that it takes a description of a buffer (with size), a format string
and values to write into a buffer a textual representation of the values
according to the format string. As for strftime the function
also returns the number of bytes written into the buffer.
There are two difference: strfmon can take more than one argument
and of course the format specification is different. The format string
consists as for strftime of normal text which is simply printed
and format specifiers, which here are also introduced using `%'.
Following the `%' the function allows similar to printf a
sequence of flags and other specifications before the format character:
-
Immediately following the `%' there can be one or more of the
following flags:
- `=f'
-
The single byte character f is used for this field as the numeric
fill character. By default this character is a space character.
Filling with this character is only performed if a left precision
is specified. It is not just to fill to the given field width.
- `^'
-
The number is printed without grouping the digits using the rules of the
current locale. By default grouping is enabled.
- `+', `('
-
At most one of these flags must be used. They select which format to
represent the sign of currency amount is used. By default and if
`+' is used the locale equivalent to +/- is used. If
`(' is used negative amounts are enclosed in parentheses. The
exact format is determined by the values of the
LC_MONETARY
category of the locale selected at program runtime.
- `!'
-
The output will not contain the currency symbol.
- `-'
-
The output will be formatted right-justified instead left-justified if
the output does not fill the entire field width.
The next part of a specification is an, again optional, specification of
the field width. The width is given by digits following the flags. If
no width is specified it is assumed to be 0. The width value is
used after it is determined how much space the printed result needs. If
it does not require fewer characters than specified by the width value
nothing happens. Otherwise the output is extended to use as many
characters as the width says by filling with spaces. At which side
depends on whether the `-' flag was given or not. If it was given,
the spaces are added at the right, making the output right-justified and
vice versa.
So far the format looks familiar as it is similar to printf or
strftime formats. But the next two fields introduce something
new. The first one, if available, is introduced by a `#' character
which is followed by a decimal digit string. The value of the digit
string specifies the width the formatted digits left to the radix
character. This does not include the grouping character needed
if the `^' flag is not given. If the space needed to print the
number does not fill the whole width the field is padded at the left
side with the fill character which can be selected using the `='
flag and which by default is a space. For example, if the field width
is selected as 6 and the number is 123, the fill character is
`*' the result will be `***123'.
The next field is introduced by a `.' (period) and consists of
another decimal digit string. Its value describes the number of
characters printed after the radix character. The default is
selected from the current locale (frac_digits,
int_frac_digits, see see section Generic Numeric Formatting Parameters). If the exact
representation needs more digits than those specified by the field width
the displayed value is rounded. In case the number of fractional digits
is selected to be zero, no radix character is printed.
As a GNU extension the strfmon implementation in the GNU libc
allows as the next field an optional `L' as a format modifier. If
this modifier is given the argument is expected to be a long
double instead of a double value.
Finally as the last component of the format there must come a format
specifying. There are three specifiers defined:
- `i'
-
The argument is formatted according to the locale's rules to format an
international currency value.
- `n'
-
The argument is formatted according to the locale's rules to format an
national currency value.
- `%'
-
Creates a `%' in the output. There must be no flag, width
specifier or modifier given, only `%%' is allowed.
As it is done for printf, the function reads the format string
from left to right and uses the values passed to the function following
the format string. The values are expected to be either of type
double or long double, depending on the presence of the
modifier `L'. The result is stored in the buffer pointed to by
s. At most maxsize characters are stored.
The return value of the function is the number of characters stored in
s, including the terminating NUL byte. If the number of
characters stored would exceed maxsize the function returns
-1 and the content of the buffer s is unspecified. In this
case errno is set to E2BIG.
A few examples should make it clear how to use this function. It is
assumed that all the following pieces of code are executed in a program
which uses the locale valid for the USA (en_US). The simplest
form of the format is this:
strfmon (buf, 100, "@%n@%n@%n@", 123.45, -567.89, 12345.678);
The output produced is
"@$123.45@-$567.89@$12,345.68@"
We can notice several things here. First, the width for all formats is
different. We have not specified a width in the format string and so
this is no wonder. Second, the third number is printed using thousands
separators. The thousands separator for the en_US locale is a
comma. Beside this the number is rounded. The .678 are rounded
to .68 since the format does not specify a precision and the
default value in the locale is 2. A last thing is that the
national currency symbol is printed since `%n' was used, not
`i'. The next example shows how we can align the output.
strfmon (buf, 100, "@%=*11n@%=*11n@%=*11n@", 123.45, -567.89, 12345.678);
The output this time is:
"@ $123.45@ -$567.89@ $12,345.68@"
Two things stand out. First, all fields have the same width (eleven
characters) since this is the width given in the format and since no
number required more characters to be printed. The second important
point is that the fill character is not used. This is correct since the
white space was not used to fill the space specified by the right
precision, but instead it is used to fill to the given width. The
difference becomes obvious if we now add a right width specification.
strfmon (buf, 100, "@%=*11#5n@%=*11#5n@%=*11#5n@",
123.45, -567.89, 12345.678);
The output is
"@ $***123.45@-$***567.89@ $12,456.68@"
Here we can see that all the currency symbols are now aligned and the
space between the currency sign and the number is filled with the
selected fill character. Please note that although the right precision
is selected to be 5 and 123.45 has three characters right
of the radix character, the space is filled with three asterisks. This
is correct since as explained above, the right precision does not count
the characters used for the thousands separators in. One last example
should explain the remaining functionality.
strfmon (buf, 100, "@%=0(16#5.3i@%=0(16#5.3i@%=0(16#5.3i@",
123.45, -567.89, 12345.678);
This rather complex format string produces the following output:
"@ USD 000123,450 @(USD 000567.890)@ USD 12,345.678 @"
The most noticeable change is the use of the alternative style to
represent negative numbers. In financial circles it is often done using
parentheses and this is what the `(' flag selected. The fill character
is now `0'. Please note that this `0' character is not
regarded as a numeric zero and therefore the first and second number are
not printed using a thousands separator. Since we use in the format the
specifier `i' instead of `n' now the international form of the
currency symbol is used. This is a four letter string, in this case
"USD ". The last point is that since the left precision is
selected to be three the first and second number are printed with an
extra zero at the end and the third number is printed unrounded.
Go to the first, previous, next, last section, table of contents.
|