.
The GNU C Library provides pattern matching facilities for two kinds of
patterns: regular expressions and file-name wildcards. The library also
provides a facility for expanding variable and command references and
parsing text into words in the way the shell does.
This section describes how to match a wildcard pattern against a
particular string. The result is a yes or no answer: does the
string fit the pattern or not. The symbols described here are all
declared in `fnmatch.h'.
- Function: int fnmatch (const char *pattern, const char *string, int flags)
-
This function tests whether the string string matches the pattern
pattern. It returns
0 if they do match; otherwise, it
returns the nonzero value FNM_NOMATCH. The arguments
pattern and string are both strings.
The argument flags is a combination of flag bits that alter the
details of matching. See below for a list of the defined flags.
In the GNU C Library, fnmatch cannot experience an "error"---it
always returns an answer for whether the match succeeds. However, other
implementations of fnmatch might sometimes report "errors".
They would do so by returning nonzero values that are not equal to
FNM_NOMATCH.
These are the available flags for the flags argument:
FNM_FILE_NAME
-
Treat the `/' character specially, for matching file names. If
this flag is set, wildcard constructs in pattern cannot match
`/' in string. Thus, the only way to match `/' is with
an explicit `/' in pattern.
FNM_PATHNAME
-
This is an alias for
FNM_FILE_NAME; it comes from POSIX.2. We
don't recommend this name because we don't use the term "pathname" for
file names.
FNM_PERIOD
-
Treat the `.' character specially if it appears at the beginning of
string. If this flag is set, wildcard constructs in pattern
cannot match `.' as the first character of string.
If you set both
FNM_PERIOD and FNM_FILE_NAME, then the
special treatment applies to `.' following `/' as well as to
`.' at the beginning of string. (The shell uses the
FNM_PERIOD and FNM_FILE_NAME flags together for matching
file names.)
FNM_NOESCAPE
-
Don't treat the `\' character specially in patterns. Normally,
`\' quotes the following character, turning off its special meaning
(if any) so that it matches only itself. When quoting is enabled, the
pattern `\?' matches only the string `?', because the question
mark in the pattern acts like an ordinary character.
If you use
FNM_NOESCAPE, then `\' is an ordinary character.
FNM_LEADING_DIR
-
Ignore a trailing sequence of characters starting with a `/' in
string; that is to say, test whether string starts with a
directory name that pattern matches.
If this flag is set, either `foo*' or `foobar' as a pattern
would match the string `foobar/frobozz'.
FNM_CASEFOLD
-
Ignore case in comparing string to pattern.
The archetypal use of wildcards is for matching against the files in a
directory, and making a list of all the matches. This is called
globbing.
You could do this using fnmatch, by reading the directory entries
one by one and testing each one with fnmatch. But that would be
slow (and complex, since you would have to handle subdirectories by
hand).
The library provides a function glob to make this particular use
of wildcards convenient. glob and the other symbols in this
section are declared in `glob.h'.
The result of globbing is a vector of file names (strings). To return
this vector, glob uses a special data type, glob_t, which
is a structure. You pass glob the address of the structure, and
it fills in the structure's fields to tell you about the results.
- Data Type: glob_t
-
This data type holds a pointer to a word vector. More precisely, it
records both the address of the word vector and its size. The GNU
implementation contains some more fields which are non-standard
extensions.
gl_pathc
-
The number of elements in the vector.
gl_pathv
-
The address of the vector. This field has type
char **.
gl_offs
-
The offset of the first real element of the vector, from its nominal
address in the
gl_pathv field. Unlike the other fields, this
is always an input to glob, rather than an output from it.
If you use a nonzero offset, then that many elements at the beginning of
the vector are left empty. (The glob function fills them with
null pointers.)
The gl_offs field is meaningful only if you use the
GLOB_DOOFFS flag. Otherwise, the offset is always zero
regardless of what is in this field, and the first real element comes at
the beginning of the vector.
gl_closedir
-
The address of an alternative implementation of the
closedir
function. It is used if the GLOB_ALTDIRFUNC bit is set in
the flag parameter. The type of this field is
void (*) (void *).
This is a GNU extension.
gl_readdir
-
The address of an alternative implementation of the
readdir
function used to read the contents of a directory. It is used if the
GLOB_ALTDIRFUNC bit is set in the flag parameter. The type of
this field is struct dirent *(*) (void *).
This is a GNU extension.
gl_opendir
-
The address of an alternative implementation of the
opendir
function. It is used if the GLOB_ALTDIRFUNC bit is set in
the flag parameter. The type of this field is
void *(*) (const char *).
This is a GNU extension.
gl_stat
-
The address of an alternative implementation of the
stat function
to get information about an object in the filesystem. It is used if the
GLOB_ALTDIRFUNC bit is set in the flag parameter. The type of
this field is int (*) (const char *, struct stat *).
This is a GNU extension.
gl_lstat
-
The address of an alternative implementation of the
lstat
function to get information about an object in the filesystems, not
following symbolic links. It is used if the GLOB_ALTDIRFUNC bit
is set in the flag parameter. The type of this field is int
(*) (const char *, struct stat *).
This is a GNU extension.
- Function: int glob (const char *pattern, int flags, int (*errfunc) (const char *filename, int error-code), glob_t *vector-ptr)
-
The function
glob does globbing using the pattern pattern
in the current directory. It puts the result in a newly allocated
vector, and stores the size and address of this vector into
*vector-ptr. The argument flags is a combination of
bit flags; see section Flags for Globbing, for details of the flags.
The result of globbing is a sequence of file names. The function
glob allocates a string for each resulting word, then
allocates a vector of type char ** to store the addresses of
these strings. The last element of the vector is a null pointer.
This vector is called the word vector.
To return this vector, glob stores both its address and its
length (number of elements, not counting the terminating null pointer)
into *vector-ptr.
Normally, glob sorts the file names alphabetically before
returning them. You can turn this off with the flag GLOB_NOSORT
if you want to get the information as fast as possible. Usually it's
a good idea to let glob sort them--if you process the files in
alphabetical order, the users will have a feel for the rate of progress
that your application is making.
If glob succeeds, it returns 0. Otherwise, it returns one
of these error codes:
GLOB_ABORTED
-
There was an error opening a directory, and you used the flag
GLOB_ERR or your specified errfunc returned a nonzero
value.
See below
for an explanation of the GLOB_ERR flag and errfunc.
GLOB_NOMATCH
-
The pattern didn't match any existing files. If you use the
GLOB_NOCHECK flag, then you never get this error code, because
that flag tells glob to pretend that the pattern matched
at least one file.
GLOB_NOSPACE
-
It was impossible to allocate memory to hold the result.
In the event of an error, glob stores information in
*vector-ptr about all the matches it has found so far.
This section describes the flags that you can specify in the
flags argument to glob. Choose the flags you want,
and combine them with the C bitwise OR operator |.
GLOB_APPEND
-
Append the words from this expansion to the vector of words produced by
previous calls to
glob. This way you can effectively expand
several words as if they were concatenated with spaces between them.
In order for appending to work, you must not modify the contents of the
word vector structure between calls to glob. And, if you set
GLOB_DOOFFS in the first call to glob, you must also
set it when you append to the results.
Note that the pointer stored in gl_pathv may no longer be valid
after you call glob the second time, because glob might
have relocated the vector. So always fetch gl_pathv from the
glob_t structure after each glob call; never save
the pointer across calls.
GLOB_DOOFFS
-
Leave blank slots at the beginning of the vector of words.
The
gl_offs field says how many slots to leave.
The blank slots contain null pointers.
GLOB_ERR
-
Give up right away and report an error if there is any difficulty
reading the directories that must be read in order to expand pattern
fully. Such difficulties might include a directory in which you don't
have the requisite access. Normally,
glob tries its best to keep
on going despite any errors, reading whatever directories it can.
You can exercise even more control than this by specifying an
error-handler function errfunc when you call glob. If
errfunc is not a null pointer, then glob doesn't give up
right away when it can't read a directory; instead, it calls
errfunc with two arguments, like this:
(*errfunc) (filename, error-code)
The argument filename is the name of the directory that
glob couldn't open or couldn't read, and error-code is the
errno value that was reported to glob.
If the error handler function returns nonzero, then glob gives up
right away. Otherwise, it continues.
GLOB_MARK
-
If the pattern matches the name of a directory, append `/' to the
directory's name when returning it.
GLOB_NOCHECK
-
If the pattern doesn't match any file names, return the pattern itself
as if it were a file name that had been matched. (Normally, when the
pattern doesn't match anything,
glob returns that there were no
matches.)
GLOB_NOSORT
-
Don't sort the file names; return them in no particular order.
(In practice, the order will depend on the order of the entries in
the directory.) The only reason not to sort is to save time.
GLOB_NOESCAPE
-
Don't treat the `\' character specially in patterns. Normally,
`\' quotes the following character, turning off its special meaning
(if any) so that it matches only itself. When quoting is enabled, the
pattern `\?' matches only the string `?', because the question
mark in the pattern acts like an ordinary character.
If you use
GLOB_NOESCAPE, then `\' is an ordinary character.
glob does its work by calling the function fnmatch
repeatedly. It handles the flag GLOB_NOESCAPE by turning on the
FNM_NOESCAPE flag in calls to fnmatch.
Beside the flags described in the last section, the GNU implementation of
glob allows a few more flags which are also defined in the
`glob.h' file. Some of the extensions implement functionality
which is available in modern shell implementations.
GLOB_PERIOD
-
The
. character (period) is treated special. It cannot be
matched by wildcards. See section Wildcard Matching, FNM_PERIOD.
GLOB_MAGCHAR
-
The
GLOB_MAGCHAR value is not to be given to glob in the
flags parameter. Instead, glob sets this bit in the
gl_flags element of the glob_t structure provided as the
result if the pattern used for matching contains any wildcard character.
GLOB_ALTDIRFUNC
-
Instead of the using the using the normal functions for accessing the
filesystem the
glob implementation uses the user-supplied
functions specified in the structure pointed to by pglob
parameter. For more information about the functions refer to the
sections about directory handling see section Accessing Directories, and
section Reading the Attributes of a File.
GLOB_BRACE
-
If this flag is given the handling of braces in the pattern is changed.
It is now required that braces appear correctly grouped. I.e., for each
opening brace there must be a closing one. Braces can be used
recursively. So it is possible to define one brace expression in
another one. It is important to note that the range of each brace
expression is completely contained in the outer brace expression (if
there is one).
The string between the matching braces is separated into single
expressions by splitting at
, (comma) characters. The commas
themself are discarded. Please note what we said above about recursive
brace expressions. The commas used to separate the subexpressions must
be at the same level. Commas in brace subexpressions are not matched.
They are used during expansion of the brace expression of the deeper
level. The example below shows this
glob ("{foo/{,bar,biz},baz}", GLOB_BRACE, NULL, &result)
is equivalent to the sequence
glob ("foo/", GLOB_BRACE, NULL, &result)
glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result)
glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result)
glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result)
if we leave aside error handling.
GLOB_NOMAGIC
-
If the pattern contains no wildcard constructs (it is a literal file name),
return it as the sole "matching" word, even if no file exists by that name.
GLOB_TILDE
-
If this flag is used the character
~ (tilde) is handled special
if it appears at the beginning of the pattern. Instead of being taken
verbatim it is used to represent the home directory of a known user.
If ~ is the only character in pattern or it is followed by a
/ (slash), the home directory of the process owner is
substituted. Using getlogin and getpwnam the information
is read from the system databases. As an example take user bart
with his home directory at `/home/bart'. For him a call like
glob ("~/bin/*", GLOB_TILDE, NULL, &result)
would return the contents of the directory `/home/bart/bin'.
Instead of referring to the own home directory it is also possible to
name the home directory of other users. To do so one has to append the
user name after the tilde character. So the contents of user
homer's `bin' directory can be retrieved by
glob ("~homer/bin/*", GLOB_TILDE, NULL, &result)
If the user name is not valid or the home directory cannot be determined
for some reason the pattern is left untouched and itself used as the
result. I.e., if in the last example home is not available the
tilde expansion yields to "~homer/bin/*" and glob is not
looking for a directory named ~homer.
This functionality is equivalent to what is available in C-shells if the
nonomatch flag is set.
GLOB_TILDE_CHECK
-
If this flag is used
glob behaves like as if GLOB_TILDE is
given. The only difference is that if the user name is not available or
the home directory cannot be determined for other reasons this leads to
an error. glob will return GLOB_NOMATCH instead of using
the pattern itself as the name.
This functionality is equivalent to what is available in C-shells if
nonomatch flag is not set.
GLOB_ONLYDIR
-
If this flag is used the globbing function takes this as a
hint that the caller is only interested in directories
matching the pattern. If the information about the type of the file
is easily available non-directories will be rejected but no extra
work will be done to determine the information for each file. I.e.,
the caller must still be able to filter directories out.
This functionality is only available with the GNU
glob
implementation. It is mainly used internally to increase the
performance but might be useful for a user as well and therefore is
documented here.
Calling glob will in most cases allocate resources which are used
to represent the result of the function call. If the same object of
type glob_t is used in multiple call to glob the resources
are freed or reused so that no leaks appear. But this does not include
the time when all glob calls are done.
- Function: void globfree (glob_t *pglob)
-
The
globfree function frees all resources allocated by previous
calls to glob associated with the object pointed to by
pglob. This function should be called whenever the currently used
glob_t typed object isn't used anymore.
The GNU C library supports two interfaces for matching regular
expressions. One is the standard POSIX.2 interface, and the other is
what the GNU system has had for many years.
Both interfaces are declared in the header file `regex.h'.
If you define _POSIX_C_SOURCE, then only the POSIX.2
functions, structures, and constants are declared.
Before you can actually match a regular expression, you must
compile it. This is not true compilation--it produces a special
data structure, not machine instructions. But it is like ordinary
compilation in that its purpose is to enable you to "execute" the
pattern fast. (See section Matching a Compiled POSIX Regular Expression, for how to use the
compiled regular expression for matching.)
There is a special data type for compiled regular expressions:
- Data Type: regex_t
-
This type of object holds a compiled regular expression.
It is actually a structure. It has just one field that your programs
should look at:
re_nsub
-
This field holds the number of parenthetical subexpressions in the
regular expression that was compiled.
There are several other fields, but we don't describe them here, because
only the functions in the library should use them.
After you create a regex_t object, you can compile a regular
expression into it by calling regcomp.
- Function: int regcomp (regex_t *compiled, const char *pattern, int cflags)
-
The function
regcomp "compiles" a regular expression into a
data structure that you can use with regexec to match against a
string. The compiled regular expression format is designed for
efficient matching. regcomp stores it into *compiled.
It's up to you to allocate an object of type regex_t and pass its
address to regcomp.
The argument cflags lets you specify various options that control
the syntax and semantics of regular expressions. See section Flags for POSIX Regular Expressions.
If you use the flag REG_NOSUB, then regcomp omits from
the compiled regular expression the information necessary to record
how subexpressions actually match. In this case, you might as well
pass 0 for the matchptr and nmatch arguments when
you call regexec.
If you don't use REG_NOSUB, then the compiled regular expression
does have the capacity to record how subexpressions match. Also,
regcomp tells you how many subexpressions pattern has, by
storing the number in compiled->re_nsub. You can use that
value to decide how long an array to allocate to hold information about
subexpression matches.
regcomp returns 0 if it succeeds in compiling the regular
expression; otherwise, it returns a nonzero error code (see the table
below). You can use regerror to produce an error message string
describing the reason for a nonzero value; see section POSIX Regexp Matching Cleanup.
Here are the possible nonzero values that regcomp can return:
REG_BADBR
-
There was an invalid `\{...\}' construct in the regular
expression. A valid `\{...\}' construct must contain either
a single number, or two numbers in increasing order separated by a
comma.
REG_BADPAT
-
There was a syntax error in the regular expression.
REG_BADRPT
-
A repetition operator such as `?' or `*' appeared in a bad
position (with no preceding subexpression to act on).
REG_ECOLLATE
-
The regular expression referred to an invalid collating element (one not
defined in the current locale for string collation). See section Categories of Activities that Locales Affect.
REG_ECTYPE
-
The regular expression referred to an invalid character class name.
REG_EESCAPE
-
The regular expression ended with `\'.
REG_ESUBREG
-
There was an invalid number in the `\digit' construct.
REG_EBRACK
-
There were unbalanced square brackets in the regular expression.
REG_EPAREN
-
An extended regular expression had unbalanced parentheses,
or a basic regular expression had unbalanced `\(' and `\)'.
REG_EBRACE
-
The regular expression had unbalanced `\{' and `\}'.
REG_ERANGE
-
One of the endpoints in a range expression was invalid.
REG_ESPACE
-
regcomp ran out of memory.
These are the bit flags that you can use in the cflags operand when
compiling a regular expression with regcomp.
REG_EXTENDED
-
Treat the pattern as an extended regular expression, rather than as a
basic regular expression.
REG_ICASE
-
Ignore case when matching letters.
REG_NOSUB
-
Don't bother storing the contents of the matches-ptr array.
REG_NEWLINE
-
Treat a newline in string as dividing string into multiple
lines, so that `$' can match before the newline and `^' can
match after. Also, don't permit `.' to match a newline, and don't
permit `[^...]' to match a newline.
Otherwise, newline acts like any other ordinary character.
Once you have compiled a regular expression, as described in section POSIX Regular Expression Compilation, you can match it against strings using
regexec. A match anywhere inside the string counts as success,
unless the regular expression contains anchor characters (`^' or
`$').
- Function: int regexec (regex_t *compiled, char *string, size_t nmatch, regmatch_t matchptr [], int eflags)
-
This function tries to match the compiled regular expression
*compiled against string.
regexec returns 0 if the regular expression matches;
otherwise, it returns a nonzero value. See the table below for
what nonzero values mean. You can use regerror to produce an
error message string describing the reason for a nonzero value;
see section POSIX Regexp Matching Cleanup.
The argument eflags is a word of bit flags that enable various
options.
If you want to get information about what part of string actually
matched the regular expression or its subexpressions, use the arguments
matchptr and nmatch. Otherwise, pass 0 for
nmatch, and NULL for matchptr. See section Match Results with Subexpressions.
You must match the regular expression with the same set of current
locales that were in effect when you compiled the regular expression.
The function regexec accepts the following flags in the
eflags argument:
REG_NOTBOL
-
Do not regard the beginning of the specified string as the beginning of
a line; more generally, don't make any assumptions about what text might
precede it.
REG_NOTEOL
-
Do not regard the end of the specified string as the end of a line; more
generally, don't make any assumptions about what text might follow it.
Here are the possible nonzero values that regexec can return:
REG_NOMATCH
-
The pattern didn't match the string. This isn't really an error.
REG_ESPACE
-
regexec ran out of memory.
When regexec matches parenthetical subexpressions of
pattern, it records which parts of string they match. It
returns that information by storing the offsets into an array whose
elements are structures of type regmatch_t. The first element of
the array (index 0) records the part of the string that matched
the entire regular expression. Each other element of the array records
the beginning and end of the part that matched a single parenthetical
subexpression.
- Data Type: regmatch_t
-
This is the data type of the matcharray array that you pass to
regexec. It contains two structure fields, as follows:
rm_so
-
The offset in string of the beginning of a substring. Add this
value to string to get the address of that part.
rm_eo
-
The offset in string of the end of the substring.
- Data Type: regoff_t
-
regoff_t is an alias for another signed integer type.
The fields of regmatch_t have type regoff_t.
The regmatch_t elements correspond to subexpressions
positionally; the first element (index 1) records where the first
subexpression matched, the second element records the second
subexpression, and so on. The order of the subexpressions is the order
in which they begin.
When you call regexec, you specify how long the matchptr
array is, with the nmatch argument. This tells regexec how
many elements to store. If the actual regular expression has more than
nmatch subexpressions, then you won't get offset information about
the rest of them. But this doesn't alter whether the pattern matches a
particular string or not.
If you don't want regexec to return any information about where
the subexpressions matched, you can either supply 0 for
nmatch, or use the flag REG_NOSUB when you compile the
pattern with regcomp.
Sometimes a subexpression matches a substring of no characters. This
happens when `f\(o*\)' matches the string `fum'. (It really
matches just the `f'.) In this case, both of the offsets identify
the point in the string where the null substring was found. In this
example, the offsets are both 1.
Sometimes the entire regular expression can match without using some of
its subexpressions at all--for example, when `ba\(na\)*' matches the
string `ba', the parenthetical subexpression is not used. When
this happens, regexec stores -1 in both fields of the
element for that subexpression.
Sometimes matching the entire regular expression can match a particular
subexpression more than once--for example, when `ba\(na\)*'
matches the string `bananana', the parenthetical subexpression
matches three times. When this happens, regexec usually stores
the offsets of the last part of the string that matched the
subexpression. In the case of `bananana', these offsets are
6 and 8.
But the last match is not always the one that is chosen. It's more
accurate to say that the last opportunity to match is the one
that takes precedence. What this means is that when one subexpression
appears within another, then the results reported for the inner
subexpression reflect whatever happened on the last match of the outer
subexpression. For an example, consider `\(ba\(na\)*s \)*' matching
the string `bananas bas '. The last time the inner expression
actually matches is near the end of the first word. But it is
considered again in the second word, and fails to match there.
regexec reports nonuse of the "na" subexpression.
Another place where this rule applies is when the regular expression
\(ba\(na\)*s \|nefer\(ti\)* \)*
matches `bananas nefertiti'. The "na" subexpression does match
in the first word, but it doesn't match in the second word because the
other alternative is used there. Once again, the second repetition of
the outer subexpression overrides the first, and within that second
repetition, the "na" subexpression is not used. So regexec
reports nonuse of the "na" subexpression.
When you are finished using a compiled regular expression, you can
free the storage it uses by calling regfree.
- Function: void regfree (regex_t *compiled)
-
Calling
regfree frees all the storage that *compiled
points to. This includes various internal fields of the regex_t
structure that aren't documented in this manual.
regfree does not free the object *compiled itself.
You should always free the space in a regex_t structure with
regfree before using the structure to compile another regular
expression.
When regcomp or regexec reports an error, you can use
the function regerror to turn it into an error message string.
- Function: size_t regerror (int errcode, regex_t *compiled, char *buffer, size_t length)
-
This function produces an error message string for the error code
errcode, and stores the string in length bytes of memory
starting at buffer. For the compiled argument, supply the
same compiled regular expression structure that
regcomp or
regexec was working with when it got the error. Alternatively,
you can supply NULL for compiled; you will still get a
meaningful error message, but it might not be as detailed.
If the error message can't fit in length bytes (including a
terminating null character), then regerror truncates it.
The string that regerror stores is always null-terminated
even if it has been truncated.
The return value of regerror is the minimum length needed to
store the entire error message. If this is less than length, then
the error message was not truncated, and you can use it. Otherwise, you
should call regerror again with a larger buffer.
Here is a function which uses regerror, but always dynamically
allocates a buffer for the error message:
char *get_regerror (int errcode, regex_t *compiled)
{
size_t length = regerror (errcode, compiled, NULL, 0);
char *buffer = xmalloc (length);
(void) regerror (errcode, compiled, buffer, length);
return buffer;
}
Word expansion means the process of splitting a string into
words and substituting for variables, commands, and wildcards
just as the shell does.
For example, when you write `ls -l foo.c', this string is split
into three separate words---`ls', `-l' and `foo.c'.
This is the most basic function of word expansion.
When you write `ls *.c', this can become many words, because
the word `*.c' can be replaced with any number of file names.
This is called wildcard expansion, and it is also a part of
word expansion.
When you use `echo $PATH' to print your path, you are taking
advantage of variable substitution, which is also part of word
expansion.
Ordinary programs can perform word expansion just like the shell by
calling the library function wordexp.
When word expansion is applied to a sequence of words, it performs the
following transformations in the order shown here:
-
Tilde expansion: Replacement of `~foo' with the name of
the home directory of `foo'.
-
Next, three different transformations are applied in the same step,
from left to right:
-
Variable substitution: Environment variables are substituted for
references such as `$foo'.
-
Command substitution: Constructs such as ``cat foo`' and
the equivalent `$(cat foo)' are replaced with the output from
the inner command.
-
Arithmetic expansion: Constructs such as `$(($x-1))' are
replaced with the result of the arithmetic computation.
-
Field splitting: subdivision of the text into words.
-
Wildcard expansion: The replacement of a construct such as `*.c'
with a list of `.c' file names. Wildcard expansion applies to an
entire word at a time, and replaces that word with 0 or more file names
that are themselves words.
-
Quote removal: The deletion of string-quotes, now that they have
done their job by inhibiting the above transformations when appropriate.
For the details of these transformations, and how to write the constructs
that use them, see The BASH Manual (to appear).
All the functions, constants and data types for word expansion are
declared in the header file `wordexp.h'.
Word expansion produces a vector of words (strings). To return this
vector, wordexp uses a special data type, wordexp_t, which
is a structure. You pass wordexp the address of the structure,
and it fills in the structure's fields to tell you about the results.
- Data Type: wordexp_t
-
This data type holds a pointer to a word vector. More precisely, it
records both the address of the word vector and its size.
we_wordc
-
The number of elements in the vector.
we_wordv
-
The address of the vector. This field has type
char **.
we_offs
-
The offset of the first real element of the vector, from its nominal
address in the
we_wordv field. Unlike the other fields, this
is always an input to wordexp, rather than an output from it.
If you use a nonzero offset, then that many elements at the beginning of
the vector are left empty. (The wordexp function fills them with
null pointers.)
The we_offs field is meaningful only if you use the
WRDE_DOOFFS flag. Otherwise, the offset is always zero
regardless of what is in this field, and the first real element comes at
the beginning of the vector.
- Function: int wordexp (const char *words, wordexp_t *word-vector-ptr, int flags)
-
Perform word expansion on the string words, putting the result in
a newly allocated vector, and store the size and address of this vector
into
*word-vector-ptr. The argument flags is a
combination of bit flags; see section Flags for Word Expansion, for details of
the flags.
You shouldn't use any of the characters `|&;<>' in the string
words unless they are quoted; likewise for newline. If you use
these characters unquoted, you will get the WRDE_BADCHAR error
code. Don't use parentheses or braces unless they are quoted or part of
a word expansion construct. If you use quotation characters `'"`',
they should come in pairs that balance.
The results of word expansion are a sequence of words. The function
wordexp allocates a string for each resulting word, then
allocates a vector of type char ** to store the addresses of
these strings. The last element of the vector is a null pointer.
This vector is called the word vector.
To return this vector, wordexp stores both its address and its
length (number of elements, not counting the terminating null pointer)
into *word-vector-ptr.
If wordexp succeeds, it returns 0. Otherwise, it returns one
of these error codes:
WRDE_BADCHAR
-
The input string words contains an unquoted invalid character such
as `|'.
WRDE_BADVAL
-
The input string refers to an undefined shell variable, and you used the flag
WRDE_UNDEF to forbid such references.
WRDE_CMDSUB
-
The input string uses command substitution, and you used the flag
WRDE_NOCMD to forbid command substitution.
WRDE_NOSPACE
-
It was impossible to allocate memory to hold the result. In this case,
wordexp can store part of the results--as much as it could
allocate room for.
WRDE_SYNTAX
-
There was a syntax error in the input string. For example, an unmatched
quoting character is a syntax error.
- Function: void wordfree (wordexp_t *word-vector-ptr)
-
Free the storage used for the word-strings and vector that
*word-vector-ptr points to. This does not free the
structure *word-vector-ptr itself--only the other
data it points to.
This section describes the flags that you can specify in the
flags argument to wordexp. Choose the flags you want,
and combine them with the C operator |.
WRDE_APPEND
-
Append the words from this expansion to the vector of words produced by
previous calls to
wordexp. This way you can effectively expand
several words as if they were concatenated with spaces between them.
In order for appending to work, you must not modify the contents of the
word vector structure between calls to wordexp. And, if you set
WRDE_DOOFFS in the first call to wordexp, you must also
set it when you append to the results.
WRDE_DOOFFS
-
Leave blank slots at the beginning of the vector of words.
The
we_offs field says how many slots to leave.
The blank slots contain null pointers.
WRDE_NOCMD
-
Don't do command substitution; if the input requests command substitution,
report an error.
WRDE_REUSE
-
Reuse a word vector made by a previous call to
wordexp.
Instead of allocating a new vector of words, this call to wordexp
will use the vector that already exists (making it larger if necessary).
Note that the vector may move, so it is not safe to save an old pointer
and use it again after calling wordexp. You must fetch
we_pathv anew after each call.
WRDE_SHOWERR
-
Do show any error messages printed by commands run by command substitution.
More precisely, allow these commands to inherit the standard error output
stream of the current process. By default,
wordexp gives these
commands a standard error stream that discards all output.
WRDE_UNDEF
-
If the input refers to a shell variable that is not defined, report an
error.
Here is an example of using wordexp to expand several strings
and use the results to run a shell command. It also shows the use of
WRDE_APPEND to concatenate the expansions and of wordfree
to free the space allocated by wordexp.
int
expand_and_execute (const char *program, const char *options)
{
wordexp_t result;
pid_t pid
int status, i;
/* Expand the string for the program to run. */
switch (wordexp (program, &result, 0))
{
case 0: /* Successful. */
break;
case WRDE_NOSPACE:
/* If the error was WRDE_NOSPACE,
then perhaps part of the result was allocated. */
wordfree (&result);
default: /* Some other error. */
return -1;
}
/* Expand the strings specified for the arguments. */
for (i = 0; args[i]; i++)
{
if (wordexp (options, &result, WRDE_APPEND))
{
wordfree (&result);
return -1;
}
}
pid = fork ();
if (pid == 0)
{
/* This is the child process. Execute the command. */
execv (result.we_wordv[0], result.we_wordv);
exit (EXIT_FAILURE);
}
else if (pid < 0)
/* The fork failed. Report failure. */
status = -1;
else
/* This is the parent process. Wait for the child to complete. */
if (waitpid (pid, &status, 0) != pid)
status = -1;
wordfree (&result);
return status;
}
It's a standard part of shell syntax that you can use `~' at the
beginning of a file name to stand for your own home directory. You
can use `~user' to stand for user's home directory.
Tilde expansion is the process of converting these abbreviations
to the directory names that they stand for.
Tilde expansion applies to the `~' plus all following characters up
to whitespace or a slash. It takes place only at the beginning of a
word, and only if none of the characters to be transformed is quoted in
any way.
Plain `~' uses the value of the environment variable HOME
as the proper home directory name. `~' followed by a user name
uses getpwname to look up that user in the user database, and
uses whatever directory is recorded there. Thus, `~' followed
by your own name can give different results from plain `~', if
the value of HOME is not really your home directory.
Part of ordinary shell syntax is the use of `$variable' to
substitute the value of a shell variable into a command. This is called
variable substitution, and it is one part of doing word expansion.
There are two basic ways you can write a variable reference for
substitution:
${variable}
-
If you write braces around the variable name, then it is completely
unambiguous where the variable name ends. You can concatenate
additional letters onto the end of the variable value by writing them
immediately after the close brace. For example, `${foo}s'
expands into `tractors'.
$variable
-
If you do not put braces around the variable name, then the variable
name consists of all the alphanumeric characters and underscores that
follow the `$'. The next punctuation character ends the variable
name. Thus, `$foo-bar' refers to the variable
foo and expands
into `tractor-bar'.
When you use braces, you can also use various constructs to modify the
value that is substituted, or test it in various ways.
${variable:-default}
-
Substitute the value of variable, but if that is empty or
undefined, use default instead.
${variable:=default}
-
Substitute the value of variable, but if that is empty or
undefined, use default instead and set the variable to
default.
${variable:?message}
-
If variable is defined and not empty, substitute its value.
Otherwise, print message as an error message on the standard error
stream, and consider word expansion a failure.
${variable:+replacement}
-
Substitute replacement, but only if variable is defined and
nonempty. Otherwise, substitute nothing for this construct.
${#variable}
-
Substitute a numeral which expresses in base ten the number of
characters in the value of variable. `${#foo}' stands for
`7', because `tractor' is seven characters.
These variants of variable substitution let you remove part of the
variable's value before substituting it. The prefix and
suffix are not mere strings; they are wildcard patterns, just
like the patterns that you use to match multiple file names. But
in this context, they match against parts of the variable value
rather than against file names.
${variable%%suffix}
-
Substitute the value of variable, but first discard from that
variable any portion at the end that matches the pattern suffix.
If there is more than one alternative for how to match against
suffix, this construct uses the longest possible match.
Thus, `${foo%%r*}' substitutes `t', because the largest
match for `r*' at the end of `tractor' is `ractor'.
${variable%suffix}
-
Substitute the value of variable, but first discard from that
variable any portion at the end that matches the pattern suffix.
If there is more than one alternative for how to match against
suffix, this construct uses the shortest possible alternative.
Thus, `${foo%%r*}' substitutes `tracto', because the shortest
match for `r*' at the end of `tractor' is just `r'.
${variable##prefix}
-
Substitute the value of variable, but first discard from that
variable any portion at the beginning that matches the pattern prefix.
If there is more than one alternative for how to match against
prefix, this construct uses the longest possible match.
Thus, `${foo%%r*}' substitutes `t', because the largest
match for `r*' at the end of `tractor' is `ractor'.
${variable#prefix}
-
Substitute the value of variable, but first discard from that
variable any portion at the beginning that matches the pattern prefix.
If there is more than one alternative for how to match against
prefix, this construct uses the shortest possible alternative.
Thus, `${foo%%r*}' substitutes `tracto', because the shortest
match for `r*' at the end of `tractor' is just `r'.
Go to the first, previous, next, last section, table of contents.
|