malloc
malloc
malloc
malloc
malloc
-Related Functionsalloca
Examplealloca
alloca
iconv
exampleiconv
Implementationsiconv
Implementation in the GNU C Library
iconv
iconv
module data structuresiconv
module interfaceslocaleconv
: It is portable but...
catgets
function familycatgets
interface
gettext
family of functions
gettext
usesgettext
in GUI programsgettext
gettext
hsearch
function.tsearch
function.glob
wordexp
wordexp
Exampleungetc
To Do Unreadingprintf
printf
Extension Exampleprintf
Handlersfmtmsg
andaddseverity
inetd
Daemon
inetd
Serversinetd
TZ
signal
andsigaction
sigaction
Function Examplesigaction
kill
kill
for Communicationpause
pause
sigsuspend
getopt
getopt
functiongetopt
getopt_long
getopt_long
argp_parse
Functionargp_parse
argp_help
Functionargp_help
Functiongetauxval
sysconf
sysconf
sysconf
Parameterssysconf
pathconf
This isThe GNU C Library Reference Manual, for Version 2.16of the GNU C Library.
Appendices
Indices
--- The Detailed Node Listing ---
Introduction
Standards and Portability
Using the Library
Error Reporting
Memory
Memory Allocation
Unconstrained Allocation
Allocation Debugging
Obstacks
Variable Size Automatic
Locking Pages
Character Handling
String and Array Utilities
Argz and Envz Vectors
Character Set Handling
Restartable multibyte conversion
Non-reentrant Conversion
Generic Charset Conversion
Locales
Locale Information
The Lame Way to Locale Data
Message Translation
Message catalogs a la X/Open
The Uniforum approach
Message catalogs with gettext
Searching and Sorting
Pattern Matching
Globbing
Regular Expressions
Word Expansion
I/O Overview
I/O Concepts
File Names
I/O on Streams
Unreading
Formatted Output
Customizing Printf
Formatted Input
Stream Buffering
Other Kinds of Streams
Custom Streams
Formatted Messages
Low-Level I/O
Stream/Descriptor Precautions
Asynchronous I/O
File Status Flags
File System Interface
Accessing Directories
File Attributes
Pipes and FIFOs
Sockets
Socket Addresses
Local Namespace
Internet Namespace
Host Addresses
Open/Close Sockets
Connections
Transferring Data
Datagrams
Inetd
Socket Options
Low-Level Terminal Interface
Terminal Modes
Special Characters
Pseudo-Terminals
Syslog
Submitting Syslog Messages
Mathematics
Pseudo-Random Numbers
Arithmetic
Floating Point Errors
Arithmetic Functions
Parsing of Numbers
Date and Time
Processor And CPU Time
Calendar Time
Parsing Date and Time
Resource Usage And Limitation
Priority
Traditional Scheduling
Memory Resources
Non-Local Exits
Signal Handling
Concepts of Signals
Standard Signals
Signal Actions
Defining Handlers
Atomic Data Access
Generating Signals
Blocking Signals
Waiting for a Signal
BSD Signal Handling
Program Basics
Program Arguments
Parsing Program Arguments
Environment Variables
Program Termination
Processes
Job Control
Implementing a Shell
Functions for Job Control
Name Service Switch
NSS Configuration File
NSS Module Internals
Extending NSS
Users and Groups
User Accounting Database
User Database
Group Database
Netgroup Database
System Management
Filesystem Handling
Mount Information
System Configuration
Sysconf
Cryptographic Functions
Debugging Support
Language Features
Variadic Functions
How Variadic
Data Type Measurements
Floating Type Macros
Installation
Maintenance
Source Layout
Porting
Platform
The C language provides no built-in facilities for performing suchcommon operations as input/output, memory management, stringmanipulation, and the like. Instead, these facilities are definedin a standardlibrary, which you compile and link with yourprograms. The GNU C Library, described in this document, defines all of thelibrary functions that are specified by the ISO C standard, as well asadditional features specific to POSIX and other derivatives of the Unixoperating system, and extensions specific to GNU systems.
The purpose of this manual is to tell you how to use the facilitiesof the GNU C Library. We have mentioned which features belong to whichstandards to help you identify things that are potentially non-portableto other systems. But the emphasis in this manual is not on strictportability.
This manual is written with the assumption that you are at leastsomewhat familiar with the C programming language and basic programmingconcepts. Specifically, familiarity with ISO standard C(seeISO C), rather than “traditional” pre-ISO C dialects, isassumed.
The GNU C Library includes several header files, each of whichprovides definitions and declarations for a group of related facilities;this information is used by the C compiler when processing your program. For example, the header filestdio.h declares facilities forperforming input and output, and the header filestring.hdeclares string processing utilities. The organization of this manualgenerally follows the same division as the header files.
If you are reading this manual for the first time, you should read allof the introductory material and skim the remaining chapters. There arealot of functions in the GNU C Library and it's not realistic toexpect that you will be able to remember exactlyhow to use eachand every one of them. It's more important to become generally familiarwith the kinds of facilities that the library provides, so that when youare writing your programs you can recognizewhen to make use oflibrary functions, and where in this manual you can find morespecific information about them.
This section discusses the various standards and other sources that the GNU C Libraryis based upon. These sources include the ISO C andPOSIX standards, and the System V and Berkeley Unix implementations.
The primary focus of this manual is to tell you how to make effectiveuse of the GNU C Library facilities. But if you are concerned aboutmaking your programs compatible with these standards, or portable tooperating systems other than GNU, this can affect how you use thelibrary. This section gives you an overview of these standards, so thatyou will know what they are when they are mentioned in other parts ofthe manual.
See Library Summary, for an alphabetical list of the functions andother symbols provided by the library. This list also states whichstandards each function or symbol comes from.
The GNU C Library is compatible with the C standard adopted by theAmerican National Standards Institute (ANSI):American National Standard X3.159-1989—“ANSI C” and laterby the International Standardization Organization (ISO):ISO/IEC 9899:1990, “Programming languages—C”. We here refer to the standard as ISO C since this is the moregeneral standard in respect of ratification. The header files and library facilities that make up the GNU C Library area superset of those specified by the ISO C standard.
If you are concerned about strict adherence to the ISO C standard, youshould use the ‘-ansi’ option when you compile your programs withthe GNU C compiler. This tells the compiler to defineonly ISOstandard features from the library header files, unless you explicitlyask for additional features. SeeFeature Test Macros, forinformation on how to do this.
Being able to restrict the library to include only ISO C features isimportant because ISO C puts limitations on what names can be definedby the library implementation, and the GNU extensions don't fit theselimitations. SeeReserved Names, for more information about theserestrictions.
This manual does not attempt to give you complete details on thedifferences between ISO C and older dialects. It gives advice on howto write programs to work portably under multiple C dialects, but doesnot aim for completeness.
The GNU C Library is also compatible with the ISO POSIX family ofstandards, known more formally as thePortable Operating SystemInterface for Computer Environments (ISO/IEC 9945). They were alsopublished as ANSI/IEEE Std 1003. POSIX is derived mostly from variousversions of the Unix operating system.
The library facilities specified by the POSIX standards are a supersetof those required by ISO C; POSIX specifies additional features forISO C functions, as well as specifying new additional functions. Ingeneral, the additional requirements and functionality defined by thePOSIX standards are aimed at providing lower-level support for aparticular kind of operating system environment, rather than generalprogramming language support which can run in many diverse operatingsystem environments.
The GNU C Library implements all of the functions specified inISO/IEC 9945-1:1996, the POSIX System Application ProgramInterface, commonly referred to as POSIX.1. The primary extensions tothe ISO C facilities specified by this standard include file systeminterface primitives (see File System Interface), device-specificterminal control functions (see Low-Level Terminal Interface), andprocess control functions (see Processes).
Some facilities from ISO/IEC 9945-2:1993, the POSIX Shell andUtilities standard (POSIX.2) are also implemented in the GNU C Library. These include utilities for dealing with regular expressions and otherpattern matching facilities (seePattern Matching).
The GNU C Library defines facilities from some versions of Unix whichare not formally standardized, specifically from the 4.2 BSD, 4.3 BSD,and 4.4 BSD Unix systems (also known asBerkeley Unix) and fromSunOS (a popular 4.2 BSD derivative that includes some Unix SystemV functionality). These systems support most of the ISO C and POSIXfacilities, and 4.4 BSD and newer releases of SunOS in fact support them all.
The BSD facilities include symbolic links (see Symbolic Links), theselect
function (see Waiting for I/O), the BSD signalfunctions (see BSD Signal Handling), and sockets (see Sockets).
TheSystem V Interface Description (SVID) is a document describingthe AT&T Unix System V operating system. It is to some extent asuperset of the POSIX standard (seePOSIX).
The GNU C Library defines most of the facilities required by the SVIDthat are not also required by the ISO C or POSIX standards, forcompatibility with System V Unix and other Unix systems (such asSunOS) which include these facilities. However, many of the moreobscure and less generally useful facilities required by the SVID arenot included. (In fact, Unix System V itself does not provide them all.)
The supported facilities from System V include the methods forinter-process communication and shared memory, thehsearch
anddrand48
families of functions,fmtmsg
and several of themathematical functions.
The X/Open Portability Guide, published by the X/Open Company, Ltd., isa more general standard than POSIX. X/Open owns the Unix copyright andthe XPG specifies the requirements for systems which are intended to bea Unix system.
The GNU C Library complies to the X/Open Portability Guide, Issue 4.2,with all extensions common to XSI (X/Open System Interface)compliant systems and also all X/Open UNIX extensions.
The additions on top of POSIX are mainly derived from functionalityavailable in System V and BSD systems. Some of the really badmistakes in System V systems were corrected, though. Sincefulfilling the XPG standard with the Unix extensions is aprecondition for getting the Unix brand chances are good that thefunctionality is available on commercial systems.
This section describes some of the practical issues involved in usingthe GNU C Library.
Libraries for use by C programs really consist of two parts:headerfiles that define types and macros and declare variables andfunctions; and the actual library orarchive that contains thedefinitions of the variables and functions.
(Recall that in C, a declaration merely provides information thata function or variable exists and gives its type. For a functiondeclaration, information about the types of its arguments might beprovided as well. The purpose of declarations is to allow the compilerto correctly process references to the declared variables and functions. Adefinition, on the other hand, actually allocates storage for avariable or says what a function does.)In order to use the facilities in the GNU C Library, you should be surethat your program source files include the appropriate header files. This is so that the compiler has declarations of these facilitiesavailable and can correctly process references to them. Once yourprogram has been compiled, the linker resolves these references tothe actual definitions provided in the archive file.
Header files are included into a program source file by the‘#include’ preprocessor directive. The C language supports twoforms of this directive; the first,
#include "header"
is typically used to include a header file header that you writeyourself; this would contain definitions and declarations describing theinterfaces between the different parts of your particular application. By contrast,
#include
is typically used to include a header file file.h that containsdefinitions and declarations for a standard library. This file wouldnormally be installed in a standard place by your system administrator. You should use this second form for the C library header files.
Typically, ‘#include’ directives are placed at the top of the Csource file, before any other code. If you begin your source files withsome comments explaining what the code in the file does (a good idea),put the ‘#include’ directives immediately afterwards, following thefeature test macro definition (seeFeature Test Macros).
For more information about the use of header files and ‘#include’directives, seeHeader Files.
The GNU C Library provides several header files, each of which containsthe type and macro definitions and variable and function declarationsfor a group of related facilities. This means that your programs mayneed to include several header files, depending on exactly whichfacilities you are using.
Some library header files include other library header filesautomatically. However, as a matter of programming style, you shouldnot rely on this; it is better to explicitly include all the headerfiles required for the library facilities you are using. The GNU C Libraryheader files have been written in such a way that it doesn'tmatter if a header file is accidentally included more than once;including a header file a second time has no effect. Likewise, if yourprogram needs to include multiple header files, the order in which theyare included doesn't matter.
Compatibility Note: Inclusion of standard header files in anyorder and any number of times works in any ISO C implementation. However, this has traditionally not been the case in many older Cimplementations.
Strictly speaking, you don't have to include a header file to usea function it declares; you could declare the function explicitlyyourself, according to the specifications in this manual. But it isusually better to include the header file because it may define typesand macros that are not otherwise available and because it may definemore efficient macro replacements for some functions. It is also a sureway to have the correct declaration.
If we describe something as a function in this manual, it may have amacro definition as well. This normally has no effect on how yourprogram runs—the macro definition does the same thing as the functionwould. In particular, macro equivalents for library functions evaluatearguments exactly once, in the same way that a function call would. Themain reason for these macro definitions is that sometimes they canproduce an inline expansion that is considerably faster than an actualfunction call.
Taking the address of a library function works even if it is alsodefined as a macro. This is because, in this context, the name of thefunction isn't followed by the left parenthesis that is syntacticallynecessary to recognize a macro call.
You might occasionally want to avoid using the macro definition of afunction—perhaps to make your program easier to debug. There aretwo ways you can do this:
For example, suppose the header file stdlib.h declares a functionnamedabs
with
extern int abs (int);
and also provides a macro definition for abs
. Then, in:
#includeint f (int *i) { return abs (++*i); }
the reference to abs
might refer to either a macro or a function. On the other hand, in each of the following examples the reference isto a function and not a macro.
#includeint g (int *i) { return (abs) (++*i); } #undef abs int h (int *i) { return abs (++*i); }
Since macro definitions that double for a function behave inexactly the same way as the actual function version, there is usually noneed for any of these methods. In fact, removing macro definitions usuallyjust makes your program slower.
The names of all library types, macros, variables and functions thatcome from the ISO C standard are reserved unconditionally; your programmay not redefine these names. All other library names arereserved if your program explicitly includes the header file thatdefines or declares them. There are several reasons for theserestrictions:
exit
to do something completely different fromwhat the standardexit
function does, for example. Preventingthis situation helps to make your programs easier to understand andcontributes to modularity and maintainability.In addition to the names documented in this manual, reserved namesinclude all external identifiers (global functions and variables) thatbegin with an underscore (‘_’) and all identifiers regardless ofuse that begin with either two underscores or an underscore followed bya capital letter are reserved names. This is so that the library andheader files can define functions, variables, and macros for internalpurposes without risk of conflict with names in user programs.
Some additional classes of identifier names are reserved for futureextensions to the C language or the POSIX.1 environment. While using thesenames for your own purposes right now might not cause a problem, they doraise the possibility of conflict with future versions of the Cor POSIX standards, so you should avoid these names.
float
andlong double
arguments,respectively. In addition, some individual header files reserve names beyondthose that they actually define. You only need to worry about theserestrictions if your program includes that particular header file.
The exact set of features available when you compile a source fileis controlled by whichfeature test macros you define.
If you compile your programs using ‘gcc -ansi’, you get only theISO C library features, unless you explicitly request additionalfeatures by defining one or more of the feature macros. SeeGNU CC Command Options,for more information about GCC options.
You should define these macros by using ‘#define’ preprocessordirectives at the top of your source code files. These directivesmust come before any#include
of a system header file. Itis best to make them the very first thing in the file, preceded only bycomments. You could also use the ‘-D’ option to GCC, but it'sbetter if you make the source files indicate their own meaning in aself-contained way.
This system exists to allow the library to conform to multiple standards. Although the different standards are often described as supersets of eachother, they are usually incompatible because larger standards requirefunctions with names that smaller ones reserve to the user program. Thisis not mere pedantry — it has been a problem in practice. For instance,some non-GNU programs define functions namedgetline
that havenothing to do with this library'sgetline
. They would not becompilable if all features were enabled indiscriminately.
This should not be used to verify that a program conforms to a limitedstandard. It is insufficient for this purpose, as it will not protect youfrom including header files outside the standard, or relying on semanticsundefined within the standard.
If you define this macro, then the functionality from the POSIX.1standard (IEEE Standard 1003.1) is available, as well as all of theISO C facilities.
The state of
_POSIX_SOURCE
is irrelevant if you define themacro_POSIX_C_SOURCE
to a positive integer.
Define this macro to a positive integer to control which POSIXfunctionality is made available. The greater the value of this macro,the more functionality is made available.
If you define this macro to a value greater than or equal to
1
,then the functionality from the 1990 edition of the POSIX.1 standard(IEEE Standard 1003.1-1990) is made available.If you define this macro to a value greater than or equal to
2
,then the functionality from the 1992 edition of the POSIX.2 standard(IEEE Standard 1003.2-1992) is made available.If you define this macro to a value greater than or equal to
199309L
,then the functionality from the 1993 edition of the POSIX.1b standard(IEEE Standard 1003.1b-1993) is made available.Greater values for
_POSIX_C_SOURCE
will enable future extensions. The POSIX standards process will define these values as necessary, andthe GNU C Library should support them some time after they become standardized. The 1996 edition of POSIX.1 (ISO/IEC 9945-1: 1996) states thatif you define_POSIX_C_SOURCE
to a value greater thanor equal to199506L
, then the functionality from the 1996edition is made available.
If you define this macro, functionality derived from 4.3 BSD Unix isincluded as well as the ISO C, POSIX.1, and POSIX.2 material.
Some of the features derived from 4.3 BSD Unix conflict with thecorresponding features specified by the POSIX.1 standard. If thismacro is defined, the 4.3 BSD definitions take precedence over thePOSIX definitions.
Due to the nature of some of the conflicts between 4.3 BSD and POSIX.1,you need to use a specialBSD compatibility library when linkingprograms compiled for BSD compatibility. This is because some functionsmust be defined in two different ways, one of them in the normal Clibrary, and one of them in the compatibility library. If your programdefines
_BSD_SOURCE
, you must give the option ‘-lbsd-compat’to the compiler or linker when linking the program, to tell it to findfunctions in this special compatibility library before looking for them inthe normal C library.
If you define this macro, functionality derived from SVID isincluded as well as the ISO C, POSIX.1, POSIX.2, and X/Open material.
— Macro: _XOPEN_SOURCE_EXTENDED
If you define this macro, functionality described in the X/OpenPortability Guide is included. This is a superset of the POSIX.1 andPOSIX.2 functionality and in fact
_POSIX_SOURCE
and_POSIX_C_SOURCE
are automatically defined.As the unification of all Unices, functionality only available inBSD and SVID is also included.
If the macro
_XOPEN_SOURCE_EXTENDED
is also defined, even morefunctionality is available. The extra functions will make all functionsavailable which are necessary for the X/Open Unix brand.If the macro
_XOPEN_SOURCE
has the value 500 this includesall functionality described so far plus some new definitions from theSingle Unix Specification, version 2.
If this macro is defined some extra functions are available whichrectify a few shortcomings in all previous standards. Specifically,the functions
fseeko
andftello
are available. Withoutthese functions the difference between the ISO C interface(fseek
,ftell
) and the low-level POSIX interface(lseek
) would lead to problems.This macro was introduced as part of the Large File Support extension (LFS).
If you define this macro an additional set of functions is made availablewhich enables 32 bit systems to use files of sizes beyondthe usual limit of 2GB. This interface is not available if the systemdoes not support files that large. On systems where the natural filesize limit is greater than 2GB (i.e., on 64 bit systems) the newfunctions are identical to the replaced functions.
The new functionality is made available by a new set of types andfunctions which replace the existing ones. The names of these new objectscontain
64
to indicate the intention, e.g.,off_t
vs.off64_t
andfseeko
vs.fseeko64
.This macro was introduced as part of the Large File Support extension(LFS). It is a transition interface for the period when 64 bitoffsets are not generally used (see
_FILE_OFFSET_BITS
).
This macro determines which file system interface shall be used, onereplacing the other. Whereas
_LARGEFILE64_SOURCE
makes the 64 bit interface available as an additional interface,_FILE_OFFSET_BITS
allows the 64 bit interface toreplace the old interface.If
_FILE_OFFSET_BITS
is undefined, or if it is defined to thevalue32
, nothing changes. The 32 bit interface is used andtypes likeoff_t
have a size of 32 bits on 32 bitsystems.If the macro is defined to the value
64
, the large file interfacereplaces the old interface. I.e., the functions are not made availableunder different names (as they are with_LARGEFILE64_SOURCE
). Instead the old function names now reference the new functions, e.g., acall tofseeko
now indeed callsfseeko64
.This macro should only be selected if the system provides mechanisms forhandling large files. On 64 bit systems this macro has no effectsince the
*64
functions are identical to the normal functions.This macro was introduced as part of the Large File Support extension(LFS).
Until the revised ISO C standard is widely adopted the new featuresare not automatically enabled. The GNU C Library nevertheless has a completeimplementation of the new standard and to enable the new features themacro
_ISOC99_SOURCE
should be defined.
If you define this macro, everything is included: ISO C89, ISO C99, POSIX.1, POSIX.2, BSD, SVID, X/Open, LFS, and GNU extensions. Inthe cases where POSIX.1 conflicts with BSD, the POSIX definitions takeprecedence.
If you want to get the full effect of
_GNU_SOURCE
but make theBSD definitions take precedence over the POSIX definitions, use thissequence of definitions:#define _GNU_SOURCE #define _BSD_SOURCE #define _SVID_SOURCENote that if you do this, you must link your program with the BSDcompatibility library by passing the ‘-lbsd-compat’ option to thecompiler or linker.NB: If you forget to do this, you mayget very strange errors at run time.
If you define one of these macros, reentrant versions of several functions getdeclared. Some of the functions are specified in POSIX.1c but many othersare only available on a few other systems or are unique to the GNU C Library. The problem is the delay in the standardization of the thread safe C libraryinterface.
Unlike on some other systems, no special version of the C library must beused for linking. There is only one version but while compiling thisit must have been specified to compile as thread safe.
We recommend you use _GNU_SOURCE
in new programs. If you don'tspecify the ‘-ansi’ option to GCC and don't define any of thesemacros explicitly, the effect is the same as defining_POSIX_C_SOURCE
to 2 and _POSIX_SOURCE
,_SVID_SOURCE
, and _BSD_SOURCE
to 1.
When you define a feature test macro to request a larger class of features,it is harmless to define in addition a feature test macro for a subset ofthose features. For example, if you define_POSIX_C_SOURCE
, thendefining_POSIX_SOURCE
as well has no effect. Likewise, if youdefine_GNU_SOURCE
, then defining either_POSIX_SOURCE
or_POSIX_C_SOURCE
or_SVID_SOURCE
as well has no effect.
Note, however, that the features of _BSD_SOURCE
are not a subset ofany of the other feature test macros supported. This is because it definesBSD features that take precedence over the POSIX features that arerequested by the other macros. For this reason, defining_BSD_SOURCE
in addition to the other feature test macros does havean effect: it causes the BSD features to take priority over the conflictingPOSIX features.
Here is an overview of the contents of the remaining chapters ofthis manual.
sizeof
operator and the symbolic constant NULL
, how to write functionsaccepting variable numbers of arguments, and constants describing theranges and other properties of the numerical types. There is also a simpledebugging mechanism which allows you to put assertions in your code, andhave diagnostic messages printed if the tests fail.isspace
) and functions forperforming case conversion.FILE *
objects). These are the normal C library functionsfromstdio.h.char
data type. setjmp
andlongjmp
functions. These functions provide a facility forgoto
-like jumps which can jump from one function to another.If you already know the name of the facility you are interested in, youcan look it up inLibrary Summary. This gives you a summary ofits syntax and a pointer to where you can find a more detaileddescription. This appendix is particularly useful if you just want toverify the order and type of arguments to a function, for example. Italso tells you what standard or system each function, variable, or macrois derived from.
Many functions in the GNU C Library detect and report error conditions,and sometimes your programs need to check for these error conditions. For example, when you open an input file, you should verify that thefile was actually opened correctly, and print an error message or takeother appropriate action if the call to the library function failed.
This chapter describes how the error reporting facility works. Yourprogram should include the header fileerrno.h to use thisfacility.
Most library functions return a special value to indicate that they havefailed. The special value is typically-1
, a null pointer, or aconstant such asEOF
that is defined for that purpose. But thisreturn value tells you only that an error has occurred. To find outwhat kind of error it was, you need to look at the error code stored in thevariableerrno
. This variable is declared in the header fileerrno.h.
The variable
errno
contains the system error number. You canchange the value oferrno
.Since
errno
is declaredvolatile
, it might be changedasynchronously by a signal handler; seeDefining Handlers. However, a properly written signal handler saves and restores the valueoferrno
, so you generally do not need to worry about thispossibility except when writing signal handlers.The initial value of
errno
at program startup is zero. Manylibrary functions are guaranteed to set it to certain nonzero valueswhen they encounter certain kinds of errors. These error conditions arelisted for each function. These functions do not changeerrno
when they succeed; thus, the value oferrno
after a successfulcall is not necessarily zero, and you should not useerrno
todeterminewhether a call failed. The proper way to do that isdocumented for each function.If the call failed, you canexamineerrno
.Many library functions can set
errno
to a nonzero value as aresult of calling other library functions which might fail. You shouldassume that any library function might altererrno
when thefunction returns an error.Portability Note: ISO C specifies
errno
as a“modifiable lvalue” rather than as a variable, permitting it to beimplemented as a macro. For example, its expansion might involve afunction call, like*__errno_location ()
. In fact, that iswhat it ison GNU/Linux and GNU/Hurd systems. The GNU C Library, on each system, doeswhatever is right for the particular system.There are a few library functions, like
sqrt
andatan
,that return a perfectly legitimate value in case of an error, but alsoseterrno
. For these functions, if you want to check to seewhether an error occurred, the recommended method is to seterrno
to zero before calling the function, and then check its value afterward.
All the error codes have symbolic names; they are macros defined inerrno.h. The names start with ‘E’ and an upper-caseletter or digit; you should consider names of this form to bereserved names. See Reserved Names.
The error code values are all positive integers and are all distinct,with one exception:EWOULDBLOCK
andEAGAIN
are the same. Since the values are distinct, you can use them as labels in aswitch
statement; just don't use bothEWOULDBLOCK
andEAGAIN
. Your program should not make any other assumptions aboutthe specific values of these symbolic constants.
The value of errno
doesn't necessarily have to correspond to anyof these macros, since some library functions might return other errorcodes of their own for other situations. The only values that areguaranteed to be meaningful for a particular library function are theones that this manual lists for that function.
Except on GNU/Hurd systems, almost any system call can return EFAULT
ifit is given an invalid pointer as an argument. Since this could onlyhappen as a result of a bug in your program, and since it will nothappen on GNU/Hurd systems, we have saved space by not mentioningEFAULT
in the descriptions of individual functions.
In some Unix systems, many system calls can also return EFAULT
ifgiven as an argument a pointer into the stack, and the kernel for someobscure reason fails in its attempt to extend the stack. If this everhappens, you should probably try using statically or dynamicallyallocated memory instead of stack memory on that system.
The error code macros are defined in the header fileerrno.h. All of them expand into integer constant values. Some of these errorcodes can't occur on GNU systems, but they can occur using the GNU C Libraryon other systems.
Operation not permitted; only the owner of the file (or other resource)or processes with special privileges can perform the operation.
No such file or directory. This is a “file doesn't exist” errorfor ordinary files that are referenced in contexts where they areexpected to already exist.
No process matches the specified process ID.
Interrupted function call; an asynchronous signal occurred and preventedcompletion of the call. When this happens, you should try the callagain.
You can choose to have functions resume after a signal that is handled,rather than failing with
EINTR
; seeInterrupted Primitives.
Input/output error; usually used for physical read or write errors.
No such device or address. The system tried to use the devicerepresented by a file you specified, and it couldn't find the device. This can mean that the device file was installed incorrectly, or thatthe physical device is missing or not correctly attached to thecomputer.
Argument list too long; used when the arguments passed to a new programbeing executed with one of the
exec
functions (seeExecuting a File) occupy too much memory space. This condition never arises onGNU/Hurd systems.
Invalid executable file format. This condition is detected by the
exec
functions; seeExecuting a File.
Bad file descriptor; for example, I/O on a descriptor that has beenclosed or reading from a descriptor open only for writing (or viceversa).
There are no child processes. This error happens on operations that aresupposed to manipulate child processes, when there aren't any processesto manipulate.
Deadlock avoided; allocating a system resource would have resulted in adeadlock situation. The system does not guarantee that it will noticeall such situations. This error means you got lucky and the systemnoticed; it might just hang. SeeFile Locks, for an example.
No memory available. The system cannot allocate more virtual memorybecause its capacity is full.
Permission denied; the file permissions do not allow the attempted operation.
Bad address; an invalid pointer was detected. On GNU/Hurd systems, this error never happens; you get a signal instead.
A file that isn't a block special file was given in a situation thatrequires one. For example, trying to mount an ordinary file as a filesystem in Unix gives this error.
Resource busy; a system resource that can't be shared is already in use. For example, if you try to delete a file that is the root of a currentlymounted filesystem, you get this error.
File exists; an existing file was specified in a context where it onlymakes sense to specify a new file.
An attempt to make an improper link across file systems was detected. This happens not only when you use
link
(seeHard Links) butalso when you rename a file withrename
(see Renaming Files).
The wrong type of device was given to a function that expects aparticular sort of device.
A file that isn't a directory was specified when a directory is required.
File is a directory; you cannot open a directory for writing,or create or remove hard links to it.
Invalid argument. This is used to indicate various kinds of problemswith passing the wrong argument to a library function.
The current process has too many files open and can't open any more. Duplicate descriptors do count toward this limit.
In BSD and GNU, the number of open files is controlled by a resourcelimit that can usually be increased. If you get this error, you mightwant to increase the
RLIMIT_NOFILE
limit or make it unlimited;seeLimits on Resources.
There are too many distinct file openings in the entire system. Notethat any number of linked channels count as just one file opening; seeLinked Channels. This error never occurs on GNU/Hurd systems.
Inappropriate I/O control operation, such as trying to set terminalmodes on an ordinary file.
An attempt to execute a file that is currently open for writing, orwrite to a file that is currently being executed. Often using adebugger to run a program is considered having it open for writing andwill cause this error. (The name stands for “text file busy”.) Thisis not an error on GNU/Hurd systems; the text is copied as necessary.
File too big; the size of a file would be larger than allowed by the system.
No space left on device; write operation on a file failed because thedisk is full.
Invalid seek operation (such as on a pipe).
An attempt was made to modify something on a read-only file system.
Too many links; the link count of a single file would become too large.
rename
can cause this error if the file being renamed already hasas many links as it can take (seeRenaming Files).
Broken pipe; there is no process reading from the other end of a pipe. Every library function that returns this error code also generates a
SIGPIPE
signal; this signal terminates the program if not handledor blocked. Thus, your program will never actually seeEPIPE
unless it has handled or blockedSIGPIPE
.
Domain error; used by mathematical functions when an argument value doesnot fall into the domain over which the function is defined.
Range error; used by mathematical functions when the result value isnot representable because of overflow or underflow.
Resource temporarily unavailable; the call might work if you try againlater. The macro
EWOULDBLOCK
is another name forEAGAIN
;they are always the same in the GNU C Library.This error can happen in a few different situations:
- An operation that would block was attempted on an object that hasnon-blocking mode selected. Trying the same operation again will blockuntil some external condition makes it possible to read, write, orconnect (whatever the operation). You can use
select
to find outwhen the operation will be possible; see Waiting for I/O.Portability Note: In many older Unix systems, this conditionwas indicated by
EWOULDBLOCK
, which was a distinct error codedifferent fromEAGAIN
. To make your program portable, you shouldcheck for both codes and treat them the same.- A temporary resource shortage made an operation impossible.
fork
can return this error. It indicates that the shortage is expected topass, so your program can try the call again later and it may succeed. It is probably a good idea to delay for a few seconds before trying itagain, to allow time for other processes to release scarce resources. Such shortages are usually fairly serious and affect the whole system,so usually an interactive program should report the error to the userand return to its command loop.
In the GNU C Library, this is another name for
EAGAIN
(above). The values are always the same, on every operating system.C libraries in many older Unix systems have
EWOULDBLOCK
as aseparate error code.
An operation that cannot complete immediately was initiated on an objectthat has non-blocking mode selected. Some functions that must alwaysblock (such as
connect
; seeConnecting) never returnEAGAIN
. Instead, they returnEINPROGRESS
to indicate thatthe operation has begun and will take some time. Attempts to manipulatethe object before the call completes returnEALREADY
. You canuse theselect
function to find out when the pending operationhas completed; seeWaiting for I/O.
An operation is already in progress on an object that has non-blockingmode selected.
A file that isn't a socket was specified when a socket is required.
The size of a message sent on a socket was larger than the supportedmaximum size.
The socket type does not support the requested communications protocol.
You specified a socket option that doesn't make sense for theparticular protocol being used by the socket. SeeSocket Options.
The socket domain does not support the requested communications protocol(perhaps because the requested protocol is completely invalid). SeeCreating a Socket.
The socket type is not supported.
The operation you requested is not supported. Some socket functionsdon't make sense for all types of sockets, and others may not beimplemented for all communications protocols. On GNU/Hurd systems, thiserror can happen for many calls when the object does not support theparticular operation; it is a generic indication that the server knowsnothing to do for that call.
The socket communications protocol family you requested is not supported.
The address family specified for a socket is not supported; it isinconsistent with the protocol being used on the socket. SeeSockets.
The requested socket address is already in use. See Socket Addresses.
The requested socket address is not available; for example, you triedto give a socket a name that doesn't match the local host name. SeeSocket Addresses.
A socket operation failed because the network was down.
A socket operation failed because the subnet containing the remote hostwas unreachable.
A network connection was reset because the remote host crashed.
A network connection was aborted locally.
A network connection was closed for reasons outside the control of thelocal host, such as by the remote machine rebooting or an unrecoverableprotocol violation.
The kernel's buffers for I/O operations are all in use. In GNU, thiserror is always synonymous with
ENOMEM
; you may get one or theother from network operations.
You tried to connect a socket that is already connected. See Connecting.
The socket is not connected to anything. You get this error when youtry to transmit data over a socket, without first specifying adestination for the data. For a connectionless socket (for datagramprotocols, such as UDP), you get
EDESTADDRREQ
instead.
No default destination address was set for the socket. You get thiserror when you try to transmit data over a connectionless socket,without first specifying a destination for the data with
connect
.
The socket has already been shut down.
???
A socket operation with a specified timeout received no response duringthe timeout period.
A remote host refused to allow the network connection (typically becauseit is not running the requested service).
Too many levels of symbolic links were encountered in looking up a file name. This often indicates a cycle of symbolic links.
Filename too long (longer than
PATH_MAX
; see Limits for Files) or host name too long (ingethostname
orsethostname
; seeHost Identification).
The remote host for a requested network connection is down.
The remote host for a requested network connection is not reachable.
Directory not empty, where an empty directory was expected. Typically,this error occurs when you are trying to delete a directory.
This means that the per-user limit on new process would be exceeded byan attempted
fork
. SeeLimits on Resources, for details ontheRLIMIT_NPROC
limit.
The file quota system is confused because there are too many users.
The user's disk quota was exceeded.
Stale NFS file handle. This indicates an internal confusion in the NFSsystem which is due to file system rearrangements on the server host. Repairing this condition usually requires unmounting and remountingthe NFS file system on the local host.
An attempt was made to NFS-mount a remote file system with a file name thatalready specifies an NFS-mounted file. (This is an error on some operating systems, but we expect it to workproperly on GNU/Hurd systems, making this error code impossible.)
???
???
???
???
???
No locks available. This is used by the file locking facilities; seeFile Locks. This error is never generated by GNU/Hurd systems, butit can result from an operation to an NFS server running anotheroperating system.
Inappropriate file type or format. The file was the wrong type for theoperation, or a data file had the wrong format.
On some systems
chmod
returns this error if you try to set thesticky bit on a non-directory file; seeSetting Permissions.
???
???
Function not implemented. This indicates that the function called isnot implemented at all, either in the C library itself or in theoperating system. When you get this error, you can be sure that thisparticular function will always fail with
ENOSYS
unless youinstall a new version of the C library or the operating system.
Not supported. A function returns this error when certain parametervalues are valid, but the functionality they request is not available. This can mean that the function does not implement a particular commandor option value or flag bit at all. For functions that operate on someobject given in a parameter, such as a file descriptor or a port, itmight instead mean that onlythat specific object (filedescriptor, port, etc.) is unable to support the other parameters given;different file descriptors might support different ranges of parametervalues.
If the entire function is not available at all in the implementation,it returns
ENOSYS
instead.
While decoding a multibyte character the function came along an invalidor an incomplete sequence of bytes or the given wide character is invalid.
On GNU/Hurd systems, servers supporting the
term
protocol returnthis error for certain operations when the caller is not in theforeground process group of the terminal. Users do not usually see thiserror because functions such asread
andwrite
translateit into aSIGTTIN
orSIGTTOU
signal. SeeJob Control,for information on process groups and these signals.
On GNU/Hurd systems, opening a file returns this error when the file istranslated by a program and the translator program dies while startingup, before it has connected to the file.
The experienced user will know what is wrong.
You did what?
Go home and have a glass of warm, dairy-fresh milk.
This error code has no purpose.
Operation canceled; an asynchronous operation was canceled before itcompleted. SeeAsynchronous I/O. When you call
aio_cancel
,the normal result is for the operations affected to complete with thiserror; seeCancel AIO Operations.
The following error codes are defined by the Linux/i386 kernel. They are not yet documented.
The library has functions and variables designed to make it easy foryour program to report informative error messages in the customaryformat about the failure of a library call. The functionsstrerror
andperror
give you the standard error messagefor a given error code; the variableprogram_invocation_short_name
gives you convenient access to thename of the program that encountered the error.
The
strerror
function maps the error code (see Checking for Errors) specified by the errnum argument to a descriptive errormessage string. The return value is a pointer to this string.The value errnum normally comes from the variable
errno
.You should not modify the string returned by
strerror
. Also, ifyou make subsequent calls tostrerror
, the string might beoverwritten. (But it's guaranteed that no library function ever callsstrerror
behind your back.)The function
strerror
is declared in string.h.
The
strerror_r
function works likestrerror
but instead ofreturning the error message in a statically allocated buffer shared byall threads in the process, it returns a private copy for thethread. This might be either some permanent global data or a messagestring in the user supplied buffer starting at buf with thelength ofn bytes.At most n characters are written (including the NUL byte) so it isup to the user to select the buffer large enough.
This function should always be used in multi-threaded programs sincethere is no way to guarantee the string returned by
strerror
really belongs to the last call of the current thread.This function
strerror_r
is a GNU extension and it is declared instring.h.
This function prints an error message to the stream
stderr
;see Standard Streams. The orientation ofstderr
is notchanged.If you call
perror
with a message that is either a nullpointer or an empty string,perror
just prints the error messagecorresponding toerrno
, adding a trailing newline.If you supply a non-null message argument, then
perror
prefixes its output with this string. It adds a colon and a spacecharacter to separate themessage from the error string correspondingtoerrno
.The function
perror
is declared in stdio.h.
strerror
and perror
produce the exact same message for anygiven error code; the precise text varies from system to system. Withthe GNU C Library, the messages are fairly short; there are no multi-linemessages or embedded newlines. Each error message begins with a capitalletter and does not include any terminating punctuation.
Compatibility Note: The strerror
function was introducedin ISO C89. Many older C systems do not support this function yet.
Many programs that don't read input from the terminal are designed toexit if any system call fails. By convention, the error message fromsuch a program should start with the program's name, sans directories. You can find that name in the variableprogram_invocation_short_name
; the full file name is stored thevariableprogram_invocation_name
.
This variable's value is the name that was used to invoke the programrunning in the current process. It is the same as
argv[0]
. Notethat this is not necessarily a useful file name; often it contains nodirectory names. SeeProgram Arguments.
This variable's value is the name that was used to invoke the programrunning in the current process, with directory names removed. (That isto say, it is the same as
program_invocation_name
minuseverything up to the last slash, if any.)
The library initialization code sets up both of these variables beforecalling main
.
Portability Note: These two variables are GNU extensions. Ifyou want your program to work with non-GNU libraries, you must save thevalue ofargv[0]
inmain
, and then strip off the directorynames yourself. We added these extensions to make it possible to writeself-contained error-reporting subroutines that require no explicitcooperation frommain
.
Here is an example showing how to handle failure to open a filecorrectly. The functionopen_sesame
tries to open the named filefor reading and returns a stream if successful. Thefopen
library function returns a null pointer if it couldn't open the file forsome reason. In that situation,open_sesame
constructs anappropriate error message using thestrerror
function, andterminates the program. If we were going to make some other librarycalls before passing the error code tostrerror
, we'd have tosave it in a local variable instead, because those other libraryfunctions might overwriteerrno
in the meantime.
#include#include #include #include FILE * open_sesame (char *name) { FILE *stream; errno = 0; stream = fopen (name, "r"); if (stream == NULL) { fprintf (stderr, "%s: Couldn't open file %s; %s\n", program_invocation_short_name, name, strerror (errno)); exit (EXIT_FAILURE); } else return stream; }
Using perror
has the advantage that the function is portable andavailable on all systems implementing ISO C. But often the textperror
generates is not what is wanted and there is no way toextend or change whatperror
does. The GNU coding standard, forinstance, requires error messages to be preceded by the program name andprograms which read some input files should provide informationabout the input file name and the line number in case an error isencountered while reading the file. For these occasions there are twofunctions available which are widely used throughout the GNU project. These functions are declared inerror.h.
The
error
function can be used to report general problems duringprogram execution. Theformat argument is a format string justlike those given to theprintf
family of functions. Thearguments required for the format can follow theformat parameter. Just likeperror
,error
also can report an error code intextual form. But unlikeperror
the error value is explicitlypassed to the function in theerrnum parameter. This eliminatesthe problem mentioned above that the error reporting function must becalled immediately after the function causing the error since otherwiseerrno
might have a different value.The
error
prints first the program name. If the applicationdefined a global variableerror_print_progname
and points it to afunction this function will be called to print the program name. Otherwise the string from the global variableprogram_name
isused. The program name is followed by a colon and a space which in turnis followed by the output produced by the format string. If theerrnum parameter is non-zero the format string output is followedby a colon and a space, followed by the error message for the error codeerrnum. In any case is the output terminated with a newline.The output is directed to the
stderr
stream. If thestderr
wasn't oriented before the call it will be narrow-orientedafterwards.The function will return unless the status parameter has anon-zero value. In this case the function will call
exit
withthestatus value for its parameter and therefore never return. Iferror
returns the global variableerror_message_count
isincremented by one to keep track of the number of errors reported.
The
error_at_line
function is very similar to theerror
function. The only difference are the additional parametersfnameandlineno. The handling of the other parameters is identical tothat oferror
except that between the program name and the stringgenerated by the format string additional text is inserted.Directly following the program name a colon, followed by the file namepointer to byfname, another colon, and a value oflineno isprinted.
This additional output of course is meant to be used to locate an errorin an input file (like a programming language source code file etc).
If the global variable
error_one_per_line
is set to a non-zerovalueerror_at_line
will avoid printing consecutive messages forthe same file and line. Repetition which are not directly followingeach other are not caught.Just like
error
this function only returned if status iszero. Otherwiseexit
is called with the non-zero value. Iferror
returns the global variableerror_message_count
isincremented by one to keep track of the number of errors reported.
As mentioned above the error
and error_at_line
functionscan be customized by defining a variable namederror_print_progname
.
If the
error_print_progname
variable is defined to a non-zerovalue the function pointed to is called byerror
orerror_at_line
. It is expected to print the program name or dosomething similarly useful.The function is expected to be print to the
stderr
stream andmust be able to handle whatever orientation the stream has.The variable is global and shared by all threads.
The
error_message_count
variable is incremented whenever one ofthe functionserror
orerror_at_line
returns. Thevariable is global and shared by all threads.
The
error_one_per_line
variable influences onlyerror_at_line
. Normally theerror_at_line
functioncreates output for every invocation. Iferror_one_per_line
isset to a non-zero valueerror_at_line
keeps track of the lastfile name and line number for which an error was reported and avoiddirectly following messages for the same file and line. This variableis global and shared by all threads.
A program which read some input file and reports errors in it could looklike this:
{ char *line = NULL; size_t len = 0; unsigned int lineno = 0; error_message_count = 0; while (! feof_unlocked (fp)) { ssize_t n = getline (&line, &len, fp); if (n <= 0) /* End of file or error. */ break; ++lineno; /* Process the line. */ ... if (Detect error in line) error_at_line (0, errval, filename, lineno, "some error text %s", some_variable); } if (error_message_count != 0) error (EXIT_FAILURE, 0, "%u errors found", error_message_count); }
error
and error_at_line
are clearly the functions ofchoice and enable the programmer to write applications which follow theGNU coding standard. The GNU C Library additionally contains functions whichare used in BSD for the same purpose. These functions are declared inerr.h. It is generally advised to not use these functions. Theyare included only for compatibility.
The
warn
function is roughly equivalent to a call likeerror (0, errno, format, the parameters)
except that the global variables
error
respects and modifiesare not used.
The
vwarn
function is just likewarn
except that theparameters for the handling of the format stringformat are passedin as an value of typeva_list
.
The
warnx
function is roughly equivalent to a call likeerror (0, 0, format, the parameters)
except that the global variables
error
respects and modifiesare not used. The difference towarn
is that no error numberstring is printed.
The
vwarnx
function is just likewarnx
except that theparameters for the handling of the format stringformat are passedin as an value of typeva_list
.
The
err
function is roughly equivalent to a call likeerror (status, errno, format, the parameters)
except that the global variables
error
respects and modifiesare not used and that the program is exited even ifstatus is zero.
The
verr
function is just likeerr
except that theparameters for the handling of the format stringformat are passedin as an value of typeva_list
.
The
errx
function is roughly equivalent to a call likeerror (status, 0, format, the parameters)
except that the global variables
error
respects and modifiesare not used and that the program is exited even ifstatusis zero. The difference toerr
is that no error numberstring is printed.
The
verrx
function is just likeerrx
except that theparameters for the handling of the format stringformat are passedin as an value of typeva_list
.
This chapter describes how processes manage and use memory in a systemthat uses the GNU C Library.
The GNU C Library has several functions for dynamically allocatingvirtual memory in various ways. They vary in generality and inefficiency. The library also provides functions for controlling pagingand allocation of real memory.
Memory mapped I/O is not discussed in this chapter. See Memory-mapped I/O.
One of the most basic resources a process has available to it is memory. There are a lot of different ways systems organize memory, but in atypical one, each process has one linear virtual address space, withaddresses running from zero to some huge maximum. It need not becontiguous; i.e., not all of these addresses actually can be used tostore data.
The virtual memory is divided into pages (4 kilobytes is typical). Backing each page of virtual memory is a page of real memory (called aframe) or some secondary storage, usually disk space. The diskspace might be swap space or just some ordinary disk file. Actually, apage of all zeroes sometimes has nothing at all backing it – there'sjust a flag saying it is all zeroes.The same frame of real memory or backing store can back multiple virtualpages belonging to multiple processes. This is normally the case, forexample, with virtual memory occupied by GNU C Library code. The samereal memory frame containing theprintf
function backs a virtualmemory page in each of the existing processes that has aprintf
call in its program.
In order for a program to access any part of a virtual page, the pagemust at that moment be backed by (“connected to”) a real frame. Butbecause there is usually a lot more virtual memory than real memory, thepages must move back and forth between real memory and backing storeregularly, coming into real memory when a process needs to access themand then retreating to backing store when not needed anymore. Thismovement is calledpaging.
When a program attempts to access a page which is not at that momentbacked by real memory, this is known as apage fault. When a pagefault occurs, the kernel suspends the process, places the page into areal page frame (this is called “paging in” or “faulting in”), thenresumes the process so that from the process' point of view, the pagewas in real memory all along. In fact, to the process, all pages alwaysseem to be in real memory. Except for one thing: the elapsed executiontime of an instruction that would normally be a few nanoseconds issuddenly much, much, longer (because the kernel normally has to do I/Oto complete the page-in). For programs sensitive to that, the functionsdescribed inLocking Pages can control it. Within each virtual address space, a process has to keep track of whatis at which addresses, and that process is called memory allocation. Allocation usually brings to mind meting out scarce resources, but inthe case of virtual memory, that's not a major goal, because there isgenerally much more of it than anyone needs. Memory allocation within aprocess is mainly just a matter of making sure that the same byte ofmemory isn't used to store two different things.
Processes allocate memory in two major ways: by exec andprogrammatically. Actually, forking is a third way, but it's not veryinteresting. SeeCreating a Process.
Exec is the operation of creating a virtual address space for a process,loading its basic program into it, and executing the program. It isdone by the “exec” family of functions (e.g.execl
). Theoperation takes a program file (an executable), it allocates space toload all the data in the executable, loads it, and transfers control toit. That data is most notably the instructions of the program (thetext), but also literals and constants in the program and evensome variables: C variables with the static storage class (seeMemory Allocation and C).Once that program begins to execute, it uses programmatic allocation togain additional memory. In a C program with the GNU C Library, thereare two kinds of programmatic allocation: automatic and dynamic. SeeMemory Allocation and C.
Memory-mapped I/O is another form of dynamic virtual memory allocation. Mapping memory to a file means declaring that the contents of certainrange of a process' addresses shall be identical to the contents of aspecified regular file. The system makes the virtual memory initiallycontain the contents of the file, and if you modify the memory, thesystem writes the same modification to the file. Note that due to themagic of virtual memory and page faults, there is no reason for thesystem to do I/O to read the file, or allocate real memory for itscontents, until the program accesses the virtual memory. SeeMemory-mapped I/O.Just as it programmatically allocates memory, the program canprogrammatically deallocate (free) it. You can't free the memorythat was allocated by exec. When the program exits or execs, you mightsay that all its memory gets freed, but since in both cases the addressspace ceases to exist, the point is really moot. SeeProgram Termination. A process' virtual address space is divided into segments. A segment isa contiguous range of virtual addresses. Three important segments are:
This section covers how ordinary programs manage storage for their data,including the famousmalloc
function and some fancier facilitiesspecial the GNU C Library and GNU Compiler.
The C language supports two kinds of memory allocation through thevariables in C programs:
A third important kind of memory allocation, dynamic allocation,is not supported by C variables but is available via GNU C Libraryfunctions.
Dynamic memory allocation is a technique in which programsdetermine as they are running where to store some information. You needdynamic allocation when the amount of memory you need, or how long youcontinue to need it, depends on factors that are not known before theprogram runs.
For example, you may need a block to store a line read from an inputfile; since there is no limit to how long a line can be, you mustallocate the memory dynamically and make it dynamically larger as youread more of the line.
Or, you may need a block for each record or each definition in the inputdata; since you can't know in advance how many there will be, you mustallocate a new block for each record or definition as you read it.
When you use dynamic allocation, the allocation of a block of memory isan action that the program requests explicitly. You call a function ormacro when you want to allocate space, and specify the size with anargument. If you want to free the space, you do so by calling anotherfunction or macro. You can do these things whenever you want, as oftenas you want.
Dynamic allocation is not supported by C variables; there is no storageclass “dynamic”, and there can never be a C variable whose value isstored in dynamically allocated space. The only way to get dynamicallyallocated memory is via a system call (which is generally via a GNU C Libraryfunction call), and the only way to refer to dynamicallyallocated space is through a pointer. Because it is less convenient,and because the actual process of dynamic allocation requires morecomputation time, programmers generally use dynamic allocation only whenneither static nor automatic allocation will serve.
For example, if you want to allocate dynamically some space to hold astruct foobar
, you cannot declare a variable of typestructfoobar
whose contents are the dynamically allocated space. But you candeclare a variable of pointer typestruct foobar *
and assign it theaddress of the space. Then you can use the operators ‘*’ and‘->’ on this pointer variable to refer to the contents of the space:
{ struct foobar *ptr = (struct foobar *) malloc (sizeof (struct foobar)); ptr->name = x; ptr->next = current_foobar; current_foobar = ptr; }
The most general dynamic allocation facility ismalloc
. Itallows you to allocate blocks of memory of any size at any time, makethem bigger or smaller at any time, and free the blocks individually atany time (or never).
To allocate a block of memory, callmalloc
. The prototype forthis function is instdlib.h.
This function returns a pointer to a newly allocated block sizebytes long, or a null pointer if the block could not be allocated.
The contents of the block are undefined; you must initialize it yourself(or usecalloc
instead; seeAllocating Cleared Space). Normally you would cast the value as a pointer to the kind of objectthat you want to store in the block. Here we show an example of doingso, and of initializing the space with zeros using the library functionmemset
(see Copying and Concatenation):
struct foo *ptr; ... ptr = (struct foo *) malloc (sizeof (struct foo)); if (ptr == 0) abort (); memset (ptr, 0, sizeof (struct foo));
You can store the result of malloc
into any pointer variablewithout a cast, because ISO C automatically converts the typevoid *
to another type of pointer when necessary. But the castis necessary in contexts other than assignment operators or if you mightwant your code to run in traditional C.
Remember that when allocating space for a string, the argument tomalloc
must be one plus the length of the string. This isbecause a string is terminated with a null character that doesn't countin the “length” of the string but does need space. For example:
char *ptr; ... ptr = (char *) malloc (length + 1);
See Representation of Strings, for more information about this.
malloc
If no more space is available, malloc
returns a null pointer. You should check the value ofevery call tomalloc
. It isuseful to write a subroutine that callsmalloc
and reports anerror if the value is a null pointer, returning only if the value isnonzero. This function is conventionally calledxmalloc
. Hereit is:
void * xmalloc (size_t size) { register void *value = malloc (size); if (value == 0) fatal ("virtual memory exhausted"); return value; }
Here is a real example of using malloc
(by way of xmalloc
). The functionsavestring
will copy a sequence of characters intoa newly allocated null-terminated string:
char * savestring (const char *ptr, size_t len) { register char *value = (char *) xmalloc (len + 1); value[len] = '\0'; return (char *) memcpy (value, ptr, len); }
The block that malloc
gives you is guaranteed to be aligned sothat it can hold any type of data. On GNU systems, the address isalways a multiple of eight on most systems, and a multiple of 16 on64-bit systems. Only rarely is any higher boundary (such as a pageboundary) necessary; for those cases, use memalign
,posix_memalign
orvalloc
(seeAligned Memory Blocks).
Note that the memory located after the end of the block is likely to bein use for something else; perhaps a block already allocated by anothercall tomalloc
. If you attempt to treat the block as longer thanyou asked for it to be, you are liable to destroy the data thatmalloc
uses to keep track of its blocks, or you may destroy thecontents of another block. If you have already allocated a block anddiscover you want it to be bigger, userealloc
(see Changing Block Size).
malloc
When you no longer need a block that you got withmalloc
, use thefunctionfree
to make the block available to be allocated again. The prototype for this function is instdlib.h.
The
free
function deallocates the block of memory pointed atby ptr.
This function does the same thing as
free
. It's provided forbackward compatibility with SunOS; you should usefree
instead.
Freeing a block alters the contents of the block. Do not expect tofind any data (such as a pointer to the next block in a chain of blocks) inthe block after freeing it. Copy whatever you need out of the block beforefreeing it! Here is an example of the proper way to free all the blocks ina chain, and the strings that they point to:
struct chain { struct chain *next; char *name; } void free_chain (struct chain *chain) { while (chain != 0) { struct chain *next = chain->next; free (chain->name); free (chain); chain = next; } }
Occasionally, free
can actually return memory to the operatingsystem and make the process smaller. Usually, all it can do is allow alater call tomalloc
to reuse the space. In the meantime, thespace remains in your program as part of a free-list used internally bymalloc
.
There is no point in freeing blocks at the end of a program, because allof the program's space is given back to the system when the processterminates.
Often you do not know for certain how big a block you will ultimately needat the time you must begin to use the block. For example, the block mightbe a buffer that you use to hold a line being read from a file; no matterhow long you make the buffer initially, you may encounter a line that islonger.
You can make the block longer by calling realloc
. This functionis declared instdlib.h.
The
realloc
function changes the size of the block whose address isptr to benewsize.Since the space after the end of the block may be in use,
realloc
may find it necessary to copy the block to a new address where more freespace is available. The value ofrealloc
is the new address of theblock. If the block needs to be moved,realloc
copies the oldcontents.If you pass a null pointer for ptr,
realloc
behaves justlike ‘malloc (newsize)’. This can be convenient, but bewarethat older implementations (before ISO C) may not support thisbehavior, and will probably crash whenrealloc
is passed a nullpointer.
Like malloc
, realloc
may return a null pointer if nomemory space is available to make the block bigger. When this happens,the original block is untouched; it has not been modified or relocated.
In most cases it makes no difference what happens to the original blockwhen realloc
fails, because the application program cannot continuewhen it is out of memory, and the only thing to do is to give a fatal errormessage. Often it is convenient to write and use a subroutine,conventionally calledxrealloc
, that takes care of the error messageas xmalloc
does formalloc
:
void * xrealloc (void *ptr, size_t size) { register void *value = realloc (ptr, size); if (value == 0) fatal ("Virtual memory exhausted"); return value; }
You can also use realloc
to make a block smaller. The reason youwould do this is to avoid tying up a lot of memory space when only a littleis needed. In several allocation implementations, making a block smaller sometimesnecessitates copying it, so it can fail if no other space is available.
If the new size you specify is the same as the old size, realloc
is guaranteed to change nothing and return the same address that you gave.
The function calloc
allocates memory and clears it to zero. Itis declared instdlib.h.
This function allocates a block long enough to contain a vector ofcount elements, each of sizeeltsize. Its contents arecleared to zero before
calloc
returns.
You could define calloc
as follows:
void * calloc (size_t count, size_t eltsize) { size_t size = count * eltsize; void *value = malloc (size); if (value != 0) memset (value, 0, size); return value; }
But in general, it is not guaranteed that calloc
callsmalloc
internally. Therefore, if an application provides its ownmalloc
/realloc
/free
outside the C library, itshould always definecalloc
, too.
malloc
As opposed to other versions, the malloc
in the GNU C Librarydoes not round up block sizes to powers of two, neither for large norfor small sizes. Neighboring chunks can be coalesced on afree
no matter what their size is. This makes the implementation suitablefor all kinds of allocation patterns without generally incurring highmemory waste through fragmentation.
Very large blocks (much larger than a page) are allocated withmmap
(anonymous or via/dev/zero
) by this implementation. This has the great advantage that these chunks are returned to thesystem immediately when they are freed. Therefore, it cannot happenthat a large chunk becomes “locked” in between smaller ones and evenafter callingfree
wastes memory. The size threshold formmap
to be used can be adjusted withmallopt
. The use ofmmap
can also be disabled completely.
The address of a block returned bymalloc
orrealloc
inGNU systems is always a multiple of eight (or sixteen on 64-bitsystems). If you need a block whose address is a multiple of a higherpower of two than that, usememalign
,posix_memalign
, orvalloc
. memalign
is declared inmalloc.h andposix_memalign
is declared instdlib.h.
With the GNU C Library, you can use free
to free the blocks thatmemalign
,posix_memalign
, andvalloc
return. Thatdoes not work in BSD, however—BSD does not provide any way to freesuch blocks.
The
memalign
function allocates a block of size bytes whoseaddress is a multiple ofboundary. Theboundary must be apower of two! The functionmemalign
works by allocating asomewhat larger block, and then returning an address within the blockthat is on the specified boundary.
The
posix_memalign
function is similar to thememalign
function in that it returns a buffer ofsize bytes aligned to amultiple ofalignment. But it adds one requirement to theparameteralignment: the value must be a power of two multiple ofsizeof (void *)
.If the function succeeds in allocation memory a pointer to the allocatedmemory is returned in
*
memptr and the return value is zero. Otherwise the function returns an error value indicating the problem.This function was introduced in POSIX 1003.1d.
Using
valloc
is like usingmemalign
and passing the page sizeas the value of the second argument. It is implemented like this:void * valloc (size_t size) { return memalign (getpagesize (), size); }Query Memory Parameters for more information about the memorysubsystem.
You can adjust some parameters for dynamic memory allocation with themallopt
function. This function is the general SVID/XPGinterface, defined inmalloc.h.
When calling
mallopt
, the param argument specifies theparameter to be set, andvalue the new value to be set. Possiblechoices forparam, as defined inmalloc.h, are:
M_TRIM_THRESHOLD
- This is the minimum size (in bytes) of the top-most, releasable chunkthat will cause
sbrk
to be called with a negative argument inorder to return memory to the system.
M_TOP_PAD
- This parameter determines the amount of extra memory to obtain from thesystem when a call to
sbrk
is required. It also specifies thenumber of bytes to retain when shrinking the heap by callingsbrk
with a negative argument. This provides the necessary hysteresis inheap size such that excessive amounts of system calls can be avoided.
M_MMAP_THRESHOLD
- All chunks larger than this value are allocated outside the normalheap, using the
mmap
system call. This way it is guaranteedthat the memory for these chunks can be returned to the system onfree
. Note that requests smaller than this threshold might stillbe allocated viammap
.
M_MMAP_MAX
- The maximum number of chunks to allocate with
mmap
. Setting thisto zero disables all use ofmmap
.
M_PERTURB
- If non-zero, memory blocks are filled with values depending on somelow order bits of this parameter when they are allocated (except whenallocated by
calloc
) and freed. This can be used to debug theuse of uninitialized or freed heap memory.
You can askmalloc
to check the consistency of dynamic memory byusing themcheck
function. This function is a GNU extension,declared in mcheck.h.
Calling
mcheck
tellsmalloc
to perform occasionalconsistency checks. These will catch things such as writingpast the end of a block that was allocated withmalloc
.The abortfn argument is the function to call when an inconsistencyis found. If you supply a null pointer, then
mcheck
uses adefault function which prints a message and callsabort
(see Aborting a Program). The function you supply is called withone argument, which says what sort of inconsistency was detected; itstype is described below.It is too late to begin allocation checking once you have allocatedanything with
malloc
. Somcheck
does nothing in thatcase. The function returns-1
if you call it too late, and0
otherwise (when it is successful).The easiest way to arrange to call
mcheck
early enough is to usethe option ‘-lmcheck’ when you link your program; then you don'tneed to modify your program source at all. Alternatively you might usea debugger to insert a call tomcheck
whenever the program isstarted, for example these gdb commands will automatically callmcheck
whenever the program starts:(gdb) break main Breakpoint 1, main (argc=2, argv=0xbffff964) at whatever.c:10 (gdb) command 1 Type commands for when breakpoint 1 is hit, one per line. End with a line saying just "end". >call mcheck(0) >continue >end (gdb) ...This will however only work if no initialization function of any objectinvolved calls any of the
malloc
functions sincemcheck
must be called before the first such function.
The
mprobe
function lets you explicitly check for inconsistenciesin a particular allocated block. You must have already calledmcheck
at the beginning of the program, to do its occasionalchecks; callingmprobe
requests an additional consistency checkto be done at the time of the call.The argument pointer must be a pointer returned by
malloc
orrealloc
.mprobe
returns a value that says whatinconsistency, if any, was found. The values are described below.
This enumerated type describes what kind of inconsistency was detectedin an allocated block, if any. Here are the possible values:
MCHECK_DISABLED
mcheck
was not called before the first allocation. No consistency checking can be done.
MCHECK_OK
- No inconsistency detected.
MCHECK_HEAD
- The data immediately before the block was modified. This commonly happens when an array index or pointeris decremented too far.
MCHECK_TAIL
- The data immediately after the block was modified. This commonly happens when an array index or pointeris incremented too far.
MCHECK_FREE
- The block was already freed.
Another possibility to check for and guard against bugs in the use ofmalloc
,realloc
andfree
is to set the environmentvariable MALLOC_CHECK_
. WhenMALLOC_CHECK_
is set, aspecial (less efficient) implementation is used which is designed to betolerant against simple errors, such as double calls offree
withthe same argument, or overruns of a single byte (off-by-one bugs). Notall such errors can be protected against, however, and memory leaks canresult. IfMALLOC_CHECK_
is set to0
, any detected heapcorruption is silently ignored; if set to1
, a diagnostic isprinted onstderr
; if set to 2
,abort
is calledimmediately. This can be useful because otherwise a crash may happenmuch later, and the true cause for the problem is then very hard totrack down.
There is one problem with MALLOC_CHECK_
: in SUID or SGID binariesit could possibly be exploited since diverging from the normal programsbehavior it now writes something to the standard error descriptor. Therefore the use ofMALLOC_CHECK_
is disabled by default forSUID and SGID binaries. It can be enabled again by the systemadministrator by adding a file/etc/suid-debug (the content isnot important it could be empty).
So, what's the difference between using MALLOC_CHECK_
and linkingwith ‘-lmcheck’?MALLOC_CHECK_
is orthogonal with respect to‘-lmcheck’. ‘-lmcheck’ has been added for backwardcompatibility. BothMALLOC_CHECK_
and ‘-lmcheck’ shoulduncover the same bugs - but usingMALLOC_CHECK_
you don't need torecompile your application.
The GNU C Library lets you modify the behavior ofmalloc
,realloc
, andfree
by specifying appropriate hookfunctions. You can use these hooks to help you debug programs that usedynamic memory allocation, for example.
The hook variables are declared in malloc.h.
The value of this variable is a pointer to the function that
malloc
uses whenever it is called. You should define thisfunction to look likemalloc
; that is, like:void *function (size_t size, const void *caller)The value of caller is the return address found on the stack whenthe
malloc
function was called. This value allows you to tracethe memory consumption of the program.
The value of this variable is a pointer to function that
realloc
uses whenever it is called. You should define this function to looklikerealloc
; that is, like:void *function (void *ptr, size_t size, const void *caller)The value of caller is the return address found on the stack whenthe
realloc
function was called. This value allows you to trace thememory consumption of the program.
The value of this variable is a pointer to function that
free
uses whenever it is called. You should define this function to looklikefree
; that is, like:void function (void *ptr, const void *caller)The value of caller is the return address found on the stack whenthe
free
function was called. This value allows you to trace thememory consumption of the program.
The value of this variable is a pointer to function that
memalign
uses whenever it is called. You should define this function to looklikememalign
; that is, like:void *function (size_t alignment, size_t size, const void *caller)The value of caller is the return address found on the stack whenthe
memalign
function was called. This value allows you to trace thememory consumption of the program.
You must make sure that the function you install as a hook for one ofthese functions does not call that function recursively without restoringthe old value of the hook first! Otherwise, your program will get stuckin an infinite recursion. Before calling the function recursively, oneshould make sure to restore all the hooks to their previous value. Whencoming back from the recursive call, all the hooks should be resavedsince a hook might modify itself.
The value of this variable is a pointer to a function that is calledonce when the malloc implementation is initialized. This is a weakvariable, so it can be overridden in the application with a definitionlike the following:
void (*__malloc_initialize_hook) (void) = my_init_hook;
An issue to look out for is the time at which the malloc hook functionscan be safely installed. If the hook functions call the malloc-relatedfunctions recursively, it is necessary that malloc has already properlyinitialized itself at the time when__malloc_hook
etc. isassigned to. On the other hand, if the hook functions provide acomplete malloc implementation of their own, it is vital that the hooksare assigned tobefore the very firstmalloc
call hascompleted, because otherwise a chunk obtained from the ordinary,un-hooked malloc may later be handed to__free_hook
, for example.
In both cases, the problem can be solved by setting up the hooks fromwithin a user-defined function pointed to by__malloc_initialize_hook
—then the hooks will be set up safelyat the right time.
Here is an example showing how to use __malloc_hook
and__free_hook
properly. It installs a function that prints outinformation every timemalloc
orfree
is called. We justassume here that realloc
and memalign
are not used in ourprogram.
/* Prototypes for __malloc_hook, __free_hook */ #include/* Prototypes for our hooks. */ static void my_init_hook (void); static void *my_malloc_hook (size_t, const void *); static void my_free_hook (void*, const void *); /* Override initializing hook from the C library. */ void (*__malloc_initialize_hook) (void) = my_init_hook; static void my_init_hook (void) { old_malloc_hook = __malloc_hook; old_free_hook = __free_hook; __malloc_hook = my_malloc_hook; __free_hook = my_free_hook; } static void * my_malloc_hook (size_t size, const void *caller) { void *result; /* Restore all old hooks */ __malloc_hook = old_malloc_hook; __free_hook = old_free_hook; /* Call recursively */ result = malloc (size); /* Save underlying hooks */ old_malloc_hook = __malloc_hook; old_free_hook = __free_hook; /* printf
might callmalloc
, so protect it too. */ printf ("malloc (%u) returns %p\n", (unsigned int) size, result); /* Restore our own hooks */ __malloc_hook = my_malloc_hook; __free_hook = my_free_hook; return result; } static void my_free_hook (void *ptr, const void *caller) { /* Restore all old hooks */ __malloc_hook = old_malloc_hook; __free_hook = old_free_hook; /* Call recursively */ free (ptr); /* Save underlying hooks */ old_malloc_hook = __malloc_hook; old_free_hook = __free_hook; /*printf
might callfree
, so protect it too. */ printf ("freed pointer %p\n", ptr); /* Restore our own hooks */ __malloc_hook = my_malloc_hook; __free_hook = my_free_hook; } main () { ... }
The mcheck
function (see Heap Consistency Checking) works byinstalling such hooks.
malloc
You can get information about dynamic memory allocation by calling themallinfo
function. This function and its associated data typeare declared inmalloc.h; they are an extension of the standardSVID/XPG version.
This structure type is used to return information about the dynamicmemory allocator. It contains the following members:
int arena
- This is the total size of memory allocated with
sbrk
bymalloc
, in bytes.
int ordblks
- This is the number of chunks not in use. (The memory allocatorinternally gets chunks of memory from the operating system, and thencarves them up to satisfy individual
malloc
requests; see Efficiency and Malloc.)
int smblks
- This field is unused.
int hblks
- This is the total number of chunks allocated with
mmap
.
int hblkhd
- This is the total size of memory allocated with
mmap
, in bytes.
int usmblks
- This field is unused.
int fsmblks
- This field is unused.
int uordblks
- This is the total size of memory occupied by chunks handed out by
malloc
.
int fordblks
- This is the total size of memory occupied by free (not in use) chunks.
int keepcost
- This is the size of the top-most releasable chunk that normallyborders the end of the heap (i.e., the high end of the virtual addressspace's data segment).
This function returns information about the current dynamic memory usagein a structure of type
struct mallinfo
.
malloc
-Related FunctionsHere is a summary of the functions that work with malloc
:
void *malloc (size_t
size
)
void free (void *
addr
)
malloc
. See Freeing after Malloc.
void *realloc (void *
addr
, size_t
size
)
malloc
larger or smaller,possibly by copying it to a new location. See Changing Block Size.
void *calloc (size_t
count
, size_t
eltsize
)
malloc
, and set its contents to zero. See Allocating Cleared Space.
void *valloc (size_t
size
)
void *memalign (size_t
size
, size_t
boundary
)
int mallopt (int
param
, int
value
)
int mcheck (void (*
abortfn
) (void))
malloc
to perform occasional consistency checks ondynamically allocated memory, and to call abortfn when aninconsistency is found. See Heap Consistency Checking.
void *(*__malloc_hook) (size_t
size
, const void *
caller
)
malloc
uses whenever it is called.
void *(*__realloc_hook) (void *
ptr
, size_t
size
, const void *
caller
)
realloc
uses whenever it is called.
void (*__free_hook) (void *
ptr
, const void *
caller
)
free
uses whenever it is called.
void (*__memalign_hook) (size_t
size
, size_t
alignment
, const void *
caller
)
memalign
uses whenever it is called.
struct mallinfo mallinfo (void)
A complicated task when programming with languages which do not usegarbage collected dynamic memory allocation is to find memory leaks. Long running programs must assure that dynamically allocated objects arefreed at the end of their lifetime. If this does not happen the systemruns out of memory, sooner or later.
The malloc
implementation in the GNU C Library provides somesimple means to detect such leaks and obtain some information to findthe location. To do this the application must be started in a specialmode which is enabled by an environment variable. There are no speedpenalties for the program if the debugging mode is not enabled.
When the
mtrace
function is called it looks for an environmentvariable namedMALLOC_TRACE
. This variable is supposed tocontain a valid file name. The user must have write access. If thefile already exists it is truncated. If the environment variable is notset or it does not name a valid file which can be opened for writingnothing is done. The behavior ofmalloc
etc. is not changed. For obvious reasons this also happens if the application is installedwith the SUID or SGID bit set.If the named file is successfully opened,
mtrace
installs specialhandlers for the functionsmalloc
,realloc
, andfree
(see Hooks for Malloc). From then on, all uses of thesefunctions are traced and protocolled into the file. There is now ofcourse a speed penalty for all calls to the traced functions so tracingshould not be enabled during normal use.This function is a GNU extension and generally not available on othersystems. The prototype can be found inmcheck.h.
The
muntrace
function can be called aftermtrace
was usedto enable tracing themalloc
calls. If no (successful) call ofmtrace
was mademuntrace
does nothing.Otherwise it deinstalls the handlers for
malloc
,realloc
,andfree
and then closes the protocol file. No calls areprotocolled anymore and the program runs again at full speed.This function is a GNU extension and generally not available on othersystems. The prototype can be found inmcheck.h.
Even though the tracing functionality does not influence the runtimebehavior of the program it is not a good idea to callmtrace
inall programs. Just imagine that you debug a program usingmtrace
and all other programs used in the debugging session also trace theirmalloc
calls. The output file would be the same for all programsand thus is unusable. Therefore one should callmtrace
only ifcompiled for debugging. A program could therefore start like this:
#includeint main (int argc, char *argv[]) { #ifdef DEBUGGING mtrace (); #endif ... }
This is all what is needed if you want to trace the calls during thewhole runtime of the program. Alternatively you can stop the tracing atany time with a call tomuntrace
. It is even possible to restartthe tracing again with a new call tomtrace
. But this can causeunreliable results since there may be calls of the functions which arenot called. Please note that not only the application uses the tracedfunctions, also libraries (including the C library itself) use thesefunctions.
This last point is also why it is no good idea to call muntrace
before the program terminated. The libraries are informed about thetermination of the program only after the program returns frommain
or callsexit
and so cannot free the memory they usebefore this time.
So the best thing one can do is to call mtrace
as the very firstfunction in the program and never callmuntrace
. So the programtraces almost all uses of themalloc
functions (except thosecalls which are executed by constructors of the program or usedlibraries).
You know the situation. The program is prepared for debugging and inall debugging sessions it runs well. But once it is started withoutdebugging the error shows up. A typical example is a memory leak thatbecomes visible only when we turn off the debugging. If you foreseesuch situations you can still win. Simply use something equivalent tothe following little program:
#include#include static void enable (int sig) { mtrace (); signal (SIGUSR1, enable); } static void disable (int sig) { muntrace (); signal (SIGUSR2, disable); } int main (int argc, char *argv[]) { ... signal (SIGUSR1, enable); signal (SIGUSR2, disable); ... }
I.e., the user can start the memory debugger any time s/he wants if theprogram was started withMALLOC_TRACE
set in the environment. The output will of course not show the allocations which happened beforethe first signal but if there is a memory leak this will show upnevertheless.
If you take a look at the output it will look similar to this:
= Start [0x8048209] - 0x8064cc8 [0x8048209] - 0x8064ce0 [0x8048209] - 0x8064cf8 [0x80481eb] + 0x8064c48 0x14 [0x80481eb] + 0x8064c60 0x14 [0x80481eb] + 0x8064c78 0x14 [0x80481eb] + 0x8064c90 0x14 = End
What this all means is not really important since the trace file is notmeant to be read by a human. Therefore no attention is given toreadability. Instead there is a program which comes with the GNU C Librarywhich interprets the traces and outputs a summary in anuser-friendly way. The program is called mtrace
(it is in fact aPerl script) and it takes one or two arguments. In any case the name ofthe file with the trace output must be specified. If an optionalargument precedes the name of the trace file this must be the name ofthe program which generated the trace.
drepper$ mtrace tst-mtrace log No memory leaks.
In this case the program tst-mtrace
was run and it produced atrace filelog. The message printed bymtrace
shows thereare no problems with the code, all allocated memory was freedafterwards.
If we call mtrace
on the example trace given above we would get adifferent outout:
drepper$ mtrace errlog - 0x08064cc8 Free 2 was never alloc'd 0x8048209 - 0x08064ce0 Free 3 was never alloc'd 0x8048209 - 0x08064cf8 Free 4 was never alloc'd 0x8048209 Memory not freed: ----------------- Address Size Caller 0x08064c48 0x14 at 0x80481eb 0x08064c60 0x14 at 0x80481eb 0x08064c78 0x14 at 0x80481eb 0x08064c90 0x14 at 0x80481eb
We have called mtrace
with only one argument and so the scripthas no chance to find out what is meant with the addresses given in thetrace. We can do better:
drepper$ mtrace tst errlog - 0x08064cc8 Free 2 was never alloc'd /home/drepper/tst.c:39 - 0x08064ce0 Free 3 was never alloc'd /home/drepper/tst.c:39 - 0x08064cf8 Free 4 was never alloc'd /home/drepper/tst.c:39 Memory not freed: ----------------- Address Size Caller 0x08064c48 0x14 at /home/drepper/tst.c:33 0x08064c60 0x14 at /home/drepper/tst.c:33 0x08064c78 0x14 at /home/drepper/tst.c:33 0x08064c90 0x14 at /home/drepper/tst.c:33
Suddenly the output makes much more sense and the user can seeimmediately where the function calls causing the trouble can be found.
Interpreting this output is not complicated. There are at most twodifferent situations being detected. First,free
was called forpointers which were never returned by one of the allocation functions. This is usually a very bad problem and what this looks like is shown inthe first three lines of the output. Situations like this are quiterare and if they appear they show up very drastically: the programnormally crashes.
The other situation which is much harder to detect are memory leaks. Asyou can see in the output themtrace
function collects all thisinformation and so can say that the program calls an allocation functionfrom line 33 in the source file/home/drepper/tst-mtrace.c fourtimes without freeing this memory before the program terminates. Whether this is a real problem remains to be investigated.
An obstack is a pool of memory containing a stack of objects. Youcan create any number of separate obstacks, and then allocate objects inspecified obstacks. Within each obstack, the last object allocated mustalways be the first one freed, but distinct obstacks are independent ofeach other.
Aside from this one constraint of order of freeing, obstacks are totallygeneral: an obstack can contain any number of objects of any size. Theyare implemented with macros, so allocation is usually very fast as long asthe objects are usually small. And the only space overhead per object isthe padding needed to start each object on a suitable boundary.
The utilities for manipulating obstacks are declared in the headerfile obstack.h.
An obstack is represented by a data structure of type
structobstack
. This structure has a small fixed size; it records the statusof the obstack and how to find the space in which objects are allocated. It does not contain any of the objects themselves. You should not tryto access the contents of the structure directly; use only the functionsdescribed in this chapter.
You can declare variables of type struct obstack
and use them asobstacks, or you can allocate obstacks dynamically like any other kindof object. Dynamic allocation of obstacks allows your program to have avariable number of different stacks. (You can even allocate anobstack structure in another obstack, but this is rarely useful.)
All the functions that work with obstacks require you to specify whichobstack to use. You do this with a pointer of typestruct obstack*
. In the following, we often say “an obstack” when strictlyspeaking the object at hand is such a pointer.
The objects in the obstack are packed into large blocks calledchunks. Thestruct obstack
structure points to a chain ofthe chunks currently in use.
The obstack library obtains a new chunk whenever you allocate an objectthat won't fit in the previous chunk. Since the obstack library manageschunks automatically, you don't need to pay much attention to them, butyou do need to supply a function which the obstack library should use toget a chunk. Usually you supply a function which usesmalloc
directly or indirectly. You must also supply a function to free a chunk. These matters are described in the following section.
Each source file in which you plan to use the obstack functionsmust include the header fileobstack.h, like this:
#include
Also, if the source file uses the macroobstack_init
, it mustdeclare or define two functions or macros that will be called by theobstack library. One,obstack_chunk_alloc
, is used to allocatethe chunks of memory into which objects are packed. The other,obstack_chunk_free
, is used to return chunks when the objects inthem are freed. These macros should appear before any use of obstacksin the source file.
Usually these are defined to use malloc
via the intermediaryxmalloc
(seeUnconstrained Allocation). This is done withthe following pair of macro definitions:
#define obstack_chunk_alloc xmalloc #define obstack_chunk_free free
Though the memory you get using obstacks really comes from malloc
,using obstacks is faster because malloc
is called less often, forlarger blocks of memory. SeeObstack Chunks, for full details.
At run time, before the program can use a struct obstack
objectas an obstack, it must initialize the obstack by callingobstack_init
.
Initialize obstack obstack-ptr for allocation of objects. Thisfunction calls the obstack's
obstack_chunk_alloc
function. Ifallocation of memory fails, the function pointed to byobstack_alloc_failed_handler
is called. Theobstack_init
function always returns 1 (Compatibility notice: Former versions ofobstack returned 0 if allocation failed).
Here are two examples of how to allocate the space for an obstack andinitialize it. First, an obstack that is a static variable:
static struct obstack myobstack; ... obstack_init (&myobstack);
Second, an obstack that is itself dynamically allocated:
struct obstack *myobstack_ptr = (struct obstack *) xmalloc (sizeof (struct obstack)); obstack_init (myobstack_ptr);
The value of this variable is a pointer to a function that
obstack
uses whenobstack_chunk_alloc
fails to allocatememory. The default action is to print a message and abort. You should supply a function that either callsexit
(seeProgram Termination) orlongjmp
(see Non-Local Exits) and doesn't return.void my_obstack_alloc_failed (void) ... obstack_alloc_failed_handler = &my_obstack_alloc_failed;
The most direct way to allocate an object in an obstack is withobstack_alloc
, which is invoked almost likemalloc
.
This allocates an uninitialized block of size bytes in an obstackand returns its address. Hereobstack-ptr specifies which obstackto allocate the block in; it is the address of the
struct obstack
object which represents the obstack. Each obstack function or macrorequires you to specify anobstack-ptr as the first argument.This function calls the obstack's
obstack_chunk_alloc
function ifit needs to allocate a new chunk of memory; it callsobstack_alloc_failed_handler
if allocation of memory byobstack_chunk_alloc
failed.
For example, here is a function that allocates a copy of a string strin a specific obstack, which is in the variablestring_obstack
:
struct obstack string_obstack; char * copystring (char *string) { size_t len = strlen (string) + 1; char *s = (char *) obstack_alloc (&string_obstack, len); memcpy (s, string, len); return s; }
To allocate a block with specified contents, use the functionobstack_copy
, declared like this:
This allocates a block and initializes it by copying sizebytes of data starting ataddress. It calls
obstack_alloc_failed_handler
if allocation of memory byobstack_chunk_alloc
failed.
Like
obstack_copy
, but appends an extra byte containing a nullcharacter. This extra byte is not counted in the argumentsize.
The obstack_copy0
function is convenient for copying a sequenceof characters into an obstack as a null-terminated string. Here is anexample of its use:
char * obstack_savestring (char *addr, int size) { return obstack_copy0 (&myobstack, addr, size); }
Contrast this with the previous example of savestring
usingmalloc
(seeBasic Allocation).
To free an object allocated in an obstack, use the functionobstack_free
. Since the obstack is a stack of objects, freeingone object automatically frees all other objects allocated more recentlyin the same obstack.
If object is a null pointer, everything allocated in the obstackis freed. Otherwise,object must be the address of an objectallocated in the obstack. Thenobject is freed, along witheverything allocated in obstack sinceobject.
Note that if object is a null pointer, the result is anuninitialized obstack. To free all memory in an obstack but leave itvalid for further allocation, callobstack_free
with the addressof the first object allocated on the obstack:
obstack_free (obstack_ptr, first_object_allocated_ptr);
Recall that the objects in an obstack are grouped into chunks. When allthe objects in a chunk become free, the obstack library automaticallyfrees the chunk (seePreparing for Obstacks). Then otherobstacks, or non-obstack allocation, can reuse the space of the chunk.
The interfaces for using obstacks may be defined either as functions oras macros, depending on the compiler. The obstack facility works withall C compilers, including both ISO C and traditional C, but there areprecautions you must take if you plan to use compilers other than GNU C.
If you are using an old-fashioned non-ISO C compiler, all the obstack“functions” are actually defined only as macros. You can call thesemacros like functions, but you cannot use them in any other way (forexample, you cannot take their address).
Calling the macros requires a special precaution: namely, the firstoperand (the obstack pointer) may not contain any side effects, becauseit may be computed more than once. For example, if you write this:
obstack_alloc (get_obstack (), 4);
you will find that get_obstack
may be called several times. If you use*obstack_list_ptr++
as the obstack pointer argument,you will get very strange results since the incrementation may occurseveral times.
In ISO C, each function has both a macro definition and a functiondefinition. The function definition is used if you take the address of thefunction without calling it. An ordinary call uses the macro definition bydefault, but you can request the function definition instead by writing thefunction name in parentheses, as shown here:
char *x; void *(*funcp) (); /* Use the macro. */ x = (char *) obstack_alloc (obptr, size); /* Call the function. */ x = (char *) (obstack_alloc) (obptr, size); /* Take the address of the function. */ funcp = obstack_alloc;
This is the same situation that exists in ISO C for the standard libraryfunctions. SeeMacro Definitions.
Warning: When you do use the macros, you must observe theprecaution of avoiding side effects in the first operand, even in ISO C.
If you use the GNU C compiler, this precaution is not necessary, becausevarious language extensions in GNU C permit defining the macros so as tocompute each argument only once.
Because memory in obstack chunks is used sequentially, it is possible tobuild up an object step by step, adding one or more bytes at a time to theend of the object. With this technique, you do not need to know how muchdata you will put in the object until you come to the end of it. We callthis the technique ofgrowing objects. The special functionsfor adding data to the growing object are described in this section.
You don't need to do anything special when you start to grow an object. Using one of the functions to add data to the object automaticallystarts it. However, it is necessary to say explicitly when the object isfinished. This is done with the functionobstack_finish
.
The actual address of the object thus built up is not known until theobject is finished. Until then, it always remains possible that you willadd so much data that the object must be copied into a new chunk.
While the obstack is in use for a growing object, you cannot use it forordinary allocation of another object. If you try to do so, the spacealready added to the growing object will become part of the other object.
The most basic function for adding to a growing object is
obstack_blank
, which adds space without initializing it.
To add a block of initialized space, use
obstack_grow
, which isthe growing-object analogue ofobstack_copy
. It addssizebytes of data to the growing object, copying the contents fromdata.
This is the growing-object analogue of
obstack_copy0
. It addssize bytes copied fromdata, followed by an additional nullcharacter.
To add one character at a time, use the function
obstack_1grow
. It adds a single byte containingc to the growing object.
Adding the value of a pointer one can use the function
obstack_ptr_grow
. It addssizeof (void *)
bytescontaining the value ofdata.
A single value of type
int
can be added by using theobstack_int_grow
function. It addssizeof (int)
bytes tothe growing object and initializes them with the value ofdata.
When you are finished growing the object, use the function
obstack_finish
to close it off and return its final address.Once you have finished the object, the obstack is available for ordinaryallocation or for growing another object.
This function can return a null pointer under the same conditions as
obstack_alloc
(seeAllocation in an Obstack).
When you build an object by growing it, you will probably need to knowafterward how long it became. You need not keep track of this as you growthe object, because you can find out the length from the obstack justbefore finishing the object with the functionobstack_object_size
,declared as follows:
This function returns the current size of the growing object, in bytes. Remember to call this functionbefore finishing the object. After it is finished,
obstack_object_size
will return zero.
If you have started growing an object and wish to cancel it, you shouldfinish it and then free it, like this:
obstack_free (obstack_ptr, obstack_finish (obstack_ptr));
This has no effect if no object was growing.
You can use obstack_blank
with a negative size argument to makethe current object smaller. Just don't try to shrink it beyond zerolength—there's no telling what will happen if you do that.
The usual functions for growing objects incur overhead for checkingwhether there is room for the new growth in the current chunk. If youare frequently constructing objects in small steps of growth, thisoverhead can be significant.
You can reduce the overhead by using special “fast growth”functions that grow the object without checking. In order to have arobust program, you must do the checking yourself. If you do this checkingin the simplest way each time you are about to add data to the object, youhave not saved anything, because that is what the ordinary growthfunctions do. But if you can arrange to check less often, or checkmore efficiently, then you make the program faster.
The function obstack_room
returns the amount of room availablein the current chunk. It is declared as follows:
This returns the number of bytes that can be added safely to the currentgrowing object (or to an object about to be started) in obstackobstack using the fast growth functions.
While you know there is room, you can use these fast growth functionsfor adding data to a growing object:
The function
obstack_1grow_fast
adds one byte containing thecharacterc to the growing object in obstackobstack-ptr.
The function
obstack_ptr_grow_fast
addssizeof (void *)
bytes containing the value ofdata to the growing object inobstackobstack-ptr.
The function
obstack_int_grow_fast
addssizeof (int)
bytescontaining the value ofdata to the growing object in obstackobstack-ptr.
The function
obstack_blank_fast
adds size bytes to thegrowing object in obstackobstack-ptr without initializing them.
When you check for space using obstack_room
and there is notenough room for what you want to add, the fast growth functionsare not safe. In this case, simply use the corresponding ordinarygrowth function instead. Very soon this will copy the object to anew chunk; then there will be lots of room available again.
So, each time you use an ordinary growth function, check afterward forsufficient space usingobstack_room
. Once the object is copiedto a new chunk, there will be plenty of space again, so the program willstart using the fast growth functions again.
Here is an example:
void add_string (struct obstack *obstack, const char *ptr, int len) { while (len > 0) { int room = obstack_room (obstack); if (room == 0) { /* Not enough room. Add one character slowly, which may copy to a new chunk and make room. */ obstack_1grow (obstack, *ptr++); len--; } else { if (room > len) room = len; /* Add fast as much as we have room for. */ len -= room; while (room-- > 0) obstack_1grow_fast (obstack, *ptr++); } } }
Here are functions that provide information on the current status ofallocation in an obstack. You can use them to learn about an object whilestill growing it.
This function returns the tentative address of the beginning of thecurrently growing object inobstack-ptr. If you finish the objectimmediately, it will have that address. If you make it larger first, itmay outgrow the current chunk—then its address will change!
If no object is growing, this value says where the next object youallocate will start (once again assuming it fits in the currentchunk).
This function returns the address of the first free byte in the currentchunk of obstackobstack-ptr. This is the end of the currentlygrowing object. If no object is growing,
obstack_next_free
returns the same value asobstack_base
.
This function returns the size in bytes of the currently growing object. This is equivalent to
obstack_next_free (obstack-ptr) - obstack_base (obstack-ptr)
Each obstack has an alignment boundary; each object allocated inthe obstack automatically starts on an address that is a multiple of thespecified boundary. By default, this boundary is aligned so thatthe object can hold any type of data.
To access an obstack's alignment boundary, use the macroobstack_alignment_mask
, whose function prototype looks likethis:
The value is a bit mask; a bit that is 1 indicates that the correspondingbit in the address of an object should be 0. The mask value should be oneless than a power of 2; the effect is that all object addresses aremultiples of that power of 2. The default value of the mask is a valuethat allows aligned objects to hold any type of data: for example, ifits value is 3, any type of data can be stored at locations whoseaddresses are multiples of 4. A mask value of 0 means an object can starton any multiple of 1 (that is, no alignment is required).
The expansion of the macro
obstack_alignment_mask
is an lvalue,so you can alter the mask by assignment. For example, this statement:obstack_alignment_mask (obstack_ptr) = 0;has the effect of turning off alignment processing in the specified obstack.
Note that a change in alignment mask does not take effect untilafter the next time an object is allocated or finished in theobstack. If you are not growing an object, you can make the newalignment mask take effect immediately by callingobstack_finish
. This will finish a zero-length object and then do proper alignment forthe next object.
Obstacks work by allocating space for themselves in large chunks, andthen parceling out space in the chunks to satisfy your requests. Chunksare normally 4096 bytes long unless you specify a different chunk size. The chunk size includes 8 bytes of overhead that are not actually usedfor storing objects. Regardless of the specified size, longer chunkswill be allocated when necessary for long objects.
The obstack library allocates chunks by calling the functionobstack_chunk_alloc
, which you must define. When a chunk is nolonger needed because you have freed all the objects in it, the obstacklibrary frees the chunk by callingobstack_chunk_free
, which youmust also define.
These two must be defined (as macros) or declared (as functions) in eachsource file that usesobstack_init
(seeCreating Obstacks). Most often they are defined as macros like this:
#define obstack_chunk_alloc malloc #define obstack_chunk_free free
Note that these are simple macros (no arguments). Macro definitions witharguments will not work! It is necessary thatobstack_chunk_alloc
orobstack_chunk_free
, alone, expand into a function name if it isnot itself a function name.
If you allocate chunks with malloc
, the chunk size should be apower of 2. The default chunk size, 4096, was chosen because it is longenough to satisfy many typical requests on the obstack yet short enoughnot to waste too much memory in the portion of the last chunk not yet used.
This returns the chunk size of the given obstack.
Since this macro expands to an lvalue, you can specify a new chunk size byassigning it a new value. Doing so does not affect the chunks alreadyallocated, but will change the size of chunks allocated for that particularobstack in the future. It is unlikely to be useful to make the chunk sizesmaller, but making it larger might improve efficiency if you areallocating many objects whose size is comparable to the chunk size. Hereis how to do so cleanly:
if (obstack_chunk_size (obstack_ptr) < new-chunk-size) obstack_chunk_size (obstack_ptr) = new-chunk-size;
Here is a summary of all the functions associated with obstacks. Eachtakes the address of an obstack (struct obstack *
) as its firstargument.
void obstack_init (struct obstack *
obstack-ptr
)
void *obstack_alloc (struct obstack *
obstack-ptr
, int
size
)
void *obstack_copy (struct obstack *
obstack-ptr
, void *
address
, int
size
)
void *obstack_copy0 (struct obstack *
obstack-ptr
, void *
address
, int
size
)
void obstack_free (struct obstack *
obstack-ptr
, void *
object
)
void obstack_blank (struct obstack *
obstack-ptr
, int
size
)
void obstack_grow (struct obstack *
obstack-ptr
, void *
address
, int
size
)
void obstack_grow0 (struct obstack *
obstack-ptr
, void *
address
, int
size
)
void obstack_1grow (struct obstack *
obstack-ptr
, char
data-char
)
void *obstack_finish (struct obstack *
obstack-ptr
)
int obstack_object_size (struct obstack *
obstack-ptr
)
void obstack_blank_fast (struct obstack *
obstack-ptr
, int
size
)
void obstack_1grow_fast (struct obstack *
obstack-ptr
, char
data-char
)
int obstack_room (struct obstack *
obstack-ptr
)
int obstack_alignment_mask (struct obstack *
obstack-ptr
)
int obstack_chunk_size (struct obstack *
obstack-ptr
)
void *obstack_base (struct obstack *
obstack-ptr
)
void *obstack_next_free (struct obstack *
obstack-ptr
)
The functionalloca
supports a kind of half-dynamic allocation inwhich blocks are allocated dynamically but freed automatically.
Allocating a block with alloca
is an explicit action; you canallocate as many blocks as you wish, and compute the size at run time. Butall the blocks are freed when you exit the function thatalloca
wascalled from, just as if they were automatic variables declared in thatfunction. There is no way to free the space explicitly.
The prototype for alloca
is in stdlib.h. This function isa BSD extension.
The return value of
alloca
is the address of a block of sizebytes of memory, allocated in the stack frame of the calling function.
Do not use alloca
inside the arguments of a function call—youwill get unpredictable results, because the stack space for thealloca
would appear on the stack in the middle of the space forthe function arguments. An example of what to avoid is foo (x,alloca (4), y)
.
alloca
ExampleAs an example of the use of alloca
, here is a function that opensa file name made from concatenating two argument strings, and returns afile descriptor or minus one signifying failure:
int open2 (char *str1, char *str2, int flags, int mode) { char *name = (char *) alloca (strlen (str1) + strlen (str2) + 1); stpcpy (stpcpy (name, str1), str2); return open (name, flags, mode); }
Here is how you would get the same results with malloc
andfree
:
int open2 (char *str1, char *str2, int flags, int mode) { char *name = (char *) malloc (strlen (str1) + strlen (str2) + 1); int desc; if (name == 0) fatal ("virtual memory exceeded"); stpcpy (stpcpy (name, str1), str2); desc = open (name, flags, mode); free (name); return desc; }
As you can see, it is simpler with alloca
. But alloca
hasother, more important advantages, and some disadvantages.
alloca
Here are the reasons why alloca
may be preferable to malloc
:
alloca
wastes very little space and is very fast. (It isopen-coded by the GNU C compiler.)alloca
does not have separate pools for different sizes ofblock, space used for any size block can be reused for any other size.alloca
does not cause memory fragmentation.longjmp
(seeNon-Local Exits)automatically free the space allocated withalloca
when they exitthrough the function that calledalloca
. This is the mostimportant reason to usealloca
. To illustrate this, suppose you have a functionopen_or_report_error
which returns a descriptor, likeopen
, if it succeeds, but does not return to its caller if itfails. If the file cannot be opened, it prints an error message andjumps out to the command level of your program using longjmp
. Let's changeopen2
(seeAlloca Example) to use thissubroutine:
int open2 (char *str1, char *str2, int flags, int mode) { char *name = (char *) alloca (strlen (str1) + strlen (str2) + 1); stpcpy (stpcpy (name, str1), str2); return open_or_report_error (name, flags, mode); }
Because of the way alloca
works, the memory it allocates isfreed even when an error occurs, with no special effort required.
By contrast, the previous definition of open2
(which usesmalloc
andfree
) would develop a memory leak if it werechanged in this way. Even if you are willing to make more changes tofix it, there is no easy way to do so.
alloca
These are the disadvantages ofalloca
in comparison withmalloc
:
alloca
, so it is lessportable. However, a slower emulation ofalloca
written in Cis available for use on systems with this deficiency.In GNU C, you can replace most uses ofalloca
with an array ofvariable size. Here is howopen2
would look then:
int open2 (char *str1, char *str2, int flags, int mode) { char name[strlen (str1) + strlen (str2) + 1]; stpcpy (stpcpy (name, str1), str2); return open (name, flags, mode); }
But alloca
is not always equivalent to a variable-sized array, forseveral reasons:
alloca
remains until the end of the function.alloca
within a loop, allocating anadditional block on each iteration. This is impossible withvariable-sized arrays.NB: If you mix use of alloca
and variable-sized arrayswithin one function, exiting a scope in which a variable-sized array wasdeclared frees all blocks allocated withalloca
during theexecution of that scope.
The symbols in this section are declared in unistd.h.
You will not normally use the functions in this section, because thefunctions described inMemory Allocation are easier to use. Thoseare interfaces to a GNU C Library memory allocator that uses thefunctions below itself. The functions below are simple interfaces tosystem calls.
brk
sets the high end of the calling process' data segment toaddr.The address of the end of a segment is defined to be the address of thelast byte in the segment plus 1.
The function has no effect if addr is lower than the low end ofthe data segment. (This is considered success, by the way).
The function fails if it would cause the data segment to overlap anothersegment or exceed the process' data storage limit (seeLimits on Resources).
The function is named for a common historical case where data storageand the stack are in the same segment. Data storage allocation growsupward from the bottom of the segment while the stack grows downwardtoward it from the top of the segment and the curtain between them iscalled the break.
The return value is zero on success. On failure, the return value is
-1
anderrno
is set accordingly. The followingerrno
values are specific to this function:
ENOMEM
- The request would cause the data segment to overlap another segment orexceed the process' data storage limit.
This function is the same as
brk
except that you specify the newend of the data segment as an offsetdelta from the current endand on success the return value is the address of the resulting end ofthe data segment instead of zero.This means you can use ‘sbrk(0)’ to find out what the current endof the data segment is.
You can tell the system to associate a particular virtual memory pagewith a real page frame and keep it that way — i.e., cause the page tobe paged in if it isn't already and mark it so it will never be pagedout and consequently will never cause a page fault. This is calledlocking a page.
The functions in this chapter lock and unlock the calling process'pages.
Because page faults cause paged out pages to be paged in transparently,a process rarely needs to be concerned about locking pages. However,there are two reasons people sometimes are:
In some cases, the programmer knows better than the system's demandpaging allocator which pages should remain in real memory to optimizesystem performance. In this case, locking pages can help.
Be aware that when you lock a page, that's one fewer page frame that canbe used to back other virtual memory (by the same or other processes),which can mean more page faults, which means the system runs moreslowly. In fact, if you lock enough memory, some programs may not beable to run at all for lack of real memory.
A memory lock is associated with a virtual page, not a real frame. Thepaging rule is: If a frame backs at least one locked page, don't page itout.
Memory locks do not stack. I.e., you can't lock a particular page twiceso that it has to be unlocked twice before it is truly unlocked. It iseither locked or it isn't.
A memory lock persists until the process that owns the memory explicitlyunlocks it. (But process termination and exec cause the virtual memoryto cease to exist, which you might say means it isn't locked any more).
Memory locks are not inherited by child processes. (But note that on amodern Unix system, immediately after a fork, the parent's and thechild's virtual address space are backed by the same real page frames,so the child enjoys the parent's locks). SeeCreating a Process.
Because of its ability to impact other processes, only the superuser canlock a page. Any process can unlock its own page.
The system sets limits on the amount of memory a process can have lockedand the amount of real memory it can have dedicated to it. SeeLimits on Resources.
In Linux, locked pages aren't as locked as you might think. Two virtual pages that are not shared memory can nonetheless be backedby the same real frame. The kernel does this in the name of efficiencywhen it knows both virtual pages contain identical data, and does iteven if one or both of the virtual pages are locked.
But when a process modifies one of those pages, the kernel must get it aseparate frame and fill it with the page's data. This is known as acopy-on-write page fault. It takes a small amount of time and ina pathological case, getting that frame may require I/O. To make sure this doesn't happen to your program, don't just lock thepages. Write to them as well, unless you know you won't write to themever. And to make sure you have pre-allocated frames for your stack,enter a scope that declares a C automatic variable larger than themaximum stack size you will need, set it to something, then return fromits scope.
The symbols in this section are declared in sys/mman.h. Thesefunctions are defined by POSIX.1b, but their availability depends onyour kernel. If your kernel doesn't allow these functions, they existbut always fail. They are available with a Linux kernel.
Portability Note: POSIX.1b requires that when the mlock
andmunlock
functions are available, the fileunistd.hdefine the macro_POSIX_MEMLOCK_RANGE
and the filelimits.h
define the macroPAGESIZE
to be the size of amemory page in bytes. It requires that when themlockall
andmunlockall
functions are available, the unistd.h filedefine the macro_POSIX_MEMLOCK
. The GNU C Library conforms tothis requirement.
mlock
locks a range of the calling process' virtual pages.The range of memory starts at address addr and is len byteslong. Actually, since you must lock whole pages, it is the range ofpages that include any part of the specified range.
When the function returns successfully, each of those pages is backed by(connected to) a real frame (is resident) and is marked to stay thatway. This means the function may cause page-ins and have to wait forthem.
When the function fails, it does not affect the lock status of anypages.
The return value is zero if the function succeeds. Otherwise, it is
-1
anderrno
is set accordingly.errno
valuesspecific to this function are:
ENOMEM
- At least some of the specified address range does not exist in thecalling process' virtual address space.
- The locking would cause the process to exceed its locked page limit.
EPERM
- The calling process is not superuser.
EINVAL
- len is not positive.
ENOSYS
- The kernel does not provide
mlock
capability.You can lock all a process' memory with
mlockall
. Youunlock memory withmunlock
ormunlockall
.To avoid all page faults in a C program, you have to use
mlockall
, because some of the memory a program uses is hiddenfrom the C code, e.g. the stack and automatic variables, and youwouldn't know what address to tellmlock
.
munlock
unlocks a range of the calling process' virtual pages.
munlock
is the inverse ofmlock
and functions completelyanalogously tomlock
, except that there is noEPERM
failure.
mlockall
locks all the pages in a process' virtual memory addressspace, and/or any that are added to it in the future. This includes thepages of the code, data and stack segment, as well as shared libraries,user space kernel data, shared memory, and memory mapped files.flags is a string of single bit flags represented by the followingmacros. They tell
mlockall
which of its functions you want. Allother bits must be zero.
MCL_CURRENT
- Lock all pages which currently exist in the calling process' virtualaddress space.
MCL_FUTURE
- Set a mode such that any pages added to the process' virtual addressspace in the future will be locked from birth. This mode does notaffect future address spaces owned by the same process so exec, whichreplaces a process' address space, wipes out
MCL_FUTURE
. See Executing a File.When the function returns successfully, and you specified
MCL_CURRENT
, all of the process' pages are backed by (connectedto) real frames (they are resident) and are marked to stay that way. This means the function may cause page-ins and have to wait for them.When the process is in
MCL_FUTURE
mode because it successfullyexecuted this function and specifiedMCL_CURRENT
, any system callby the process that requires space be added to its virtual address spacefails witherrno
=ENOMEM
if locking the additional spacewould cause the process to exceed its locked page limit. In the casethat the address space addition that can't be accommodated is stackexpansion, the stack expansion fails and the kernel sends aSIGSEGV
signal to the process.When the function fails, it does not affect the lock status of any pagesor the future locking mode.
The return value is zero if the function succeeds. Otherwise, it is
-1
anderrno
is set accordingly.errno
valuesspecific to this function are:
ENOMEM
- At least some of the specified address range does not exist in thecalling process' virtual address space.
- The locking would cause the process to exceed its locked page limit.
EPERM
- The calling process is not superuser.
EINVAL
- Undefined bits in flags are not zero.
ENOSYS
- The kernel does not provide
mlockall
capability.You can lock just specific pages with
mlock
. You unlock pageswithmunlockall
andmunlock
.
munlockall
unlocks every page in the calling process' virtualaddress space and turn offMCL_FUTURE
future locking mode.The return value is zero if the function succeeds. Otherwise, it is
-1
anderrno
is set accordingly. The only way thisfunction can fail is for generic reasons that all functions and systemcalls can fail, so there are no specificerrno
values.
Programs that work with characters and strings often need to classify acharacter—is it alphabetic, is it a digit, is it whitespace, and soon—and perform case conversion operations on characters. Thefunctions in the header filectype.h are provided for thispurpose. Since the choice of locale and character set can alter theclassifications of particular character codes, all of these functionsare affected by the current locale. (More precisely, they are affectedby the locale currently selected for character classification—theLC_CTYPE
category; seeLocale Categories.)
The ISO C standard specifies two different sets of functions. Theone set works onchar
type characters, the other one onwchar_t
wide characters (seeExtended Char Intro).
This section explains the library functions for classifying characters. For example, isalpha
is the function to test for an alphabeticcharacter. It takes one argument, the character to test, and returns anonzero integer if the character is alphabetic, and zero otherwise. Youwould use it like this:
if (isalpha (c)) printf ("The character `%c' is alphabetic.\n", c);
Each of the functions in this section tests for membership in aparticular class of characters; each has a name starting with ‘is’. Each of them takes one argument, which is a character to test, andreturns anint
which is treated as a boolean value. Thecharacter argument is passed as anint
, and it may be theconstant valueEOF
instead of a real character.
The attributes of any given character can vary between locales. See Locales, for more information on locales.
These functions are declared in the header file ctype.h.
Returns true if c is a lower-case letter. The letter need not befrom the Latin alphabet, any alphabet representable is valid.
Returns true if c is an upper-case letter. The letter need not befrom the Latin alphabet, any alphabet representable is valid.
Returns true if c is an alphabetic character (a letter). If
islower
orisupper
is true of a character, thenisalpha
is also true.In some locales, there may be additional characters for which
isalpha
is true—letters which are neither upper case nor lowercase. But in the standard"C"
locale, there are no suchadditional characters.
Returns true if c is a decimal digit (‘0’ through ‘9’).
Returns true if c is an alphanumeric character (a letter ornumber); in other words, if either
isalpha
orisdigit
istrue of a character, thenisalnum
is also true.
Returns true if c is a hexadecimal digit. Hexadecimal digits include the normal decimal digits ‘0’ through‘9’ and the letters ‘A’ through ‘F’ and‘a’ through ‘f’.
Returns true if c is a punctuation character. This means any printing character that is not alphanumeric or a spacecharacter.
Returns true if c is a whitespace character. In the standard
"C"
locale,isspace
returns true for only the standardwhitespace characters:
' '
- space
'\f'
- formfeed
'\n'
- newline
'\r'
- carriage return
'\t'
- horizontal tab
'\v'
- vertical tab
Returns true if c is a blank character; that is, a space or a tab. This function was originally a GNU extension, but was added in ISO C99.
Returns true if c is a graphic character; that is, a characterthat has a glyph associated with it. The whitespace characters are notconsidered graphic.
Returns true if c is a printing character. Printing charactersinclude all the graphic characters, plus the space (‘’) character.
Returns true if c is a control character (that is, a character thatis not a printing character).
Returns true if c is a 7-bit
unsigned char
value that fitsinto the US/UK ASCII character set. This function is a BSD extensionand is also an SVID extension.
This section explains the library functions for performing conversionssuch as case mappings on characters. For example, toupper
converts any character to upper case if possible. If the charactercan't be converted,toupper
returns it unchanged.
These functions take one argument of type int
, which is thecharacter to convert, and return the converted character as anint
. If the conversion is not applicable to the argument given,the argument is returned unchanged.
Compatibility Note: In pre-ISO C dialects, instead ofreturning the argument unchanged, these functions may fail when theargument is not suitable for the conversion. Thus for portability, youmay need to writeislower(c) ? toupper(c) : c
rather than justtoupper(c)
.
These functions are declared in the header file ctype.h.
If c is an upper-case letter,
tolower
returns the correspondinglower-case letter. Ifc is not an upper-case letter,c is returned unchanged.
If c is a lower-case letter,
toupper
returns the correspondingupper-case letter. Otherwisec is returned unchanged.
This function converts c to a 7-bit
unsigned char
valuethat fits into the US/UK ASCII character set, by clearing the high-orderbits. This function is a BSD extension and is also an SVID extension.
This is identical to
tolower
, and is provided for compatibilitywith the SVID. SeeSVID.
This is identical to
toupper
, and is provided for compatibilitywith the SVID.
Amendment 1 to ISO C90 defines functions to classify widecharacters. Although the original ISO C90 standard already definedthe typewchar_t
, no functions operating on them were defined.
The general design of the classification functions for wide charactersis more general. It allows extensions to the set of availableclassifications, beyond those which are always available. The POSIXstandard specifies how extensions can be made, and this is alreadyimplemented in the GNU C Library implementation of the localedef
program.
The character class functions are normally implemented with bitsets,with a bitset per character. For a given character, the appropriatebitset is read from a table and a test is performed as to whether acertain bit is set. Which bit is tested for is determined by theclass.
For the wide character classification functions this is made visible. There is a type classification type defined, a function to retrieve thisvalue for a given class, and a function to test whether a givencharacter is in this class, using the classification value. On top ofthis the normal character classification functions as used forchar
objects can be defined.
The
wctype_t
can hold a value which represents a character class. The only defined way to generate such a value is by using thewctype
function.This type is defined in wctype.h.
The
wctype
returns a value representing a class of widecharacters which is identified by the stringproperty. Besidesome standard properties each locale can define its own ones. In caseno property with the given name is known for the current localeselected for theLC_CTYPE
category, the function returns zero.The properties known in every locale are:
"alnum"
"alpha"
"cntrl"
"digit"
"graph"
"lower"
"print"
"punct"
"space"
"upper"
"xdigit"
This function is declared in wctype.h.
To test the membership of a character to one of the non-standard classesthe ISO C standard defines a completely new function.
This function returns a nonzero value if wc is in the characterclass specified bydesc.desc must previously be returnedby a successful call to
wctype
.This function is declared in wctype.h.
To make it easier to use the commonly-used classification functions,they are defined in the C library. There is no need to usewctype
if the property string is one of the known characterclasses. In some situations it is desirable to construct the propertystrings, and then it is important that wctype
can also handle thestandard classes.
This function returns a nonzero value if wc is an alphanumericcharacter (a letter or number); in other words, if either
iswalpha
oriswdigit
is true of a character, theniswalnum
is alsotrue.This function can be implemented using
iswctype (wc, wctype ("alnum"))It is declared in wctype.h.
Returns true if wc is an alphabetic character (a letter). If
iswlower
oriswupper
is true of a character, theniswalpha
is also true.In some locales, there may be additional characters for which
iswalpha
is true—letters which are neither upper case nor lowercase. But in the standard"C"
locale, there are no suchadditional characters.This function can be implemented using
iswctype (wc, wctype ("alpha"))It is declared in wctype.h.
Returns true if wc is a control character (that is, a character thatis not a printing character).
This function can be implemented using
iswctype (wc, wctype ("cntrl"))It is declared in wctype.h.
Returns true if wc is a digit (e.g., ‘0’ through ‘9’). Please note that this function does not only return a nonzero value fordecimal digits, but for all kinds of digits. A consequence isthat code like the following will not work unconditionally forwide characters:
n = 0; while (iswdigit (*wc)) { n *= 10; n += *wc++ - L'0'; }This function can be implemented using
iswctype (wc, wctype ("digit"))It is declared in wctype.h.
Returns true if wc is a graphic character; that is, a characterthat has a glyph associated with it. The whitespace characters are notconsidered graphic.
This function can be implemented using
iswctype (wc, wctype ("graph"))It is declared in wctype.h.
Returns true if wc is a lower-case letter. The letter need not befrom the Latin alphabet, any alphabet representable is valid.
This function can be implemented using
iswctype (wc, wctype ("lower"))It is declared in wctype.h.
Returns true if wc is a printing character. Printing charactersinclude all the graphic characters, plus the space (‘’) character.
This function can be implemented using
iswctype (wc, wctype ("print"))It is declared in wctype.h.
Returns true if wc is a punctuation character. This means any printing character that is not alphanumeric or a spacecharacter.
This function can be implemented using
iswctype (wc, wctype ("punct"))It is declared in wctype.h.
Returns true if wc is a whitespace character. In the standard
"C"
locale,iswspace
returns true for only the standardwhitespace characters:
L' '
- space
L'\f'
- formfeed
L'\n'
- newline
L'\r'
- carriage return
L'\t'
- horizontal tab
L'\v'
- vertical tab
This function can be implemented using
iswctype (wc, wctype ("space"))It is declared in wctype.h.
Returns true if wc is an upper-case letter. The letter need not befrom the Latin alphabet, any alphabet representable is valid.
This function can be implemented using
iswctype (wc, wctype ("upper"))It is declared in wctype.h.
Returns true if wc is a hexadecimal digit. Hexadecimal digits include the normal decimal digits ‘0’ through‘9’ and the letters ‘A’ through ‘F’ and‘a’ through ‘f’.
This function can be implemented using
iswctype (wc, wctype ("xdigit"))It is declared in wctype.h.
The GNU C Library also provides a function which is not defined in theISO C standard but which is available as a version for single bytecharacters as well.
Returns true if wc is a blank character; that is, a space or a tab. This function was originally a GNU extension, but was added in ISO C99. It is declared inwchar.h.
The first note is probably not astonishing but still occasionally acause of problems. Theisw
XXX functions can be implementedusing macros and in fact, the GNU C Library does this. They are stillavailable as real functions but when thewctype.h header isincluded the macros will be used. This is the same as thechar
type versions of these functions.
The second note covers something new. It can be best illustrated by a(real-world) example. The first piece of code is an excerpt from theoriginal code. It is truncated a bit but the intention should be clear.
int is_in_class (int c, const char *class) { if (strcmp (class, "alnum") == 0) return isalnum (c); if (strcmp (class, "alpha") == 0) return isalpha (c); if (strcmp (class, "cntrl") == 0) return iscntrl (c); ... return 0; }
Now, with the wctype
and iswctype
you can avoid theif
cascades, but rewriting the code as follows is wrong:
int is_in_class (int c, const char *class) { wctype_t desc = wctype (class); return desc ? iswctype ((wint_t) c, desc) : 0; }
The problem is that it is not guaranteed that the wide characterrepresentation of a single-byte character can be found using casting. In fact, usually this fails miserably. The correct solution to thisproblem is to write the code as follows:
int is_in_class (int c, const char *class) { wctype_t desc = wctype (class); return desc ? iswctype (btowc (c), desc) : 0; }
See Converting a Character, for more information on btowc
. Note that this change probably does not improve the performanceof the program a lot since thewctype
function still has to makethe string comparisons. It gets really interesting if theis_in_class
function is called more than once for thesame class name. In this case the variabledesc could be computedonce and reused for all the calls. Therefore the above form of thefunction is probably not the final one.
The classification functions are also generalized by the ISO Cstandard. Instead of just allowing the two standard mappings, alocale can contain others. Again, thelocaledef
programalready supports generating such locale data files.
This data type is defined as a scalar type which can hold a valuerepresenting the locale-dependent character mapping. There is no way toconstruct such a value apart from using the return value of the
wctrans
function.This type is defined in wctype.h.
The
wctrans
function has to be used to find out whether a namedmapping is defined in the current locale selected for theLC_CTYPE
category. If the returned value is non-zero, you can useit afterwards in calls totowctrans
. If the return value iszero no such mapping is known in the current locale.Beside locale-specific mappings there are two mappings which areguaranteed to be available in every locale:
"tolower"
"toupper"
These functions are declared in wctype.h.
towctrans
maps the input character wcaccording to the rules of the mapping for whichdesc is adescriptor, and returns the value it finds.desc must beobtained by a successful call towctrans
.This function is declared in wctype.h.
For the generally available mappings, the ISO C standard definesconvenient shortcuts so that it is not necessary to callwctrans
for them.
If wc is an upper-case letter,
towlower
returns the correspondinglower-case letter. Ifwc is not an upper-case letter,wc is returned unchanged.
towlower
can be implemented usingtowctrans (wc, wctrans ("tolower"))This function is declared in wctype.h.
If wc is a lower-case letter,
towupper
returns the correspondingupper-case letter. Otherwisewc is returned unchanged.
towupper
can be implemented usingtowctrans (wc, wctrans ("toupper"))This function is declared in wctype.h.
The same warnings given in the last section for the use of the widecharacter classification functions apply here. It is not possible tosimply cast achar
type value to awint_t
and use it as anargument totowctrans
calls.
Operations on strings (or arrays of characters) are an important part ofmany programs. The GNU C Library provides an extensive set of stringutility functions, including functions for copying, concatenating,comparing, and searching strings. Many of these functions can alsooperate on arbitrary regions of storage; for example, the memcpy
function can be used to copy the contents of any kind of array.
It's fairly common for beginning C programmers to “reinvent the wheel”by duplicating this functionality in their own code, but it pays tobecome familiar with the library functions and to make use of them,since this offers benefits in maintenance, efficiency, and portability.
For instance, you could easily compare one string to another in twolines of C code, but if you use the built-instrcmp
function,you're less likely to make a mistake. And, since these libraryfunctions are typically highly optimized, your program may run fastertoo.
This section is a quick summary of string concepts for beginning Cprogrammers. It describes how character strings are represented in Cand some common pitfalls. If you are already familiar with thismaterial, you can skip this section.
Astring is an array ofchar
objects. But string-valuedvariables are usually declared to be pointers of typechar *
. Such variables do not include space for the text of a string; that hasto be stored somewhere else—in an array variable, a string constant,or dynamically allocated memory (seeMemory Allocation). It's up toyou to store the address of the chosen memory space into the pointervariable. Alternatively you can store anull pointer in thepointer variable. The null pointer does not point anywhere, soattempting to reference the string it points to gets an error.
“string” normally refers to multibyte character strings as opposed towide character strings. Wide character strings are arrays of typewchar_t
and as for multibyte character strings usually pointersof type wchar_t *
are used.
By convention, anull character,'\0'
, marks the end of amultibyte character string and thenull wide character,L'\0'
, marks the end of a wide character string. For example, intesting to see whether thechar *
variablep points to anull character marking the end of a string, you can write!*
p or*
p == '\0'
.
A null character is quite different conceptually from a null pointer,although both are represented by the integer0
.
String literals appear in C program source as strings ofcharacters between double-quote characters (‘"’) where the initialdouble-quote character is immediately preceded by a capital ‘L’(ell) character (as in L"foo"
). In ISO C, string literalscan also be formed by string concatenation:"a" "b"
is thesame as"ab"
. For wide character strings one can either useL"a" L"b"
orL"a" "b"
. Modification of string literals isnot allowed by the GNU C compiler, because literals are placed inread-only storage.
Character arrays that are declared const
cannot be modifiedeither. It's generally good style to declare non-modifiable stringpointers to be of typeconst char *
, since this often allows theC compiler to detect accidental modifications as well as providing someamount of documentation about what your program intends to do with thestring.
The amount of memory allocated for the character array may extend pastthe null character that normally marks the end of the string. In thisdocument, the termallocated size is always used to refer to thetotal amount of memory allocated for the string, while the termlength refers to the number of characters up to (but notincluding) the terminating null character.A notorious source of program bugs is trying to put more characters in astring than fit in its allocated size. When writing code that extendsstrings or moves characters into a pre-allocated array, you should bevery careful to keep track of the length of the text and make explicitchecks for overflowing the array. Many of the library functionsdo not do this for you! Remember also that you need to allocatean extra byte to hold the null character that marks the end of thestring.
Originally strings were sequences of bytes where each byte represents asingle character. This is still true today if the strings are encodedusing a single-byte character encoding. Things are different if thestrings are encoded using a multibyte encoding (for more information onencodings seeExtended Char Intro). There is no difference inthe programming interface for these two kind of strings; the programmerhas to be aware of this and interpret the byte sequences accordingly.
But since there is no separate interface taking care of thesedifferences the byte-based string functions are sometimes hard to use. Since the count parameters of these functions specify bytes a call tostrncpy
could cut a multibyte character in the middle and put anincomplete (and therefore unusable) byte sequence in the target buffer.
To avoid these problems later versions of the ISO C standardintroduce a second set of functions which are operating onwidecharacters (seeExtended Char Intro). These functions don't havethe problems the single-byte versions have since every wide character isa legal, interpretable value. This does not mean that cutting widecharacter strings at arbitrary points is without problems. It normallyis for alphabet-based languages (except for non-normalized text) butlanguages based on syllables still have the problem that more than onewide character is necessary to complete a logical unit. This is ahigher level problem which the C library functions are not designedto solve. But it is at least good that no invalid byte sequences can becreated. Also, the higher level functions can also much easier operateon wide character than on multibyte characters so that a general adviseis to use wide characters internally whenever text is more than simplycopied.
The remaining of this chapter will discuss the functions for handlingwide character strings in parallel with the discussion of the multibytecharacter strings since there is almost always an exact equivalentavailable.
This chapter describes both functions that work on arbitrary arrays orblocks of memory, and functions that are specific to null-terminatedarrays of characters and wide characters.
Functions that operate on arbitrary blocks of memory have namesbeginning with ‘mem’ and ‘wmem’ (such asmemcpy
andwmemcpy
) and invariably take an argument which specifies the size(in bytes and wide characters respectively) of the block of memory tooperate on. The array arguments and return values for these functionshave typevoid *
orwchar_t
. As a matter of style, theelements of the arrays used with the ‘mem’ functions are referredto as “bytes”. You can pass any kind of pointer to these functions,and thesizeof
operator is useful in computing the value for thesize argument. Parameters to the ‘wmem’ functions must be of typewchar_t *
. These functions are not really usable with anythingbut arrays of this type.
In contrast, functions that operate specifically on strings and widecharacter strings have names beginning with ‘str’ and ‘wcs’respectively (such asstrcpy
andwcscpy
) and look for anull character to terminate the string instead of requiring an explicitsize argument to be passed. (Some of these functions accept a specifiedmaximum length, but they also check for premature termination with anull character.) The array arguments and return values for thesefunctions have typechar *
andwchar_t *
respectively, andthe array elements are referred to as “characters” and “widecharacters”.
In many cases, there are both ‘mem’ and ‘str’/‘wcs’versions of a function. The one that is more appropriate to use dependson the exact situation. When your program is manipulating arbitraryarrays or blocks of storage, then you should always use the ‘mem’functions. On the other hand, when you are manipulating null-terminatedstrings it is usually more convenient to use the ‘str’/‘wcs’functions, unless you already know the length of the string in advance. The ‘wmem’ functions should be used for wide character arrays withknown size.
Some of the memory and string functions take single characters asarguments. Since a value of typechar
is automatically promotedinto an value of typeint
when used as a parameter, the functionsare declared withint
as the type of the parameter in question. In case of the wide character function the situation is similarly: theparameter type for a single wide character iswint_t
and notwchar_t
. This would for many implementations not be necessarysince thewchar_t
is large enough to not be automaticallypromoted, but since the ISO C standard does not require such achoice of types thewint_t
type is used.
You can get the length of a string using the strlen
function. This function is declared in the header filestring.h.
The
strlen
function returns the length of the null-terminatedstrings in bytes. (In other words, it returns the offset of theterminating null character within the array.)For example,
strlen ("hello, world") ⇒ 12When applied to a character array, the
strlen
function returnsthe length of the string stored there, not its allocated size. You canget the allocated size of the character array that holds a string usingthesizeof
operator:char string[32] = "hello, world"; sizeof (string) ⇒ 32 strlen (string) ⇒ 12But beware, this will not work unless string is the characterarray itself, not a pointer to it. For example:
char string[32] = "hello, world"; char *ptr = string; sizeof (string) ⇒ 32 sizeof (ptr) ⇒ 4 /* (on a machine with 4 byte pointers) */
This is an easy mistake to make when you are working with functions thattake string arguments; those arguments are always pointers, not arrays.
It must also be noted that for multibyte encoded strings the returnvalue does not have to correspond to the number of characters in thestring. To get this value the string can be converted to widecharacters and
wcslen
can be used or something like the followingcode can be used:/* The input is instring
. The length is expected inn
. */ { mbstate_t t; char *scopy = string; /* In initial state. */ memset (&t, '\0', sizeof (t)); /* Determine number of characters. */ n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t); }This is cumbersome to do so if the number of characters (as opposed tobytes) is needed often it is better to work with wide characters.
The wide character equivalent is declared in wchar.h.
The
wcslen
function is the wide character equivalent tostrlen
. The return value is the number of wide characters in thewide character string pointed to byws (this is also the offset ofthe terminating null wide character of ws).Since there are no multi wide character sequences making up onecharacter the return value is not only the offset in the array, it isalso the number of wide characters.
This function was introduced in Amendment 1 to ISO C90.
The
strnlen
function returns the length of the string s inbytes if this length is smaller thanmaxlen bytes. Otherwise itreturnsmaxlen. Therefore this function is equivalent to(strlen (
s) <
maxlen? strlen (
s) :
maxlen)
but itis more efficient and works even if the strings is notnull-terminated.char string[32] = "hello, world"; strnlen (string, 32) ⇒ 12 strnlen (string, 5) ⇒ 5This function is a GNU extension and is declared in string.h.
wcsnlen
is the wide character equivalent tostrnlen
. Themaxlen parameter specifies the maximum number of wide characters.This function is a GNU extension and is declared in wchar.h.
You can use the functions described in this section to copy the contentsof strings and arrays, or to append the contents of one string toanother. The ‘str’ and ‘mem’ functions are declared in theheader file string.h while the ‘wstr’ and ‘wmem’functions are declared in the filewchar.h.A helpful way to remember the ordering of the arguments to the functionsin this section is that it corresponds to an assignment expression, withthe destination array specified to the left of the source array. Allof these functions return the address of the destination array.
Most of these functions do not work properly if the source anddestination arrays overlap. For example, if the beginning of thedestination array overlaps the end of the source array, the originalcontents of that part of the source array may get overwritten before itis copied. Even worse, in the case of the string functions, the nullcharacter marking the end of the string may be lost, and the copyfunction might get stuck in a loop trashing all the memory allocated toyour program.
All functions that have problems copying between overlapping arrays areexplicitly identified in this manual. In addition to functions in thissection, there are a few others likesprintf
(seeFormatted Output Functions) and scanf
(see Formatted Input Functions).
The
memcpy
function copies size bytes from the objectbeginning atfrom into the object beginning atto. Thebehavior of this function is undefined if the two arraysto andfrom overlap; usememmove
instead if overlapping is possible.The value returned by
memcpy
is the value of to.Here is an example of how you might use
memcpy
to copy thecontents of an array:struct foo *oldarray, *newarray; int arraysize; ... memcpy (new, old, arraysize * sizeof (struct foo));
The
wmemcpy
function copies size wide characters from the objectbeginning atwfrom into the object beginning atwto. Thebehavior of this function is undefined if the two arrayswto andwfrom overlap; usewmemmove
instead if overlapping is possible.The following is a possible implementation of
wmemcpy
but thereare more optimizations possible.wchar_t * wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size) { return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t)); }The value returned by
wmemcpy
is the value of wto.This function was introduced in Amendment 1 to ISO C90.
The
mempcpy
function is nearly identical to thememcpy
function. It copiessize bytes from the object beginning atfrom
into the object pointed to byto. But instead ofreturning the value ofto it returns a pointer to the bytefollowing the last written byte in the object beginning atto. I.e., the value is((void *) ((char *)
to+
size))
.This function is useful in situations where a number of objects shall becopied to consecutive memory positions.
void * combine (void *o1, size_t s1, void *o2, size_t s2) { void *result = malloc (s1 + s2); if (result != NULL) mempcpy (mempcpy (result, o1, s1), o2, s2); return result; }This function is a GNU extension.
The
wmempcpy
function is nearly identical to thewmemcpy
function. It copiessize wide characters from the objectbeginning atwfrom
into the object pointed to bywto. Butinstead of returning the value ofwto it returns a pointer to thewide character following the last written wide character in the objectbeginning atwto. I.e., the value iswto+
size.This function is useful in situations where a number of objects shall becopied to consecutive memory positions.
The following is a possible implementation of
wmemcpy
but thereare more optimizations possible.wchar_t * wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size) { return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); }This function is a GNU extension.
memmove
copies the size bytes at from into thesize bytes atto, even if those two blocks of spaceoverlap. In the case of overlap,memmove
is careful to copy theoriginal values of the bytes in the block atfrom, including thosebytes which also belong to the block atto.The value returned by
memmove
is the value of to.
wmemmove
copies the size wide characters at wfrominto thesize wide characters atwto, even if those twoblocks of space overlap. In the case of overlap,memmove
iscareful to copy the original values of the wide characters in the blockatwfrom, including those wide characters which also belong to theblock atwto.The following is a possible implementation of
wmemcpy
but thereare more optimizations possible.wchar_t * wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size) { return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t)); }The value returned by
wmemmove
is the value of wto.This function is a GNU extension.
This function copies no more than size bytes from from toto, stopping if a byte matchingc is found. The returnvalue is a pointer intoto one byte past wherec was copied,or a null pointer if no byte matchingc appeared in the firstsize bytes offrom.
This function copies the value of c (converted to an
unsigned char
) into each of the firstsize bytes of theobject beginning atblock. It returns the value ofblock.
This function copies the value of wc into each of the firstsize wide characters of the object beginning atblock. Itreturns the value ofblock.
This copies characters from the string from (up to and includingthe terminating null character) into the stringto. Like
memcpy
, this function has undefined results if the stringsoverlap. The return value is the value ofto.
This copies wide characters from the string wfrom (up to andincluding the terminating null wide character) into the stringwto. Like
wmemcpy
, this function has undefined results ifthe strings overlap. The return value is the value ofwto.
This function is similar to
strcpy
but always copies exactlysize characters intoto.If the length of from is more than size, then
strncpy
copies just the firstsize characters. Note that in this casethere is no null terminator written intoto.If the length of from is less than size, then
strncpy
copies all offrom, followed by enough null characters to add uptosize characters in all. This behavior is rarely useful, but itis specified by the ISO C standard.The behavior of
strncpy
is undefined if the strings overlap.Using
strncpy
as opposed tostrcpy
is a way to avoid bugsrelating to writing past the end of the allocated space forto. However, it can also make your program much slower in one common case:copying a string which is probably small into a potentially large buffer. In this case,size may be large, and when it is,strncpy
willwaste a considerable amount of time copying null characters.
This function is similar to
wcscpy
but always copies exactlysize wide characters intowto.If the length of wfrom is more than size, then
wcsncpy
copies just the firstsize wide characters. Notethat in this case there is no null terminator written intowto.If the length of wfrom is less than size, then
wcsncpy
copies all ofwfrom, followed by enough null widecharacters to add up tosize wide characters in all. Thisbehavior is rarely useful, but it is specified by the ISO Cstandard.The behavior of
wcsncpy
is undefined if the strings overlap.Using
wcsncpy
as opposed towcscpy
is a way to avoid bugsrelating to writing past the end of the allocated space forwto. However, it can also make your program much slower in one common case:copying a string which is probably small into a potentially large buffer. In this case,size may be large, and when it is,wcsncpy
willwaste a considerable amount of time copying null wide characters.
This function copies the null-terminated string s into a newlyallocated string. The string is allocated using
malloc
; seeUnconstrained Allocation. Ifmalloc
cannot allocate spacefor the new string,strdup
returns a null pointer. Otherwise itreturns a pointer to the new string.
This function copies the null-terminated wide character string wsinto a newly allocated string. The string is allocated using
malloc
; seeUnconstrained Allocation. Ifmalloc
cannot allocate space for the new string,wcsdup
returns a nullpointer. Otherwise it returns a pointer to the new wide characterstring.This function is a GNU extension.
This function is similar to
strdup
but always copies at mostsize characters into the newly allocated string.If the length of s is more than size, then
strndup
copies just the firstsize characters and adds a closing nullterminator. Otherwise all characters are copied and the string isterminated.This function is different to
strncpy
in that it alwaysterminates the destination string.
strndup
is a GNU extension.
This function is like
strcpy
, except that it returns a pointer tothe end of the stringto (that is, the address of the terminatingnull characterto + strlen (from)
) rather than the beginning.For example, this program uses
stpcpy
to concatenate ‘foo’and ‘bar’ to produce ‘foobar’, which it then prints.#include#include int main (void) { char buffer[10]; char *to = buffer; to = stpcpy (to, "foo"); to = stpcpy (to, "bar"); puts (buffer); return 0; } This function is not part of the ISO or POSIX standards, and is notcustomary on Unix systems, but we did not invent it either. Perhaps itcomes from MS-DOG.
Its behavior is undefined if the strings overlap. The function isdeclared in string.h.
This function is like
wcscpy
, except that it returns a pointer tothe end of the stringwto (that is, the address of the terminatingnull characterwto + strlen (wfrom)
) rather than the beginning.This function is not part of ISO or POSIX but was found useful whiledeveloping the GNU C Library itself.
The behavior of
wcpcpy
is undefined if the strings overlap.
wcpcpy
is a GNU extension and is declared in wchar.h.
This function is similar to
stpcpy
but copies always exactlysize characters intoto.If the length of from is more then size, then
stpncpy
copies just the firstsize characters and returns a pointer to thecharacter directly following the one which was copied last. Note that inthis case there is no null terminator written intoto.If the length of from is less than size, then
stpncpy
copies all offrom, followed by enough null characters to add uptosize characters in all. This behavior is rarely useful, but itis implemented to be useful in contexts where this behavior of thestrncpy
is used.stpncpy
returns a pointer to thefirst written null character.This function is not part of ISO or POSIX but was found useful whiledeveloping the GNU C Library itself.
Its behavior is undefined if the strings overlap. The function isdeclared in string.h.
This function is similar to
wcpcpy
but copies always exactlywsize characters intowto.If the length of wfrom is more then size, then
wcpncpy
copies just the firstsize wide characters andreturns a pointer to the wide character directly following the lastnon-null wide character which was copied last. Note that in this casethere is no null terminator written intowto.If the length of wfrom is less than size, then
wcpncpy
copies all ofwfrom, followed by enough null characters to add uptosize characters in all. This behavior is rarely useful, but itis implemented to be useful in contexts where this behavior of thewcsncpy
is used.wcpncpy
returns a pointer to thefirst written null character.This function is not part of ISO or POSIX but was found useful whiledeveloping the GNU C Library itself.
Its behavior is undefined if the strings overlap.
wcpncpy
is a GNU extension and is declared in wchar.h.
This macro is similar to
strdup
but allocates the new stringusingalloca
instead ofmalloc
(see Variable Size Automatic). This means of course the returned string has the samelimitations as any block of memory allocated usingalloca
.For obvious reasons
strdupa
is implemented only as a macro;you cannot get the address of this function. Despite this limitationit is a useful function. The following code shows a situation whereusingmalloc
would be a lot more expensive.#include#include #include const char path[] = _PATH_STDPATH; int main (void) { char *wr_path = strdupa (path); char *cp = strtok (wr_path, ":"); while (cp != NULL) { puts (cp); cp = strtok (NULL, ":"); } return 0; } Please note that calling
strtok
using path directly isinvalid. It is also not allowed to callstrdupa
in the argumentlist ofstrtok
sincestrdupa
usesalloca
(see Variable Size Automatic) can interfere with the parameterpassing.This function is only available if GNU CC is used.
This function is similar to
strndup
but likestrdupa
itallocates the new string usingalloca
seeVariable Size Automatic. The same advantages and limitationsofstrdupa
are valid forstrndupa
, too.This function is implemented only as a macro, just like
strdupa
. Just asstrdupa
this macro also must not be used inside theparameter list in a function call.
strndupa
is only available if GNU CC is used.
The
strcat
function is similar tostrcpy
, except that thecharacters fromfrom are concatenated or appended to the end ofto, instead of overwriting it. That is, the first character fromfrom overwrites the null character marking the end ofto.An equivalent definition for
strcat
would be:char * strcat (char *restrict to, const char *restrict from) { strcpy (to + strlen (to), from); return to; }This function has undefined results if the strings overlap.
The
wcscat
function is similar towcscpy
, except that thecharacters fromwfrom are concatenated or appended to the end ofwto, instead of overwriting it. That is, the first character fromwfrom overwrites the null character marking the end ofwto.An equivalent definition for
wcscat
would be:wchar_t * wcscat (wchar_t *wto, const wchar_t *wfrom) { wcscpy (wto + wcslen (wto), wfrom); return wto; }This function has undefined results if the strings overlap.
Programmers using the strcat
or wcscat
function (or thefollowingstrncat
orwcsncar
functions for that matter)can easily be recognized as lazy and reckless. In almost all situationsthe lengths of the participating strings are known (it better should besince how can one otherwise ensure the allocated size of the buffer issufficient?) Or at least, one could know them if one keeps track of theresults of the various function calls. But then it is very inefficientto usestrcat
/wcscat
. A lot of time is wasted finding theend of the destination string so that the actual copying can start. This is a common example:
/* This function concatenates arbitrarily many strings. The last parameter must beNULL
. */ char * concat (const char *str, ...) { va_list ap, ap2; size_t total = 1; const char *s; char *result; va_start (ap, str); /* Actuallyva_copy
, but this is the name more gcc versions understand. */ __va_copy (ap2, ap); /* Determine how much space we need. */ for (s = str; s != NULL; s = va_arg (ap, const char *)) total += strlen (s); va_end (ap); result = (char *) malloc (total); if (result != NULL) { result[0] = '\0'; /* Copy the strings. */ for (s = str; s != NULL; s = va_arg (ap2, const char *)) strcat (result, s); } va_end (ap2); return result; }
This looks quite simple, especially the second loop where the stringsare actually copied. But these innocent lines hide a major performancepenalty. Just imagine that ten strings of 100 bytes each have to beconcatenated. For the second string we search the already stored 100bytes for the end of the string so that we can append the next string. For all strings in total the comparisons necessary to find the end ofthe intermediate results sums up to 5500! If we combine the copyingwith the search for the allocation we can write this function moreefficient:
char * concat (const char *str, ...) { va_list ap; size_t allocated = 100; char *result = (char *) malloc (allocated); if (result != NULL) { char *newp; char *wp; const char *s; va_start (ap, str); wp = result; for (s = str; s != NULL; s = va_arg (ap, const char *)) { size_t len = strlen (s); /* Resize the allocated memory if necessary. */ if (wp + len + 1 > result + allocated) { allocated = (allocated + len) * 2; newp = (char *) realloc (result, allocated); if (newp == NULL) { free (result); return NULL; } wp = newp + (wp - result); result = newp; } wp = mempcpy (wp, s, len); } /* Terminate the result string. */ *wp++ = '\0'; /* Resize memory to the optimal size. */ newp = realloc (result, wp - result); if (newp != NULL) result = newp; va_end (ap); } return result; }
With a bit more knowledge about the input strings one could fine-tunethe memory allocation. The difference we are pointing to here is thatwe don't usestrcat
anymore. We always keep track of the lengthof the current intermediate result so we can safe us the search for theend of the string and usemempcpy
. Please note that we alsodon't usestpcpy
which might seem more natural since we handlewith strings. But this is not necessary since we already know thelength of the string and therefore can use the faster memory copyingfunction. The example would work for wide characters the same way.
Whenever a programmer feels the need to use strcat
she or heshould think twice and look through the program whether the code cannotbe rewritten to take advantage of already calculated results. Again: itis almost always unnecessary to usestrcat
.
This function is like
strcat
except that not more than sizecharacters fromfrom are appended to the end ofto. Asingle null character is also always appended toto, so the totalallocated size ofto must be at least size+ 1
byteslonger than its initial length.The
strncat
function could be implemented like this:char * strncat (char *to, const char *from, size_t size) { to[strlen (to) + size] = '\0'; strncpy (to + strlen (to), from, size); return to; }The behavior of
strncat
is undefined if the strings overlap.
This function is like
wcscat
except that not more than sizecharacters fromfrom are appended to the end ofto. Asingle null character is also always appended toto, so the totalallocated size ofto must be at least size+ 1
byteslonger than its initial length.The
wcsncat
function could be implemented like this:wchar_t * wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom, size_t size) { wto[wcslen (to) + size] = L'\0'; wcsncpy (wto + wcslen (wto), wfrom, size); return wto; }The behavior of
wcsncat
is undefined if the strings overlap.
Here is an example showing the use of strncpy
and strncat
(the wide character version is equivalent). Notice how, in the call tostrncat
, thesize parameter is computed to avoidoverflowing the character arraybuffer
.
#include#include #define SIZE 10 static char buffer[SIZE]; int main (void) { strncpy (buffer, "hello", SIZE); puts (buffer); strncat (buffer, ", world", SIZE - strlen (buffer) - 1); puts (buffer); }
The output produced by this program looks like:
hello hello, wo
This is a partially obsolete alternative for
memmove
, derived fromBSD. Note that it is not quite equivalent tomemmove
, because thearguments are not in the same order and there is no return value.
This is a partially obsolete alternative for
memset
, derived fromBSD. Note that it is not as general asmemset
, because the onlyvalue it can store is zero.
You can use the functions in this section to perform comparisons on thecontents of strings and arrays. As well as checking for equality, thesefunctions can also be used as the ordering functions for sortingoperations. SeeSearching and Sorting, for an example of this.
Unlike most comparison operations in C, the string comparison functionsreturn a nonzero value if the strings arenot equivalent ratherthan if they are. The sign of the value indicates the relative orderingof the first characters in the strings that are not equivalent: anegative value indicates that the first string is “less” than thesecond, while a positive value indicates that the first string is“greater”.
The most common use of these functions is to check only for equality. This is canonically done with an expression like ‘! strcmp (s1, s2)’.
All of these functions are declared in the header file string.h.
The function
memcmp
compares the size bytes of memorybeginning ata1 against thesize bytes of memory beginningat a2. The value returned has the same sign as the differencebetween the first differing pair of bytes (interpreted asunsignedchar
objects, then promoted toint
).If the contents of the two blocks are equal,
memcmp
returns0
.
The function
wmemcmp
compares the size wide charactersbeginning ata1 against thesize wide characters beginningat a2. The value returned is smaller than or larger than zerodepending on whether the first differing wide character isa1 issmaller or larger than the corresponding character ina2.If the contents of the two blocks are equal,
wmemcmp
returns0
.
On arbitrary arrays, the memcmp
function is mostly useful fortesting equality. It usually isn't meaningful to do byte-wise orderingcomparisons on arrays of things other than bytes. For example, abyte-wise comparison on the bytes that make up floating-point numbersisn't likely to tell you anything about the relationship between thevalues of the floating-point numbers.
wmemcmp
is really only useful to compare arrays of typewchar_t
since the function looks atsizeof (wchar_t)
bytesat a time and this number of bytes is system dependent.
You should also be careful about using memcmp
to compare objectsthat can contain “holes”, such as the padding inserted into structureobjects to enforce alignment requirements, extra space at the end ofunions, and extra characters at the ends of strings whose length is lessthan their allocated size. The contents of these “holes” areindeterminate and may cause strange behavior when performing byte-wisecomparisons. For more predictable results, perform an explicitcomponent-wise comparison.
For example, given a structure type definition like:
struct foo { unsigned char tag; union { double f; long i; char *p; } value; };
you are better off writing a specialized comparison function to comparestruct foo
objects instead of comparing them withmemcmp
.
The
strcmp
function compares the string s1 againsts2, returning a value that has the same sign as the differencebetween the first differing pair of characters (interpreted asunsigned char
objects, then promoted toint
).If the two strings are equal,
strcmp
returns0
.A consequence of the ordering used by
strcmp
is that if s1is an initial substring ofs2, thens1 is considered to be“less than” s2.
strcmp
does not take sorting conventions of the language thestrings are written in into account. To get that one has to usestrcoll
.
The
wcscmp
function compares the wide character string ws1againstws2. The value returned is smaller than or larger than zerodepending on whether the first differing wide character isws1 issmaller or larger than the corresponding character in ws2.If the two strings are equal,
wcscmp
returns0
.A consequence of the ordering used by
wcscmp
is that if ws1is an initial substring ofws2, thenws1 is considered to be“less than” ws2.
wcscmp
does not take sorting conventions of the language thestrings are written in into account. To get that one has to usewcscoll
.
This function is like
strcmp
, except that differences in case areignored. How uppercase and lowercase characters are related isdetermined by the currently selected locale. In the standard"C"
locale the characters Ä and ä do not match but in a locale whichregards these characters as parts of the alphabet they do match.
strcasecmp
is derived from BSD.
This function is like
wcscmp
, except that differences in case areignored. How uppercase and lowercase characters are related isdetermined by the currently selected locale. In the standard"C"
locale the characters Ä and ä do not match but in a locale whichregards these characters as parts of the alphabet they do match.
wcscasecmp
is a GNU extension.
This function is the similar to
strcmp
, except that no more thansize characters are compared. In other words, if the twostrings are the same in their firstsize characters, thereturn value is zero.
This function is the similar to
wcscmp
, except that no more thansize wide characters are compared. In other words, if the twostrings are the same in their firstsize wide characters, thereturn value is zero.
This function is like
strncmp
, except that differences in caseare ignored. Likestrcasecmp
, it is locale dependent howuppercase and lowercase characters are related.
strncasecmp
is a GNU extension.
This function is like
wcsncmp
, except that differences in caseare ignored. Likewcscasecmp
, it is locale dependent howuppercase and lowercase characters are related.
wcsncasecmp
is a GNU extension.
Here are some examples showing the use of strcmp
andstrncmp
(equivalent examples can be constructed for the widecharacter functions). These examples assume the use of the ASCIIcharacter set. (If some other character set—say, EBCDIC—is usedinstead, then the glyphs are associated with different numeric codes,and the return values and ordering may differ.)
strcmp ("hello", "hello") ⇒ 0 /* These two strings are the same. */ strcmp ("hello", "Hello") ⇒ 32 /* Comparisons are case-sensitive. */ strcmp ("hello", "world") ⇒ -15 /* The character'h'
comes before'w'
. */ strcmp ("hello", "hello, world") ⇒ -44 /* Comparing a null character against a comma. */ strncmp ("hello", "hello, world", 5) ⇒ 0 /* The initial 5 characters are the same. */ strncmp ("hello, world", "hello, stupid world!!!", 5) ⇒ 0 /* The initial 5 characters are the same. */
The
strverscmp
function compares the string s1 againsts2, considering them as holding indices/version numbers. Thereturn value follows the same conventions as found in thestrcmp
function. In fact, ifs1 and s2 contain nodigits,strverscmp
behaves likestrcmp
.Basically, we compare strings normally (character by character), untilwe find a digit in each string - then we enter a special comparisonmode, where each sequence of digits is taken as a whole. If we reach theend of these two parts without noticing a difference, we return to thestandard comparison mode. There are two types of numeric parts:"integral" and "fractional" (those begin with a '0'). The typesof the numeric parts affect the way we sort them:
- integral/integral: we compare values as you would expect.
- fractional/integral: the fractional part is less than the integral one. Again, no surprise.
- fractional/fractional: the things become a bit more complex. If the common prefix contains only leading zeroes, the longest part is lessthan the other one; else the comparison behaves normally.
strverscmp ("no digit", "no digit") ⇒ 0 /* same behavior as strcmp. */ strverscmp ("item#99", "item#100") ⇒ <0 /* same prefix, but 99 < 100. */ strverscmp ("alpha1", "alpha001") ⇒ >0 /* fractional part inferior to integral one. */ strverscmp ("part1_f012", "part1_f01") ⇒ >0 /* two fractional parts. */ strverscmp ("foo.009", "foo.0") ⇒ <0 /* idem, but with leading zeroes only. */This function is especially useful when dealing with filename sorting,because filenames frequently hold indices/version numbers.
strverscmp
is a GNU extension.
This is an obsolete alias for
memcmp
, derived from BSD.
In some locales, the conventions for lexicographic ordering differ fromthe strict numeric ordering of character codes. For example, in Spanishmost glyphs with diacritical marks such as accents are not considereddistinct letters for the purposes of collation. On the other hand, thetwo-character sequence ‘ll’ is treated as a single letter that iscollated immediately after ‘l’.
You can use the functions strcoll
and strxfrm
(declared inthe headers filestring.h) andwcscoll
and wcsxfrm
(declared in the headers file wchar) to compare strings using acollation ordering appropriate for the current locale. The locale usedby these functions in particular can be specified by setting the localefor theLC_COLLATE
category; see Locales. In the standard C locale, the collation sequence forstrcoll
isthe same as that forstrcmp
. Similarly, wcscoll
andwcscmp
are the same in this situation.
Effectively, the way these functions work is by applying a mapping totransform the characters in a string to a byte sequence that representsthe string's position in the collating sequence of the current locale. Comparing two such byte sequences in a simple fashion is equivalent tocomparing the strings with the locale's collating sequence.
The functions strcoll
and wcscoll
perform this translationimplicitly, in order to do one comparison. By contrast,strxfrm
andwcsxfrm
perform the mapping explicitly. If you are makingmultiple comparisons using the same string or set of strings, it islikely to be more efficient to usestrxfrm
orwcsxfrm
totransform all the strings just once, and subsequently compare thetransformed strings withstrcmp
orwcscmp
.
The
strcoll
function is similar tostrcmp
but uses thecollating sequence of the current locale for collation (theLC_COLLATE
locale).
The
wcscoll
function is similar towcscmp
but uses thecollating sequence of the current locale for collation (theLC_COLLATE
locale).
Here is an example of sorting an array of strings, using strcoll
to compare them. The actual sort algorithm is not written here; itcomes fromqsort
(seeArray Sort Function). The job of thecode shown here is to say how to compare the strings while sorting them. (Later on in this section, we will show a way to do this moreefficiently usingstrxfrm
.)
/* This is the comparison function used withqsort
. */ int compare_elements (const void *v1, const void *v2) { char * const *p1 = v1; char * const *p1 = v2; return strcoll (*p1, *p2); } /* This is the entry point---the function to sort strings using the locale's collating sequence. */ void sort_strings (char **array, int nstrings) { /* Sorttemp_array
by comparing the strings. */ qsort (array, nstrings, sizeof (char *), compare_elements); }
The function
strxfrm
transforms the string from using thecollation transformation determined by the locale currently selected forcollation, and stores the transformed string in the arrayto. Uptosize characters (including a terminating null character) arestored.The behavior is undefined if the strings to and fromoverlap; seeCopying and Concatenation.
The return value is the length of the entire transformed string. Thisvalue is not affected by the value ofsize, but if it is greateror equal thansize, it means that the transformed string did notentirely fit in the arrayto. In this case, only as much of thestring as actually fits was stored. To get the whole transformedstring, call
strxfrm
again with a bigger output array.The transformed string may be longer than the original string, and itmay also be shorter.
If size is zero, no characters are stored in to. In thiscase,
strxfrm
simply returns the number of characters that wouldbe the length of the transformed string. This is useful for determiningwhat size the allocated array should be. It does not matter whatto is ifsize is zero; to may even be a null pointer.
The function
wcsxfrm
transforms wide character string wfromusing the collation transformation determined by the locale currentlyselected for collation, and stores the transformed string in the arraywto. Up tosize wide characters (including a terminating nullcharacter) are stored.The behavior is undefined if the strings wto and wfromoverlap; seeCopying and Concatenation.
The return value is the length of the entire transformed wide characterstring. This value is not affected by the value ofsize, but ifit is greater or equal thansize, it means that the transformedwide character string did not entirely fit in the arraywto. Inthis case, only as much of the wide character string as actually fitswas stored. To get the whole transformed wide character string, call
wcsxfrm
again with a bigger output array.The transformed wide character string may be longer than the originalwide character string, and it may also be shorter.
If size is zero, no characters are stored in to. In thiscase,
wcsxfrm
simply returns the number of wide characters thatwould be the length of the transformed wide character string. This isuseful for determining what size the allocated array should be (rememberto multiply withsizeof (wchar_t)
). It does not matter whatwto is ifsize is zero; wto may even be a null pointer.
Here is an example of how you can use strxfrm
whenyou plan to do many comparisons. It does the same thing as the previousexample, but much faster, because it has to transform each string onlyonce, no matter how many times it is compared with other strings. Eventhe time needed to allocate and free storage is much less than the timewe save, when there are many strings.
struct sorter { char *input; char *transformed; }; /* This is the comparison function used withqsort
to sort an array ofstruct sorter
. */ int compare_elements (const void *v1, const void *v2) { const struct sorter *p1 = v1; const struct sorter *p2 = v2; return strcmp (p1->transformed, p2->transformed); } /* This is the entry point---the function to sort strings using the locale's collating sequence. */ void sort_strings_fast (char **array, int nstrings) { struct sorter temp_array[nstrings]; int i; /* Set uptemp_array
. Each element contains one input string and its transformed string. */ for (i = 0; i < nstrings; i++) { size_t length = strlen (array[i]) * 2; char *transformed; size_t transformed_length; temp_array[i].input = array[i]; /* First try a buffer perhaps big enough. */ transformed = (char *) xmalloc (length); /* Transformarray[i]
. */ transformed_length = strxfrm (transformed, array[i], length); /* If the buffer was not large enough, resize it and try again. */ if (transformed_length >= length) { /* Allocate the needed space. +1 for terminatingNUL
character. */ transformed = (char *) xrealloc (transformed, transformed_length + 1); /* The return value is not interesting because we know how long the transformed string is. */ (void) strxfrm (transformed, array[i], transformed_length + 1); } temp_array[i].transformed = transformed; } /* Sorttemp_array
by comparing transformed strings. */ qsort (temp_array, sizeof (struct sorter), nstrings, compare_elements); /* Put the elements back in the permanent array in their sorted order. */ for (i = 0; i < nstrings; i++) array[i] = temp_array[i].input; /* Free the strings we allocated. */ for (i = 0; i < nstrings; i++) free (temp_array[i].transformed); }
The interesting part of this code for the wide character version wouldlook like this:
void sort_strings_fast (wchar_t **array, int nstrings) { ... /* Transformarray[i]
. */ transformed_length = wcsxfrm (transformed, array[i], length); /* If the buffer was not large enough, resize it and try again. */ if (transformed_length >= length) { /* Allocate the needed space. +1 for terminatingNUL
character. */ transformed = (wchar_t *) xrealloc (transformed, (transformed_length + 1) * sizeof (wchar_t)); /* The return value is not interesting because we know how long the transformed string is. */ (void) wcsxfrm (transformed, array[i], transformed_length + 1); } ...
Note the additional multiplication with sizeof (wchar_t)
in therealloc
call.
Compatibility Note: The string collation functions are a newfeature of ISO C90. Older C dialects have no equivalent feature. The wide character versions were introduced in Amendment 1 to ISO C90.
This section describes library functions which perform various kindsof searching operations on strings and arrays. These functions aredeclared in the header filestring.h.
This function finds the first occurrence of the byte c (convertedto an
unsigned char
) in the initialsize bytes of theobject beginning atblock. The return value is a pointer to thelocated byte, or a null pointer if no match was found.
This function finds the first occurrence of the wide character wcin the initialsize wide characters of the object beginning atblock. The return value is a pointer to the located widecharacter, or a null pointer if no match was found.
Often the
memchr
function is used with the knowledge that thebytec is available in the memory block specified by theparameters. But this means that thesize parameter is not reallyneeded and that the tests performed with it at runtime (to check whetherthe end of the block is reached) are not needed.The
rawmemchr
function exists for just this situation which issurprisingly frequent. The interface is similar tomemchr
exceptthat thesize parameter is missing. The function will look beyondthe end of the block pointed to byblock in case the programmermade an error in assuming that the bytec is present in the block. In this case the result is unspecified. Otherwise the return value is apointer to the located byte.This function is of special interest when looking for the end of astring. Since all strings are terminated by a null byte a call like
rawmemchr (str, '\0')will never go beyond the end of the string.
This function is a GNU extension.
The function
memrchr
is likememchr
, except that it searchesbackwards from the end of the block defined byblock andsize(instead of forwards from the front).This function is a GNU extension.
The
strchr
function finds the first occurrence of the characterc (converted to achar
) in the null-terminated stringbeginning atstring. The return value is a pointer to the locatedcharacter, or a null pointer if no match was found.For example,
strchr ("hello, world", 'l') ⇒ "llo, world" strchr ("hello, world", '?') ⇒ NULLThe terminating null character is considered to be part of the string,so you can use this function get a pointer to the end of a string byspecifying a null character as the value of thec argument.
When
strchr
returns a null pointer, it does not let you knowthe position of the terminating null character it has found. If youneed that information, it is better (but less portable) to usestrchrnul
than to search for it a second time.
The
wcschr
function finds the first occurrence of the widecharacterwc in the null-terminated wide character stringbeginning atwstring. The return value is a pointer to thelocated wide character, or a null pointer if no match was found.The terminating null character is considered to be part of the widecharacter string, so you can use this function get a pointer to the endof a wide character string by specifying a null wude character as thevalue of thewc argument. It would be better (but less portable)to use
wcschrnul
in this case, though.
strchrnul
is the same asstrchr
except that if it doesnot find the character, it returns a pointer to string's terminatingnull character rather than a null pointer.This function is a GNU extension.
wcschrnul
is the same aswcschr
except that if it does notfind the wide character, it returns a pointer to wide character string'sterminating null wide character rather than a null pointer.This function is a GNU extension.
One useful, but unusual, use of the strchr
function is when one wants to have a pointer pointing to the NUL byteterminating a string. This is often written in this way:
s += strlen (s);
This is almost optimal but the addition operation duplicated a bit ofthe work already done in thestrlen
function. A better solutionis this:
s = strchr (s, '\0');
There is no restriction on the second parameter of strchr
so itcould very well also be the NUL character. Those readers thinking veryhard about this might now point out that thestrchr
function ismore expensive than thestrlen
function since we have two abortcriteria. This is right. But in the GNU C Library the implementation ofstrchr
is optimized in a special way so thatstrchr
actually is faster.
The function
strrchr
is likestrchr
, except that it searchesbackwards from the end of the stringstring (instead of forwardsfrom the front).For example,
strrchr ("hello, world", 'l') ⇒ "ld"
The function
wcsrchr
is likewcschr
, except that it searchesbackwards from the end of the stringwstring (instead of forwardsfrom the front).
This is like
strchr
, except that it searches haystack for asubstringneedle rather than just a single character. Itreturns a pointer into the stringhaystack that is the firstcharacter of the substring, or a null pointer if no match was found. Ifneedle is an empty string, the function returnshaystack.For example,
strstr ("hello, world", "l") ⇒ "llo, world" strstr ("hello, world", "wo") ⇒ "world"
This is like
wcschr
, except that it searches haystack for asubstringneedle rather than just a single wide character. Itreturns a pointer into the stringhaystack that is the first widecharacter of the substring, or a null pointer if no match was found. Ifneedle is an empty string, the function returnshaystack.
wcswcs
is an deprecated alias forwcsstr
. This is thename originally used in the X/Open Portability Guide before theAmendment 1 to ISO C90 was published.
This is like
strstr
, except that it ignores case in searching forthe substring. Likestrcasecmp
, it is locale dependent howuppercase and lowercase characters are related.For example,
strcasestr ("hello, world", "L") ⇒ "llo, world" strcasestr ("hello, World", "wo") ⇒ "World"
This is like
strstr
, but needle and haystack are bytearrays rather than null-terminated strings.needle-len is thelength ofneedle and haystack-len is the length ofhaystack.This function is a GNU extension.
The
strspn
(“string span”) function returns the length of theinitial substring ofstring that consists entirely of characters thatare members of the set specified by the stringskipset. The orderof the characters inskipset is not important.For example,
strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz") ⇒ 5Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.
The
wcsspn
(“wide character string span”) function returns thelength of the initial substring ofwstring that consists entirelyof wide characters that are members of the set specified by the stringskipset. The order of the wide characters inskipset is notimportant.
The
strcspn
(“string complement span”) function returns the lengthof the initial substring ofstring that consists entirely of charactersthat arenot members of the set specified by the stringstopset. (In other words, it returns the offset of the first character instringthat is a member of the setstopset.)For example,
strcspn ("hello, world", " \t\n,.;!?") ⇒ 5Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.
The
wcscspn
(“wide character string complement span”) functionreturns the length of the initial substring ofwstring thatconsists entirely of wide characters that arenot members of theset specified by the stringstopset. (In other words, it returnsthe offset of the first character instring that is a member ofthe setstopset.)
The
strpbrk
(“string pointer break”) function is related tostrcspn
, except that it returns a pointer to the first characterinstring that is a member of the setstopset instead of thelength of the initial substring. It returns a null pointer if no suchcharacter fromstopset is found.For example,
strpbrk ("hello, world", " \t\n,.;!?") ⇒ ", world"Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.
The
wcspbrk
(“wide character string pointer break”) function isrelated towcscspn
, except that it returns a pointer to the firstwide character inwstring that is a member of the setstopset instead of the length of the initial substring. Itreturns a null pointer if no such character fromstopset is found.
index
is another name forstrchr
; they are exactly the same. New code should always usestrchr
since this name is defined inISO C whileindex
is a BSD invention which never was availableon System V derived systems.
rindex
is another name forstrrchr
; they are exactly the same. New code should always usestrrchr
since this name is defined inISO C whilerindex
is a BSD invention which never was availableon System V derived systems.
It's fairly common for programs to have a need to do some simple kindsof lexical analysis and parsing, such as splitting a command string upinto tokens. You can do this with the strtok
function, declaredin the header filestring.h.
A string can be split into tokens by making a series of calls to thefunction
strtok
.The string to be split up is passed as the newstring argument onthe first call only. The
strtok
function uses this to set upsome internal state information. Subsequent calls to get additionaltokens from the same string are indicated by passing a null pointer asthenewstring argument. Callingstrtok
with anothernon-nullnewstring argument reinitializes the state information. It is guaranteed that no other library function ever callsstrtok
behind your back (which would mess up this internal state information).The delimiters argument is a string that specifies a set of delimitersthat may surround the token being extracted. All the initial charactersthat are members of this set are discarded. The first character that isnot a member of this set of delimiters marks the beginning of thenext token. The end of the token is found by looking for the nextcharacter that is a member of the delimiter set. This character in theoriginal stringnewstring is overwritten by a null character, and thepointer to the beginning of the token innewstring is returned.
On the next call to
strtok
, the searching begins at the nextcharacter beyond the one that marked the end of the previous token. Note that the set of delimitersdelimiters do not have to be thesame on every call in a series of calls tostrtok
.If the end of the string newstring is reached, or if the remainder ofstring consists only of delimiter characters,
strtok
returnsa null pointer.Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.
A string can be split into tokens by making a series of calls to thefunction
wcstok
.The string to be split up is passed as the newstring argument onthe first call only. The
wcstok
function uses this to set upsome internal state information. Subsequent calls to get additionaltokens from the same wide character string are indicated by passing anull pointer as thenewstring argument. Callingwcstok
with another non-null newstring argument reinitializes the stateinformation. It is guaranteed that no other library function ever callswcstok
behind your back (which would mess up this internal stateinformation).The delimiters argument is a wide character string that specifiesa set of delimiters that may surround the token being extracted. Allthe initial wide characters that are members of this set are discarded. The first wide character that isnot a member of this set ofdelimiters marks the beginning of the next token. The end of the tokenis found by looking for the next wide character that is a member of thedelimiter set. This wide character in the original wide characterstringnewstring is overwritten by a null wide character, and thepointer to the beginning of the token innewstring is returned.
On the next call to
wcstok
, the searching begins at the nextwide character beyond the one that marked the end of the previous token. Note that the set of delimitersdelimiters do not have to be thesame on every call in a series of calls towcstok
.If the end of the wide character string newstring is reached, orif the remainder of string consists only of delimiter wide characters,
wcstok
returns a null pointer.Note that “character” is here used in the sense of byte. In a stringusing a multibyte character encoding (abstract) character consisting ofmore than one byte are not treated as an entity. Each byte is treatedseparately. The function is not locale-dependent.
Warning: Since strtok
and wcstok
alter the stringthey is parsing, you should always copy the string to a temporary bufferbefore parsing it withstrtok
/wcstok
(seeCopying and Concatenation). If you allow strtok
or wcstok
to modifya string that came from another part of your program, you are asking fortrouble; that string might be used for other purposes afterstrtok
orwcstok
has modified it, and it would not havethe expected value.
The string that you are operating on might even be a constant. Thenwhen strtok
or wcstok
tries to modify it, your programwill get a fatal signal for writing in read-only memory. SeeProgram Error Signals. Even if the operation of strtok
or wcstok
would not require a modification of the string (e.g., if there isexactly one token) the string can (and in the GNU C Library case will) bemodified.
This is a special case of a general principle: if a part of a programdoes not have as its purpose the modification of a certain datastructure, then it is error-prone to modify the data structuretemporarily.
The functions strtok
and wcstok
are not reentrant. SeeNonreentrancy, for a discussion of where and why reentrancy isimportant.
Here is a simple example showing the use of strtok
.
#include#include ... const char string[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *token, *cp; ... cp = strdupa (string); /* Make writable copy. */ token = strtok (cp, delimiters); /* token => "words" */ token = strtok (NULL, delimiters); /* token => "separated" */ token = strtok (NULL, delimiters); /* token => "by" */ token = strtok (NULL, delimiters); /* token => "spaces" */ token = strtok (NULL, delimiters); /* token => "and" */ token = strtok (NULL, delimiters); /* token => "punctuation" */ token = strtok (NULL, delimiters); /* token => NULL */
The GNU C Library contains two more functions for tokenizing a stringwhich overcome the limitation of non-reentrancy. They are onlyavailable for multibyte character strings.
Just like
strtok
, this function splits the string into severaltokens which can be accessed by successive calls tostrtok_r
. The difference is that the information about the next token is stored inthe space pointed to by the third argument,save_ptr, which is apointer to a string pointer. Callingstrtok_r
with a nullpointer fornewstring and leaving save_ptr between the callsunchanged does the job without hindering reentrancy.This function is defined in POSIX.1 and can be found on many systemswhich support multi-threading.
This function has a similar functionality as
strtok_r
with thenewstring argument replaced by thesave_ptr argument. Theinitialization of the moving pointer has to be done by the user. Successive calls tostrsep
move the pointer along the tokensseparated by delimiter, returning the address of the next tokenand updatingstring_ptr to point to the beginning of the nexttoken.One difference between
strsep
andstrtok_r
is that if theinput string contains more than one character fromdelimiter in arowstrsep
returns an empty string for each pair of charactersfromdelimiter. This means that a program normally should testforstrsep
returning an empty string before processing it.This function was introduced in 4.3BSD and therefore is widely available.
Here is how the above example looks like when strsep
is used.
#include#include ... const char string[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *running; char *token; ... running = strdupa (string); token = strsep (&running, delimiters); /* token => "words" */ token = strsep (&running, delimiters); /* token => "separated" */ token = strsep (&running, delimiters); /* token => "by" */ token = strsep (&running, delimiters); /* token => "spaces" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "and" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "punctuation" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => NULL */
The GNU version of the
basename
function returns the lastcomponent of the path infilename. This function is the preferredusage, since it does not modify the argument,filename, andrespects trailing slashes. The prototype forbasename
can befound instring.h. Note, this function is overriden by the XPGversion, iflibgen.h is included.Example of using GNU
basename
:#includeint main (int argc, char *argv[]) { char *prog = basename (argv[0]); if (argc < 2) { fprintf (stderr, "Usage %s \n", prog); exit (1); } ... } Portability Note: This function may produce different resultson different systems.
This is the standard XPG defined
basename
. It is similar inspirit to the GNU version, but may modify thepath by removingtrailing '/' characters. If thepath is made up entirely of '/'characters, then "/" will be returned. Also, ifpath isNULL
or an empty string, then "." is returned. The prototype forthe XPG version can be found inlibgen.h.Example of using XPG
basename
:#includeint main (int argc, char *argv[]) { char *prog; char *path = strdupa (argv[0]); prog = basename (path); if (argc < 2) { fprintf (stderr, "Usage %s \n", prog); exit (1); } ... }
The
dirname
function is the compliment to the XPG version ofbasename
. It returns the parent directory of the file specifiedbypath. Ifpath isNULL
, an empty string, orcontains no '/' characters, then "." is returned. The prototype for thisfunction can be found inlibgen.h.
The function below addresses the perennial programming quandary: “How doI take good data in string form and painlessly turn it into garbage?”This is actually a fairly simple task for C programmers who do not usethe GNU C Library string functions, but for programs based on the GNU C Library,the strfry
function is the preferred method fordestroying string data.
The prototype for this function is in string.h.
strfry
creates a pseudorandom anagram of a string, replacing theinput with the anagram in place. For each position in the string,strfry
swaps it with a position in the string selected at random(from a uniform distribution). The two positions may be the same.The return value of
strfry
is always string.Portability Note: This function is unique to the GNU C Library.
The memfrob
function converts an array of data to somethingunrecognizable and back again. It is not encryption in its usual sensesince it is easy for someone to convert the encrypted data back to cleartext. The transformation is analogous to Usenet's “Rot13” encryptionmethod for obscuring offensive jokes from sensitive eyes and such. Unlike Rot13,memfrob
works on arbitrary binary data, not justtext.For true encryption, See Cryptographic Functions.
This function is declared in string.h.
memfrob
transforms (frobnicates) each byte of the data structureatmem, which islength bytes long, by bitwise exclusiveoring it with binary 00101010. It does the transformation in place andits return value is alwaysmem.Note that
memfrob
a second time on the same data structurereturns it to its original state.This is a good function for hiding information from someone who doesn'twant to see it or doesn't want to see it very much. To really preventpeople from retrieving the information, use stronger encryption such asthat described in SeeCryptographic Functions.
Portability Note: This function is unique to the GNU C Library.
To store or transfer binary data in environments which only support textone has to encode the binary data by mapping the input bytes tocharacters in the range allowed for storing or transfering. SVIDsystems (and nowadays XPG compliant systems) provide minimal support forthis task.
This function encodes a 32-bit input value using characters from thebasic character set. It returns a pointer to a 7 character buffer whichcontains an encoded version ofn. To encode a series of bytes theuser must copy the returned string to a destination buffer. It returnsthe empty string ifn is zero, which is somewhat bizarre butmandated by the standard.
Warning: Since a static buffer is used this function should notbe used in multi-threaded programs. There is no thread-safe alternativeto this function in the C library.
Compatibility Note: The XPG standard states that the returnvalue ofl64a
is undefined ifn is negative. In the GNUimplementation,l64a
treats its argument as unsigned, so it willreturn a sensible encoding for any nonzeron; however, portableprograms should not rely on this.To encode a large buffer
l64a
must be called in a loop, once foreach 32-bit word of the buffer. For example, one could do somethinglike this:char * encode (const void *buf, size_t len) { /* We know in advance how long the buffer has to be. */ unsigned char *in = (unsigned char *) buf; char *out = malloc (6 + ((len + 3) / 4) * 6 + 1); char *cp = out, *p; /* Encode the length. */ /* Using `htonl' is necessary so that the data can be decoded even on machines with different byte order. `l64a' can return a string shorter than 6 bytes, so we pad it with encoding of 0 ('.') at the end by hand. */ p = stpcpy (cp, l64a (htonl (len))); cp = mempcpy (p, "......", 6 - (p - cp)); while (len > 3) { unsigned long int n = *in++; n = (n << 8) | *in++; n = (n << 8) | *in++; n = (n << 8) | *in++; len -= 4; p = stpcpy (cp, l64a (htonl (n))); cp = mempcpy (p, "......", 6 - (p - cp)); } if (len > 0) { unsigned long int n = *in++; if (--len > 0) { n = (n << 8) | *in++; if (--len > 0) n = (n << 8) | *in; } cp = stpcpy (cp, l64a (htonl (n))); } *cp = '\0'; return out; }It is strange that the library does not provide the completefunctionality needed but so be it.
To decode data produced with l64a
the following function should beused.