As we proceed through these tutorials, we’ll highlight many important points under the following three categories:
Rules are instructions that you must do, as required by the language. Failure to abide by a rule will generally result in your program not working.
Best practices are things that you should do, because that way of doing things is generally considered a standard or highly recommended. That is, either everybody does it that way (and if you do otherwise, you’ll be doing something people don’t expect), or it is superior to the alternatives.
Warnings are things that you should not do, because they will generally lead to unexpected results.
The underlying design philosophy of C and C++ can be summed up as “trust the programmer” – which is both wonderful and dangerous. C++ is designed to allow the programmer a high degree of freedom to do what they want. However, this also means the language often won’t stop you from doing things that don’t make sense, because it will assume you’re doing so for some reason it doesn’t understand. There are quite a few pitfalls that new programmers are likely to fall into if caught unaware. This is one of the primary reasons why knowing what you shouldn’t do in C/C++ is almost as important as knowing what you should do.
Name your code files something.cpp, where something is a name of your choosing, and .cpp is the extension that indicates the file is a C++ source file.
To write a C++ program inside an IDE, we typically start by creating a new project (we’ll show you how to do this in a bit). A project is a container that holds all of your source code files, images, data files, etc… that are needed to produce an executable (or library, website, etc…) that you can run or use. The project also saves various IDE, compiler, and linker settings, as well as remembering where you left off, so that when you reopen the project later, the state of the IDE can be restored to wherever you left off. When you choose to compile your program, all of the .cpp files in the project will get compiled and linked.
Each project corresponds to one program. When you’re ready to create a second program, you’ll either need to create a new project, or overwrite the code in an existing project (if you don’t want to keep it). Project files are generally IDE specific, so a project created for one IDE will need to be recreated in a different IDE.
Create a new project for each new program you write.
Use the debug build configuration when developing your programs. When you’re ready to release your executable to others, or want to test performance, use the release build configuration.
Disable compiler extensions to ensure your programs (and coding practices) remain compliant with C++ standards and will work on any system.
Don’t let warnings pile up. Resolve them as you encounter them (as if they were errors). Otherwise a warning about a serious issue may be lost amongst warnings about non-serious issues.
Turn your warning levels up to the maximum, especially while you are learning. It will help you identify possible issues.
Enable “Treat warnings as errors”. This will force you to resolve all issues causing warnings.
Every C++ program must have a special function named main (all lower case letters). When the program is run, the statements inside of main are executed in sequential order.
Don’t use multi-line comments inside other multi-line comments. Wrapping single-line comments inside a multi-line comment is okay.
Comment your code liberally, and write your comments as if speaking to someone who has no idea what the code does. Don’t assume you’ll remember why you made specific choices.
Although the language allows you to do so, avoid defining multiple variables of the same type in a single statement. Instead, define each variable in a separate statement on its own line (and then use a single-line comment to document what it is used for).
One of the most common mistakes that new programmers make is to confuse the assignment operator (=) with the equality operator (). Assignment (=) is used to assign a value to a variable. Equality () is used to test whether two operands are equal in value.
Favor initialization using braces whenever possible.
Initialize your variables upon creation.
Output a newline whenever a line of output is complete.
Using std::endl can be a bit inefficient, as it actually does two jobs: it moves the cursor to the next line of the console, and it flushes the buffer. When writing text to the console, we typically don’t need to flush the buffer at the end of each line. It’s more efficient to let the system flush itself periodically (which it has been designed to do efficiently).
Because of this, use of the ‘\n’ character is typically preferred instead. The ‘\n’ character moves the cursor to the next line of the console, but doesn’t request a flush, so it will often perform better. The ‘\n’ character also tends to be easier to read since it’s both shorter and can be embedded into existing text.
Prefer ‘\n’ over std::endl when outputting text to the console.
‘\n’ uses a backslash (as do all special characters in C++), not a forward slash. Using a forward slash (e.g. ‘/n’) instead may result in unexpected behavior.
There’s some debate over whether it’s necessary to initialize a variable immediately before you give it a user provided value via another source (e.g. std::cin), since the user-provided value will just overwrite the initialization value. In line with our previous recommendation that variables should always be initialized, best practice is to initialize the variable first.
Some compilers, such as Visual Studio, will initialize the contents of memory to some preset value when you’re using a debug build configuration. This will not happen when using a release build configuration. Therefore, if you want to run the above program yourself, make sure you’re using a release build configuration (see lesson 0.9 – Configuring your compiler: Build configurations for a reminder on how to do that). For example, if you run the above program in a Visual Studio debug configuration, it will consistently print -858993460, because that’s the value (interpreted as an integer) that Visual Studio initializes memory with in debug configurations.
Take care to avoid all situations that result in undefined behavior, such as using uninitialized variables.
What is undefined behavior, and what can happen if you do something that exhibits undefined behavior?
Undefined behavior is the result of executing code whose behavior is not well defined by the language. The result can be almost anything, including something that behaves correctly.
When working in an existing program, use the conventions of that program (even if they don’t conform to modern best practices). Use modern best practices when you’re writing new programs.
Your lines should be no longer than 80 chars in length.
If a long line is split with an operator (eg. << or +), the operator should be placed at the beginning of the next line, not the end of the current line
std::cout << 3 + 4
+ 5 + 6
* 7 * 8;
Easier to read:
cost = 57;
pricePerItem = 24;
value = 5;
numberOfItems = 17;
std::cout << "Hello world!\n"; // cout lives in the iostream library
std::cout << "It is very nice to meet you!\n"; // these comments are easier to read
std::cout << "Yeah!\n"; // especially when all lined up
// cout lives in the iostream library
std::cout << "Hello world!\n";
// these comments are easier to read
std::cout << "It is very nice to meet you!\n";
// when separated by whitespace
std::cout << "Yeah!\n";
Using the automatic formatting feature is highly recommended to keep your code’s formatting style consistent.
We’ll talk more about the order in which operators execute when we do a deep dive into the topic of operators. For now, it’s enough to know that the arithmetic operators execute in the same order as they do in standard mathematics: Parenthesis first, then Exponents, then Multiplication & Division, then Addition & Subtraction. This ordering is sometimes abbreviated PEMDAS, or expanded to the mnemonic “Please Excuse My Dear Aunt Sally”.
Some operators have additional behaviors. An operator that has some observable effect beyond producing a return value is said to have a side effect.
An expression is a combination of literals, variables, operators, and function calls that calculates a single value. The process of executing an expression is called evaluation, and the single value produced is called the result of the expression.
Expressions involving operators with side effects are a little more tricky:
x = 5 // has side effect of assigning 5 to x, evaluates to x
x = 2 + 3 // has side effect of assigning 5 to x, evaluates to x
std::cout << x // has side effect of printing x to console, evaluates to std::cout
Note that expressions do not end in a semicolon, and cannot be compiled by themselves. For example, if you were to try compiling the expression x = 5, your compiler would complain (probably about a missing semicolon). Rather, expressions are always evaluated as part of statements.
For example, take this statement:
int x{ 2 + 3 }; // 2 + 3 is an expression that has no semicolon -- the semicolon is at the end of the statement containing the expression
If you were to break this statement down into its syntax, it would look like this:
type identifier { expression };
ype could be any valid type (we chose int). identifier could be any valid name (we chose x). And expression could be any valid expression (we chose 2 + 3, which uses two literals and an operator).
Wherever you can use a single value in C++, you can use a value-producing expression instead, and the expression will be evaluated to produce a single value.
An expression statement is a statement that consists of an expression followed by a semicolon.
Question #1
What is the difference between a statement and an expression?
Statements are used when we want the program to perform an action. Expressions are used when we want the program to calculate a value.
New programmers often try to write an entire program all at once, and then get overwhelmed when it produces a lot of errors. A better strategy is to add one piece at a time, make sure it compiles, and test it. Then when you’re sure it’s working, move on to the next piece.
#include
// preferred version
int main()
{
std::cout << "Enter an integer: ";
int num{ };
std::cin >> num;
std::cout << "Double that number is: " << num * 2 << '\n'; // use an expression to multiply num * 2 at the point where we are going to print it
return 0;
}
This is the preferred solution of the bunch. When std::cout executes, the expression num * 2 will get evaluated, and the result will be double num‘s value. That value will get printed. The value in num itself will not be altered, so we can use it again later if we wish.
This version is our reference solution.
However, there’s a saying I’m fond of: “You have to write a program once to know how you should have written it the first time.” This speaks to the fact that the best solution often isn’t obvious, and that our first solutions to problems are usually not as good as they could be.
Too often new programmers focus on optimizing for performance when they should be optimizing for maintainability.
All of this is really to say: don’t be frustrated if/when your solutions don’t come out wonderfully optimized right out of your brain. That’s normal. Perfection in programming is an iterative process (one requiring repeated passes).
One more thing: You may be thinking, “C++ has so many rules and concepts. How do I remember all of this stuff?”.
Short answer: You don’t. C++ is one part using what you know, and two parts looking up how to do the rest.
As you read through this site for the first time, focus less on memorizing specifics, and more on understanding what’s possible. Then, when you have a need to implement something in a program you’re writing, you can come back here (or to a reference site) and refresh yourself on how to do so.
An expression statement is an expression that has been turned into a statement by placing a semicolon at the end of the expression.
When writing programs, add a few lines or a function, compile, resolve any errors, and make sure it works. Don’t wait until you’ve written an entire program before compiling it for the first time!
Focus on getting your code working. Once you are sure you are going to keep some bit of code, then you can spend time removing (or commenting out) temporary/debugging code, adding comments, handling error cases, formatting your code, ensuring best practices are followed, removing redundant logic, etc…
First-draft programs are often messy and imperfect. Most code requires cleanup and refinement to get to great!
A function is a reusable sequence of statements designed to do a particular job.
Functions that you write yourself are called user-defined functions.
Don’t forget to include parentheses () after the function’s name when making a function call.
“foo” is a meaningless word that is often used as a placeholder name for a function or variable when the name is unimportant to the demonstration of some concept. Such words are called metasyntactic variables (though in common language they’re often called “placeholder names” since nobody can remember the term “metasyntactic variable”). Other common metasyntactic variables in C++ include “bar”, “baz”, and 3-letter words that end in “oo”, such as “goo”, “moo”, and “boo”).
For those interested in etymology (how words evolve), RFC 3092 is an interesting read.
Your main function should return the value 0 if the program ran normally.
Make sure your functions with non-void return types return a value in all cases.
Failure to return a value from a value-returning function will cause undefined behavior.
Follow the DRY best practice: “don’t repeat yourself”. If you need to do something more than once, consider how to modify your code to remove as much redundancy as possible. Variables can be used to store the results of calculations that need to be used more than once (so we don’t have to repeat the calculation). Functions can be used to define a sequence of statements we want to execute more than once. And loops (which we’ll cover in a later chapter) can be used to execute a statement more than once.
Return values provide a way for functions to return a single value back to the function’s caller.
Functions provide a way to minimize redundancy in our programs.
Do not put a return statement at the end of a non-value returning function.
An early return is a return statement that occurs before the last line of a function. It causes the function to return to the caller immediately.
When a function is called, all of the parameters of the function are created as variables, and the value of each of the arguments is copied into the matching parameter. This process is called pass by value.
Note that the number of arguments must generally match the number of function parameters, or the compiler will throw an error. The argument passed to a function can be any valid expression (as the argument is essentially just an initializer for the parameter, and initializers can be any valid expression).
Function parameters and return values are the key mechanisms by which functions can be written in a reusable way, as it allows us to write functions that can perform tasks and return retrieved or calculated results back to the caller without knowing what the specific inputs or outputs are ahead of time.
Function parameters, as well as variables defined inside the function body, are called local variables.
int add(int x, int y) // function parameters x and y are local variables
{
int z{ x + y }; // z is a local variable too
return z;
}
Much like a person’s lifetime is defined to be the time between their birth and death, an object’s lifetime is defined to be the time between its creation and destruction. Note that variable creation and destruction happen when the program is running (called runtime), not at compile time. Therefore, lifetime is a runtime property.
An identifier’s scope determines where the identifier can be seen and used within the source code. When an identifier can be seen and used, we say it is in scope. When an identifier can not be seen, we can not use it, and we say it is out of scope.Scope is a compile-time property, and trying to use an identifier when it is not in scope will result in a compile error.
The terms “out of scope” and “going out of scope” can be confusing to new programmers.
An identifier is out of scope anywhere it cannot be accessed within the code. In the example above, the identifier x is in scope from its point of definition to the end of the main function. The identifier x is out of scope outside of that code region.
The term “going out of scope” is typically applied to objects rather than identifiers. We say an object goes out of scope at the end of the scope (the end curly brace) in which the object was instantiated. In the example above, the object named x goes out of scope at the end of the function main.
A local variable’s lifetime ends at the point where it goes out of scope, so local variables are destroyed at this point.
Note that not all types of variables are destroyed when they go out of scope. We’ll see examples of these in future lessons.
Names used for function parameters or variables declared in a function body are only visible within the function that declares them. This means local variables within a function can be named without regard for the names of variables in other functions. This helps keep functions independent.
Local variables inside the function body should be defined as close to their first use as reasonable:
#include
int main()
{
std::cout << "Enter an integer: ";
int x{}; // x defined here
std::cin >> x; // and used here
std::cout << "Enter another integer: ";
int y{}; // y defined here
std::cin >> y; // and used here
int sum{ x + y }; // sum defined here
std::cout << "The sum is: " << sum << '\n'; // and used here
return 0;
}
In the above example, each variable is defined just before it is first used. There’s no need to be strict about this – if you prefer to swap lines 5 and 6, that’s fine.
Define your local variables as close to their first use as reasonable.
Effectively using functions
One of the biggest challenges new programmers encounter (besides learning the language) is understanding when and how to use functions effectively. Here are a few basic guidelines for writing functions:
New programmers often combine calculating a value and printing the calculated value into a single function. However, this violates the “one task” rule of thumb for functions. A function that calculates a value should return the value to the caller and let the caller decide what to do with the calculated value (such as call another function to print the value).
When addressing compile errors in your programs, always resolve the first error produced first and then compile again.
Keep the parameter names in your function declarations.
You can easily create function declarations by copy/pasting your function’s header and adding a semicolon.
The one definition rule (or ODR for short) is a well-known rule in C++. The ODR has three parts:
Violating part 1 of the ODR will cause the compiler to issue a redefinition error. Violating ODR part 2 will likely cause the linker to issue a redefinition error. Violating ODR part 3 will cause undefined behavior.
A declaration is a statement that tells the compiler about the existence of an identifier and its type information. Here are some examples of declarations:
int add(int x, int y); // tells the compiler about a function named "add" that takes two int parameters and returns an int. No body!
int x; // tells the compiler about an integer variable named x
A declaration is all that is needed to satisfy the compiler. This is why we can use a forward declaration to tell the compiler about an identifier that isn’t actually defined until later.
In C++, all definitions also serve as declarations. This is why int x appears in our examples for both definitions and declarations. Since int x is a definition, it’s a declaration too. In most cases, a definition serves our purposes, as it satisfies both the compiler and linker. We only need to provide an explicit declaration when we want to use an identifier before it has been defined.
While it is true that all definitions are declarations, the converse is not true: not all declarations are definitions. An example of this is the function declaration – it satisfies the compiler, but not the linker. These declarations that aren’t definitions are called pure declarations.
In common language, the term “declaration” is typically used to mean “a pure declaration”, and “definition” is used to mean “a definition that also serves as a declaration”. Thus, we’d typically call int x; a definition, even though it is both a definition and a declaration.
When you add new code files to your project, give them a .cpp extension.
Because the compiler compiles each code file individually (and then forgets what it has seen), each code file that uses std::cout or std::cin needs to #include .
In the above example, if add.cpp had used std::cout or std::cin, it would have needed to #include .
When the compiler compiles a multi-file program, it may compile the files in any order. Additionally, it compiles each file individually, with no knowledge of what is in other files.
We will begin working with multiple files a lot once we get into object-oriented programming, so now’s as good a time as any to make sure you understand how to add and compile multiple file projects.
Reminder: Whenever you create a new code (.cpp) file, you will need to add it to your project so that it gets compiled.
Most naming collisions occur in two cases:
A name declared in a namespace won’t be mistaken for an identical name declared in another scope.
In C++, any name that is not defined inside a class, function, or a namespace is considered to be part of an implicitly defined namespace called the global namespace (sometimes also called the global scope).
In the example at the top of the lesson, functions main() and both versions of myFcn() are defined inside the global namespace. The naming collision encountered in the example happens because both versions of myFcn() end up inside the global namespace, which violates the rule that all names in the namespace must be unique.
Only declarations and definition statements can appear in the global namespace. This means we can define variables in the global namespace, though this should generally be avoided (we cover global variables in lesson 6.4 – Introduction to global variables). This also means that other types of statements (such as expression statements) cannot be placed in the global namespace (initializers for global variables being an exception):
#include // handled by preprocessor
// All of the following statements are part of the global namespace
void foo(); // okay: function forward declaration in the global namespace
int x; // compiles but strongly discouraged: uninitialized variable definition in the global namespace
int y { 5 }; // compiles but discouraged: variable definition with initializer in the global namespace
x = 5; // compile error: executable statements are not allowed in the global namespace
int main() // okay: function definition in the global namespace
{
return 0;
}
void goo(); // okay: another function forward declaration in the global namespace
When you use an identifier that is defined inside a namespace (such as the std namespace), you have to tell the compiler that the identifier lives inside the namespace.
#include
int main()
{
std::cout << "Hello world!"; // when we say cout, we mean the cout defined in the std namespace
return 0;
}
The :: symbol is an operator called the scope resolution operator.
Use explicit namespace prefixes to access identifiers defined in a namespace.
When an identifier includes a namespace prefix, the identifier is called a qualified name.
When using a using-directive in this manner, any identifier we define may conflict with any identically named identifier in the std namespace. Even worse, while an identifier name may not conflict today, it may conflict with new identifiers added to the std namespace in future language revisions. This was the whole point of moving all of the identifiers in the standard library into the std namespace in the first place!
Avoid using-directives (such as using namespace std;) at the top of your program or in header files. They violate the reason why namespaces were added in the first place.
Historically, the preprocessor was a separate program from the compiler, but in modern compilers, the preprocessor is typically built right into the compiler itself.
When the preprocessor has finished processing a code file, the result is called a translation unit. This translation unit is what is then compiled by the compiler.
The entire process of preprocessing, compiling, and linking is called translation.
If you’re curious, here is a list of translation phases. As of the time of writing, preprocessing encompasses phases 1 through 4, and compilation is phases 5 through 7.
Using directives (introduced in lesson 2.9 – Naming collisions and an introduction to namespaces) are not preprocessor directives (and thus are not processed by the preprocessor). So while the term directive usually means a preprocessor directive, this is not always the case.
A translation unit contains both the processed code from the code file, as well as the processed code from all of the #included files.
We recommend avoiding these kinds of macros altogether, as there are better ways to do this kind of thing. We discuss this more in lesson 4.13 – Const variables and symbolic constants.
#include
#define PRINT_JOE
int main()
{
#ifdef PRINT_JOE
std::cout << "Joe\n"; // will be compiled since PRINT_JOE is defined
#endif
#ifdef PRINT_BOB
std::cout << "Bob\n"; // will be excluded since PRINT_BOB is not defined
#endif
return 0;
}
One more common use of conditional compilation involves using #if 0 to exclude a block of code from being compiled (as if it were inside a comment block):
#include
int main()
{
std::cout << "Joe\n";
#if 0 // Don't compile anything starting here
std::cout << "Bob\n";
std::cout << "Steve\n";
#endif // until this point
return 0;
}
The above code only prints “Joe”, because “Bob” and “Steve” were inside an #if 0 block that the preprocessor will exclude from compilation.
Header files allow us to put declarations in one location and then import them wherever we need them. This can save a lot of typing in multi-file programs.
When you #include a file, the content of the included file is inserted at the point of inclusion. This provides a useful way to pull in declarations from another file.
Header files should generally not contain function and variable definitions, so as not to violate the one definition rule. An exception is made for symbolic constants (which we cover in lesson 4.13 – Const variables and symbolic constants).
Use a .h suffix when naming your header files.
If a header file is paired with a code file (e.g. add.h with add.cpp), they should both have the same base name (add).
Source files should #include their paired header file (if one exists).
Use double quotes to include header files that you’ve written or are expected to be found in the current directory. Use angled brackets to include headers that come with your compiler, OS, or third-party libraries you’ve installed elsewhere on your system.
The header files with the .h extension define their names in the global namespace, and may optionally define them in the std namespace as well.
The header files without the .h extension will define their names in the std namespace, and may optionally define them in the global namespace as well.
When including a header file from the standard library, use the version without the .h extension if it exists. User-defined headers should still use a .h extension.
Right click on your project in the Solution Explorer, and choose Properties, then the VC++ Directories tab. From here, you will see a line called Include Directories. Add the directories you’d like the compiler to search for additional headers there.
The nice thing about this approach is that if you ever change your directory structure, you only have to change a single compiler or IDE setting instead of every code file.
Each file should explicitly #include all of the header files it needs to compile. Do not rely on headers included transitively from other headers.
To maximize the chance that missing includes will be flagged by compiler, order your #includes as follows:
Here are a few more recommendations for creating and using header files.
The good news is that we can avoid the above problem via a mechanism called a header guard (also called an include guard). Header guards are conditional compilation directives that take the following form:
#ifndef SOME_UNIQUE_NAME_HERE
#define SOME_UNIQUE_NAME_HERE
// your declarations (and certain types of definitions) here
#endif
In large programs, it’s possible to have two separate header files (included from different directories) that end up having the same filename (e.g. directoryA\config.h and directoryB\config.h). If only the filename is used for the include guard (e.g. CONFIG_H), these two files may end up using the same guard name. If that happens, any file that includes (directly or indirectly) both config.h files will not receive the contents of the include file to be included second. This will probably cause a compilation error.
Because of this possibility for guard name conflicts, many developers recommend using a more complex/unique name in your header guards. Some good suggestions are a naming convention of H , H, or _H
Note that the goal of header guards is to prevent a code file from receiving more than one copy of a guarded header. By design, header guards do not prevent a given header file from being included (once) into separate code files. This can also cause unexpected problems.
However, because pragmas are not an official part of the C++ language (and may not be supported consistently, or at all on more esoteric platforms), others (such as Google) still recommend sticking with traditional header guards.
There is one known case where #pragma once will typically fail. If a header file is copied so that it exists in multiple places on the file system, if somehow both copies of the header get included, header guards will successfully de-dupe the identical headers, but #pragma once won’t (because the compiler won’t realize they are actually identical content).
Header guards are designed to ensure that the contents of a given header file are not copied more than once into any single file, in order to prevent duplicate definitions.
Note that duplicate declarations are fine, since a declaration can be declared multiple times without incident – but even if your header file is composed of all declarations (no definitions) it’s still a best practice to include header guards.
Note that header guards do not prevent the contents of a header file from being copied (once) into separate project files. This is a good thing, because we often need to reference the contents of a given header from different project files.
In order to write a successful program, you first need to define what your goal is. Ideally, you should be able to state this in a sentence or two. It is often useful to express this as a user-facing outcome. For example:
Note that your requirements should similarly be focused on the “what”, not the “how”.
For example:
The randomized dungeon should always contain a way to get from the entrance to an exit.
The program should crash in less than 0.1% of user sessions.
A single problem may yield many requirements, and the solution isn’t “done” until it satisfies all of them.
When you are an experienced programmer, there are many other steps that typically would take place at this point, including:
Version control systems have the added advantage of not only being able to restore your files, but also to roll them back to a previous version.
In real life, we often need to perform tasks that are very complex. Trying to figure out how to do these tasks can be very challenging. In such cases, we often make use of the top down method of problem solving. That is, instead of solving a single complex task, we break that task into multiple subtasks, each of which is individually easier to solve. If those subtasks are still too difficult to solve, they can be broken down further. By continuously splitting complex tasks into simpler ones, you can eventually get to a point where each individual task is manageable, if not trivial.
The other way to create a hierarchy of tasks is to do so from the bottom up. In this method, we’ll start from a list of easy tasks, and construct the hierarchy by grouping them.
Remember: Don’t implement your entire program in one go. Work on it in steps, testing each step along the way before proceeding.
Once your program is “finished”, the last step is to test the whole program and ensure it works as intended. If it doesn’t work, fix it.
Keep your programs simple to start.
Add features over time.
Focus on one area at a time.
Test each piece of code as you go.
Don’t invest in perfecting early code.
Most new programmers will shortcut many of these steps and suggestions (because it seems like a lot of work and/or it’s not as much fun as writing the code). However, for any non-trivial project, following these steps will definitely save you a lot of time in the long run. A little planning up front saves a lot of debugging at the end.
The good news is that once you become comfortable with all of these concepts, they will start coming more naturally to you. Eventually you will get to the point where you can write entire functions without any pre-planning at all.
What guessing strategy you want to use is up to you – the best one depends on what type of bug it is, so you’ll likely want to try many different approaches to narrow down the issue. As you gain experience in debugging issues, your intuition will help guide you.
So how do we “make guesses”? There are many ways to do so. We’re going to start with some simple approaches in the next chapter, and then we’ll build on these and explore others in future chapters.
When printing information for debugging purposes, use std::cerr instead of std::cout. One reason for this is that std::cout may be buffered, which means there may be a pause between when you ask std::cout to output information and when it actually does. If you output using std::cout and then your program crashes immediately afterward, std::cout may or may not have actually output yet. This can mislead you about where the issue is. On the other hand, std::cerr is unbuffered, which means anything you send to it will output immediately. This helps ensure all debug output appears as soon as possible (at the cost of some performance, which we usually don’t care about when debugging).
Using std::cerr also helps make clear that the information being output is for an error case rather than a normal case.
#include
int getValue()
{
std::cerr << "getValue() called\n";
return 4;
}
int main()
{
std::cerr << "main() called\n";
std::cout << getValue;
return 0;
}
When adding temporary debug statements, it can be helpful to not indent them. This makes them easier to find for removal later.
The third-party library dbg-macro can help make debugging using print statements easier. Check it out if this is something you find yourself doing a lot.
While adding debug statements to programs for diagnostic purposes is a common rudimentary technique, and a functional one (especially when a debugger is not available for some reason), it’s not that great for a number of reasons:
While you can write your own code to create log file and send output to them, you’re better off using one of the many existing third-party logging tools available. Which one you use is up to you.
How you include, initialize, and use a logger will vary depending on the specific logger you select.
Note that conditional compilation directives are also not required using this method, as most loggers have a method to reduce/eliminate writing output to the log. This makes the code a lot easier to read, as the conditional compilation lines add a lot of clutter. With plog, logging can be temporarily disabled by changing the init statement to the following:
If you want to compile the above example yourself, or use plog in your own projects, you can follow these instructions to install it:
First, get the latest plog release:
Finally, for each project, set the somewhere\plog-master\include\ directory as an include directory inside your IDE. There are instructions on how to do this for Visual Studio here: A.2 – Using libraries with Visual Studio and Code::Blocks here: A.3 – Using libraries with Code::Blocks.
All of this tracked information is called your program state (or just state, for short).
While integrated debuggers are highly convenient and recommended for beginners, command line debuggers are well supported and still commonly used in environments that do not support graphical interfaces (e.g. embedded systems).
Debugger keyboard shortcuts will only work if the IDE/integrated debugger is the active window.
Don’t neglect learning to use a debugger. As your programs get more complicated, the amount of time you spend learning to use the integrated debugger effectively will pale in comparison to amount of time you save finding and fixing issues.
Before proceeding with this lesson (and subsequent lessons related to using a debugger), make sure your project is compiled using a debug build configuration (see 0.9 – Configuring your compiler: Build configurations for more information).
If you’re compiling your project using a release configuration instead, the functionality of the debugger may not work correctly (e.g. when you try to step into your program, it will just run the program instead).
Stepping is the name for a set of related debugger features that let us execute (step through) our code statement by statement.
Because operator<< is implemented as a function, your IDE may step into the implementation of operator<< instead.
If this happens, you’ll see your IDE open a new code file, and the arrow marker will move to the top of a function named operator<< (this is part of the standard library). Close the code file that just opened, then find and execute step out debug command (instructions are below under the “step out” section, if you need help).
In a prior lesson, we mentioned that std::cout is buffered, which means there may be a delay between when you ask std::cout to print a value, and when it actually does. Because of this, you may not see the value 5 appear at this point. To ensure that all output from std::cout is output immediately, you can temporarily add the following statement to the top of your main() function:
std::cout << std::unitbuf; // enable automatic flushing for std::cout (for debugging)
For performance reasons, this statement should be removed or commented out after debugging.
If you don’t want to continually add/remove/comment/uncomment the above, you can wrap the statement in a conditional compilation preprocessor directive (covered in lesson 2.10 – Introduction to the preprocessor):
#ifdef DEBUG
std::cout << std::unitbuf; // enable automatic flushing for std::cout (for debugging)
#endif
You’ll need to make sure the DEBUG preprocessor macro is defined, either somewhere above this statement, or as part of your compiler settings.
Like step into, The step over command executes the next statement in the normal execution path of the program.
Unlike the other two stepping commands, Step out does not just execute the next line of code. Instead, it executes all remaining code in the function currently being executed, and then returns control to you when the function has returned.
There’s one more debugging command that’s used fairly uncommonly, but is still at least worth knowing about, even if you won’t use it very often. The set next statement command allows us to change the point of execution to some other statement (sometimes informally called jumping). This can be used to jump the point of execution forwards and skip some code that would otherwise execute, or backwards and have something that already executed run again.
The set next statement command will change the point of execution, but will not otherwise change the program state. Your variables will retain whatever values they had before the jump. As a result, jumping may cause your program to produce different values, results, or behaviors than it would otherwise. Use this capability judiciously (especially jumping backwards).
You should not use set next statement to change the point of execution to a different function. This will result in undefined behavior, and likely a crash.
In case you are returning, make sure your project is compiled using a debug build configuration (see 0.9 – Configuring your compiler: Build configurations for more information). If you’re compiling your project using a release configuration instead, the functionality of the debugger may not work correctly.
Identifiers in watched expressions will evaluate to their current values. If you want to know what value an expression in your code is actually evaluating to, run to cursor to it first, so that all identifiers have the correct values.
Because inspecting the value of local variables inside a function is common while debugging, many debuggers will offer some way to quickly watch the value of all local variables in scope.
The line numbers after the function names show the next line to be executed in each function.
Since the top entry on the call stack represents the currently executing function, the line number here shows the next line that will execute when execution resumes. The remaining entries in the call stack represent functions that will be returned to at some point, so the line number for these represent the next statement that will execute after the function is returned to.
Congratulations, you now know the basics of using an integrated debugger! Using stepping, breakpoints, watches, and the call stack window, you now have the fundamentals to be able to debug almost any problem. Like many things, becoming good at using a debugger takes some practice and some trial and error. But again, we’ll reiterate the point that the time devoted to learning how to use an integrated debugger effectively will be repaid many times over in time saved debugging your programs!
When you make a semantic error, that error may or may not be immediately noticeable when you run your program. An issue may lurk undetected in your code for a long time before newly introduced code or changed circumstances cause it to manifest as a program malfunction. The longer an error sits in the code base before it is found, the more likely it is to be hard to find, and something that may have been easy to fix originally turns into a debugging adventure that eats up time and energy.
So what can we do about that?
Well, the best thing is to not make errors in the first place. Here’s an incomplete list of things that can help avoid making errors:
When making changes to your code, make behavioral changes OR structural changes, and then retest for correctness. Making behavioral and structural changes at the same time tends to lead to more errors as well as errors that are harder to find.
Defensive programming is a practice whereby the programmer tries to anticipate all of the ways the software could be misused, either by end-users, or by other developers (including the programmer themselves) using the code. These misuses can often be detected and then mitigated (e.g. by asking a user who entered bad input to try again).
#include
int add(int x, int y)
{
return x + y;
}
void testadd()
{
std::cout << "This function should print: 2 0 0 -2\n";
std::cout << add(1, 1) << ' ';
std::cout << add(-1, 1) << ' ';
std::cout << add(1, -1) << ' ';
std::cout << add(-1, -1) << ' ';
}
int main()
{
testadd();
return 0;
}
This is a primitive form of unit testing, which is a software testing method by which small units of source code are tested to determine whether they are correct.
Use a static analysis tool on your programs to help find areas where your code is non-compliant with best practices.
When using print statements, use std::cerr
instead of std::cout. But even better, avoid debugging via print statements.
Unit testing is a software testing method by which small units of source code are tested to determine whether they are correct.
In C++, we typically work with “byte-sized” chunks of data.
Some older or non-standard machines may have bytes of a different size (from 1 to 48 bits) – however, we generally need not worry about these, as the modern de-facto standard is that a byte is 8 bits. For these tutorials, we’ll assume a byte is 8 bits.
The terms integer and integral are similar, but sometimes have different meanings.
In mathematics, an integer is a number with no decimal or fractional part, including negative and positive numbers and zero.
In C++, the term integer is most often used to refer to the int data type, which holds integer values. However, it is also sometimes used to refer to the broader set of data types that are commonly used to store and display integer values. This includes short, int, long, long long, and their signed and unsigned variants.
The term integral means “like an integer”. Most often, integral is used as part of the term “integral type”, which includes the broader set of types that are stored in memory as integers, even though their behaviors might vary (which we’ll see later in this chapter when we talk about the character types). This includes bool, the integer types, and all the various character types.
Most modern programming languages include a fundamental string type (strings are a data type that lets us hold a sequence of characters, typically used to represent text). In C++, strings aren’t a fundamental type (they’re a compound type). But because basic string usage is straightforward and useful, we’ll introduce strings in this chapter as well (in lesson 4.17 – Introduction to std::string).
Many of the types defined in newer versions of C++ (e.g. std::nullptr_t) use a _t suffix. This suffix means “type”, and it’s a common nomenclature applied to modern types.
If you see something with a _t suffix, it’s probably a type. But many types don’t have a _t suffix, so this isn’t consistently applied.
Use an empty parameter list instead of void to indicate that a function has no parameters.
New programmers often focus too much on optimizing their code to use as little memory as possible. In most cases, this makes a negligible difference. Focus on writing maintainable code, and optimize only when and where the benefit will be substantive.
For maximum compatibility, you shouldn’t assume that variables are larger than the specified minimum size.
If you’re wondering what ‘\t’ is in the above program, it’s a special symbol that inserts a tab (in the example, we’re using it to align the output columns). We will cover ‘\t’ and other special symbols in lesson 4.11 – Chars.
On modern machines, objects of the fundamental data types are fast, so performance while using these types should generally not be a concern.
You might assume that types that use less memory would be faster than types that use more memory. This is not always true. CPUs are often optimized to process data of a certain size (e.g. 32 bits), and types that match that size may be processed quicker. On such a machine, a 32-bit int could be faster than a 16-bit short or an 8-bit char.
C++ only guarantees that integers will have a certain minimum size, not that they will have a specific size. See lesson 4.3 – Object sizes and the sizeof operator for information on how to determine how large each type is on your machine.
In binary representation, a single bit (called the sign bit) is used to store the sign of the number. The non-sign bits (called the magnitude bits) determine the magnitude of the number.
We discuss how the sign bit is used when representing numbers in binary in lesson O.4 – Converting between binary and decimal.
Prefer the shorthand types that do not use the int suffix or signed prefix.
Math time: an 8-bit integer contains 8 bits. 28 is 256, so an 8-bit integer can hold 256 possible values. There are 256 possible values between -128 to 127, inclusive.
7 bits are used to hold the magnitude of the number, and 1 bit is used to hold the sign.
Signed integer overflow will result in undefined behavior.
Be careful when using integer division, as you will lose any fractional parts of the quotient. However, if it’s what you want, integer division is safe to use, as the results are predictable.
An n-bit unsigned variable has a range of 0 to (2n)-1.
When no negative numbers are required, unsigned integers are well-suited for networking and systems with little memory, because unsigned integers can store more positive numbers without taking up extra memory.
Oddly, the C++ standard explicitly says “a computation involving unsigned operands can never overflow”. This is contrary to general programming consensus that integer overflow encompasses both signed and unsigned use cases (cite). Given that most programmers would consider this overflow, we’ll call this overflow despite C++’s statements to the contrary.
Many notable bugs in video game history happened due to wrap around behavior with unsigned integers. In the arcade game Donkey Kong, it’s not possible to go past level 22 due to an overflow bug that leaves the user with not enough bonus time to complete the level.
In the PC game Civilization, Gandhi was known for often being the first one to use nuclear weapons, which seems contrary to his expected passive nature. Players had a theory that Gandhi’s aggression setting was initially set at 1, but if he chose a democratic government, he’d get a -2 aggression modifier (lowering his current aggression value by 2). This would cause his aggression to overflow to 255, making him maximally aggressive! However, more recently Sid Meier (the game’s author) clarified that this wasn’t actually the case.
Favor signed numbers over unsigned numbers for holding quantities (even quantities that should be non-negative) and mathematical operations. Avoid mixing signed and unsigned numbers.
There are still a few cases in C++ where it’s okay / necessary to use unsigned numbers.
First, unsigned numbers are preferred when dealing with bit manipulation .
Second, use of unsigned numbers is still unavoidable in some cases, mainly those having to do with array indexing.
Also note that if you’re developing for an embedded system (e.g. an Arduino) or some other processor/memory limited context, use of unsigned numbers is more common and accepted (and in some cases, unavoidable) for performance reasons.
The 8-bit fixed-width integer types are often treated like chars instead of integer values (and this may vary per system). Prefer the 16-bit fixed integral types for most cases.
Avoid the following when possible:
Some compilers limit the largest creatable object to half the maximum value of std::size_t
(a good explanation for this can be found here).
In practice, the largest creatable object may be smaller than this amount (perhaps significantly so), depending on how much contiguous memory your computer has available for allocation.
Here’s the most important thing to understand: The digits in the significand (the part before the ‘e’) are called the significant digits. The number of significant digits defines a number’s precision. The more digits in the significand, the more precise a number is.
int x{5}; // 5 means integer
double y{5.0}; // 5.0 is a floating point literal (no suffix means double type by default)
float z{5.0f}; // 5.0 is a floating point literal, f suffix means float type
Note that by default, floating point literals default to type double. An f suffix is used to denote a literal of type float.
Always make sure the type of your literals match the type of the variables they’re being assigned to or used to initialize. Otherwise an unnecessary conversion will result, possibly with a loss of precision.
Make sure you don’t use integer literals where floating point literals should be used. This includes when initializing or assigning values to floating point objects, doing floating point arithmetic, and calling functions that expect floating point values.
Favor double over float unless space is at a premium, as the lack of precision in a float will often lead to inaccuracies.
Rounding errors occur when a number can’t be stored precisely. This can happen even with simple numbers, like 0.1. Therefore, rounding errors can, and do, happen all the time. Rounding errors aren’t the exception – they’re the rule. Never assume your floating point numbers are exact.
A corollary of this rule is: be wary of using floating point numbers for financial or currency data.
INF stands for infinity, and IND stands for indeterminate. Note that the results of printing Inf and NaN are platform specific, so your results may vary.
Avoid division by 0 altogether, even if your compiler supports it.
To summarize, the two things you should remember about floating point numbers:
Boolean type
Boolean is properly capitalized in the English language because it’s named after its inventor, George Boole.
If you want std::cout to print “true” or “false” instead of 0 or 1, you can use std::boolalpha. Here’s an example:
#include
int main()
{
std::cout << true << '\n';
std::cout << false << '\n';
std::cout << std::boolalpha; // print bools as true or false
std::cout << true << '\n';
std::cout << false << '\n';
return 0;
}
This prints:
1
0
true
false
You can use std::noboolalpha to turn it back off.
A condition (also called a conditional expression) is an expression that evaluates to a Boolean value.
If statements only conditionally execute a single statement. We talk about how to conditionally execute multiple statements in lesson 7.2 – If statements and blocks.
You never need an if-statement of the form:
if (condition)
return true;
else
return false;
This can be replaced by the single statement return condition
.
Be careful not to mix up character numbers with integer numbers. The following two initializations are not the same:
char ch{5}; // initialize with integer 5 (stored as integer 5)
char ch{'5'}; // initialize with code point for '5' (stored as integer 53)
Character numbers are intended to be used when we want to represent numbers as text, rather than as numbers to apply mathematical operations to.
Escape sequences start with a backslash (), not a forward slash (/). If you use a forward slash by accident, it may still compile, but will not yield the desired result.
Put stand-alone chars in single quotes (e.g. ‘t’ or ‘\n’, not “t” or “\n”). This helps the compiler optimize more effectively.
Avoid multicharacter literals (e.g. ‘56’).
Make sure that your newlines are using escape sequence ‘\n’ , not multicharacter literal ‘/n’.
wchar_t should be avoided in almost all cases (except when interfacing with the Windows API). Its size is implementation defined, and is not reliable. It has largely been deprecated.
The term “deprecated” means “still supported, but no longer recommended for use, because it has been replaced by something better or is no longer considered safe”.
Much like ASCII maps the integers 0-127 to American English characters, other character encoding standards exist to map integers (of varying sizes) to characters in other languages. The most well-known mapping outside of ASCII is the Unicode standard, which maps over 144,000 integers to characters in many different languages. Because Unicode contains so many code points, a single Unicode code point needs 32-bits to represent a character (called UTF-32). However, Unicode characters can also be encoded using multiple 16-bit or 8-bit characters (called UTF-16 and UTF-8 respectively).
char16_t and char32_t were added to C++11 to provide explicit support for 16-bit and 32-bit Unicode characters. char8_t has been added in C++20.
You won’t need to use char8_t, char16_t, or char32_t unless you’re planning on making your program Unicode compatible. Unicode and localization are generally outside the scope of these tutorials, so we won’t cover it further.
In the meantime, you should only use ASCII characters when working with characters (and strings). Using characters from other character sets may cause your characters to display incorrectly.
When the compiler does type conversion on our behalf without us explicitly asking, we call this implicit type conversion.
Type conversion produces a new value of the target type from a value of a different type.
You’ll need to disable “treat warnings as errors” temporarily if you want to compile this example. See lesson 0.11 – Configuring your compiler: Warning and error levels for more information about this setting.
Some type conversions are always safe to make (such as int to double), whereas others may result in the value being changed during conversion (such as double to int). Unsafe implicit conversions will typically either generate a compiler warning, or (in the case of brace initialization) an error.
This is one of the primary reasons brace initialization is the preferred initialization form. Brace initialization will ensure we don’t try to initialize a variable with a initializer that will lose value when it is implicitly type converted:
int main()
{
double d { 5 }; // okay: int to double is safe
int x { 5.5 }; // error: double to int not safe
return 0;
}
Whenever you see C++ syntax (excluding the preprocessor) that makes use of angled brackets (<>), the thing between the angled brackets will most likely be a type. This is typically how C++ deals with code that need a parameterized type.
To convert an unsigned number to a signed number, you can also use the static_cast operator:
#include
int main()
{
unsigned int u { 5u }; // 5u means the number 5 as an unsigned int
int s { static_cast<int>(u) }; // return value of variable u as an int
std::cout << s;
return 0;
}
The static_cast operator doesn’t do any range checking, so if you cast a value to a type whose range doesn’t contain that value, undefined behavior will result. Therefore, the above cast from unsigned int to int will yield unpredictable results if the value of the unsigned int is greater than the maximum value a signed int can hold.
The static_cast operator will produce undefined behavior if the value being converted doesn’t fit in range of the new type.
In cases where std::int8_t is treated as a char, input from the console can also cause problems:
#include
#include
int main()
{
std::cout << "Enter a number between 0 and 127: ";
std::int8_t myint{};
std::cin >> myint;
std::cout << "You entered: " << static_cast<int>(myint);
return 0;
}
A sample run of this program:
Enter a number between 0 and 127: 35
You entered: 51
Here’s what’s happening. When std::int8_t is treated as a char, the input routines interpret our input as a sequence of characters, not as an integer. So when we enter 35, we’re actually entering two chars, ‘3’ and ‘5’. Because a char object can only hold one character, the ‘3’ is extracted (the ‘5’ is left in the input stream for possible extraction later). Because the char ‘3’ has ASCII code point 51, the value 51 is stored in myint, which we then print later as an int.
Due to the way that the compiler parses more complex declarations, some developers prefer placing the const after the type (because it is slightly more consistent). This style is called “east const”. While this style has some advocates (and some reasonable points), it has not caught on significantly.
Place const before the type (because it is more idiomatic to do so).
#include
int main()
{
std::cout << "Enter your age: ";
int age{};
std::cin >> age;
const int constAge { age }; // initialize const variable using non-const value
age = 5; // ok: age is non-const, so we can change its value
constAge = 6; // error: constAge is const, so we cannot change its value
return 0;
}
In the above example, we initialize const variable constAge with non-const variable age. Because age is still non-const, we can change its value. However, because constAge is const, we cannot change the value it has after initialization.
Don’t use const when passing by value.
Don’t use const when returning by value.
Prefer constant variables over object-like macros with substitution text.
Evaluating constant expressions at compile-time makes our compilation take longer (because the compiler has to do more work), but such expressions only need to be evaluated once (rather than every time the program is run). The resulting executables are faster and use less memory.
The compiler is only required to evaluate constant expressions at compile time in contexts where a value is actually required at compile-time.
In the variable declaration int x { 3 + 4 };, x is not a constant variable and the initialization value does not need to be known at compile-time, so the constant expression 3 + 4 is not required to be evaluated at compile-time.
Even though it is not strictly required, modern compilers will usually evaluate a constant expression at compile-time because it is an easy optimization and more performant to do so.
A const variable is a compile-time constant if its initializer is a constant expression.
Any const variable that is initialized with a non-constant expression is a runtime constant. Runtime constants are constants whose initialization values aren’t known until runtime.
constexpr
keywordA constexpr (which is short for “constant expression”) variable can only be a compile-time constant. If the initialization value of a constexpr variable is not a constant expression, the compiler will error.
For example:
#include
int five()
{
return 5;
}
int main()
{
constexpr double gravity { 9.8 }; // ok: 9.8 is a constant expression
constexpr int sum { 4 + 5 }; // ok: 4 + 5 is a constant expression
constexpr int something { sum }; // ok: sum is a constant expression
std::cout << "Enter your age: ";
int age{};
std::cin >> age;
constexpr int myAge { age }; // compile error: age is not a constant expression
constexpr int f { five() }; // compile error: return value of five() is not a constant expression
return 0;
}
Any variable that should not be modifiable after initialization and whose initializer is known at compile-time should be declared as constexpr.
Any variable that should not be modifiable after initialization and whose initializer is not known at compile-time should be declared as const.
C++ does support functions that can be evaluated at compile-time (and thus can be used in constant expressions) – we discuss these in lesson 6.14 – Constexpr and consteval functions.
Literals are unnamed values inserted directly into the code.
Prefer literal suffix L (upper case) over l (lower case).
By default, floating point literals have a type of double. To make them float literals instead, the f (or F) suffix should be used:
#include
int main()
{
std::cout << 5.0; // 5.0 (no suffix) is type double (by default)
std::cout << 5.0f; // 5.0f is type float
return 0;
}
New programmers are often confused about why the following causes a compiler warning:
float f { 4.1 }; // warning: 4.1 is a double literal, not a float literal
Because 4.1 has no suffix, the literal has type double, not float. When the compiler determines the type of a literal, it doesn’t care what you’re doing with the literal (e.g. in this case, using it to initialize a float variable). Since the type of the literal (double) doesn’t match the type of the variable it is being used to initialize (float), the literal value must be converted to a float so it can then be used to initialize variable f. Converting a value from a double to a float can result in a loss of precision, so the compiler will issue a warning.
The solution here is one of the following:
float f { 4.1f }; // use 'f' suffix so the literal is a float and matches variable type of float
double d { 4.1 }; // change variable to type double so it matches the literal type double
There are two different ways to declare floating-point literals:
double pi { 3.14159 }; // 3.14159 is a double literal in standard notation
double avogadro { 6.02e23 }; // 6.02 x 10^23 is a double literal in scientific notation
In the second form, the number after the exponent can be negative:
double electronCharge { 1.6e-19 }; // charge on an electron is 1.6 x 10^-19
A magic number is a literal (usually a number) that either has an unclear meaning or may need to be changed later.
Note that magic numbers aren’t always numbers – they can also be text (e.g. names) or other types.
Avoid magic numbers in your code (use constexpr variables instead).
Prior to C++14, there is no support for binary literals. However, hexadecimal literals provide us with a useful workaround (that you may still see in existing code bases):
#include
int main()
{
int bin{}; // assume 16-bit ints
bin = 0x0001; // assign binary 0000 0000 0000 0001 to the variable
bin = 0x0002; // assign binary 0000 0000 0000 0010 to the variable
bin = 0x0004; // assign binary 0000 0000 0000 0100 to the variable
bin = 0x0008; // assign binary 0000 0000 0000 1000 to the variable
bin = 0x0010; // assign binary 0000 0000 0001 0000 to the variable
bin = 0x0020; // assign binary 0000 0000 0010 0000 to the variable
bin = 0x0040; // assign binary 0000 0000 0100 0000 to the variable
bin = 0x0080; // assign binary 0000 0000 1000 0000 to the variable
bin = 0x00FF; // assign binary 0000 0000 1111 1111 to the variable
bin = 0x00B3; // assign binary 0000 0000 1011 0011 to the variable
bin = 0xF770; // assign binary 1111 0111 0111 0000 to the variable
return 0;
}
In C++14, we can use binary literals by using the 0b prefix:
#include
int main()
{
int bin{}; // assume 16-bit ints
bin = 0b1; // assign binary 0000 0000 0000 0001 to the variable
bin = 0b11; // assign binary 0000 0000 0000 0011 to the variable
bin = 0b1010; // assign binary 0000 0000 0000 1010 to the variable
bin = 0b11110000; // assign binary 0000 0000 1111 0000 to the variable
return 0;
}
Because long literals can be hard to read, C++14 also adds the ability to use a quotation mark (‘) as a digit separator.
#include
int main()
{
int bin { 0b1011'0010 }; // assign binary 1011 0010 to the variable
long value { 2'132'673'462 }; // much easier to read than 2132673462
return 0;
}
Also note that the separator can not occur before the first digit of the value:
int bin { 0b'1011'0010 }; // error: ' used before first digit of value
Digit separators are purely visual and do not impact the literal value in any way.
By default, C++ outputs values in decimal. However, you can change the output format via use of the std::dec, std::oct, and std::hex I/O manipulators:
#include
int main()
{
int x { 12 };
std::cout << x << '\n'; // decimal (by default)
std::cout << std::hex << x << '\n'; // hexadecimal
std::cout << x << '\n'; // now hexadecimal
std::cout << std::oct << x << '\n'; // octal
std::cout << std::dec << x << '\n'; // return to decimal
std::cout << x << '\n'; // decimal
return 0;
}
This prints:
12
c
c
14
12
12
Note that once applied, the I/O manipulator remains set for future output until it is changed again.
#include // for std::bitset
#include
int main()
{
// std::bitset<8> means we want to store 8 bits
std::bitset<8> bin1{ 0b1100'0101 }; // binary literal for binary 1100 0101
std::bitset<8> bin2{ 0xC5 }; // hexadecimal literal for binary 1100 0101
std::cout << bin1 << '\n' << bin2 << '\n';
std::cout << std::bitset<4>{ 0b1010 } << '\n'; // create a temporary std::bitset and print it
return 0;
}
This prints:
11000101
11000101
1010
Fortunately, C++ has introduced two additional string types into the language that are much easier and safer to work with: std::string
and std::string_view
(C++17).Although std::string and std::string_view aren’t fundamental types, they’re straightforward and useful enough that we’ll introduce them here rather than wait until the chapter on compound types (chapter 9).
Using strings with std::cin may yield some surprises! Consider the following example:
#include
#include
int main()
{
std::cout << "Enter your full name: ";
std::string name{};
std::cin >> name; // this won't work as expected since std::cin breaks on whitespace
std::cout << "Enter your age: ";
std::string age{};
std::cin >> age;
std::cout << "Your name is " << name << " and your age is " << age << '\n';
return 0;
}
Here’s the results from a sample run of this program:
Enter your full name: John Doe
Enter your age: Your name is John and your age is Doe
Hmmm, that isn’t right! What happened? It turns out that when using operator>> to extract a string from std::cin, operator>> only returns characters up to the first whitespace it encounters. Any other characters are left inside std::cin, waiting for the next extraction.
So when we used operator>> to extract input into variable name, only “John” was extracted, leaving " Doe" inside std::cin. When we then used operator>> to get extract input into variable age, it extracted “Doe” instead of waiting for us to input an age. Then the program ends.
#include // For std::string and std::getline
#include
int main()
{
std::cout << "Enter your full name: ";
std::string name{};
std::getline(std::cin >> std::ws, name); // read a full line of text into name
std::cout << "Enter your age: ";
std::string age{};
std::getline(std::cin >> std::ws, age); // read a full line of text into age
std::cout << "Your name is " << name << " and your age is " << age << '\n';
return 0;
}
Now our program works as expected:
Enter your full name: John Doe
Enter your age: 23
Your name is John Doe and your age is 23
If using std::getline() to read strings, use std::cin >> std::ws input manipulator to ignore leading whitespace.
Using the extraction operator (>>) with std::cin ignores leading whitespace.
std::getline() does not ignore leading whitespace unless you use input manipulator std::ws.
Also note that std::string::length() returns an unsigned integral value (most likely of type size_t). If you want to assign the length to an int variable, you should static_cast it to avoid compiler warnings about signed/unsigned conversions:
int length { static_cast<int>(name.length()) };
In C++20, you can also use the std::ssize() function to get the length of a std::string as a signed integer:
#include
#include
int main()
{
std::string name{ "Alex" };
std::cout << name << " has " << std::ssize(name) << " characters\n";
return 0;
}
With normal functions, we call function(object). With member functions, we call object.function().
Do not pass std::string by value, as making copies of std::string is expensive. Prefer std::string_view parameters.
Double-quoted string literals (like “Hello, world!”) are C-style strings by default (and thus, have a strange type).
We can create string literals with type std::string by using a s suffix after the double-quoted string literal.
#include
#include // for std::string
#include // for std::string_view
int main()
{
using namespace std::literals; // easiest way to access the s and sv suffixes
std::cout << "foo\n"; // no suffix is a C-style string literal
std::cout << "goo\n"s; // s suffix is a std::string literal
std::cout << "moo\n"sv; // sv suffix is a std::string_view literal
return 0;
};
The “s” suffix lives in the namespace std::literals::string_literals. The easiest way to access the literal suffixes is via using directive using namespace std::literals
. We discuss using directives in lesson 6.12 – Using declarations and using directives. This is one of the exception cases where using an entire namespace is okay, because the suffixes defined within are unlikely to collide with any of your code.
You probably won’t need to use std::string literals very often (as it’s fine to initialize a std::string object with a C-style string literal), but we’ll see a few cases in future lessons where using std::string literals instead of C-style string literals makes things easier.
If you try to define a constexpr std::string, your compiler will probably generate an error:
#include
#include
using namespace std::literals;
int main()
{
constexpr std::string name{ "Alex"s }; // compile error
std::cout << "My name is: " << name;
return 0;
}
This happens because constexpr std::string isn’t supported in C++17 or earlier, and only has minimal support in C++20. If you need constexpr strings, use std::string_view instead (discussed in lesson 4.18 – Introduction to std::string_view.
std::string is complex, leveraging many language features that we haven’t covered yet. Fortunately, you don’t need to understand these complexities to use std::string for simple tasks, like basic string input and output. We encourage you to start experimenting with strings now, and we’ll cover additional string capabilities later.
C++17
#include
#include
void printSV(std::string_view str) // now a std::string_view
{
std::cout << str << '\n';
}
int main()
{
std::string_view s{ "Hello, world!" }; // now a std::string_view
printSV(s);
return 0;
}
Prefer std::string_view over std::string when you need a read-only string, especially for function parameters.
The “sv” suffix lives in the namespace std::literals::string_view_literals. The easiest way to access the literal suffixes is via using directive using namespace std::literals. We discuss using directives in lesson 6.12 – Using declarations and using directives. This is one of the exception cases where using an entire namespace is okay.
Angled brackets are typically used in C++ to represent something that needs a parameterizable type. This is used with static_cast to determine what data type the argument should be converted to (e.g. static_cast
will convert x to an int).
A symbolic constant is a name given to a constant value. Constant variables are one type of symbolic constant, as are object-like macros with substitution text.
Use parentheses to make it clear how a non-trivial expression should evaluate (even if they are technically unnecessary).
Expressions with a single assignment operator do not need to have the right operand of the assignment wrapped in parenthesis.
In many cases, the operands in a compound expression may evaluate in any order. This includes function calls and the arguments to those function calls.
Outside of the operator precedence and associativity rules, assume that the parts of an expression could evaluate in any order. Ensure that the expressions you write are not dependent on the order of evaluation of those parts.
For readability, both of these operators should be placed immediately preceding the operand (e.g. -x, not - x).
In the vast majority of cases, integer exponentiation will overflow the integral type. This is likely why such a function wasn’t included in the standard library in the first place.
Strongly favor the prefix version of the increment and decrement operators, as they are generally more performant, and you’re less likely to run into strange issues with them.
However, side effects can also lead to unexpected results:
#include
int add(int x, int y)
{
return x + y;
}
int main()
{
int x{ 5 };
int value{ add(x, ++x) }; // is this 5 + 6, or 6 + 6?
// It depends on what order your compiler evaluates the function arguments in
std::cout << value << '\n'; // value could be 11 or 12, depending on how the above line evaluates!
return 0;
}
The C++ standard does not define the order in which function arguments are evaluated. If the left argument is evaluated first, this becomes a call to add(5, 6), which equals 11. If the right argument is evaluated first, this becomes a call to add(6, 6), which equals 12! Note that this is only a problem because one of the arguments to function add() has a side effect.
The C++ standard intentionally does not define these things so that compilers can do whatever is most natural (and thus most performant) for a given architecture.
The C++ standard also does not define the order in which the operands of operators are evaluated. Thus x + ++x will exhibit the same issue as add(x, ++x) above.
There are other cases where the C++ standard does not specify evaluation order, so different compilers may exhibit different behaviors. Even when the C++ standard does make it clear how things should be evaluated, historically this has been an area where there have been many compiler bugs. These problems can generally all be avoided by ensuring that any variable that has a side-effect applied is used no more than once in a given statement.
C++ does not define the order of evaluation for function arguments or the operands of operators.
Don’t use a variable that has a side effect applied to it more than once in a given statement. If you do, the result may be undefined.
The comma operator (,) allows you to evaluate multiple expressions wherever a single expression is allowed. The comma operator evaluates the left operand, then the right operand, and then returns the result of the right operand.
z = (a, b); // evaluate (a, b) first to get result of b, then assign that value to variable z.
z = a, b; // evaluates as "(z = a), b", so z gets assigned the value of a, and b is evaluated and discarded.
Avoid using the comma operator, except within for loops.
In C++, the comma symbol is often used as a separator, and these uses do not invoke the comma operator. Some examples of separator commas:
void foo(int x, int y) // Comma used to separate parameters in function definition
{
add(x, y); // Comma used to separate arguments in function call
constexpr int z{ 3 }, w{ 5 }; // Comma used to separate multiple variables being defined on the same line (don't do this)
}
There is no need to avoid separator commas (except when declaring multiple variables, which you should not do).
The conditional operator (? (also sometimes called the “arithmetic if” operator) is a ternary operator (it takes 3 operands). Because it has historically been C++’s only ternary operator, it’s also sometimes referred to as “the ternary operator”.
Because the << operator has higher precedence than the ?: operator, the statement:
std::cout << (x > y) ? x : y << '\n';
would evaluate as:
(std::cout << (x > y)) ? x : y << '\n';
That would print 1 (true) if x > y, or 0 (false) otherwise!
Always parenthesize the conditional part of the conditional operator, and consider parenthesizing the whole thing as well.
#include
int main()
{
constexpr bool inBigClassroom { false };
constexpr int classSize { inBigClassroom ? 30 : 20 };
std::cout << "The class size is: " << classSize << '\n';
return 0;
}
#include
int main()
{
constexpr int x{ 5 };
std::cout << (x != 5 ? x : "x is 5"); // won't compile
return 0;
}
Only use the conditional operator for simple conditionals where you use the result and where it enhances readability.
Don’t add unnecessary == or != to conditions. It makes them harder to read without offering any additional value.
Avoid using operator== and operator!= to compare floating point values if there is any chance those values have been calculated.
constexpr gravity { 9.8 }
if (gravity == 9.8) // okay if gravity was initialized with a literal
// we're on earth
It is okay to compare a low-precision (few significant digits) floating point literal to the same literal value of the same type.
Here’s our previous code testing both algorithms:
#include
#include
#include
// return true if the difference between a and b is within epsilon percent of the larger of a and b
bool approximatelyEqualRel(double a, double b, double relEpsilon)
{
return (std::abs(a - b) <= (std::max(std::abs(a), std::abs(b)) * relEpsilon));
}
bool approximatelyEqualAbsRel(double a, double b, double absEpsilon, double relEpsilon)
{
// Check if the numbers are really close -- needed when comparing numbers near zero.
double diff{ std::abs(a - b) };
if (diff <= absEpsilon)
return true;
// Otherwise fall back to Knuth's algorithm
return (diff <= (std::max(std::abs(a), std::abs(b)) * relEpsilon));
}
int main()
{
// a is really close to 1.0, but has rounding errors
double a{ 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 };
std::cout << approximatelyEqualRel(a, 1.0, 1e-8) << '\n'; // compare "almost 1.0" to 1.0
std::cout << approximatelyEqualRel(a-1.0, 0.0, 1e-8) << '\n'; // compare "almost 0.0" to 0.0
std::cout << approximatelyEqualAbsRel(a, 1.0, 1e-12, 1e-8) << '\n'; // compare "almost 1.0" to 1.0
std::cout << approximatelyEqualAbsRel(a-1.0, 0.0, 1e-12, 1e-8) << '\n'; // compare "almost 0.0" to 0.0
}
1
0
1
1
You can see that approximatelyEqualAbsRel() handles the small inputs correctly.
Comparison of floating point numbers is a difficult topic, and there’s no “one size fits all” algorithm that works for every case. However, the approximatelyEqualAbsRel() with an absEpsilon of 1e-12 and a relEpsilon of 1e-8 should be good enough to handle most cases you’ll encounter.
If logical NOT is intended to operate on the result of other operators, the other operators and their operands need to be enclosed in parentheses.
Short circuit evaluation may cause Logical OR and Logical AND to not evaluate one operand. Avoid using expressions with side effects in conjunction with these operators.
The Logical OR and logical AND operators are an exception to the rule that the operands may evaluate in any order, as the standard explicitly states that the left operand must evaluate first.
Only the built-in versions of these operators perform short-circuit evaluation. If you overload these operators to make them work with your own types, those overloaded operators will not perform short-circuit evaluation.
When mixing logical AND and logical OR in a single expression, explicitly parenthesize each operation to ensure they evaluate how you intend.
!(x && y)
is equivalent to !x || !y
!(x || y)
is equivalent to !x && !y
If you need a form of logical XOR that works with non-Boolean operands, you can static_cast your operands to bool:
if (static_cast<bool>(a) != static_cast<bool>(b) != static_cast<bool>(c) != static_cast<bool>(d)) ... // a XOR b XOR c XOR d, for any type that can be converted to bool
The following trick (which makes use of the fact that operator! implicitly converts its operand to bool) also works and is a bit more concise:
if (!!a != !!b != !!c != !!d)
Neither of these are very intuitive, so document them well if you use them.
Why should you never do the following:
a) int y{ foo(++x, x) };
Because operator++ applies a side effect to x, we should not use x again in the same expression. In this case, the parameters to function foo() can be evaluated in any order, so it’s indeterminate whether x or ++x gets evaluated first. Because ++x changes the value of x, it’s unclear what values will be passed into the function.
b) double x{ 0.1 + 0.1 + 0.1 }; return (x == 0.3);
Floating point rounding errors will cause this to evaluate as false even though it looks like it should be true.
c) int x{ 3 / 0 };
Division by 0 causes undefined behavior, which is likely expressed in a crash.
Modifying individual bits within an object is called bit manipulation.
Bit manipulation is also useful in encryption and compression algorithms.
Bit manipulation is one of the few times when you should unambiguously use unsigned integers (or std::bitset).
#include
#include
int main()
{
std::bitset<8> bits{ 0b0000'0101 }; // we need 8 bits, start with bit pattern 0000 0101
bits.set(3); // set bit position 3 to 1 (now we have 0000 1101)
bits.flip(4); // flip bit 4 (now we have 0001 1101)
bits.reset(4); // set bit 4 back to 0 (now we have 0000 1101)
std::cout << "All the bits: " << bits << '\n';
std::cout << "Bit 3 has value: " << bits.test(3) << '\n';
std::cout << "Bit 4 has value: " << bits.test(4) << '\n';
return 0;
}
This prints:
All the bits: 00001101
Bit 3 has value: 1
Bit 4 has value: 0
std::bitset doesn’t make this easy. In order to do this, or if we want to use unsigned integer bit flags instead of std::bitset, we need to turn to more traditional methods. We’ll cover these in the next couple of lessons.
One potential surprise is that std::bitset is optimized for speed, not memory savings. The size of a std::bitset is typically the number of bytes needed to hold the bits, rounded up to the nearest sizeof(size_t), which is 4 bytes on 32-bit machines, and 8-bytes on 64-bit machines.
Thus, a std::bitset<8> will typically use either 4 or 8 bytes of memory, even though it technically only needs 1 byte to store 8 bits. Thus, std::bitset is most useful when we desire convenience, not memory savings.
To avoid surprises, use the bitwise operators with unsigned operands or std::bitset.
Note that if you’re using operator << for both output and left shift, parenthesization is required:
#include
#include
int main()
{
std::bitset<4> x{ 0b0110 };
std::cout << x << 1 << '\n'; // print value of x (0110), then 1
std::cout << (x << 1) << '\n'; // print x left shifted by 1 (1100)
return 0;
}
This prints:
01101
1100
The last operator is the bitwise XOR (^), also known as exclusive or.
When evaluating two operands, XOR evaluates to true (1) if one and only one of its operands is true (1). If neither or both are true, it evaluates to 0.
There is no bitwise NOT assignment operator. This is because the other bitwise operators are binary, but bitwise NOT is unary (so what would go on the right-hand side of a ~= operator?). If you want to flip all of the bits, you can use normal assignment here: x = ~x;
Summarizing how to evaluate bitwise operations utilizing the column method:
When evaluating bitwise OR, if any bit in a column is 1, the result for that column is 1.
When evaluating bitwise AND, if all bits in a column are 1, the result for that column is 1.
When evaluating bitwise XOR, if there are an odd number of 1 bits in a column, the result for that column is 1.
In the next lesson, we’ll explore how these operators can be used in conjunction with bit masks to facilitate bit manipulation.
A bitwise rotation is like a bitwise shift, except that any bits shifted off one end are added back to the other end. For example 0b1001u << 1 would be 0b0010u, but a left rotate by 1 would result in 0b0011u instead. Implement a function that does a left rotate on a std::bitset<4>. For this one, it’s okay to use test() and set().
A bit mask is a predefined set of bits that is used to select which specific bits will be modified by subsequent operations.
A bit mask essentially performs the same function for bits – the bit mask blocks the bitwise operators from touching bits we don’t want modified, and allows access to the ones we do want modified.
As an aside…
Some compilers may complain about a sign conversion with this line:
flags &= ~mask2;
Because the type of mask2 is smaller than int, operator~ causes operand mask2 to undergo integral promotion to type int. Then the compiler complains that we’re trying to use operator&= where the left operand is unsigned and the right operand is signed.
If this is the case, try the following:
flags &= static_cast<std::uint8_t>(~mask2);
We discuss integral promotion in lesson 8.2 – Floating-point and integral promotion.
Why would you want to? The functions only allow you to modify individual bits. The bitwise operators allow you to modify multiple bits at once.
Instead, if you defined the function using bit flags like this:
void someFunction(std::bitset<32> options);
Then you could use bit flags to pass in only the options you wanted:
someFunction(option10 | option32);
Not only is this much more readable, it’s likely to be more performant as well, since it only involves 2 operations (one Bitwise OR and one parameter copy).
This is one of the reasons OpenGL, a well regarded 3d graphic library, opted to use bit flag parameters instead of many consecutive Boolean parameters.
Here’s a sample function call from OpenGL:
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); // clear the color and the depth buffer
GL_COLOR_BUFFER_BIT and GL_DEPTH_BUFFER_BIT are bit masks defined as follows (in gl2.h):
#define GL_DEPTH_BUFFER_BIT 0x00000100
#define GL_STENCIL_BUFFER_BIT 0x00000400
#define GL_COLOR_BUFFER_BIT 0x00004000
Summarizing how to set, clear, toggle, and query bit flags:
To query bit states, we use bitwise AND:
if (flags & option4) ... // if option4 is set, do something
To set bits (turn on), we use bitwise OR:
flags |= option4; // turn option 4 on.
flags |= (option4 | option5); // turn options 4 and 5 on.
To clear bits (turn off), we use bitwise AND with bitwise NOT:
flags &= ~option4; // turn option 4 off
flags &= ~(option4 | option5); // turn options 4 and 5 off
To flip bit states, we use bitwise XOR:
flags ^= option4; // flip option4 from on to off, or vice versa
flags ^= (option4 | option5); // flip options 4 and 5
Write a program that asks the user to input a number between 0 and 255. Print this number as an 8-bit binary number (of the form #### ####). Don’t use any bitwise operators. Don’t use std::bitset.
#include
int printAndDecrementOne(int x, int pow)
{
std::cout << '1';
return (x - pow);
}
// x is our number to test
// pow is a power of 2 (e.g. 128, 64, 32, etc...)
int printAndDecrementBit(int x, int pow)
{
// Test whether our x is greater than some power of 2 and print the bit
if (x >= pow)
return printAndDecrementOne(x, pow); // If x is greater than our power of 2, subtract the power of 2
// x is less than pow
std::cout << '0';
return x;
}
int main()
{
std::cout << "Enter an integer between 0 and 255: ";
int x{};
std::cin >> x;
x = printAndDecrementBit(x, 128);
x = printAndDecrementBit(x, 64);
x = printAndDecrementBit(x, 32);
x = printAndDecrementBit(x, 16);
std::cout << ' ';
x = printAndDecrementBit(x, 8);
x = printAndDecrementBit(x, 4);
x = printAndDecrementBit(x, 2);
x = printAndDecrementBit(x, 1);
std::cout << '\n';
return 0;
}
Keep the nesting level of your functions to 3 or less. If your function has a need for more nested levels, consider refactoring your function into sub-functions.
Do not add custom functionality to the std namespace.
In applications, namespaces can be used to separate application-specific code from code that might be reusable later (e.g. math functions). For example, physical and math functions could go into one namespace (e.g. math:. Language and localization functions in another (e.g. lang:.
When you write a library or code that you want to distribute to others, always place your code inside a namespace. The code your library is used in may not follow best practices – in such a case, if your library’s declarations aren’t in a namespace, there’s an elevated chance for naming conflicts to occur. As an additional advantage, placing library code inside a namespace also allows the user to see the contents of your library by using their editor’s auto-complete and suggestion feature.
Identifiers have another property named linkage. An identifier’s linkage determines whether other declarations of that name refer to the same object or not.
Local variables have no linkage, which means that each declaration refers to a unique object. For example:
int main()
{
int x { 2 }; // local variable, no linkage
{
int x { 3 }; // this identifier x refers to a different object than the previous x
}
return 0;
}
Scope and linkage may seem somewhat similar. However, scope defines where a single declaration can be seen and used. Linkage defines whether multiple declarations refer to the same object or not.
New developers sometimes wonder whether it’s worth creating a nested block just to intentionally limit a variable’s scope (and force it to go out of scope / be destroyed early). Doing so makes that variable simpler, but the overall function becomes longer and more complex as a result. The tradeoff generally isn’t worth it. If creating a nested block seems useful to intentionally limit the scope of a chunk of code, that code might be better to put in a separate function instead.
Define variables in the most limited existing scope. Avoid creating new blocks whose only purpose is to limit the scope of variables.
Consider using a “g” or “g_” prefix when naming non-const global variables, to help differentiate them from local variables and function parameters.
Global variables are created when the program starts, and destroyed when it ends. This is called static duration. Variables with static duration are sometimes called static variables.
New programmers are often tempted to use lots of global variables, because they can be used without having to explicitly pass them to every function that needs them. However, use of non-constant global variables should generally be avoided altogether! We’ll discuss why in upcoming lesson 6.8 – Why (non-const) global variables are evil.
// Non-constant global variables
int g_x; // defines non-initialized global variable (zero initialized by default)
int g_x {}; // defines explicitly zero-initialized global variable
int g_x { 1 }; // defines explicitly initialized global variable
// Const global variables
const int g_y; // error: const variables must be initialized
const int g_y { 2 }; // defines initialized global constant
// Constexpr global variables
constexpr int g_y; // error: constexpr variables must be initialized
constexpr int g_y { 3 }; // defines initialized global const
Shadowing of local variables should generally be avoided, as it can lead to inadvertent errors where the wrong variable is used or modified. Some compilers will issue a warning when a variable is shadowed.
For the same reason that we recommend avoiding shadowing local variables, we recommend avoiding shadowing global variables as well. This is trivially avoidable if all of your global names use a “g_” prefix.
Avoid variable shadowing.
In lesson 6.3 – Local variables, we said, “An identifier’s linkage determines whether other declarations of that name refer to the same object or not”, and we discussed how local variables have no linkage
.
Global variable and functions identifiers can have either internal linkage
or external linkage
. We’ll cover the internal linkage case in this lesson, and the external linkage case in lesson 6.7 – External linkage and variable forward declarations.
An identifier with internal linkage can be seen and used within a single file, but it is not accessible from other files (that is, it is not exposed to the linker). This means that if two files have identically named identifiers with internal linkage, those identifiers will be treated as independent.
Global variables with internal linkage
Global variables with internal linkage are sometimes called internal variables.
To make a non-constant global variable internal, we use the static keyword.
#include
static int g_x{}; // non-constant globals have external linkage by default, but can be given internal linkage via the static keyword
const int g_y{ 1 }; // const globals have internal linkage by default
constexpr int g_z{ 2 }; // constexpr globals have internal linkage by default
int main()
{
std::cout << g_x << ' ' << g_y << ' ' << g_z << '\n';
return 0;
}
The use of the static keyword above is an example of a storage class specifier, which sets both the name’s linkage and its storage duration (but not its scope). The most commonly used storage class specifiers are static, extern, and mutable. The term storage class specifier is mostly used in technical documentations.
// Internal global variables definitions:
static int g_x; // defines non-initialized internal global variable (zero initialized by default)
static int g_x{ 1 }; // defines initialized internal global variable
const int g_y { 2 }; // defines initialized internal global const variable
constexpr int g_y { 3 }; // defines initialized internal global constexpr variable
// Internal function definitions:
static int foo() {}; // defines internal function
We provide a comprehensive summary in lesson 6.11 – Scope, duration, and linkage summary.
Global variables with external linkage are sometimes called external variables. To make a global variable external (and thus accessible by other files), we can use the extern
keyword to do so:
int g_x { 2 }; // non-constant globals are external by default
extern const int g_y { 3 }; // const globals can be defined as extern, making them external
extern constexpr int g_z { 3 }; // constexpr globals can be defined as extern, making them external (but this is useless, see the note in the next section)
int main()
{
return 0;
}
Non-const global variables are external by default (if used, the extern keyword will be ignored).
If you want to define an uninitialized non-const global variable, do not use the extern keyword, otherwise C++ will think you’re trying to make a forward declaration for the variable.
Although constexpr variables can be given external linkage via the extern keyword, they can not be forward declared, so there is no value in giving them external linkage.
This is because the compiler needs to know the value of the constexpr variable (at compile time). If that value is defined in some other file, the compiler has no visibility on what value was defined in that other file.
Variables forward declarations do need the extern keyword to help differentiate variables definitions from variable forward declarations (they look otherwise identical):
// non-constant
int g_x; // variable definition (can have initializer if desired)
extern int g_x; // forward declaration (no initializer)
// constant
extern const int g_y { 1 }; // variable definition (const requires initializers)
extern const int g_y; // forward declaration (no initializer)
However, informally, the term “file scope” is more often applied to global variables with internal linkage, and “global scope” to global variables with external linkage (since they can be used across the whole program, with the appropriate forward declarations).
// External global variable definitions:
int g_x; // defines non-initialized external global variable (zero initialized by default)
extern const int g_x{ 1 }; // defines initialized const external global variable
extern constexpr int g_x{ 2 }; // defines initialized constexpr external global variable
// Forward declarations
extern int g_y; // forward declaration for non-constant global variable
extern const int g_y; // forward declaration for const global variable
extern constexpr int g_y; // not allowed: constexpr variables can't be forward declared
We provide a comprehensive summary in lesson 6.11 – Scope, duration, and linkage summary.
Use local variables instead of global variables whenever possible.
Dynamic initialization of global variables causes a lot of problems in C++. Avoid dynamic initialization whenever possible.
There aren’t many.
As a rule of thumb, any use of a global variable should meet at least the following two criteria: There should only ever be one of the thing the variable represents in your program, and its use should be ubiquitous throughout your program.
If you do find a good use for a non-const global variable, a few useful bits of advice will minimize the amount of trouble you can get into. This advice isn’t only for non-const global variables, but can help with all global variables.
First, prefix all non-namespaced global variables with “g” or “g_”, or better yet, put them in a namespace (discussed in lesson 6.2 – User-defined namespaces and the scope resolution operator), to reduce the chance of naming collisions.
namespace constants
{
constexpr double gravity { 9.8 };
}
int main()
{
return 0;
}
Second, instead of allowing direct access to the global variable, it’s a better practice to “encapsulate” the variable. Make sure the variable can only be accessed from within the file it’s declared in, e.g. by making the variable static or const, then provide external global “access functions” to work with the variable. These functions can ensure proper usage is maintained (e.g. do input validation, range checking, etc…). Also, if you ever decide to change the underlying implementation (e.g. move from one database to another), you only have to update the access functions instead of every piece of code that uses the global variable directly.
For example, instead of:
namespace constants
{
extern const double gravity { 9.8 }; // has external linkage, is directly accessible by other files
}
Do this:
namespace constants
{
constexpr double gravity { 9.8 }; // has internal linkage, is accessible only by this file
}
double getGravity() // this function can be exported to other files to access the global outside of this file
{
// We could add logic here if needed later
// or change the implementation transparently to the callers
return constants::gravity;
}
Global const variables have internal linkage by default, gravity doesn’t need to be static.
Third, when writing an otherwise standalone function that uses the global variable, don’t use the variable directly in your function body. Pass it in as an argument instead. That way, if your function ever needs to use a different value for some circumstance, you can simply vary the argument. This helps maintain modularity.
Instead of:
#include
namespace constants
{
constexpr double gravity { 9.8 };
}
// This function is only useful for calculating your instant velocity based on the global gravity
double instantVelocity(int time)
{
return constants::gravity * time;
}
int main()
{
std::cout << instantVelocity(5);
}
Do this:
#include
namespace constants
{
constexpr double gravity { 9.8 };
}
// This function can calculate the instant velocity for any gravity value (more useful)
double instantVelocity(int time, double gravity)
{
return gravity * time;
}
int main()
{
std::cout << instantVelocity(5, constants::gravity); // pass our constant to the function as a parameter
}
What’s the best naming prefix for a global variable?
Answer: //
C++ jokes are the best.
The term “optimizing away” refers to any process where the compiler optimizes the performance of your program by removing things in a way that doesn’t affect the output of your program. For example, lets say you have some const variable x that’s initialized to value 4. Wherever your code references variable x, the compiler can just replace x with 4 (since x is const, we know it won’t ever change to a different value) and avoid having to create and initialize a variable altogether.
We use const instead of constexpr in this method because constexpr variables can’t be forward declared, even if they have external linkage. This is because the compiler needs to know the value of the variable at compile time, and a forward declaration does not provide this information.
constants.cpp:
#include "constants.h"
namespace constants
{
// actual global variables
extern const double pi { 3.14159 };
extern const double avogadro { 6.0221413e23 };
extern const double myGravity { 9.2 }; // m/s^2 -- gravity is light on this planet
}
constants.h:
#ifndef CONSTANTS_H
#define CONSTANTS_H
namespace constants
{
// since the actual variables are inside a namespace, the forward declarations need to be inside a namespace as well
extern const double pi;
extern const double avogadro;
extern const double myGravity;
}
#endif
Use in the code file stays the same:
main.cpp:
#include "constants.h" // include all the forward declarations
#include
int main()
{
std::cout << "Enter a radius: ";
int radius{};
std::cin >> radius;
std::cout << "The circumference is: " << 2.0 * radius * constants::pi << '\n';
return 0;
}
Because global symbolic constants should be namespaced (to avoid naming conflicts with other identifiers in the global namespace), the use of a “g_” naming prefix is not necessary.
Now the symbolic constants will get instantiated only once (in constants.cpp) instead of in each code file where constants.h is #included, and all uses of these constants will be linked to the version instantiated in constants.cpp. Any changes made to constants.cpp will require recompiling only constants.cpp.
In order for variables to be usable in compile-time contexts, such as array sizes, the compiler has to see the variable’s definition (not just a forward declaration).
Given the above downsides, prefer defining your constants in a header file (either per the prior section, or per the next section). If you find that the values for your constants are changing a lot (e.g. because you are tuning the program) and this is leading to long compilation times, you can move just the offending constants into a .cpp file as needed.
C++17
C++17 introduced a new concept called inline variables. In C++, the term inline has evolved to mean “multiple definitions are allowed”. Thus, an inline variable is one that is allowed to be defined in multiple files without violating the one definition rule. Inline global variables have external linkage by default.
Inline variables have two primary restrictions that must be obeyed:
With this, we can go back to defining our globals in a header file without the downside of duplicated variables:
constants.h:
#ifndef CONSTANTS_H
#define CONSTANTS_H
// define your own namespace to hold constants
namespace constants
{
inline constexpr double pi { 3.14159 }; // note: now inline constexpr
inline constexpr double avogadro { 6.0221413e23 };
inline constexpr double myGravity { 9.2 }; // m/s^2 -- gravity is light on this planet
// ... other related constants
}
#endif
main.cpp:
#include "constants.h"
#include
int main()
{
std::cout << "Enter a radius: ";
int radius{};
std::cin >> radius;
std::cout << "The circumference is: " << 2.0 * radius * constants::pi << '\n';
return 0;
}
We can include constants.h
into as many code files as we want, but these variables will only be instantiated once and shared across all code files.
This method does retain the downside of requiring every file that includes the constants header be recompiled if any constant value is changed.
If you need global constants and your compiler is C++17 capable, prefer defining inline constexpr global variables in a header file.
Use std::string_view for constexpr strings. We cover this in lesson 4.18 – Introduction to std::string_view.
Initialize your static local variables. Static local variables are only initialized the first time the code is executed, not on subsequent calls.
Static local variables can be made const (or constexpr).
With a const/constexpr static local variable, you can create and initialize the expensive object once, and then reuse it whenever the function is called.
Static local variables should only be used if in your entire program and in the foreseeable future of your program, the variable is unique and it wouldn’t make sense to reset the variable.
Avoid static
local variables unless the variable never needs to be reset.
When used as part of an identifier declaration, the static and extern keywords are called storage class specifiers. In this context, they set the storage duration and linkage of the identifier.
Specifier | Meaning | Note |
---|---|---|
extern | static (or thread_local) storage duration and external linkage | |
static | static (or thread_local) storage duration and internal linkage | |
thread_local | thread storage duration | |
mutable | object allowed to be modified even if containing class is const | |
auto | automatic storage duration | Deprecated in C++11 |
register | automatic storage duration and hint to the compiler to place in a register | Deprecated in C++17 |
The term storage class specifier is typically only used in formal documentation.
You’ve probably seen this program in a lot of textbooks and tutorials:
#include
using namespace std;
int main()
{
cout << "Hello world!\n";
return 0;
}
Some older IDEs will also auto-populate new C++ projects with a similar program (so you can compile something immediately, rather than starting from a blank file).
If you see this, run. Your textbook, tutorial, or compiler are probably out of date. In this lesson, we’ll explore why.
In 1995, namespaces were standardized, and all of the functionality from the standard library was moved out of the global namespace and into namespace std
. This change broke older code that was still using names without std::
.
Fast forward to today – if you’re using the standard library a lot, typing std::
before everything you use from the standard library can become repetitive, and in some cases, can make your code harder to read.
C++ provides some solutions to both of these problems, in the form of using statements
.
But first, let’s define two terms.
A name can also be qualified by a class name using the scope resolution operator (:, or by a class object using the member selection operators (. or ->). For example:
class C; // some class
C::s_member; // s_member is qualified by class C
obj.x; // x is qualified by class object obj
ptr->y; // y is qualified by pointer to class object ptr
A using declaration allows us to use an unqualified name (with no scope) as an alias for a qualified name.
#include
int main()
{
using std::cout; // this using declaration tells the compiler that cout should resolve to std::cout
cout << "Hello world!\n"; // so no std:: prefix is needed here!
return 0;
} // the using declaration expires here
Although this method is less explicit than using the std::
prefix, it’s generally considered safe and acceptable (when used inside a function).
For technical reasons, using directives do not actually import names into the current scope – instead they import the names into an outer scope (more details about which outer scope is picked can be found here. However, these names are not accessible from the outer scope – they are only accessible via unqualified (non-prefixed) lookup from the scope of the using directive (or a nested scope).
The practical effect is that (outside of some weird edge cases involving multiple using directives inside nested namespaces), using directives behave as if the names had been imported into the current scope. To keep things simple, we will proceed under the simplification that the names are imported into the current scope.
#include
int main()
{
using namespace std; // this using directive tells the compiler to import all names from namespace std into the current namespace without qualification
cout << "Hello world!\n"; // so no std:: prefix is needed here
return 0;
}
In modern C++, using directives generally offer little benefit (saving some typing) compared to the risk. Because using directives import all of the names from a namespace (potentially including lots of names you’ll never use), the possibility for naming collisions to occur increases significantly (especially if you import the std namespace).
If a using declaration or using directive is used within a block, the names are applicable to just that block (it follows normal block scoping rules). This is a good thing, as it reduces the chances for naming collisions to occur to just within that block.
If a using declaration or using directive is used in the global namespace, the names are applicable to the entire rest of the file (they have file scope).
Of course, all of this headache can be avoided by explicitly using the scope resolution operator (: in the first place.
Avoid using directives (particularly using namespace std;), except in specific circumstances (such as using namespace std::literals to access the s and sv literal suffixes). Using declarations are generally considered safe to use inside blocks. Limit their use in the global namespace of a code file, and never use them in the global namespace of a header file.
Prefer explicit namespaces over using statements. Avoid using directives whenever possible. Using declarations
are okay to use inside blocks.
The using keyword is also used to define type aliases, which are unrelated to using statements. We cover type aliases in lesson 8.6 – Typedefs and type aliases.
A function that is eligible to have its function calls expanded is called an inline function.
Modern optimizing compilers make the decision about when functions should be expanded inline.
Some types of functions are implicitly treated as inline functions. These include:
Do not use the inline keyword to request inline expansion for your functions.
In lesson 6.9 – Sharing global constants across multiple files (using inline variables), we noted that in modern C++, the inline concept has evolved to have a new meaning: multiple definitions are allowed in the program. This is true for functions as well as variables. Thus, if we mark a function as inline, then that function is allowed to have multiple definitions (in different files), as long as those definitions are identical.
The compiler needs to be able to see the full definition of an inline function wherever it is called.
For the most part, you should not mark your functions as inline, but we’ll see examples in the future where this is useful.
Avoid the use of the inline keyword for functions unless you have a specific, compelling reason to do so.
A constexpr function is a function whose return value may be computed at compile-time. To make a function a constexpr function, we simply use the constexpr keyword in front of the return type. Here’s a similar program to the one above, using a constexpr function:
#include
constexpr int greater(int x, int y) // now a constexpr function
{
return (x > y ? x : y);
}
int main()
{
constexpr int x{ 5 };
constexpr int y{ 6 };
// We'll explain why we use variable g here later in the lesson
constexpr int g { greater(x, y) }; // will be evaluated at compile-time
std::cout << g << " is greater!\n";
return 0;
}
So in our example, the call to greater(x, y) will be replaced by the result of the function call, which is the integer value 6. In other words, the compiler will compile this:
#include
int main()
{
constexpr int x{ 5 };
constexpr int y{ 6 };
constexpr int g { 6 }; // greater(x, y) evaluated and replaced with return value 6
std::cout << g << " is greater!\n";
return 0;
}
To be eligible for compile-time evaluation, a function must have a constexpr return type and not call any non-constexpr functions. Additionally, a call to the function must have constexpr arguments (e.g. constexpr variables or literals).
We’ll use the term “eligible for compile-time evaluation” later in the article, so remember this definition.
ur greater() function definition and function call in the above example meets these requirements, so it is eligible for compile-time evaluation.
Use a constexpr
return type for functions that need to return a compile-time constant.
The compiler must be able to see the full definition of a constexpr function, not just a forward declaration.
Constexpr functions used in a single source file (.cpp) can be defined in the source file above where they are used.
Constexpr functions used in multiple source files should be defined in a header file so they can be included into each source file.
Allowing functions with a constexpr return type to be evaluated at either compile-time or runtime was allowed so that a single function can serve both cases.
Otherwise, you’d need to have separate functions (a function with a constexpr return type, and a function with a non-constexpr return type). This would not only require duplicate code, the two functions would also need to have different names!
A constexpr function is not allowed to call a non-constexpr function. If this were allowed, the constexpr function wouldn’t be able to evaluate at compile-time, which defeats the point of constexpr. Trying to do so will cause the compiler to produce a compilation error.
#include
constexpr int greater(int x, int y)
{
return (x > y ? x : y);
}
int main()
{
constexpr int g { greater(5, 6) }; // case 1: evaluated at compile-time
std::cout << g << " is greater!\n";
int x{ 5 }; // not constexpr
std::cout << greater(x, 6) << " is greater!\n"; // case 2: evaluated at runtime
std::cout << greater(5, 6) << " is greater!\n"; // case 3: may be evaluated at either runtime or compile-time
return 0;
}
Note that your compiler’s optimization level setting may have an impact on whether it decides to evaluate a function at compile-time or runtime. This also means that your compiler may make different choices for debug vs. release builds (as debug builds typically have optimizations turned off).
A constexpr function that is eligible to be evaluated at compile-time will only be evaluated at compile-time if the return value is used where a constant expression is required. Otherwise, compile-time evaluation is not guaranteed.
Thus, a constexpr function is better thought of as “can be used in a constant expression”, not “will be evaluated at compile-time”.
Prior to C++20, there are no standard language tools available to do this.
In C++20, std::is_constant_evaluated() (defined in the
#include
constexpr int someFunction()
{
if (std::is_constant_evaluated()) // if compile-time evaluation
// do something
else // runtime evaluation
// do something else
}
Used cleverly, you can have your function produce some observable difference (such as returning a special value) when evaluated at compile-time, and then infer how it evaluated from that result.
However, in C++20, there is a better workaround to this issue, which we’ll present in a moment.
C++20
C++20 introduces the keyword consteval, which is used to indicate that a function must evaluate at compile-time, otherwise a compile error will result. Such functions are called immediate functions.
#include
consteval int greater(int x, int y) // function is now consteval
{
return (x > y ? x : y);
}
int main()
{
constexpr int g { greater(5, 6) }; // ok: will evaluate at compile-time
std::cout << greater(5, 6) << " is greater!\n"; // ok: will evaluate at compile-time
int x{ 5 }; // not constexpr
std::cout << greater(x, 6) << " is greater!\n"; // error: consteval functions must evaluate at compile-time
return 0;
}
Just like constexpr functions, consteval functions are implicitly inline.
Use consteval if you have a function that must run at compile-time for some reason (e.g. performance).
C++20
An unnamed namespace (also called an anonymous namespace) is a namespace that is defined without a name, like so:
#include
namespace // unnamed namespace
{
void doSomething() // can only be accessed in this file
{
std::cout << "v1\n";
}
}
int main()
{
doSomething(); // we can call doSomething() without a namespace prefix
return 0;
}
For functions, this is effectively the same as defining all functions in the unnamed namespace as static functions. The following program is effectively identical to the one above:
#include
static void doSomething() // can only be accessed in this file
{
std::cout << "v1\n";
}
int main()
{
doSomething(); // we can call doSomething() without a namespace prefix
return 0;
}
Unnamed namespaces are typically used when you have a lot of content that you want to ensure stays local to a given file, as it’s easier to cluster such content in an unnamed namespace than individually mark all declarations as static. Unnamed namespaces will also keep user-defined types (something we’ll discuss in a later lesson) local to the file, something for which there is no alternative equivalent mechanism to do.
An inline namespace is a namespace that is typically used to version content. Much like an unnamed namespace, anything declared inside an inline namespace is considered part of the parent namespace. However, inline namespaces don’t give everything internal linkage.
#include
inline namespace v1 // declare an inline namespace named v1
{
void doSomething()
{
std::cout << "v1\n";
}
}
namespace v2 // declare a normal namespace named v2
{
void doSomething()
{
std::cout << "v2\n";
}
}
int main()
{
v1::doSomething(); // calls the v1 version of doSomething()
v2::doSomething(); // calls the v2 version of doSomething()
doSomething(); // calls the inline version of doSomething() (which is v1)
return 0;
}
This prints:
v1
v2
v1
In the above example, callers to doSomething will get the v1 (the inline version) of doSomething. Callers who want to use the newer version can explicitly call v2::dosomething(). This preserves the function of existing programs while allowing newer programs to take advantage of newer/better variations.
Avoid non-const global variables whenever possible. Const globals are generally seen as acceptable. Use inline variables for global constants if your compiler is C++17 capable.
Local variables can be given static duration via the static keyword.
A qualified name is a name that includes an associated scope (e.g. std::string). An unqualified name is a name that does not include a scoping qualifier (e.g. string).
Inline functions were originally designed as a way to request that the compiler replace your function call with inline expansion of the function code. You should not need to use the inline keyword for this purpose because the compiler will generally determine this for you. In modern C++, the inline
keyword is used to exempt a function from the one-definition rule, allowing its definition to be imported into multiple code files. Inline functions are typically defined in header files so they can be #included into any code files that needs them.
C++20 introduces the keyword consteval, which is used to indicate that a function must evaluate at compile-time, otherwise a compile error will result. Such functions are called immediate functions.
Finally, C++ supports unnamed namespaces, which implicitly treat all contents of the namespace as if it had internal linkage. C++ also supports inline namespaces, which provide some primitive versioning capabilities for namespaces.
The specific sequence of statements that the CPU executes is called the program’s execution path (or path, for short).
Straight-line programs take the same path (execute the same statements in the same order) every time they are run.
When a control flow statement
causes point of execution to change to a non-sequential statement, this is called branching.
This is where the real fun begins. So let’s get to it!
Consider putting single statements associated with an if or else in blocks (particularly while you are learning). More experienced C++ developers sometimes disregard this practice in favor of tighter vertical spacing.
A middle-ground alternative is to put single-lines on the same line as the if or else:
if (age >= 21) purchaseBeer();
This avoids both of the above downsides mentioned above at some minor cost to readability.
If the programmer does not declare a block in the statement portion of an if statement or else statement, the compiler will implicitly declare one. Thus:
if (condition)
true_statement;
else
false_statement;
is actually the equivalent of:
if (condition)
{
true_statement;
}
else
{
false_statement;
}
A null statement is an expression statement that consists of just a semicolon:
if (x > 10)
; // this is a null statement
Be careful not to “terminate” your if statement with a semicolon, otherwise your conditional statement(s) will execute unconditionally (even if they are inside a block).
Because testing a variable or expression for equality against a set of different values is common, C++ provides an alternative conditional statement called a switch statement that is specialized for this purpose.
Prefer switch statements over if-else chains when there is a choice.
The one restriction is that the condition must evaluate to an integral type or an enumerated type , or be convertible to one.Expressions that evaluate to floating point types, strings, and most other non-integral types may not be used here.
Why does the switch type only allow for integral (or enumerated) types? The answer is because switch statements are designed to be highly optimized. Historically, the most common way for compilers to implement switch statements is via Jump tables – and jump tables only work with integral values.
For those of you already familiar with arrays, a jump table works much like an array, an integral value is used as the array index to “jump” directly to a result. This can be much more efficient than doing a bunch of sequential comparisons.
Of course, compilers don’t have to implement switches using jump tables, and sometimes they don’t. There is technically no reason that C++ couldn’t relax the restriction so that other types could be used as well, they just haven’t done so yet (as of C++20).
#include
void printDigitName(int x)
{
switch (x) // x is evaluated to produce value 2
{
case 1:
std::cout << "One";
return;
case 2: // which matches the case statement here
std::cout << "Two"; // so execution starts here
return; // and then we return to the caller
case 3:
std::cout << "Three";
return;
default:
std::cout << "Unknown";
return;
}
}
int main()
{
printDigitName(2);
std::cout << '\n';
return 0;
}
switch (x)
{
case 54:
case 54: // error: already used value 54!
case '6': // error: '6' converts to integer value 54, which is already used
}
If the conditional expression does not match any of the case labels, no cases are executed. We’ll show an example of this shortly.
The default label is optional, and there can only be one default label per switch statement. By convention, the default case is placed last in the switch block.
Place the default case last in the switch block.
#include
void printDigitName(int x)
{
switch (x) // x evaluates to 3
{
case 1:
std::cout << "One";
break;
case 2:
std::cout << "Two";
break;
case 3:
std::cout << "Three"; // execution starts here
break; // jump to the end of the switch block
default:
std::cout << "Unknown";
break;
}
// execution continues here
std::cout << " Ah-Ah-Ah!";
}
int main()
{
printDigitName(3);
std::cout << '\n';
return 0;
}
Each set of statements underneath a label should end in a break statement or a return statement. This includes the statements underneath the last label in the switch.
This is probably not what we wanted! When execution flows from a statement underneath a label into statements underneath a subsequent label, this is called fallthrough.
Once the statements underneath a case or default label have started executing, they will overflow (fallthrough) into subsequent cases. Break or return statements are typically used to prevent this.
The [[fallthrough]] attribute modifies a null statement to indicate that fallthrough is intentional (and no warnings should be triggered):
#include
int main()
{
switch (2)
{
case 1:
std::cout << 1 << '\n';
break;
case 2:
std::cout << 2 << '\n'; // Execution begins here
[[fallthrough]]; // intentional fallthrough -- note the semicolon to indicate the null statement
case 3:
std::cout << 3 << '\n'; // This is also executed
break;
}
return 0;
}
This program prints:
2
3
Use the [[fallthrough]] attribute (along with a null statement) to indicate intentional fallthrough.
You can use the logical OR operator to combine multiple tests into a single statement:
bool isVowel(char c)
{
return (c=='a' || c=='e' || c=='i' || c=='o' || c=='u' ||
c=='A' || c=='E' || c=='I' || c=='O' || c=='U');
}
This suffers …
You can do something similar using switch statements by placing multiple case labels in sequence:
bool isVowel(char c)
{
switch (c)
{
case 'a': // if c is 'a'
case 'e': // or if c is 'e'
case 'i': // or if c is 'i'
case 'o': // or if c is 'o'
case 'u': // or if c is 'u'
case 'A': // or if c is 'A'
case 'E': // or if c is 'E'
case 'I': // or if c is 'I'
case 'O': // or if c is 'O'
case 'U': // or if c is 'U'
return true;
default:
return false;
}
}
Thus, we can “stack” case labels to make all of those case labels share the same set of statements afterward. This is not considered fallthrough behavior, so use of comments or [[fallthrough]] is not needed here.
However, with switch statements, the statements after labels are all scoped to the switch block. No implicit blocks are created.
switch (1)
{
case 1: // does not create an implicit block
foo(); // this is part of the switch scope, not an implicit block to case 1
break; // this is part of the switch scope, not an implicit block to case 1
default:
std::cout << "default case\n";
break;
}
In the above example, the 2 statements between the case 1 and the default label are scoped as part of the switch block, not a block implicit to case 1.
If a case needs to define and/or initialize a new variable, the best practice is to do so inside an explicit block underneath the case statement:
switch (1)
{
case 1:
{ // note addition of explicit block here
int x{ 4 }; // okay, variables can be initialized inside a block inside a case
std::cout << x;
break;
}
default:
std::cout << "default case\n";
break;
}
If defining variables used in a case statement, do so in a block inside the case.
One notable exception is when you need to exit a nested loop but not the entire function – in such a case, a goto to just beyond the loops is probably the cleanest solution.
Avoid goto statements (unless the alternatives are significantly worse for code readability).
Favor while(true) for intentional infinite loops.
Often, we want a loop to execute a certain number of times. To do this, it is common to use a loop variable, often called a counter.
Loop variables should be of type (signed) int.
Each time a loop executes, it is called an iteration.
It is also possible to nest loops inside of other loops. In the following example, the nested loop (which we’re calling the inner loop) and the outer loop each have their own counters. Note that the loop expression for the inner loop makes use of the outer loop’s counter as well!
#include
int main()
{
// outer loops between 1 and 5
int outer{ 1 };
while (outer <= 5)
{
// For each iteration of the outer loop, the code in the body of the loop executes once
// inner loops between 1 and outer
int inner{ 1 };
while (inner <= outer)
{
std::cout << inner << ' ';
++inner;
}
// print a newline at the end of each row
std::cout << '\n';
++outer;
}
return 0;
}
This program prints:
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
Now make the numbers print like this:
1
2 1
3 2 1
4 3 2 1
5 4 3 2 1
A do while statement is a looping construct that works just like a while loop, except the statement always executes at least once.
In practice, do-while loops aren’t commonly used. Having the condition at the bottom of the loop obscures the loop condition, which can lead to errors. Many developers recommend avoiding do-while loops altogether as a result. We’ll take a softer stance and advocate for preferring while loops over do-while when given an equal choice.
Favor while loops over do-while when given an equal choice.
Avoid operator!= when doing numeric comparisons in the for-loop condition.
Off-by-one errors occur when the loop iterates one too many or one too few times to produce the desired result.
Although you do not see it very often, it is worth noting that the following example produces an infinite loop:
for (;;)
statement;
The above example is equivalent to:
while (true)
statement;
This might be a little unexpected, as you’d probably expect an omitted condition-expression to be treated as false. However, the C++ standard explicitly (and inconsistently) defines that an omitted condition-expression in a for loop should be treated as true.
We recommend avoiding this form of the for loop altogether and using while(true) instead.
Defining multiple variables (in the init-statement) and using the comma operator (in the end-expression) is acceptable inside a for statement
.
Prefer for loops over while loops when there is an obvious loop variable.
Prefer while loops over for loops when there is no obvious loop variable.
Many textbooks caution readers not to use break and continue in loops, both because it causes the execution flow to jump around, and because it can make the flow of logic harder to follow. For example, a break in the middle of a complicated piece of logic could either be missed, or it may not be obvious under what conditions it should be triggered.
However, used judiciously, break and continue can help make loops more readable by keeping the number of nested blocks down and reducing the need for complicated looping logic.
For example, consider the following program:
#include
int main()
{
int count{ 0 }; // count how many times the loop iterates
bool keepLooping { true }; // controls whether the loop ends or not
while (keepLooping)
{
std::cout << "Enter 'e' to exit this loop or any other character to continue: ";
char ch{};
std::cin >> ch;
if (ch == 'e')
keepLooping = false;
else
{
++count;
std::cout << "We've iterated " << count << " times\n";
}
}
return 0;
}
This program uses a boolean variable to control whether the loop continues or not, as well as a nested block that only runs if the user doesn’t exit.
Here’s a version that’s easier to understand, using a break statement
:
#include
int main()
{
int count{ 0 }; // count how many times the loop iterates
while (true) // loop until user terminates
{
std::cout << "Enter 'e' to exit this loop or any other character to continue: ";
char ch{};
std::cin >> ch;
if (ch == 'e')
break;
++count;
std::cout << "We've iterated " << count << " times\n";
}
return 0;
}
In this version, by using a single break statement, we’ve avoided the use of a Boolean variable (and having to understand both what its intended use is, and where its value is changed), an else statement, and a nested block.
Minimizing the number of variables used and keeping the number of nested blocks down both improve code comprehensibility more than a break or continue harms it. For that reason, we believe judicious use of break or continue is acceptable.
Use break and continue when they simplify your loop logic.
Our stance is that early returns are more helpful than harmful, but we recognize that there is a bit of art to the practice.
Use early returns when they simplify your function’s logic.
The last category of flow control statement we’ll cover in this chapter is the halt. A halt is a flow control statement that terminates the program. In C++, halts are implemented as functions (rather than keywords), so our halt statements will be function calls.
std::exit() is a function that causes the program to terminate normally. Normal termination means the program has exited in an expected way. Note that the term normal termination does not imply anything about whether the program was successful (that’s what the status code is for). For example, let’s say you were writing a program where you expected the user to type in a filename to process. If the user typed in an invalid filename, your program would probably return a non-zero status code to indicate the failure state, but it would still have a normal termination.
std::exit() performs a number of cleanup functions. First, objects with static storage duration are destroyed. Then some other miscellaneous file cleanup is done if any files were used. Finally, control is returned back to the OS, with the argument passed to std::exit() used as the status code.
The std::exit()
function does not clean up local variables in the current function or up the call stack.
Because std::exit() terminates the program immediately, you may want to manually do some cleanup before terminating. In this context, cleanup means things like closing database or network connections, deallocating any memory you have allocated, writing information to a log file, etc…
In the above example, we called function cleanup() to handle our cleanup tasks. However, remembering to manually call a cleanup function before calling every call to exit() adds burden to the programmer.
To assist with this, C++ offers the std::atexit() function, which allows you to specify a function that will automatically be called on program termination via std::exit().
#include // for std::exit()
#include
void cleanup()
{
// code here to do any kind of cleanup required
std::cout << "cleanup!\n";
}
int main()
{
// register cleanup() to be called automatically when std::exit() is called
std::atexit(cleanup); // note: we use cleanup rather than cleanup() since we're not making a function call to cleanup() right now
std::cout << 1 << '\n';
std::exit(0); // terminate and return status code 0 to operating system
// The following statements never execute
std::cout << 2 << '\n';
return 0;
}
In multi-threaded programs, calling std::exit() can cause your program to crash (because the thread calling std::exit() will cleanup static objects that may still be accessed by other threads). For this reason, C++ has introduced another pair of functions that work similarly to std::exit() and std::atexit() called std::quick_exit() and std::at_quick_exit(). std::quick_exit() terminates the program normally, but does not clean up static objects, and may or may not do other types of cleanup. std::at_quick_exit() performs the same role as std::atexit() for programs terminated with std::quick_exit().
C++ contains two other halt-related functions.
The std::abort()
function causes your program to terminate abnormally.
The std::terminate()
function is typically used in conjunction with exceptions (we’ll cover exceptions in a later chapter). Although std::terminate can be called explicitly, it is more often called implicitly when an exception isn’t handled (and in a few other exception-related cases). By default, std::terminate() calls std::abort().
The short answer is “almost never”. Destroying local objects is an important part of C++ (particularly when we get into classes), and none of the above-mentioned functions clean up local variables. Exceptions are a better and safer mechanism for handling error cases.
Only use a halt if there is no safe way to return normally from the main function. If you haven’t disabled exceptions, prefer using exceptions for handling errors safely.
Just because your program worked for one set of inputs doesn’t mean it’s going to work correctly in all cases.
Software testing (also called software validation) is the process of determining whether or not the software actually works as expected.
There’s a lot that can be written about testing methodologies – in fact, we could write a whole chapter on it. But since it’s not a C++ specific topic, we’ll stick to a brief and informal introduction, covered from the point of view of you (as the developer) testing your own code. In the next few subsections, we’ll talk about some practical things you should be thinking about as you test your code.
Testing a small part of your code in isolation to ensure that “unit” of code is correct is called unit testing. Each unit test is designed to ensure that a particular behavior of the unit is correct.
Write your program in small, well defined units (functions or classes), compile often, and test your code as you go.
#include
bool isLowerVowel(char c)
{
switch (c)
{
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
return true;
default:
return false;
}
}
// Not called from anywhere right now
// But here if you want to retest things later
void testVowel()
{
std::cout << isLowerVowel('a'); // temporary test code, should produce 1
std::cout << isLowerVowel('q'); // temporary test code, should produce 0
}
int main()
{
return 0;
}
As you create more tests, you can simply add them to the testVowel() function.
We can do better by writing a test function that contains both the tests AND the expected answers and compares them so we don’t have to.
#include
bool isLowerVowel(char c)
{
switch (c)
{
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
return true;
default:
return false;
}
}
// returns the number of the test that failed, or 0 if all tests passed
int testVowel()
{
if (!isLowerVowel('a')) return 1;
if (isLowerVowel('q')) return 2;
return 0;
}
int main()
{
return 0;
}
Now, you can call testVowel() at any time to re-prove that you haven’t broken anything, and the test routine will do all the work for you, returning either an “all good” signal (return value 0), or the test number that didn’t pass, so you can investigate why it broke. This is particularly useful when going back and modifying old code, to ensure you haven’t accidentally broken anything!
Because writing functions to exercise other functions is so common and useful, there are entire frameworks (called unit testing frameworks) that are designed to help simplify the process of writing, maintaining, and executing unit tests. Since these involve third party software, we won’t cover them here, but you should be aware they exist.
Once each of your units has been tested in isolation, they can be integrated into your program and retested to make sure they were integrated properly. This is called an integration test. Integration testing tends to be more complicated – for now, running your program a few times and spot checking the behavior of the integrated unit will suffice.
The term code coverage is used to describe how much of the source code of a program is executed while testing. There are many different metrics used for code coverage. We’ll cover a few of the more useful and popular ones in the following sections.
The term statement coverage refers to the percentage of statements in your code that have been exercised by your testing routines.
For our isLowerVowel() function:
bool isLowerVowel(char c)
{
switch (c) // statement 1
{
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
return true; // statement 2
default:
return false; // statement 3
}
}
This function will require two calls to test all of the statements, as there is no way to reach statement 2 and 3 in the same function call.
While aiming for 100% statement coverage is good, it’s not enough to ensure correctness.
Branch coverage refers to the percentage of branches that have been executed, each possible branch counted separately.
Aim for 100% branch coverage of your code.
Loop coverage (informally called the 0, 1, 2 test) says that if you have a loop in your code, you should ensure it works properly when it iterates 0 times, 1 time, and 2 times.
Use the 0, 1, 2 test to ensure your loops work correctly with different number of iterations.
Test different categories of input values to make sure your unit handles them properly.
From lesson 5.7 – Logical operators, the following program makes an operator precedence mistake:
#include
int main()
{
int x{ 5 };
int y{ 7 };
if (!x > y) // oops: operator precedence issue
std::cout << x << " is not greater than " << y << '\n';
else
std::cout << x << " is greater than " << y << '\n';
return 0;
}
Because logical NOT has higher precedence than operator>, the conditional evaluates as if it was written (!x) > y, which isn’t what the programmer intended.
As a result, this program prints:
5 is greater than 7
This can also happen when mixing Logical OR and Logical AND in the same expression (Logical AND takes precedence over Logical OR). Use explicit parenthesization to avoid these kinds of errors.
The following floating point variable doesn’t have enough precision to store the entire number:
#include
int main()
{
float f{ 0.123456789f };
std::cout << f << '\n';
return 0;
}
Because of this lack of precision, the number is rounded slightly:
0.123457
In lesson 5.6 – Relational operators and floating point comparisons, we talked about how using operator== and operator!= can be problematic with floating point numbers due to small rounding errors (as well as what to do about it). Here’s an example:
#include
int main()
{
double d{ 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 }; // should sum to 1.0
if (d == 1.0)
std::cout << "equal\n";
else
std::cout << "not equal\n";
return 0;
}
This program prints:
not equal
The more arithmetic you do with a floating point number, the more it will accumulate small rounding errors.
In lesson 7.3 – Common if statement problems, we covered null statements, which are statements that do nothing.
In the below program, we only want to blow up the world if we have the user’s permission:
#include
void blowUpWorld()
{
std::cout << "Kaboom!\n";
}
int main()
{
std::cout << "Should we blow up the world again? (y/n): ";
char c{};
std::cin >> c;
if (c=='y'); // accidental null statement here
blowUpWorld(); // so this will always execute since it's not part of the if-statement
return 0;
}
However, because of an accidental null statement, the function call to blowUpWorld() is always executed, so we blow it up regardless:
Should we blow up the world again? (y/n): n
Kaboom!
Functions may fail for any number of reasons – the caller may have passed in an argument with an invalid value, or something may fail within the body of the function. For example, a function that opens a file for reading might fail if the file cannot be found.
When this happens, you have quite a few options at your disposal. There is no best way to handle an error – it really depends on the nature of the problem and whether the problem can be fixed or not.
There are 4 general strategies that can be used:
If the error is so bad that the program can not continue to operate properly, this is called a non-recoverable error (also called a fatal error). In such cases, the best thing to do is terminate the program. If your code is in main() or a function called directly from main(), the best thing to do is let main() return a non-zero status code. However, if you’re deep in some nested subfunction, it may not be convenient or possible to propagate the error all the way back to main(). In such a case, a halt statement (such as std::exit()) can be used.
For example:
double doDivision(int x, int y)
{
if (y == 0)
{
std::cerr << "Error: Could not divide by zero\n";
std::exit(1);
}
return static_cast<double>(x) / y;
}
Because returning an error from a function back to the caller is complicated (and the many different ways to do so leads to inconsistency, and inconsistency leads to mistakes), C++ offers an entirely separate way to pass errors back to the caller: exceptions
.
The basic idea is that when an error occurs, an exception is “thrown”. If the current function does not “catch” the error, the caller of the function has a chance to catch the error. If the caller does not catch the error, the caller’s caller has a chance to catch the error. The error progressively moves up the call stack until it is either caught and handled (at which point execution continues normally), or until main() fails to handle the error (at which point the program is terminated with an exception error).
We cover exception handling in chapter 20 of this tutorial series.
A program that handles error cases well is said to be robust.
Extraction fails if the input data does not match the type of the variable being extracted to. For example:
int x{};
std::cin >> x;
If the user were to enter ‘b’, extraction would fail because ‘b’ can not be extracted to an integer variable.
char getOperator()
{
while (true) // Loop until user enters a valid input
{
std::cout << "Enter one of the following: +, -, *, or /: ";
char operation{};
std::cin >> operation;
// Check whether the user entered meaningful input
switch (operation)
{
case '+':
case '-':
case '*':
case '/':
return operation; // return it to the caller
default: // otherwise tell the user what went wrong
std::cerr << "Oops, that input is invalid. Please try again.\n";
}
} // and try again
}
Although the above program works, the execution is messy. It would be better if any extraneous characters entered were simply ignored. Fortunately, it’s easy to ignore characters:
std::cin.ignore(100, '\n'); // clear up to 100 characters out of the buffer, or until a '\n' character is removed
This call would remove up to 100 characters, but if the user entered more than 100 characters we’ll get messy output again. To ignore all characters up to the next ‘\n’, we can pass std::numeric_limitsstd::streamsize::max() to std::cin.ignore(). std::numeric_limitsstd::streamsize::max() returns the largest value that can be stored in a variable of type std::streamsize. Passing this value to std::cin.ignore() causes it to disable the count check.
To ignore everything up to and including the next ‘\n’ character, we call
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
Because this line is quite long for what it does, it’s handy to wrap it in a function which can be called in place of std::cin.ignore().
#include // for std::numeric_limits
void ignoreLine()
{
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
Since the last character the user entered must be a ‘\n’, we can tell std::cin to ignore buffered characters until it finds a newline character (which is removed as well).
Let’s update our getDouble() function to ignore any extraneous input:
double getDouble()
{
std::cout << "Enter a double value: ";
double x{};
std::cin >> x;
ignoreLine();
return x;
}
Now our program will work as expected, even if we enter “5*7” for the first input – the 5 will be extracted, and the rest of the characters will be removed from the input buffer. Since the input buffer is now empty, the user will be properly asked for input the next time an extraction operation is performed!
Some lessons still pass 32767 to std::cin.ignore(). This is a magic number with no special meaning to std::cin.ignore() and should be avoided. If you see such an occurrence, feel free to point it out.
When the user enters ‘a’, that character is placed in the buffer. Then operator>> tries to extract ‘a’ to variable x, which is of type double. Since ‘a’ can’t be converted to a double, operator>> can’t do the extraction. Two things happen at this point: ‘a’ is left in the buffer, and std::cin goes into “failure mode”.
Once in “failure mode”, future requests for input extraction will silently fail. Thus in our calculator program, the output prompts still print, but any requests for further extraction are ignored. This means that instead waiting for us to enter an operation, the input prompt is skipped, and we get stuck in an infinite loop because there is no way to reach one of the valid cases.
Fortunately, we can detect whether an extraction has failed:
if (std::cin.fail()) // has a previous extraction failed?
{
// yep, so let's handle the failure
std::cin.clear(); // put us back in 'normal' operation mode
ignoreLine(); // and remove the bad input
}
Because std::cin has a Boolean conversion indicating whether the last input succeeded, it’s more idiomatic to write the above as following:
if (!std::cin) // has a previous extraction failed?
{
// yep, so let's handle the failure
std::cin.clear(); // put us back in 'normal' operation mode
ignoreLine(); // and remove the bad input
}
Let’s integrate that into our getDouble() function:
double getDouble()
{
while (true) // Loop until user enters a valid input
{
std::cout << "Enter a double value: ";
double x{};
std::cin >> x;
if (!std::cin) // has a previous extraction failed?
{
// yep, so let's handle the failure
std::cin.clear(); // put us back in 'normal' operation mode
ignoreLine(); // and remove the bad input
}
else // else our extraction succeeded
{
ignoreLine();
return x; // so return the value we extracted
}
}
}
A failed extraction due to invalid input will cause the variable to be zero-initialized. Zero initialization means the variable is set to 0, 0.0, “”, or whatever value 0 converts to for that type.
#include
#include
int main()
{
std::int16_t x{}; // x is 16 bits, holds from -32768 to 32767
std::cout << "Enter a number between -32768 and 32767: ";
std::cin >> x;
std::int16_t y{}; // y is 16 bits, holds from -32768 to 32767
std::cout << "Enter another number between -32768 and 32767: ";
std::cin >> y;
std::cout << "The sum is: " << x + y << '\n';
return 0;
}
What happens if the user enters a number that is too large (e.g. 40000)?
Enter a number between -32768 and 32767: 40000
Enter another number between -32768 and 32767: The sum is: 32767
In the above case, std::cin goes immediately into “failure mode”, but also assigns the closest in-range value to the variable. Consequently, x is left with the assigned value of 32767. Additional inputs are skipped, leaving y with the initialized value of 0. We can handle this kind of error in the same way as a failed extraction.
Here’s our example calculator, updated with a few additional bits of error checking:
#include
#include
void ignoreLine()
{
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
double getDouble()
{
while (true) // Loop until user enters a valid input
{
std::cout << "Enter a double value: ";
double x{};
std::cin >> x;
// Check for failed extraction
if (!std::cin) // has a previous extraction failed?
{
// yep, so let's handle the failure
std::cin.clear(); // put us back in 'normal' operation mode
ignoreLine(); // and remove the bad input
std::cerr << "Oops, that input is invalid. Please try again.\n";
}
else
{
ignoreLine(); // remove any extraneous input
return x;
}
}
}
char getOperator()
{
while (true) // Loop until user enters a valid input
{
std::cout << "Enter one of the following: +, -, *, or /: ";
char operation{};
std::cin >> operation;
ignoreLine(); // // remove any extraneous input
// Check whether the user entered meaningful input
switch (operation)
{
case '+':
case '-':
case '*':
case '/':
return operation; // return it to the caller
default: // otherwise tell the user what went wrong
std::cerr << "Oops, that input is invalid. Please try again.\n";
}
} // and try again
}
void printResult(double x, char operation, double y)
{
switch (operation)
{
case '+':
std::cout << x << " + " << y << " is " << x + y << '\n';
break;
case '-':
std::cout << x << " - " << y << " is " << x - y << '\n';
break;
case '*':
std::cout << x << " * " << y << " is " << x * y << '\n';
break;
case '/':
std::cout << x << " / " << y << " is " << x / y << '\n';
break;
default: // Being robust means handling unexpected parameters as well, even though getOperator() guarantees operation is valid in this particular program
std::cerr << "Something went wrong: printResult() got an invalid operator.\n";
}
}
int main()
{
double x{ getDouble() };
char operation{ getOperator() };
double y{ getDouble() };
printResult(x, operation, y);
return 0;
}
As you write your programs, consider how users will misuse your program, especially around text input. For each point of text input, consider:
The following code will clear any extraneous input:
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
The following code will test for and fix failed extractions or overflow:
if (!std::cin) // has a previous extraction failed or overflowed?
{
// yep, so let's handle the failure
std::cin.clear(); // put us back in 'normal' operation mode
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); // and remove the bad input
}
Finally, use loops to ask the user to re-enter input if the original input was invalid.
Input validation is important and useful, but it also tends to make examples more complicated and harder to follow. Accordingly, in future lessons, we will generally not do any kind of input validation unless it’s relevant to something we’re trying to teach.
If the program terminates (via std::exit
) then we will have lost our call stack and any debugging information that might help us isolate the problem. std::abort
is a better option for such cases, as typically the developer will be given the option to start debugging at the point where the program aborted.
In programming, a precondition is any condition that must always be true prior to the execution of component of code. Our check of y is a precondition that ensures y has a valid value before the function continues.
It’s more common for functions with preconditions to be written like this:
void printDivision(int x, int y)
{
if (y == 0)
{
std::cerr << "Error: Could not divide by zero\n";
return;
}
std::cout << static_cast<double>(x) / y;
}
An invariant is a condition that must be true while some component is executing.
Similarly, a postcondition is something that must be true after the execution of some component of code. Our function doesn’t have any postconditions.
An assertion is an expression that will be true unless there is a bug in the program.
When an assertion evaluates to false, your program is immediately stopped. This gives you an opportunity to use debugging tools to examine the state of your program and determine why the assertion failed. Working backwards, you can then find and fix the issue.
Without an assertion to detect an error and fail, such an error would likely cause your program to malfunction later. In such cases, it can be very difficult to determine where things are going wrong, or what the root cause of the issue actually is.
In C++, runtime assertions are implemented via the assert preprocessor macro, which lives in the header.
#include // for assert()
#include // for std::sqrt
#include
double calculateTimeUntilObjectHitsGround(double initialHeight, double gravity)
{
assert(gravity > 0.0); // The object won't reach the ground unless there is positive gravity.
if (initialHeight <= 0.0)
{
// The object is already on the ground. Or buried.
return 0.0;
}
return std::sqrt((2.0 * initialHeight) / gravity);
}
int main()
{
std::cout << "Took " << calculateTimeUntilObjectHitsGround(100.0, -9.8) << " second(s)\n";
return 0;
}
When the program calls calculateTimeUntilObjectHitsGround(100.0, -9.8), assert(gravity > 0.0) will evaluate to false, which will trigger the assert. That will print a message similar to this:
dropsimulator: src/main.cpp:6: double calculateTimeUntilObjectHitsGround(double, double): Assertion 'gravity > 0.0' failed.
The actual message varies depending on which compiler you use.
Although asserts are most often used to validate function parameters, they can be used anywhere you would like to validate that something is true.
Although we told you previously to avoid preprocessor macros, asserts are one of the few preprocessor macros that are considered acceptable to use. We encourage you to use assert statements liberally throughout your code.
Simply add a string literal joined by a logical AND:
assert(found && "Car could not be found in database");
However, when the assert triggers, the string literal will be included in the assert message:
Assertion failed: found && "Car could not be found in database", file C:\\VCProjects\\Test.cpp, line 34
That gives you some additional context as to what went wrong.
Use assertions to document cases that should be logically impossible.
The assert macro comes with a small performance cost that is incurred each time the assert condition is checked. Furthermore, asserts should (ideally) never be encountered in production code (because your code should already be thoroughly tested). Consequently, many developers prefer that asserts are only active in debug builds. C++ comes with a way to turn off asserts in production code. If the macro NDEBUG
is defined, the assert macro gets disabled.
C++ also has another type of assert called static_assert. A static_assert is an assertion that is checked at compile-time rather than at runtime, with a failing static_assert causing a compile error. Unlike assert, which is declared in the header, static_assert is a keyword, so no header needs to be included to use it.
A static_assert takes the following form:
static_assert(condition, diagnostic_message)
If the condition is not true, the diagnostic message is printed. Here’s an example of using static_assert to ensure types have a certain size:
static_assert(sizeof(long) == 8, "long must be 8 bytes");
static_assert(sizeof(int) == 4, "int must be 4 bytes");
int main()
{
return 0;
}
On the author’s machine, when compiled, the compiler errors:
1>c:\consoleapplication1\main.cpp(19): error C2338: long must be 8 bytes
Because static_assert is evaluated by the compiler, the condition must be able to be evaluated at compile time. Also, unlike normal assert (which is evaluated at runtime), static_assert can be placed anywhere in the code file (even in the global namespace).
Prior to C++17, the diagnostic message must be supplied as the second parameter. Since C++17, providing a diagnostic message is optional.
An algorithm is a finite sequence of instructions that can be followed to solve some problem or produce some useful result.
An algorithm is considered to be stateful if it retains some information across calls. Conversely, a stateless algorithm does not store any information (and must be given all the information it needs to work with when it is called).
When applied to algorithms, the term state refers to the current values held in stateful variables.
It’s easy to write a basic PRNG algorithm. Here’s a short PRNG example that generates 100 16-bit pseudo-random numbers:
#include
// For illustrative purposes only, don't use this
unsigned int LCG16() // our PRNG
{
static unsigned int s_state{ 5323 };
// Generate the next number
// Due to our use of large constants and overflow, it would be
// hard for someone to casually predict what the next number is
// going to be from the previous one.
s_state = 8253729 * s_state + 2396403; // first we modify the state
return s_state % 32768; // then we use the new state to generate the next number in the sequence
}
int main()
{
// Print 100 random numbers
for (int count{ 1 }; count <= 100; ++count)
{
std::cout << LCG16() << '\t';
// If we've printed 10 numbers, start a new row
if (count % 10 == 0)
std::cout << '\n';
}
return 0;
}
As it turns out, this particular algorithm isn’t very good as a random number generator. But most PRNGs work similarly to LCG16() – they just typically use more state variables and more complex mathematical operations in order to generate better quality results.
All of the values that a PRNG will produce are deterministically calculated from the seed value(s).
Over the years, many different kinds of PRNG algorithms have been developed (Wikipedia has a good list here). Every PRNG algorithm has strengths and weaknesses that might make it more or less suitable for a particular applications, so selecting the right algorithm for your application is important.
Many PRNGs are now considered relatively poor by modern standards – and there’s no reason to use a PRNG that doesn’t perform well when it’s just as easy to use one that does.
The randomization capabilities in C++ are accessible via the header of the standard library. Within the random library, there are 6 PRNG families available for use (as of C++20):
There is zero reason to use knuth_b, default_random_engine, or rand() (which is a random number generator provided for compatibility with C).
As of C++20, the Mersenne Twister algorithm is the only PRNG that ships with C++ that has both decent performance and quality.
A test called PracRand is often used to assess the performance and quality of PRNGs (to determine whether they have different kinds of biases). You may also see references to SmallCrush, Crush or BigCrush – these are other tests that are sometimes used for the same purpose.
If you want to see what the output of Pracrand looks like, this website has output for all of the PRNGs that C++ supports as of C++20.
Probably. For most applications, Mersenne Twister is fine, both in terms of performance and quality.
However, it’s worth noting that by modern PRNG standards, Mersenne Twister is a bit outdated. The biggest issue with Mersenne Twister is that its results can be predicted after seeing 624 generated numbers, making it non-suitable for any application that requires non-predictability.
If you are developing an application that requires the highest quality random results (e.g. a statistical simulation), the fastest results, or one where non-predictability is important (e.g. cryptography), you’ll need to use a 3rd party library.
The Mersenne Twister PRNG, besides having a great name, is probably the most popular PRNG across all programming languages. Although it is a bit old by today’s standards, it generally produces quality results and has decent performance. The random library has support for two Mersenne Twister types:
Using Mersenne Twister is straightforward:
#include
#include // for std::mt19937
int main()
{
std::mt19937 mt{}; // Instantiate a 32-bit Mersenne Twister
// Print a bunch of random numbers
for (int count{ 1 }; count <= 40; ++count)
{
std::cout << mt() << '\t'; // generate a random number
// If we've printed 5 numbers, start a new row
if (count % 5 == 0)
std::cout << '\n';
}
return 0;
}
Since mt is a variable, you may be wondering what mt() means.
In lesson 4.17 – Introduction to std::string, we showed an example where we called the function name.length(), which invoked the length() function on std::string variable name.
mt() is a concise syntax for calling the function mt.operator(), which for these PRNG types has been defined to return the next random result in the sequence. The advantage of using operator() instead of a named function is that we don’t need to remember the function’s name, and the concise syntax is less typing.
The random library has many random numbers distributions, most of which you will never use unless you’re doing some kind of statistical analysis. But there’s one random number distribution that’s extremely useful: a uniform distribution is a random number distribution that produces outputs between two numbers X and Y (inclusive) with equal probability.
Here’s a similar program to the one above, using a uniform distribution to simulate the roll of a 6-sided dice:
#include
#include // for std::mt19937 and std::uniform_int_distribution
int main()
{
std::mt19937 mt{};
// Create a reusable random number generator that generates uniform numbers between 1 and 6
std::uniform_int_distribution die6{ 1, 6 }; // for C++14, use std::uniform_int_distribution<> die6{ 1, 6 };
// Print a bunch of random numbers
for (int count{ 1 }; count <= 40; ++count)
{
std::cout << die6(mt) << '\t'; // generate a roll of the die here
// If we've printed 10 numbers, start a new row
if (count % 10 == 0)
std::cout << '\n';
}
return 0;
}
There are only two noteworthy differences in this example compared to the previous one. First, we’ve created a uniform distribution variable (named die6) to generate numbers between 1 and 6. Second, instead of calling mt() to generate 32-bit unsigned integer random numbers, we’re now calling die6(mt) to generate a value between 1 and 6.
It turns out, we really don’t need our seed to be a random number – we just need to pick something that changes each time the program is run. Then we can use our PRNG to generate a unique sequence of pseudo-random numbers from that seed.
There are two methods that are commonly used to do this:
#include
#include // for std::mt19937
#include // for std::chrono
int main()
{
// Seed our Mersenne Twister using the
std::mt19937 mt{ static_cast<unsigned int>(
std::chrono::steady_clock::now().time_since_epoch().count()
) };
// Create a reusable random number generator that generates uniform numbers between 1 and 6
std::uniform_int_distribution die6{ 1, 6 }; // for C++14, use std::uniform_int_distribution<> die6{ 1, 6 };
// Print a bunch of random numbers
for (int count{ 1 }; count <= 40; ++count)
{
std::cout << die6(mt) << '\t'; // generate a roll of the die here
// If we've printed 10 numbers, start a new row
if (count % 10 == 0)
std::cout << '\n';
}
return 0;
}
std::chrono::high_resolution_clock is a popular choice instead of std::chrono::steady_clock. std::chrono::high_resolution_clock is the clock that uses the most granular unit of time, but it may use the system clock for the current time, which can be changed or rolled back by users. std::chrono::steady_clock may have a less granular tick time, but is the only clock with a guarantee that users can not adjust it.
#include
#include // for std::mt19937 and std::random_device
int main()
{
std::mt19937 mt{ std::random_device{}() };
// Create a reusable random number generator that generates uniform numbers between 1 and 6
std::uniform_int_distribution die6{ 1, 6 }; // for C++14, use std::uniform_int_distribution<> die6{ 1, 6 };
// Print a bunch of random numbers
for (int count{ 1 }; count <= 40; ++count)
{
std::cout << die6(mt) << '\t'; // generate a roll of the die here
// If we've printed 10 numbers, start a new row
if (count % 10 == 0)
std::cout << '\n';
}
return 0;
}
Use std::random_device to seed your PRNGs (unless it’s not implemented properly for your target compiler/architecture).
std::random_device{} creates a value-initialized temporary object of type std::random_device. The () then calls operator() on that temporary object, which returns a randomized value (which we use as an initializer for our Mersenne Twister)
It’s the equivalent of the calling the following function, which uses a syntax you should be more familiar with:
unsigned int getRandomDeviceValue()
{
std::random_device rd{}; // create a value initialized std::random_device object
return rd(); // return the result of operator() to the caller
}
Using std::random_device{}() allows us to get the same result without creating a named function or named variable, so it’s much more concise.
Because std::random_device is implementation defined, we can’t assume much about it. It may be expensive to access or it may cause our program to pause while waiting for more random numbers to become available. The pool of numbers that it draws from may also be depleted quickly, which would impact the random results for other applications requesting random numbers via the same method. For this reason, std::random_device is better used to seed other PRNGs rather than as a PRNG itself.
Only seed a given pseudo-random number generator once, and do not reseed it.
Here’s an example of a common mistake that new programmers make:
#include
#include
int getCard()
{
std::mt19937 mt{ std::random_device{}() }; // this gets created and seeded every time the function is called
std::uniform_int_distribution card{ 1, 52 };
return card(mt);
}
int main()
{
std::cout << getCard();
return 0;
}
In the getCard()
function, the random number generator is being created and seeded every time the function is called. This is inefficient at best, and will likely cause poor random results.
So if you initialize std::seed_seq with a single 32-bit integer (e.g. from std::random_device) and then initialize a Mersenne Twister with the std::seed_seq object, std::seed_seq will generate 620 bytes of additional seed data. The results won’t be amazingly high quality, but it’s better than nothing.
#include
#include
int main()
{
std::random_device rd;
std::seed_seq ss{ rd(), rd(), rd(), rd(), rd(), rd(), rd(), rd() }; // get 8 integers of random numbers from std::random_device for our seed
std::mt19937 mt{ ss }; // initialize our Mersenne Twister with the std::seed_seq
// Create a reusable random number generator that generates uniform numbers between 1 and 6
std::uniform_int_distribution die6{ 1, 6 }; // for C++14, use std::uniform_int_distribution<> die6{ 1, 6 };
// Print a bunch of random numbers
for (int count{ 1 }; count <= 40; ++count)
{
std::cout << die6(mt) << '\t'; // generate a roll of the die here
// If we've printed 10 numbers, start a new row
if (count % 10 == 0)
std::cout << '\n';
}
return 0;
}
This is pretty straightforward so there isn’t much reason not to do this at a minimum.
You can! However, this may be slow, and risks depleting the pool of random numbers that std::random_device uses.
The seed_seq initialization used by std::mt19937 performs a warm up, so we don’t need to explicitly warm up std::mt19937 objects.
Visual Studio’s implementation of rand() had (or still has?) a bug where the first generated result would not be sufficiently randomized. You may see older programs that use rand() discard a single result as a way to avoid this issue.
What we really want is a single PRNG object that we can share and access anywhere, across all of our functions and files. The best option here is to create a global random number generator object (inside a namespace!). Remember how we told you to avoid non-const global variables? This is an exception.
Here’s a simple, header-only solution that you can #include in any code file that needs access to a randomized, self-seeded std::mt19937
:
Random.h:
#ifndef RANDOM_MT_H
#define RANDOM_MT_H
#include
#include
namespace Random
{
inline std::mt19937 init()
{
std::random_device rd;
// Create seed_seq with high-res clock and 7 random numbers from std::random_device
std::seed_seq ss{
static_cast<unsigned int>(std::chrono::steady_clock::now().time_since_epoch().count()),
rd(), rd(), rd(), rd(), rd(), rd(), rd() };
return std::mt19937{ ss };
}
inline std::mt19937 mt{ init() }; // here's our std::mt19937 PRNG object
// Generate a random number between [min, max] (inclusive)
inline int get(int min, int max)
{
// we can create a distribution in any function that needs it
std::uniform_int_distribution die{ min, max };
return die(mt); // and then generate a random number from our global generator
}
};
#endif
And a sample program showing how it is used:
main.cpp:
#include
#include "Random.h"
int main()
{
// Generate a random number between 1 and 6
std::cout << Random::get(1, 6) << '\n';
// Create a reusable random number generator that generates uniform numbers between 1 and 6
std::uniform_int_distribution die6{ 1, 6 }; // for C++14, use std::uniform_int_distribution<> die6{ 1, 6 };
// Print a bunch of random numbers
for (int count{ 1 }; count <= 10; ++count)
{
std::cout << die6(Random::mt) << '\t'; // generate a roll of the die here
}
return 0;
}
Programs that use random numbers can be difficult to debug because the program may exhibit different behaviors each time it is run. Sometimes it may work, and sometimes it may not. When debugging, it’s helpful to ensure your program executes the same (incorrect) way each time. That way, you can run the program as many times as needed to isolate where the error is.
For this reason, when debugging, it’s a useful technique to seed your PRNG with a specific value (e.g. 5) that causes the erroneous behavior to occur. This will ensure your program generates the same results each time, making debugging easier. Once you’ve found the error, you can use your normal seeding method to start generating randomized results again.
Halts allow us to terminate our program. Normal termination means the program has exited in an expected way (and the status code will indicate whether it succeeded or not). std::exit() is automatically called at the end of main, or it can be called explicitly to terminate the program. It does some cleanup, but does not cleanup any local variables, or unwind the call stack.
Scope creep occurs when a project’s capabilities grow beyond what was originally intended at the start of the project or project phase.
A pseudo-random number generator (PRNG) is an algorithm that generates a sequence of numbers whose properties simulate a sequence of random numbers. When a PRNG is instantiated, an initial value (or set of values) called a random seed (or seed for short) can be provided to initialize the state of the PRNG. When a PRNG has been initialized with a seed, we say it has been seeded. The size of the seed value can be smaller than the size of the state of the PRNG. When this happens, we say the PRNG has been underseeded. The length of the sequence before a PRNG begins to repeat itself is known as the period.
A random number distribution converts the output of a PRNG into some other distribution of numbers. A uniform distribution is a random number distribution that produces outputs between two numbers X and Y (inclusive) with equal probability.
#include
#include // for std::mt19937
#include
int getGuess(int count)
{
while (true) // loop until user enters valid input
{
std::cout << "Guess #" << count << ": ";
int guess{};
std::cin >> guess;
if (std::cin.fail()) // did the extraction fail?
{
// yep, so let's handle the failure
std::cin.clear(); // put us back in 'normal' operation mode
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); // remove the bad input
continue; // and try again
}
// If the guess was out of bounds
if (guess < 1 || guess > 100)
{
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); // remove the bad input
continue; // and try again
}
// We may have gotten a partial extraction (e.g. user entered '43x')
// We'll remove any extraneous input before we proceed
// so the next extraction doesn't fail
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
return guess;
}
}
// returns true if the user won, false if they lost
bool playGame(int guesses, int number)
{
// Loop through all of the guesses
for (int count{ 1 }; count <= guesses; ++count)
{
int guess{ getGuess(count) };
if (guess > number)
std::cout << "Your guess is too high.\n";
else if (guess < number)
std::cout << "Your guess is too low.\n";
else // guess == number
return true;
}
return false;
}
bool playAgain()
{
// Keep asking the user if they want to play again until they pick y or n.
while (true)
{
char ch{};
std::cout << "Would you like to play again (y/n)? ";
std::cin >> ch;
switch (ch)
{
case 'y': return true;
case 'n': return false;
default:
// clear out any extraneous input
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}
}
}
int main()
{
std::random_device rd;
std::seed_seq seq{ rd(), rd(), rd(), rd(), rd(), rd(), rd(), rd() };
std::mt19937 mt{ seq }; // Create a mersenne twister, seeded using the seed sequence
std::uniform_int_distribution die{ 1, 100 }; // generate random numbers between 1 and 100
constexpr int guesses{ 7 }; // the user has this many guesses
do
{
int number{ die(mt) }; // this is the number the user needs to guess
std::cout << "Let's play a game. I'm thinking of a number between 1 and 100. You have " << guesses << " tries to guess what it is.\n";
bool won{ playGame(guesses, number) };
if (won)
std::cout << "Correct! You win!\n";
else
std::cout << "Sorry, you lose. The correct number was " << number << "\n";
} while (playAgain());
std::cout << "Thank you for playing.\n";
return 0;
}