Basic C
Table of Contents
This page is very much a work in progress and will grow gradually during 2018. Feel free to make requests!
Figure 1: The beauty of C is that it is a small language, where almost nothing happens under the hood that you did not explicitly ask for.
1 Anatomy of a C Program
A C program consists of one or more files. Files come in two
forms: C ode and H eader files (with extensions .c
and
.h
respectively). .c
files contain declarations of functions
and data structures. The .h
files contain definitions of e.g.
type aliases and function prototypes which simply names a function
and states its parameter types and return type, but not its actual
code.
In an executable C program, exactly one .c
file has a main()
function that starts the program.1
The figure below shows a typical .h
file. It contains a set of
include directives and type aliases which are used in the function
prototypes.
#pragma once [include directives] [type aliases] [function prototypes]
%23pragma%20once%0A%0A%5Binclude%20directives%5D%0A%0A%5Btype%20aliases%5D%0A%0A%5Bfunction%20prototypes%5D%0A
The include directives list names of .h
files whose definitions
will be included in the current file. For example, writing
#include <stdlib.h>
includes all the function defintions from
the standard library into the curren file, allowing the compiler
to check that calls to these functions are correct.
The line #pragma once
states that no matter how many times the
.h
file is included, it will only be included
once.2
In contrast, a typical .c
file will look like this:
[include directives] [type aliases] [struct declarations] [function prototypes] [function definitions]
%5Binclude%20directives%5D%0A%0A%5Btype%20aliases%5D%0A%0A%5Bstruct%20declarations%5D%0A%0A%5Bfunction%20prototypes%5D%0A%0A%5Bfunction%20definitions%5D%0A
Here, there is no #pragma
, which is because .c
files are not
usually included into other files. Commonly, the file file.c
includes file.h
, which means a .c
file imports a list of its
public functions. This is useful because the C compiler reads
sources top to bottom, and will be confused3 if a function is called
before it is defined.
The struct declaration part will declare how or memory objects are
shaped. If there are function definitions in the .c
file which
are not in the .h
file, they are typically listed before the
definitions. If type aliase, struct declarations and function
prototypes come before the function definitions, we are free to
use them in the function defitions, no matter their order.
1.1 A Complete Example
Below shows three files, greeter.h
, greeter.c
and driver.c
that together make up a complete C program. greeter.h
defines a
function, greet()
that takes a message and a name. greeter.c
implements this function. Finally, driver.c
must include
greeter.h
to learn the existence of greet()
.
#pragma once void greet(char *msg, char *name);
%23pragma%20once%0A%0Avoid%20greet%28char%20%2Amsg%2C%20char%20%2Aname%29%3B%0A
#include "greeter.h" void greet(char *msg, char *name) { printf("Hi, %s, %s!\n", name, msg); }
%23include%20%22greeter.h%22%0A%0Avoid%20greet%28char%20%2Amsg%2C%20char%20%2Aname%29%0A%7B%0A%20%20printf%28%22Hi%2C%20%25s%2C%20%25s%21%5Cn%22%2C%20name%2C%20msg%29%3B%0A%7D%0A
#include "greeter.h" int main(int arc, char *argv[]) { if (argc == 3) { greet(argv[1], argv[2]); } else { puts("Usage: ./driver <msg> <name>") } return 0; }
%23include%20%22greeter.h%22%0A%0Aint%20main%28int%20arc%2C%20char%20%2Aargv%5B%5D%29%0A%7B%0A%20%20if%20%28argc%20%3D%3D%203%29%0A%20%20%20%20%7B%0A%20%20%20%20%20%20greet%28argv%5B1%5D%2C%20argv%5B2%5D%29%3B%0A%20%20%20%20%7D%0A%20%20else%0A%20%20%20%20%7B%0A%20%20%20%20%20%20puts%28%22Usage%3A%20.%2Fdriver%20%3Cmsg%3E%20%3Cname%3E%22%29%0A%20%20%20%20%7D%0A%0A%20%20return%200%3B%0A%7D%0A
These files can be compiled thus: gcc greeter.c driver.c -o
driver
. This produces the executable driver
. Note how we do not
mention the .h
file when compiling. That would be redundant since
the contents of greeter.h
has already been included twice, once
in each .c
file.
2 Common Compiler Flags
Most C compilers support hundreds if not thousands of options. It is beyond the scope of this course to go beyond the absolute basics. Below are some flags that we will use frequently:
Flag | Description |
---|---|
-c |
Separate compilation – produce an object file that can be linked to an executable later |
-o filename |
Name the resulting executable filename rather than a.out |
-lm |
Link with the mathematics library |
-lcunit |
Link with the CUnit testing framework |
-Wall |
Makes the compiler warn for things it considers dubious |
-Wextra |
Makes the compiler even more suspicious than with -Wall |
-pedantic |
Warns about use of C outside of what the standard supports |
-g |
Add information to the output that facilitates debuggning (you can use -ggdb if using gdb) |
-O2 , -O3 |
Turn on (increasing) levels of optimisation (this may trigger errors in bad code) |
-pg |
Add profiling information to the output |
On this course, I recommend always using -g
or -ggdb
and
-Wall
. Warnings in C are often to be taken seriously, especially
in an introductory course, and the overhead of adding debugging
information to the code is negligable. This very line will get you
quite far:4
gcc -Wall -pedantic -g file1.c file2.c file3.c
Note that unless you give a -o filename
flag, the C compiler
will name the resulting file a.out
. That is often a fine file
name for testing things out locally.
3 Loops
C has 3 kinds of loops. They are equally powerful. When you want to create a temporary loop variable just for a loop, the for loop is generally the right choice. If you know you will always go through at least one iteration, choose a do-while loop. Otherwise, while loops are great!
3.1 While Loops
Here is recursive fibonacci:
int fib(int n) { if (n < 2) { return n; } else { return fib(n - 1) + fib(n - 2); } }
int%20fib%28int%20n%29%0A%7B%0A%20%20if%20%28n%20%3C%202%29%0A%20%20%20%20%7B%0A%20%20%20%20%20%20return%20n%3B%0A%20%20%20%20%7D%0A%20%20else%0A%20%20%20%20%7B%0A%20%20%20%20%20%20return%20fib%28n%20-%201%29%20%2B%20fib%28n%20-%202%29%3B%0A%20%20%20%20%7D%0A%7D%0A
And here, written using a while loop:
int fib(int n) { int fib_1 = 0; int fib_2 = 1; int times = 0; while (times < n) { int tmp = fib_1 + fib_2; fib_1 = fib_2; fib_2 = tmp; times = times + 1; } return fib_2; }
int%20fib%28int%20n%29%0A%7B%0A%20%20int%20fib_1%20%3D%200%3B%0A%20%20int%20fib_2%20%3D%201%3B%0A%20%20int%20times%20%3D%200%3B%0A%0A%20%20while%20%28times%20%3C%20n%29%0A%20%20%20%20%7B%0A%20%20%20%20%20%20int%20tmp%20%3D%20fib_1%20%2B%20fib_2%3B%0A%20%20%20%20%20%20fib_1%20%3D%20fib_2%3B%0A%20%20%20%20%20%20fib_2%20%3D%20tmp%3B%0A%0A%20%20%20%20%20%20times%20%3D%20times%20%2B%201%3B%0A%20%20%20%20%7D%0A%0A%20%20return%20fib_2%3B%0A%7D%0A
Note: bug if n == 0 – exercise!
3.2 Do-While Loops
A do-while loop is a good fit for loops where we know we will go
through the loop body at least once. Here is an example of going
through a password dialogue. If the correct password is entered,
strcmp(password, answer)
returns 0 and the loop exits.
Otherwise, the password question is repeated.
char *password = "secret"; char *answer = NULL; do { ask_question_string("Enter password:", &answer); } while (strcmp(password, answer) != 0);
char%20%2Apassword%20%3D%20%22secret%22%3B%0Achar%20%2Aanswer%20%3D%20NULL%3B%0Ado%0A%20%20%7B%0A%20%20%20%20ask_question_string%28%22Enter%20password%3A%22%2C%20%26answer%29%3B%0A%20%20%7D%0Awhile%20%28strcmp%28password%2C%20answer%29%20%21%3D%200%29%3B%0A
Concisely, if a do-while loop is do {body} while (guard)
, we
could rewrite the code using a while loop as body; while (guard)
{body}
. What would the example above look like using a while-loop
instead?
3.3 For Loops
Technically, there is nothing you can do with a for loop that you could not do with a while. However, in many (but not all) circumstances, a for loop can be much clearer – by providing a single place (at the start of the loop) for declaring loop variables, loop guards and incrementing the loop variables.
The following snippet uses a for loop to print i = 0
, i = 1
,
etc. up to 1023. Note that the first line has three ;
-separated
compartments. the first is for declaring one or more loop
variables and initialising them; the second is the guard, just
like in while or do/while loops; the third is applied at the end of
a loop body.
int N = 1024; for (int i = 0; i < N; ++i) { printf("i = %d\n", i); } /// Cannot use i variable here!
int%20N%20%3D%201024%3B%0Afor%20%28int%20i%20%3D%200%3B%20i%20%3C%20N%3B%20%2B%2Bi%29%0A%20%20%7B%0A%20%20%20%20printf%28%22i%20%3D%20%25d%5Cn%22%2C%20i%29%3B%0A%20%20%7D%0A%2F%2F%2F%20Cannot%20use%20i%20variable%20here%21%0A
We could translate the for loop above to an equivalent while loop:
int N = 1024; int i = 0; while (i < N) { printf("i = %d\n", i); ++i; } /// __Can__ use i variable here!
int%20N%20%3D%201024%3B%0Aint%20i%20%3D%200%3B%20%0A%0Awhile%20%28i%20%3C%20N%29%0A%20%20%7B%0A%20%20%20%20printf%28%22i%20%3D%20%25d%5Cn%22%2C%20i%29%3B%0A%20%20%20%20%2B%2Bi%3B%0A%20%20%7D%0A%2F%2F%2F%20__Can__%20use%20i%20variable%20here%21%0A
4 Increment and Decrement Operators
Many C programs make use of the increment and decrement operators,
++
and --
. A pretty strong and compelling case can be made for
not using these operators, because they make code hard to read.
Indeed, several “modern” languages5
do not support them.
The ++
operator operates on SCALAR variables and fields and adds
one to their value. The reason why the operator is sometimes
confusing is that the result of applying it depends on whether it
is used in a prefix (e.g., ++x
) or postfix (e.g., x++
position).
Example: let x
be an integer variable declared thus: int x =
5;
. Table LINK shows the meaning of ++
and --
applied to x
in both the postfix and prefix positions.
Expression | Return Value | Side Effect |
---|---|---|
x++ |
5 |
x = 6 |
++x |
6 |
x = 6 |
x-- |
5 |
x = 4 |
--x |
4 |
x = 4 |
4.1 Exercise I
What does the following code print?
int x = 42; printf("%d ", x++); printf("%d ", ++x); printf("%d.", x)
int%20x%20%3D%2042%3B%0Aprintf%28%22%25d%20%22%2C%20x%2B%2B%29%3B%0Aprintf%28%22%25d%20%22%2C%20%2B%2Bx%29%3B%0Aprintf%28%22%25d.%22%2C%20x%29%0A
What does the following code print?
int x = 42; printf("%d ", ++x); printf("%d ", x++); printf("%d.", x)
int%20x%20%3D%2042%3B%0Aprintf%28%22%25d%20%22%2C%20%2B%2Bx%29%3B%0Aprintf%28%22%25d%20%22%2C%20x%2B%2B%29%3B%0Aprintf%28%22%25d.%22%2C%20x%29%0A
Brain teaser (and sort-of trick question) What does the following code print?
int x = 42; printf("%d.", x+++x);
int%20x%20%3D%2042%3B%0Aprintf%28%22%25d.%22%2C%20x%2B%2B%2Bx%29%3B%0A
The first and second example are quite simple. It is easy for your
eyes to transpose the two printf()
-lines and confuse one for the
other. The final example is interesting because it stresses the
parsing situation – does x+++x
parse as (x++) + x
or as x +
(++x)
– and does it matter?
4.2 Rationale
Historically, many programmers have enjoyed writing programs which
use ++
(and --
) when iterating over arrays. Here is a snippet that
illustrates this in a very simple way.
char msg[4]; int i = 0; msg[i++] = 'H'; msg[i++] = 'e'; msg[i++] = 'y'; msg[i++] = '\0';
char%20msg%5B4%5D%3B%0Aint%20i%20%3D%200%3B%0Amsg%5Bi%2B%2B%5D%20%3D%20%27H%27%3B%0Amsg%5Bi%2B%2B%5D%20%3D%20%27e%27%3B%0Amsg%5Bi%2B%2B%5D%20%3D%20%27y%27%3B%0Amsg%5Bi%2B%2B%5D%20%3D%20%27%5C0%27%3B%0A
Clearly, the ability to both look up an index and increment it at
the same time is making the code above compact. In the code
above, one could argue that adding four separate lines for
incrementing i
would add “a lot of noise” to the program.
However, most-real world examples involve loops where we are
mostly changing the index in a single line. For example:
void puts_equivalent(char *str) { int i = 0; while (str[i]) putchar(str[i++]); }
void%20puts_equivalent%28char%20%2Astr%29%0A%7B%0A%20%20int%20i%20%3D%200%3B%0A%20%20while%20%28str%5Bi%5D%29%20putchar%28str%5Bi%2B%2B%5D%29%3B%0A%7D%0A
Before we dig in, let us write that code with a proper loop body:
void puts_equivalent(char *str) { int i = 0; while (str[i]) { putchar(str[i++]); } }
void%20puts_equivalent%28char%20%2Astr%29%0A%7B%0A%20%20int%20i%20%3D%200%3B%0A%20%20while%20%28str%5Bi%5D%29%0A%20%20%20%20%7B%0A%20%20%20%20%20%20putchar%28str%5Bi%2B%2B%5D%29%3B%0A%20%20%20%20%7D%0A%7D%0A
One reason to dislike this code is because putchar(str[i++]);
is
a very busy statement – many things going on in a single line:
- Read the
i
:th character ofstr
- Increment
i
- Print the character we read in 1.
Moving the increment to a line of its own makes the code much clearer:
void puts_equivalent(char *str) { int i = 0; while (str[i]) { putchar(str[i]); ++i; } }
void%20puts_equivalent%28char%20%2Astr%29%0A%7B%0A%20%20int%20i%20%3D%200%3B%0A%20%20while%20%28str%5Bi%5D%29%0A%20%20%20%20%7B%0A%20%20%20%20%20%20putchar%28str%5Bi%5D%29%3B%0A%20%20%20%20%20%20%2B%2Bi%3B%0A%20%20%20%20%7D%0A%7D%0A
We can rewrite it thus using a for
loop, which could be argued
makes the code clearer by collecting the loop variable, guard and
change to the loop variable in a coherent space, textually6.
void puts_equivalent(char *str) { for (int i = 0; str[i]; ++i) { putchar(str[i]); } }
void%20puts_equivalent%28char%20%2Astr%29%0A%7B%0A%20%20for%20%28int%20i%20%3D%200%3B%20str%5Bi%5D%3B%20%2B%2Bi%29%0A%20%20%20%20%7B%0A%20%20%20%20%20%20putchar%28str%5Bi%5D%29%3B%0A%20%20%20%20%7D%0A%7D%0A
Note that in the last two cases, the behaviour is the same if the
increment is done using ++i
or using i++
. This is a good thing
for readability and maintainability.
I strongly advocate only using ++
in the prefix position to
the extent possible. Many modern programming languages have dropped
++
in favour of +=n
which is clearer and more flexible.
4.3 Increments in Pointer Arithmetic
The increment operation is often used to do pointer arithmetic.
For example, int *ip
declares ip
to be a pointer to a place in
memory where an int is stored7.
Remember that ip + 1
returns a pointer to the next integer after
the one that ip
points to, meaning that pointer arithmetic moves
pointers to values of type T
in strides of sizeof(T)
bytes.
Also, because of C’s preference order rules, *ip++
means read
*ip
, then increase ip
by one, the following code prints 7, 42
and 4711.
int values[] = { 7, 42, 4711 } int *ip = values; printf("%d ", *ip++) printf("%d ", *ip++) printf("%d ", *ip++)
int%20values%5B%5D%20%3D%20%7B%207%2C%2042%2C%204711%20%7D%0Aint%20%2Aip%20%3D%20values%3B%0Aprintf%28%22%25d%20%22%2C%20%2Aip%2B%2B%29%0Aprintf%28%22%25d%20%22%2C%20%2Aip%2B%2B%29%0Aprintf%28%22%25d%20%22%2C%20%2Aip%2B%2B%29%0A
To increment the value pointed to by some pointer, put the
dereference in parentheses: (*ip)++
means increase the value of
*ip
by one and does not change the value of ip
.
4.4 Exercise II
Look at the following implementation of strcpy()
, the string
copying function, that copies a string from b
to a
.
Do you understand how it works?
void strcpy(char *a, char *b) { while (*a++ = *b++) ; }
void%20strcpy%28char%20%2Aa%2C%20char%20%2Ab%29%0A%7B%0A%20%20while%20%28%2Aa%2B%2B%20%3D%20%2Ab%2B%2B%29%20%3B%0A%7D%0A
Because sometimes strings are not properly null-terminated, we
should never use strcpy()
. Instead, we should rely on
strncpy()
that takes an additional argument that upper-bounds
the number of characters copied. Here is an implementation of
that function using --
.
void strncpy(char *a, char *b, int n) { while (n-- > 0 && *a++ = *b++) ; }
void%20strncpy%28char%20%2Aa%2C%20char%20%2Ab%2C%20int%20n%29%0A%7B%0A%20%20while%20%28n--%20%3E%200%20%26%26%20%2Aa%2B%2B%20%3D%20%2Ab%2B%2B%29%20%3B%0A%7D%0A
The documentation for strncpy()
says (slightly adapted):
If the length of b
is less than n
, strncpy()
writes additional null characters to a
to ensure that a total of n
bytes are written.
Update strncpy()
above to adhere to that specification. Can you
do it in a single construct on a single line? (Note – there is
nothing better about doing it in a single line, this is just a bit
of fun, and also to restrict you from inventing too complex
machinery.)
void strncpy(char *a, char *b, int n) { while (n-- > 0 && *a++ = *b++) ; while (n-- > 0) *a++ = '\0'; }
void%20strncpy%28char%20%2Aa%2C%20char%20%2Ab%2C%20int%20n%29%0A%7B%0A%20%20while%20%28n--%20%3E%200%20%26%26%20%2Aa%2B%2B%20%3D%20%2Ab%2B%2B%29%20%3B%0A%20%20while%20%28n--%20%3E%200%29%20%2Aa%2B%2B%20%3D%20%27%5C0%27%3B%0A%7D%0A
4.5 Concluding Remarks
Several modern languages abolish ++
and --
because they make
code hard to read. Many languages (including C) support more
general +=
and -=
operators that can be used to achieve the
same effect with similar brevity.
I strongly advocate only using ++
in the prefix position to
the extent possible. But even better, use += 1
and -= 1
instead. The ++
and --
are seductive, and it may be hard to
stop using them once you have grown accustomed to them.
5 Avoid Global Variables
This slide set discusses this statement further, and explains how to refactor yourself free from global variables.
Questions about stuff on these pages? Use our Piazza forum.
Want to report a bug? Please place an issue here. Pull requests are graciously accepted (hint, hint).
Nerd fact: These pages are generated using org-mode in Emacs, a modified ReadTheOrg template, and a bunch of scripts.
Ended up here randomly? These are the pages for a one-semester course at 67% speed on imperative and object-oriented programming at the department of Information Technology at Uppsala University, created by Tobias Wrigstad.
Footnotes:
#pragma
is not part of the C standard!
Nevertheless, it is supported by GCC and Clang, which are the two
C compilers we will use in this course. Later on we will cover the
canonical way, which involves digging into the C preprocessor a
little deeper.ip
– just assume ip
points to something
sensible.