Iterators
Table of Contents
1 Introduction
When we implemented linked lists, we implemented a function
list_apply()
that took as argument a pointer to a function f
and a pointer to some data d and then for each element e in
the list, called f(e, d) in list order. This pattern is often
called an internal iterator as the apply-function iterates over
all the elements of the list. C’s support for higher-order
functions (functions that take or produce other functions) is
quite crude, so in many cases, an external iterator is
preferable.
Imagine that we want to iterate over all elements of a list of integers and count odd and even values. The listing below shows a solution to this problem.
list_t *list = ...; /// a list we have gotten from somewhere int odd = 0; int even = 0; for (int i = 0; i < list_size(list); ++i) { elem_t e = list_get(list, i); if (e.i % 2 == 0) { ++even; } else { ++odd; } }
list_t%20%2Alist%20%3D%20...%3B%20%2F%2F%2F%20a%20list%20we%20have%20gotten%20from%20somewhere%0A%0Aint%20odd%20%3D%200%3B%0Aint%20even%20%3D%200%3B%0A%0Afor%20%28int%20i%20%3D%200%3B%20i%20%3C%20list_size%28list%29%3B%20%2B%2Bi%29%0A%20%20%7B%0A%20%20%20%20elem_t%20e%20%3D%20list_get%28list%2C%20i%29%3B%0A%20%20%20%20if%20%28e.i%20%25%202%20%3D%3D%200%29%0A%20%20%20%20%20%20%7B%0A%09%2B%2Beven%3B%0A%20%20%20%20%20%20%7D%0A%20%20%20%20else%0A%20%20%20%20%20%20%7B%0A%09%2B%2Bodd%3B%0A%20%20%20%20%20%20%7D%0A%20%20%7D%0A
Something which is (probably) not obvious at a first glance is
that this implementation has quadratic time complexity. Fetching
element i
requires i
steps inside list_get()
and
subsequently finding element i+1
requires another i+1
steps.
The problem is that each call to list_get()
must start from the
first link. The internal iterator list_apply()
did not suffer
from this problem and only steps through the list once.
An external iterator to a list can be implemented as an opaque1 object holding a pointer to a link in the list. Each time we advance the iterator to the next element, we only need to take one step. The key to make this possible is that the iterator stores state about the iteration, namely the pointer directly into the list.
With a properly encapsulated list, it is impossible for a client to build an efficient iterator – the iterator needs to know how the list is represented. Thus, iterators are generally implemented as part of a list, or a list exports some basic iterator interface on-top of which more fancy iterators can be built.
2 The Iterator Interface
Our iterator shall support the following standard operations:
- Check if the iterator is positioned at the end of the list
- Move the iterator forward one step
- Get the current element
- Remove the current element
- Reset the iterator (positioning it at the first element again)
Additionally, we must be able to create and destroy iterators.
2.1 A First Stab at an Iterator Structure
Following the principle of starting out with something simple that seems to be able to do most of what we want, let us define an iterator as simply a pointer to a link2.
typedef link_t iter_t;
typedef%20link_t%20iter_t%3B%20%0A
We note that this adds link_t
to the public interface of the
list, which is something we do not want3. We remember to fix that later.
2.2 Creating an Iterator
We can now add a public function to the list that creates an iterator.
Remember that list->first
points to a sentinel link.
iter_t *list_iterator(list_t *list) { return list->first; }
iter_t%20%2Alist_iterator%28list_t%20%2Alist%29%0A%7B%0A%20%20return%20list-%3Efirst%3B%20%0A%7D%0A
A function that tests whether the iterator is at the end of the
list is simple to implement. Because an iterator is really a
pointer to a link, we can simply check whether the link has a
next
field that is not NULL
. Because of the sentinel link, an
iterator to an empty list is not a NULL
pointer, but a pointer
to a link whose next
is NULL
.
2.3 Reading the Iterator
bool iterator_has_next(iter_t *iter) { return iter->next != NULL; }
bool%20iterator_has_next%28iter_t%20%2Aiter%29%0A%7B%0A%20%20return%20iter-%3Enext%20%21%3D%20NULL%3B%20%0A%7D%0A
As a new iterator points to the sentinel, in order to get an
element, we need to return the element of the next
link. If you
have the linked list fresh in mind, you have probably already
realised why we point to the preceding link rather than the
“current link”4.
elem_t iterator_get_current(iter_t *iter) { return iter->next->element; }
elem_t%20iterator_get_current%28iter_t%20%2Aiter%29%0A%7B%0A%20%20return%20iter-%3Enext-%3Eelement%3B%20%0A%7D%0A
So far, things look simple enough – but we have now reached the
limits for how far our definition of iter_t
will get us.
2.4 Iterator Functions with Side-Effects
As we move attempt to implement a function that moves the iterator forward through the list, we run into a problem. Consider the broken implementation in Figure 1.
void iterator_next(iter_t *iter) { iter = iter->next; }
void%20iterator_next%28iter_t%20%2Aiter%29%0A%7B%0A%20%20iter%20%3D%20iter-%3Enext%3B%0A%7D%0A
Hopefully it is obvious to you why this is not working! The iter
variable is a local variable inside iterator_next()
, meaning
that the change to iter
is visible only inside iterator_next()
.
One solution is to change the function to take a pointer to the
pointer to the iterator. This makes the change inside
iterator_next()
visible to the caller.
void iterator_next(iter_t **iter) { *iter = (*iter)->next; }
void%20iterator_next%28iter_t%20%2A%2Aiter%29%0A%7B%0A%20%20%2Aiter%20%3D%20%28%2Aiter%29-%3Enext%3B%0A%7D%0A
Another solution is to add an extra level of indirection – rather than make an iterator a pointer to a link, make an iterator an object that holds a pointer to a link. We will stick with the latter design rather than the pointer-to-pointer, because having an actual iterator object is much more future proof. For one, it will allow us to extend the iterator with new fields if the need should arise.
2.5 Refactoring Step: New Design of Iterator Struct
/// Goes into list.h typedef struct iter iter_t; /// Goes into list.c struct iter { link_t *current; };
%2F%2F%2F%20Goes%20into%20list.h%0Atypedef%20struct%20iter%20iter_t%3B%0A%0A%2F%2F%2F%20Goes%20into%20list.c%0Astruct%20iter%20%0A%7B%0A%20%20link_t%20%2Acurrent%3B%0A%7D%3B%0A
This requires us to update the functions we have implemented until
now, but the changes are simple – simply follow the current
pointer.
Creating an iterator now requires a malloc()
call to create a
new struct and writing the field in the struct.
iter_t *list_iterator(list_t *list) { iter_t *result = malloc(sizeof(struct iter)); result->current = list->first; return result; }
iter_t%20%2Alist_iterator%28list_t%20%2Alist%29%0A%7B%0A%20%20iter_t%20%2Aresult%20%3D%20malloc%28sizeof%28struct%20iter%29%29%3B%0A%0A%20%20result-%3Ecurrent%20%3D%20list-%3Efirst%3B%0A%0A%20%20return%20result%3B%20%0A%7D%0A
Checking whether the iterator has reached the end of the list
requires following the current
pointer.
bool iterator_has_next(iter_t *iter) { return iter->current->next != NULL; }
bool%20iterator_has_next%28iter_t%20%2Aiter%29%0A%7B%0A%20%20return%20iter-%3Ecurrent-%3Enext%20%21%3D%20NULL%3B%20%0A%7D%0A
The code for getting the current element changes in a similar way,
and turns a bit unwieldy. Following the current pointer takes us
to a link in the list. We must then read that pointer’s next
field to get to the actual current link. We are then ready to read
the element
field.
elem_t iterator_get_current(iter_t *iter) { return iter->current->next->element; }
elem_t%20iterator_get_current%28iter_t%20%2Aiter%29%0A%7B%0A%20%20return%20iter-%3Ecurrent-%3Enext-%3Eelement%3B%20%0A%7D%0A
Moving the iterator forward through the list now involves updating
its current
field.
void iterator_next(iter_t *iter) { iter->current = iter->current->next; }
void%20iterator_next%28iter_t%20%2Aiter%29%0A%7B%0A%20%20iter-%3Ecurrent%20%3D%20iter-%3Ecurrent-%3Enext%3B%0A%7D%0A
Finally, because an iterator is allocated on the heap, we need a destructor that can return its resources back to the system.
void iterator_delete(iter_t *iter) { free(iter); }
void%20iterator_delete%28iter_t%20%2Aiter%29%0A%7B%0A%20%20free%28iter%29%3B%0A%7D%0A
2.6 Resetting the Iterator
One reason for adding an iterator struct is because resetting the iterator requires us to keep extra data around in the iterator. Either we store a pointer to the first element when we create the iterator or we store a pointer to the list from which the iterator was created. We choose the latter solution because it is more resilient to change – if the first element has been removed before resetting, asking the list for the current first element will simply return the new first element.
struct iter { link_t *current; list_t *list; /// New field }; iter_t *list_iterator(list_t *list) { iter_t *result = malloc(sizeof(struct iter)); result->current = list->first; result->list = list; /// Iterator remembers where it came from return result; }
struct%20iter%20%0A%7B%0A%20%20link_t%20%2Acurrent%3B%0A%20%20list_t%20%2Alist%3B%20%2F%2F%2F%20New%20field%0A%7D%3B%0A%0Aiter_t%20%2Alist_iterator%28list_t%20%2Alist%29%0A%7B%0A%20%20iter_t%20%2Aresult%20%3D%20malloc%28sizeof%28struct%20iter%29%29%3B%0A%0A%20%20result-%3Ecurrent%20%3D%20list-%3Efirst%3B%0A%20%20result-%3Elist%20%3D%20list%3B%20%2F%2F%2F%20Iterator%20remembers%20where%20it%20came%20from%0A%0A%20%20return%20result%3B%20%0A%7D%0A
With this in place, we can now write the iterator_reset()
function.
void iterator_reset(iter_t *iter) { iter->current = iter->list->first; }
void%20iterator_reset%28iter_t%20%2Aiter%29%0A%7B%0A%20%20iter-%3Ecurrent%20%3D%20iter-%3Elist-%3Efirst%3B%0A%7D%0A
Note that we do not need to update iterator_delete()
to also
call list_delete()
on its list
field, because iterators are
normally much more short-lived than the list they are iterating
over, meaning it makes sense to be able to destroy an iterator
without also destroying the list.
2.7 Removing Elements
Because our iterator is positioned on the link “before” (it starts by pointing to the sentinel), removing an element is simple – otherwise, it would have required us to find the preceding element from the start of the list.
void iterator_remove(iter_t *iter) { iter->current->next = iter->current->next->next; }
void%20iterator_remove%28iter_t%20%2Aiter%29%0A%7B%0A%20%20iter-%3Ecurrent-%3Enext%20%3D%20iter-%3Ecurrent-%3Enext-%3Enext%3B%20%0A%7D%0A
Note that while the implementation above captures the essence of unlinking, it does not free the memory of the unlinked link. This is an easy fix.
void iterator_remove(iter_t *iter) { link_t *to_remove = iter->current->next; /// Cache result iter->current->next = to_remove->next; /// Move forward free(to_remove); /// Remove link }
void%20iterator_remove%28iter_t%20%2Aiter%29%0A%7B%0A%20%20link_t%20%2Ato_remove%20%3D%20iter-%3Ecurrent-%3Enext%3B%20%2F%2F%2F%20Cache%20result%0A%0A%20%20iter-%3Ecurrent-%3Enext%20%3D%20to_remove-%3Enext%3B%20%20%2F%2F%2F%20Move%20forward%0A%0A%20%20free%28to_remove%29%3B%20%2F%2F%2F%20Remove%20link%0A%7D%0A
3 Improving the Public Interface
It is quite common to have the remove function also return the
element of the link it is removing. Similarly, for
iterator_next()
. This is simple. Again, remember that we are
always positioned “to the left” of the current link, that is, we
are relying on being able to point to the sentinel object of the
list.
elem_t iterator_next(iter_t *iter) { iter->current = iter->current->next; return iter->current->element; } elem_t iterator_remove(iter_t *iter) { link_t *to_remove = iter->current->next; elem_t result = to_remove->element; iter->current->next = to_remove->next; free(to_remove); return result; }
elem_t%20iterator_next%28iter_t%20%2Aiter%29%0A%7B%0A%20%20iter-%3Ecurrent%20%3D%20iter-%3Ecurrent-%3Enext%3B%0A%20%20return%20iter-%3Ecurrent-%3Eelement%3B%20%0A%7D%0A%0Aelem_t%20iterator_remove%28iter_t%20%2Aiter%29%0A%7B%0A%20%20link_t%20%2Ato_remove%20%3D%20iter-%3Ecurrent-%3Enext%3B%20%0A%20%20elem_t%20result%20%3D%20to_remove-%3Eelement%3B%0A%0A%20%20iter-%3Ecurrent-%3Enext%20%3D%20to_remove-%3Enext%3B%20%0A%0A%20%20free%28to_remove%29%3B%20%0A%0A%20%20return%20result%3B%0A%7D%0A
4 Testing Our Iterator
To test the iterator, we can implement list_remove()
using it.
List remove removes the i:th
element. This means, we can create
an iterator, take i
steps, and then call iterator_remove()
.
void list_remove(list_t *list, int index) { int valid_index = list_inner_adjust_index(index, list_size(list)); iterator_t *list_iterator(list); for (int i = 0; i < valid_index; ++i) { iterator_next(iter); } iterator_remove(iter); iterator_delete(iter); }
void%20list_remove%28list_t%20%2Alist%2C%20int%20index%29%0A%7B%0A%20%20int%20valid_index%20%3D%20list_inner_adjust_index%28index%2C%20list_size%28list%29%29%3B%0A%0A%20%20iterator_t%20%2Alist_iterator%28list%29%3B%0A%20%20for%20%28int%20i%20%3D%200%3B%20i%20%3C%20valid_index%3B%20%2B%2Bi%29%0A%20%20%20%20%7B%0A%20%20%20%20%20%20iterator_next%28iter%29%3B%20%0A%20%20%20%20%7D%0A%0A%20%20iterator_remove%28iter%29%3B%0A%0A%20%20iterator_delete%28iter%29%3B%0A%7D%0A%0A
Note that, because our iterator works in terms of elements and
not links, we cannot use iterators to implement
list_inner_find_previous()
. However, because we are able to
quickly move an iterator to the right place in the list, and
either get the element or remove it, there is no longer a strong
need for list_inner_find_previous()
.
Let us close with revisiting the initial example, but use an
iterator instead of list_get()
.
list_t *list = ...; /// a list we have gotten from somewhere int odd = 0; int even = 0; for (iter_t *iter = list_iterator(list); iterator_has_next(iter); iterator_next(iter)) { elem_t e = iterator_get_current(iter); if (e.i % 2 == 0) { ++even; } else { ++odd; } }
list_t%20%2Alist%20%3D%20...%3B%20%2F%2F%2F%20a%20list%20we%20have%20gotten%20from%20somewhere%0A%0Aint%20odd%20%3D%200%3B%0Aint%20even%20%3D%200%3B%0A%0Afor%20%28iter_t%20%2Aiter%20%3D%20list_iterator%28list%29%3B%20iterator_has_next%28iter%29%3B%20iterator_next%28iter%29%29%0A%20%20%7B%0A%20%20%20%20elem_t%20e%20%3D%20iterator_get_current%28iter%29%3B%0A%20%20%20%20if%20%28e.i%20%25%202%20%3D%3D%200%29%0A%20%20%20%20%20%20%7B%0A%09%2B%2Beven%3B%0A%20%20%20%20%20%20%7D%0A%20%20%20%20else%0A%20%20%20%20%20%20%7B%0A%09%2B%2Bodd%3B%0A%20%20%20%20%20%20%7D%0A%20%20%7D%0A
If we do not insist in using a for
loop, we can do a little bit
better by relying on the implementation of iterator_next()
returning an elements.
list_t *list = ...; /// a list we have gotten from somewhere int odd = 0; int even = 0; iter_t *iter = list_iterator(list); while (iterator_has_next(iter)) { elem_t e = iterator_next(iter); /// Both step forward and return element if (e.i % 2 == 0) { ++even; } else { ++odd; } }
list_t%20%2Alist%20%3D%20...%3B%20%2F%2F%2F%20a%20list%20we%20have%20gotten%20from%20somewhere%0A%0Aint%20odd%20%3D%200%3B%0Aint%20even%20%3D%200%3B%0A%0Aiter_t%20%2Aiter%20%3D%20list_iterator%28list%29%3B%20%0A%0Awhile%20%28iterator_has_next%28iter%29%29%0A%20%20%7B%0A%20%20%20%20elem_t%20e%20%3D%20iterator_next%28iter%29%3B%20%2F%2F%2F%20Both%20step%20forward%20and%20return%20element%0A%20%20%20%20if%20%28e.i%20%25%202%20%3D%3D%200%29%0A%20%20%20%20%20%20%7B%0A%09%2B%2Beven%3B%0A%20%20%20%20%20%20%7D%0A%20%20%20%20else%0A%20%20%20%20%20%20%7B%0A%09%2B%2Bodd%3B%0A%20%20%20%20%20%20%7D%0A%20%20%7D%0A
You can think of iterator_next(iter)
a little bit like
arr[i++]
– we both look up an element in the array and
increment the index variable so that next use will “move forward”
in the array.
Questions about stuff on these pages? Use our Piazza forum.
Want to report a bug? Please place an issue here. Pull requests are graciously accepted (hint, hint).
Nerd fact: These pages are generated using org-mode in Emacs, a modified ReadTheOrg template, and a bunch of scripts.
Ended up here randomly? These are the pages for a one-semester course at 67% speed on imperative and object-oriented programming at the department of Information Technology at Uppsala University, created by Tobias Wrigstad.
Footnotes:
list_t
and link_t
that we defined then.link_t
is not
visible externally, we may end up not being able to compile this
code.