Different ways to iterate over a vector
August 28th, 2009 § 10 Comments
Iterating over a vector is a pretty simple task we get to do pretty often. It can be achieved in quite a few ways:
- Using normal random access (operator[] with index).
- Using std::iterator.
- Using std::for_each algorithm.
I set out to check the runtime differences between all these options, and the results turned out to be a little surprising (or not).
Let me present the setup:
#include <algorithm>
#include <vector>
typedef std::vector<int> vec;
struct func {
void operator () (int &n) const {
n *= 2;
}
};
void idx (vec &v, func &f) {
for (size_t i=0;i<v.size();++i)
f(v[i]);
}
void iterate (vec &v, func &f) {
for (vec::iterator i=v.begin();i!=v.end();++i)
f(*i);
}
void foreach (vec &v, func &f) {
std::for_each(v.begin(), v.end(), f);
}
I used the Timer class (presented here) to time all the different options, in the following manner:
#include "Timer.hpp"
int main () {
Timer t("iterate");
vec v(30000, 2);
func f;
for (unsigned i=30000;i;--i)
iterate(v, f);
return 0;
}
The tests were run on my own system; Windows Vista, Visual Studio 2008 (release config), Intel core2 quad q9450, 4gb ram. The observed results were the following:
idx: 1.188secs.
iterate: 6.813secs.
foreach: 0.709secs.
Looks like using the foreach algorithm is the fastest way, followed by normal random-access (using operator[]), while the usage of iterators seems to be the slowest by far.
Edit: as it turns out, the usage of checked iterators is enabled in visual studio by default, both in debug and in release modes. As you could guess, this fact has a great effect on the results. When the feature is disabled (#define _SECURE_SCL 0), the results are totally different; std::for_each still wins, but the margins are significantly smaller:
idx: 0.686secs.
iterate: 0.701secs.
foreach: 0.682secs.
I would like to thank Halbling for raising the question.
Ew, calculating .end() on every iteration? No.
void iterate (vec &v, func &f) {
vec::iterator it = v.begin(), end = v.end();
for ( ; it != end; ++it)
f(*it);
}
Try again!
Also, is there something wrong with your SHIFT-key?
Regards
Er, and:
void idx (vec &v, func &f) {
size_t n = v.size();
for (size_t i = 0;i < n; ++i)
f(v[i]);
}
These can't be cached as the compiler does not know if f() has side-effects on the vector. Consequently, using my functions you have to ensure that it doesn't, but if it did then it wouldn't be a very good iteration benchmark would it.
The presented methods are all used in the most straightforward way; no optimization attempts have been made to any of them, and that was on purpose – to show the most common usage scenario.
Moreover, the optimizations you suggested are basically what is done inside std::for_each.
thanks alot for commenting!
There’s a difference between writing non-optimized code and writing silly code. Anyone using the functions you demonstrated should not be employed to write C++, so it’s moot.
On linux (Athlon XP 1600+, g++ -O3), I get the following times for the original versions:
> idx: 1.584 s
> iterate: 1.715 s
> foreach: 1.560 s
Seems like gcc does some decent optimizations.
Also make sure that Visual Studio does not use the range-checking iterators in your benchmark…
You are absolutly correct!
Reading http://msdn.microsoft.com/en-us/library/aa985965(VS.80).aspx it turns out that checked iterators are enabled by default (both in debug & release configurations).
Switching checked iterators off generates far better results, as updated in the above post.
I imagine that writing non-silly code — as I mentioned in my comment above — would produce all but identical results for all three options… if not identical instructions. So this raises the question: just what are you trying to say here?
I was just experimenting with different ways of achieving a pretty common goal, not to make any point. I posted my results, and after discussing them and solving an issue – it turned out that all the different approaches generate pretty much the same results.
What did I learn? Firstly, that the compiler is able to optimize common pieces of code as well as a human being can. And secondly, that Microsoft’s Visual Studio has a pretty odd configuration that should be checked and verified (why would I want checked pointers in release mode?).
All in all, I hope this article will come in handy for the readers.
Another way is Boost.Foreach : http://www.boost.org/doc/libs/1_39_0/doc/html/foreach.html .
vector myvector;
BOOST_FOREACH( int i , myvector )
cout << i;
The code is optimal as for with iterator and caching the end.
(Checked on MSVC 2008 )
First off, thanks for a very relevant comment.
Given the described situation, std::for_each is just as optimal. In a general case though, BOOST_FOREACH is indeed perfect: it provides all the benefits of a regular loop (you’re able to write normal code within it, etc) with all the caching and other issues being totally taken care of by boost.