Login

I have this code (my strlen function)

size_t slen(const char *str)
{
size_t len = 0;
while (*str)
{
len++;
str++;
}
return len;
}

Doing `while (*str++)`, as shown below, the program execution time is much larger:

while (*str++)
{
len++;
}

I'm doing this to probe the code

int main()
{
double i = 11002110;
const char str[] = "long string here blablablablablablablabla"
while (i--)
slen(str);

return 0;
}

In first case the execution time is around 6.7 seconds, while in the second (using `*str++`), the time is around 10 seconds!

Why so much difference?

Trying this out on ideone.com, I get about 0.5s execution with *str++ [here][1]. Without, it takes just over a second ([here][2]). Using *str++ was faster. Perhaps with optimisation on *str++ can be done more efficiently.

[1]:

[To see links please register here]

[2]:

[To see links please register here]

This depends on your compiler, compiler flags, and your architecture. With Apple's LLVM gcc 4.2.1, I don't get a noticeable change in performance between the two versions, and there really shouldn't be. A good compiler would turn the `*str` version into something like

IA-32 (AT&T Syntax):

slen:
pushl %ebp # Save old frame pointer
movl %esp, %ebp # Initialize new frame pointer
movl -4(%ebp), %ecx # Load str into %ecx
xor %eax, %eax # Zero out %eax to hold len
loop:
cmpb (%ecx), $0 # Compare *str to 0
je done # If *str is NUL, finish
incl %eax # len++
incl %ecx # str++
j loop # Goto next iteration
done:
popl %ebp # Restore old frame pointer
ret # Return

The `*str++` version could be compiled exactly the same (since changes to `str` aren't visible outside `slen`, when the increment actually occurs isn't important), or the body of the loop could be:

loop:
incl %ecx # str++
cmpb -1(%ecx), $0 # Compare *str to 0
je done # If *str is NUL, finish
incl %eax # len++
j loop # Goto next iteration

Probably because the post-increment operator (used in the condition of the `while` statement) involves keeping a temporary copy of the variable with its old value.

What `while (*str++)` really means is:

while (tmp = *str, ++str, tmp)
...

By contrast, when you write `str++;` as a single statement in the body of the while loop, it is in a void context, hence the old value isn't fetched because it's not needed.

To summarise, in the `*str++` case you have an assignment, 2 increments, and a jump in each iteration of the loop. In the other case you only have 2 increments and a jump.

Others have already provided some excellent commentary, including analysis for the generated assembly code. I strongly recommend that you read them carefully. As they have pointed out this sort of question can't really be answered without some quantification, so let's and play with it a bit.

First, we're going to need a program. Our plan is this: we will generate strings whose lengths are powers of two, and try all functions in turn. We run through once to prime the cache and then separately time 4096 iterations using the highest-resolution available to us. Once we are done, we will calculate some basic statistics: min, max and the simple-moving average and dump it. We can then do some rudimentary analysis.

In addition to the two algorithms you've already shown, I will show a third option which doesn't involve the use of a counter at all, relying instead on a subtraction, and I'll mix things up by throwing in `std::strlen`, just to see what happens. It'll be an interesting throwdown.

Through the magic of television our little program is already written, so we compile it with `gcc -std=c++11 -O3 speed.c` and we get cranking producing some data. I've done two separate graphs, one for strings whose size is from 32 to 8192 bytes and another for strings whose size is from 16384 all the way to 1048576 bytes long. In the following graphs, the Y axis is the time consumed in *nanoseconds* and the X axis shows the length of the string in bytes.

Without further ado, let's look at performance for "small" strings from 32 to 8192 bytes:

[![Performance Plot - Small Strings][1]](

)

Now *this* is interesting. Not only is the `std::strlen` function outperforming everything across the board, it's doing it with gusto too since it's performance is a lot of more stable.

Will the situation change if we look at larger strings, from 16384 all the way to 1048576 bytes long?

[![enter image description here][2]](

)

Sort of. The difference is becoming even more pronounced. As our custom-written functions huff-and-puff, `std::strlen` continues to perform admirably.

An interesting observation to make is that you can't necessarily translate number of C++ instructions (or even, number of assembly instructions) to performance, since functions whose bodies consist of fewer instructions sometimes take longer to execute.

An even more interesting -- and **important** observation is to notice just how well the `str::strlen` function performs.

# So what does all this get us?

First conclusion: don't reinvent the wheel. Use the standard functions available to you. Not only are they already written, but they are very very heavily optimized and will almost certainly outperform anything you can write unless you're [Agner Fog](

[To see links please register here]

).

Second conclusion: unless you have **hard data** from a profiler that a particular section of code or function is hot-spot in your application, don't bother optimizing code. Programmers are notoriously bad at detecting hot-spots by looking at high level function.

Third conclusion: prefer algorithmic optimizations in order to improve your code's performance. Put your mind to work and let the compiler shuffle bits around.

Your original question was: "why is function `slen2` slower than `slen1`?" I could say that it isn't easy to answer without a lot more information, and even then it might be a lot longer and more involved than you care for. Instead what I'll say is this:

Who cares why? Why are you even bothering with this? Use `std::strlen` - which is better than anything that you can rig up - and move on to solving more important problems - because I'm sure that this *isn't* the biggest problem in your application.

[1]:

[2]:

kvetchiest635533

copolymeric917161

alfonsomyjiejz

debonair311

iridin867711