Login

Sometimes I encounter code that reads TSC with `rdtsc` instruction, but calls `cpuid` right before.

Why is calling `cpuid` necessary? I realize it may have something to do with different cores having TSC values, but what *exactly* happens when you call those two instructions in sequence?

Two reasons:

- As paxdiablo says, when the CPU sees a CPUID opcode it makes sure all the previous instructions are executed, then the CPUID taken, before any subsequent instructions execute. Without such an instruction, the CPU execution pipeline may end up executing TSC before the instruction(s) you'd like to time.
- A significant proportion of machines fail to synchronise the TSC registers across cores. In you want to read it from _a_ horse's mouth - knock yourself out at

[To see links please register here]

. So, when measuring an interval between TSC readings, unless they're taken on the same core you'll have an effectively random but possibly constant (see below) interval introduced - it can easily be several seconds (yes seconds) even soon after bootup. This effectively reflects how long the BIOS was running on a single core before kicking off the others, plus - if you've any nasty power saving options on - increasing drift caused by cores running at different frequencies or shutting down again. So, if you haven't nailed the threads reading TSC registers to the same core then you'll need to build some kind of cross-core delta table and know the core id (which is returned by CPUID) of each TSC sample in order to compensate for this offset. That's another reason you can see CPUID alongside RDTSC, and indeed a reason why with newer RDTSCP many OSes are storing core id numbers into the extra TSC_AUX[31:0] data returned. (Available from Core i7 and Athlon 64 X2, RDTSCP is a much better option in all respects - the OS normally gives you the core id as mentioned, atomic to the TSC read, *and* prevent instruction reordering).

CPUID is serializing, preventing out-of-order execution of RDTSC.

These days you can safely use LFENCE instead. It's documented as serializing on the instruction stream (but not stores to memory) on Intel CPUs, and now also on AMD after their microcode update for Spectre.

[To see links please register here]

explains more about LFENCE.

See also

for a way to use RDTSC**P** that keeps CPUID (or LFENCE) out of the timed region:

LFENCE ; (or CPUID) Don't start the timed region until everything above has executed
RDTSC ; EDX:EAX = timestamp
mov ebx, eax ; low 32 bits of start time

code under test

RDTSCP ; built-in one way barrier stops it from running early
LFENCE ; (or CPUID) still use a barrier after to prevent anything weird
sub eax, ebx ; low 32 bits of end-start

See also

[To see links please register here]

for more about RDTSC caveats, like constant_tsc and nonstop_tsc.

As a bonus, RDTSCP gives you a core ID. You could use RDTSCP for the start time as well, if you want to check for core migration. But if your CPU has the `constant_tsc` features, all cores in the package should have their TSCs synced so you typically don't need this on modern x86.

You could get the core ID from CPUID instead, as @Tony's answer points out.

It's to prevent out-of-order execution. From a link that has now disappeared from the web (but which was fortuitously copied here before it disappeared), this text is from an article entitled "Performance monitoring" by one John Eckerdal:

>The Pentium Pro and Pentium II processors support out-of-order execution instructions may be executed in another order as you programmed them. This can be a source of errors if not taken care of.
>
>To prevent this the programmer must serialize the the instruction queue. This can be done by inserting a serializing instruction like CPUID instruction before the RDTSC instruction.

stentor31538

Sirperpetuate8

people874

porras792