Short story: I did some profiling (I do a lot of profiling) and QueryPerformanceCounter showed up a lot more than..I felt it should. So, some reading up and testing later, I am now using rdtsc/rdtscp.
Longer story: A long time ago, before computers had more than one core, and if the CPU supported it, the fastest way to time things where by using the rdtsc instruction. The granularity you got was a lot higher than any other kind of timing instruction available at the time. Partly because it wasn’t really a timing instruction; it returned instructions executed since boot-up.
The problems:
-
If a CPU changes speed, the rate of instructions per second was not constant; hard to use for timing.
-
Multi-core. One cores rdtsc instruction might give a whole different value than another. Our thread might switch core, a lot. The cure for... ..read more.