07-24-2023, 03:41 AM
I used llvm-mca to compute the **total cycles** of a pice of code, thinking they would predict its runtime. However, measuring the runtime dynamically showed almost no correlation. So: **Why does the total cycles computed by llvm-mca not accurately predict the runtime? Can I predict the runtime in some better way with llvm-mca?**
--------------------------------
Details:
I wanted to know the runtime of the following code for different types of `begin` (and `end`) iterators, for `startValue` being `0.0` or `0ULL`:
std::accumulate(begin, end, starValue)
To predict the runtimes, I used the Compiler Explorer (
using vec_t = std::vector<double>;
vec_t generateRandomVector(vec_t::size_type size)
{
std::random_device rnd_device;
std::mt19937 mersenne_engine {rnd_device()};
std::uniform_real_distribution dist{0.0,1.1};
auto gen = [&dist, &mersenne_engine](){
return dist(mersenne_engine);
};
vec_t result(size);
std::generate(result.begin(), result.end(), gen);
return result;
}
double start()
{
vec_t vec = generateRandomVector(30000000);
vec_t::iterator vectorBegin = vec.begin();
vec_t::iterator vectorEnd = vec.end();
__asm volatile("# LLVM-MCA-BEGIN stopwatchedAccumulate");
double result = std::accumulate(vectorBegin, vectorEnd, 0.0);
__asm volatile("# LLVM-MCA-END");
return result;
}
However, I see no correlation between the total cycles computer by llvm-mca and the wall clock time from running the corresponding std::accumulate. For example, in the code above, the Total Cycles are 2806, the runtime is 14ms. When I switch to the startValue `0ULL`, the Total Cycles are 2357, but the runtime is 117ms.
--------------------------------
Details:
I wanted to know the runtime of the following code for different types of `begin` (and `end`) iterators, for `startValue` being `0.0` or `0ULL`:
std::accumulate(begin, end, starValue)
To predict the runtimes, I used the Compiler Explorer (
[To see links please register here]
) with its LLVM Machine Code Analyzer (llvm-mca) plugin, since llvm-mca is "a performance analysis tool that uses information available in LLVM (e.g. scheduling models) to statically measure the performance". I used the following code:using vec_t = std::vector<double>;
vec_t generateRandomVector(vec_t::size_type size)
{
std::random_device rnd_device;
std::mt19937 mersenne_engine {rnd_device()};
std::uniform_real_distribution dist{0.0,1.1};
auto gen = [&dist, &mersenne_engine](){
return dist(mersenne_engine);
};
vec_t result(size);
std::generate(result.begin(), result.end(), gen);
return result;
}
double start()
{
vec_t vec = generateRandomVector(30000000);
vec_t::iterator vectorBegin = vec.begin();
vec_t::iterator vectorEnd = vec.end();
__asm volatile("# LLVM-MCA-BEGIN stopwatchedAccumulate");
double result = std::accumulate(vectorBegin, vectorEnd, 0.0);
__asm volatile("# LLVM-MCA-END");
return result;
}
However, I see no correlation between the total cycles computer by llvm-mca and the wall clock time from running the corresponding std::accumulate. For example, in the code above, the Total Cycles are 2806, the runtime is 14ms. When I switch to the startValue `0ULL`, the Total Cycles are 2357, but the runtime is 117ms.