Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 340 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Avoiding cache pollution while loading a stream of numbers

#1
On x86 processors is there a way to load data from regular write back memory into registers without going through the cache hierarchy?

My use case is that I have a big look up structure (Hash map or B-Tree). I am working through a large stream of numbers (much bigger than my L3 but fits in memory). What I am trying to do is very simple:

int result = 0;
for (num : stream_numbers) {
int lookup_result = lookup_using_b_tree(num);
result += do_some_math_that_touches_registers_only(lookup_result);
}
return result;

Since I am visiting every number only once and the sum total of all numbers is more than the L3 size I imagine that they'll end up evicting some cache lines that hold parts of my B-tree. Instead I'd ideally like to not have any numbers from this stream hit cache since they have no temporal locality at all (only read once). That way I can maximize the chances that my B-tree remains in cache and look ups are faster.

I have looked at the `(v)movntdqa` instructions available in SSE 4.1 for temporal loads. That doesn't seem to be a good fit because it seems to only work for uncacheable write combining memory. This old [article][1] from Intel claims that:

> Future generations of Intel processors may contain optimizations and enhancements for streaming loads, such as increased utilization of the streaming load buffers and support for additional memory types, creating even more opportunities for software developers to increase the performance and energy-efficiency of their applications.

However I am unaware of any such processor today. I have read [elsewhere][2] that a processor can just choose to ignore this hint for write back memory and use a `movdqa` instead. So is there any way I could achieve loads from regular write back memory without going through the cache hierarchy on x86 processors even if it is only possible on Haswell and later models? I'd also appreciate any information on if this will be possible in the future?


[1]:

[To see links please register here]

[2]:

[To see links please register here]

Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through