However, memory latency is greatly increased as a trade-off.
Context switching threads to main memory is much expensive operation when compared to memory latency.
As this gap widened, big amounts of die area were dedicated to hiding memory latencies.
This further integration reduces memory latency even more.
So as machines get wider, memory latency becomes more and more problematic.
When it comes to memory latency, it's the loads that are the big problem, not the stores.
Yes it will hide memory latency since when one thread stalls the other gets 100% of the execution core to itself.
In doing so it effectively hides all memory latency from the processor's perspective.
Given these trends, it was expected that memory latency would become an overwhelming bottleneck in computer performance.
However, this does not happen in computer hardware because of memory latency and other aspects of the architecture.