Despite the rumors, the switching process took only 5 cycles, or the amount of time needed for the processor to empty its instruction pipeline.
One of the first, and most powerful, techniques to improve performance is the use of the instruction pipeline.
This is the name given to the 20-stage instruction pipeline within the Willamette core.
For another example, some early ways of implementing the instruction pipeline led to a delay slot.
The purpose of the branch predictor is to improve the flow in the instruction pipeline.
In order to allow the system to work even with the high inter-unit latencies, each processor used an 8-deep instruction pipeline.
The eight-stage instruction pipeline allowed instructions from eight different processes to proceed at once.
The basic concept was to increase performance through the use of deep instruction pipelines.
In order to solve this problem, Cray turned to the concept of an instruction pipeline.
A 14-stage instruction pipeline to achieve significantly higher clock speeds than the Core processors.