Pentium 4 Cache Organization

The evolution of cache organization is seen clearly in the evolution of Intel microprocessors. All of the Pentium processors include two on-chip L1 caches, one for data and one for instructions. For the Pentium 4, the L1 data cache is 8 Kbytes, using a line size of 64 bytes and a four-way set associative organization. The Pentium 4 includes an L2 cache that feeds both of the L1 caches. The L2 cache is eight-way set associative with a size of 256KB and a line size of 128 bytes.

The Pentium 4 makes use of “Hyper-Pipelined Technology” for performance. The pipeline is a 20-stage pipeline, meaning that 20 instructions can be run simultaneously. This is an improvement from the Pentium III pipeline which only allowed 10.


With the longer pipeline, less actual work is being done as more time is dedicated to filling the pipeline. Therefore, the Pentium 4 pipeline has to run at a higher frequency in order to do the same amount of work as the shorter Pentium III pipeline.

Intel use an enhanced out-of-order speculative execution engine with the Pentium 4 using advanced prediction algorithms to obtain more instructions to execute using deeper out-of-order resources, up to 126 instructions in-flight. This is three times the instructions in-flight than with a Pentium III processor. This engine improves on the number branch misspredictions of the Pentium III by about 33%. This is done by using a 4KB branch target buffer to store past branches as well as an advanced branch prediction algorithm. The level 1 cache is small to reduce latency, taking 2 cycles for an integer data cache hit and 6 cycles for a floating point. The level 1 data cache has a write-back policy, but a dynamic configuration allows this to be changed to write-through.

The level 1 cache is small to reduce latency, taking 2 cycles for an integer data cache hit and 6 cycles for a floating point. The level 1 data cache has a write-back policy, but a dynamic configuration allows this to be changed to write-through.

The level 2 cache is a unified cache and is 256KB in size. The line size is 128 bytes and it is eight-way set associative. This means that each set is made up of eight lines in cache. The increase in size and set size means that it will reduce the chances of a miss occurring when accessing this cache, increasing its effectiveness as a trade-off for its reduced speed. The increase in line size can cause higher latency for line refills, so the Pentium 4 employs a 400MHz system bus using a 100MHz clock that delivers a data rate of 3.2GB/s to make up for the latency. The system bus has a 64 byte access length, requiring 2 main memory accesses to fill a level 2 cache line. The level 2 cache employs a hardware pre-fetcher to fill up 2 cache lines to take advantage of locality of reference. The hardware monitors the history of cache misses to try to help it avoid unnecessary pre-fetches.


Figure : Pentium 4 Blocks Diagram


The Processor Core Consists Of Four Major Components :

Fetch/decode unit : Fetches program instructions in order from the L2 cache, decodes these into a series of micro-operations, and stores the results in the L1 instruction cache.

Out-of-order execution logic : schedules execution of the micro-operations subject to data dependencies and resource availability; thus, micro-operations may be scheduled for execution in a different order than they were fetched from the instruction stream.

Execution units : These units executes micro-operations, fetching the required data from the L1 data cache and temporarily storing results in registers.

Memory subsystems : These unit includes the L2 cache and the system bus, which is used to access main memory when L1 and L2 caches have a cache miss, and to access the system I/O resources.




Next Topic :

No comments:

Post a Comment