Cache Memory : Elements of Cache Design

Cache memory is a type of memory used to hold frequently used data. Cache memory is relatively small, but very fast. We occasionally refer to the use of caches in high-performance computing (HPC). Although there are a large number of cache implementation, there are a few basic design elements that serve to classify an differentiate cache architectures. They are listed below :

1. Cache Addresses
2. Cache Size
3. Mapping Function
4. Replacement Algorithms
5. Write Policy
6. Line Size
7. Number of Caches


1. Cache Addresses :  Cache memory can be located either side of a memory management unit and use either physical or logical addresses as its tag data. In terms of performance, the location of the cache can dramatically affect system performance. 

Logical Address : When Logical addresses are used, the cache can be placed between the processor and the MMU or between the MMU and main memory. A logical cache, also known as a virtual cache, stores data using virtual addresses. The processor accesses the cache directly, without going through the MMU. This organization is shown in below Figure. 


With a logical cache, the tag information refers to the logical addresses currently in use by the executing task. If the task is switched out during a context switch, the cache tags are no longer valid and the cache, together with its often hard-won data must be flushed and cleared. The processor must now go to main memory to fetch the first instructions and wait until the second iteration before any benefit is obtained from the cache. However, cache accesses do not need to go through the MMU and do not suffer from any associated delay.

Physical Address : A physical cache stores data using main memory physical addresses. One advantage of the logical cache is that cache access speed is faster than for a physical cache, because the cache can respond before the MMU performs an address translation. The organization of Physical Address is shown in below Figure


Physical caches use physical addresses, do not need flush-ing on a context switch and therefore data is preserved within the cache. The disadvantage is that all accesses must go through the memory management unit, thus incurring delays. Particular care must also be exercised when pages are swapped to and from disk.

2. Cache Size :  Cache size is most important topic for high-performance CPUs. One point to be considered is cache size. The bigger the cache, the better is the performance, but also the more it costs. 

3. Mapping Function : As there are less number of cache lines then main memory blocks, an algorithm is required for mapping main memory blocks into cache lines. Some way is required for finding which main memory block currently occupies a cache line. The selection of the mapping function represent how the cache is organized. Three techniques are normally used : direct, associative, and set associative.

Direct Mapping : The simplest technique, known as direct mapping, maps each block of main memory into only one possible cache line. The mapping is represented as follows :

         i = j modulo m

where
        i = cache line number 
        j = main memory block number
        m = number of lines in the cache

The direct mapping method is easy and inexpensive to implement. Its main disadvantage is that there exists a fixed cache location for any given block. So, if a program wants to reference words repeatedly from two different blocks that map into same line, then the block will be repeatedly swapped in the cache. The hit ratio will also be low.

Associative Mapping : Associative mapping overcomes the disadvantage of direct mapping by permitting each main memory block to be loaded into any line of the cache. Here the cache control logic understand a memory address simply as a tag and a word field. The tag field uniquely acknowledge a block of main memory. To find out whether a block is in the cache, the cache control logic must simultaneously test every line's tag for a match.

Set-Associative Mapping : The set-associative mapping possess the strengths of both the direct and associative techniques while reducing their disadvantages. Here the cache is divided into v sets, each of which made up of k lines. It is represented by following relationship.

        m = v x k
        i = j modulo v

    where 
        i = cache set number
        j = main memory block number
        m = number of lines in the cache

 It is termed as k-way set associative mapping.

4. Replacement Algorithms : When cache is introduce in order to achieve high speed, replacement algorithm is used. Once the cache has been filled, when a new block is send into the cache, then one of the existing blocks must be replaced. For direct mapping there is only one possible line for any particular block, and no other options is possible. For the associative and Set-associative methods, a replacement algorithm is required to attain high speed, such as algorithm must be implemented in hardware. The four most common algorithms are :

Least Recently Used (LRU) : The most effective is least recently used (LRU). Replace that block in the set that has been in the cache for longest period of time with no reference to it. For two-way set associative , LRU is easily implemented. As we are assuming that more recently used memory locations are more likely to be referenced. LRU should provides us the best hit ratio.

First In First Out (FIFO) : The FIFO will replace that block in the set that has been in the cache for longest period of time. FIFO is easily implemented as a round-robin technique.

Least Frequently Used (LFU) : The LFU will replace that block in the set that has been in the cache and experienced the minimum references. LFU could be implemented by attaching a counter with each line.

5. Write Policy : Before a block that is present in the cache can be replaced, it is compulsory to consider weather block has been manipulated in the cache but not in main memory. If block has not been manipulated then the old block in the cache may be overwritten. If block has manipulated then that at least one write operation has been performed on a word in that line of the cache. The main memory must be updated by writing the line of cache out to the block of memory before bringing in the new block. Different types of write policies, with performance and economic trade-offs are possible :

Write through : In this technique, all write operations are performed to main memory as well as to the cache, confirming that main memory is always valid. Any other processor-cache module can keep a watch over traffic to main memory to ensure consistency within its own cache. The main disadvantage of this method is that it produces huge memory traffic and may create a bottleneck.

Write back : An alternative technique, known as write back, minimizes memory writes. With write back, updates are made only in the cache. When a update occurs, an UPDATE bit attached with the line is set. Only when UPDATE bit is set, a block is replaced and it is written back to main memory. The main disadvantage with write back is that portions of main memory are invalid and hance when accesses by I/O modules can be permissible only with the help of the cache. This makes for complex circuitry and a potential bottleneck.

6. Line Size : Another design element is the line size. When a block of data is retrieved and placed in the cache, not only the desired word but also some number of adjacent words is retrieved. Basically, as the block size increases, more useful data are brought into the cache. The hit ratio will begin to decrease, however, as the block becomes even bigger and the probability of using the newly fetched information becomes less than the probability of reusing the information that has to be replaced. 

7. Number of Caches : When caches were firstly designed, the system had a single cache. Now a days, the use of multiple caches has become the normal process. By the use of IC, it has become possible to have a on-chip cache i.e, cache on the same chip as the processor. As compared with a cache reachable through an external bus, the on-chip cache minimizes the processor's external bus activity and hance speeds up execution times and increases overall system performance. During this period the bus is free  to support other transfers.

MULTILEVEL CACHES: Most contemporary designs include both on-chip and external caches. The simplest such organization is known as a two-level cache, with the internal cache designated as level 1 (L1) and the external cache designated as level 2 (L2). There can also be 3 or more levels of cache. This helps in reducing main memory accesses.

UNIFIED VERSUS SPLIT CACHES: Earlier on-chip cache designs consisted of a single cache used to store references to both data and instructions. This is the unified approach. More recently, it has become common to split the cache into two: one dedicated to instructions and one dedicated to data. These two caches both exist at the same level. This is the split cache. Using a unified cache or a split cache is another design issue.





Next Topic :

No comments:

Post a Comment