Cache Mapping - The Basics
Proficiency in CPU cache mapping is crucial for the parallel execution of code. Tag, Line/Block & Word/Offset are critical in CPU cache arrangement
Introduction
Cache mapping is the process of determining how cache memory is used to store data from the main memory. It determines which memory addresses in the main memory are mapped to which locations in the cache memory.
In this article first I will discuss cache mapping and the type of cache mapping. Next, my focus is exclusively on the three components of the address structure of cache mapping in a typical CPU. Subsequently, I concentrate on how to understand these structures with the help of diagrammatic visualization in all the 3 types of cache mapping. Lastly, I took an example of a typical Intel i7 processor cache mapping structure.
What is Cache Mapping?
Cache mapping is the technique used to determine which cache block (or set of blocks) stores a particular memory address.
Types of cache mapping
There are three main types of cache mapping techniques:
Direct-mapped cache
In this technique, each memory block can only be mapped to one specific block in the cache. Thus, each memory address in the main memory is mapped to a unique location in the cache memory. This means there is a one-to-one correspondence between memory and cache blocks. The location in the cache is determined by using the lower-order bits of the memory address. Direct-mapped caches are simple and easy to implement, but they are prone to high levels of conflict misses (when multiple memory blocks map to the same cache block).
Set-associative cache
In this technique, each memory block can be mapped to one of several blocks in the cache. Thus, the cache memory is divided into multiple sets, each containing multiple cache lines. This means there is a many-to-one correspondence between memory and cache blocks. In other words, each memory address is mapped to a specific set in the cache memory, and the cache controller searches only that set for the requested data. Set-associative caches reduce the number of conflict misses by allowing more flexibility in the mapping, but they are more complex to implement than direct-mapped caches.
Fully associative cache
In this technique, any memory block can be mapped to any block in the cache. Thus, each memory address in the main memory can be stored in any location in the cache memory. This means there is a many-to-many correspondence between memory and cache blocks. Here, the cache controller searches the entire cache memory for the requested data. Fully-associative caches provide the most flexibility in mapping, but they require complex hardware for efficient searching of the entire cache.
The choice of cache mapping technique depends on various factors, such as the cache size, memory access patterns, and the desired trade-off between hit rate and hardware complexity.
Modern CPUs typically use a combination of direct-mapped and set-associative cache mapping to balance performance and complexity. The exact details of the cache mapping used by a specific CPU depend on the architecture and design choices made by the processor manufacturer.
The address structure of the cache in the CPU
The address structure of the cache in a CPU typically consists of three components: tag, index, and offset. They are components used to locate data in the cache. Accordingly, these components are used to map the main memory addresses to the corresponding cache locations.
Components of address structure of a cache
The three components of the address structure of the cache in a CPU are as follows -
Tag
The tag is a field in the cache entry that stores the memory address of the data stored in that cache entry. The tag, in other words, is a field in the address structure that is used to identify the memory block stored in the cache line. It is thus used to check if the requested data is present in the cache or not. Tag is typically the upper bits of the address that are not used for the index or offset fields.
Index
The index identifies the set in the cache where the memory block resides. In a direct-mapped cache, the index is the line/block number. In a set-associative or fully-associative cache, the index represents the set number where the block can reside. Therefore, the line/block is a specific location within a set in a cache. Thus, the line/block is a group of cache entries that hold a subset of the memory addresses that can be mapped to the cache. It is used to identify the cache entry in which the requested data is stored. In a nutshell, the index is a field in the address structure that is used to identify the set or line within the cache. It is typically the middle bits of the address that are used for this purpose.
Offset
The word/offset is the byte location within a cache entry that holds the data. Thus, the offset is a field in the address structure that is used to identify the location of the data within the cache line. It is used to extract the data from the cache entry once it is identified using the tag and line/block. It is typically the lower bits of the address that are used for this purpose.
Together, these components enable the CPU to quickly locate and retrieve data from the cache when it is needed. When the CPU needs to access a memory location, it first checks the cache for the data. If the data is present in the cache, it is retrieved quickly. If the data is not present in the cache, the CPU must retrieve it from the main memory, which takes much longer. The cache organization and mapping strategy can have a significant impact on the performance of the CPU.
Understanding the address structure in cache mapping
Here's a diagram showing the tag, line/block, and word/offset arrangement in a direct-mapped cache:
____________________________________________
| Tag (upper bits of memory address) |
|-------------------------------------------|
| Valid bit | Data (cache block) |
|-------------------------------------------|
| Word/Byte Offset | Line/Block Index |
---------------------------------------------
In this diagram, the tag field is at the top, followed by a line or block containing a valid bit, and the data field that holds the cache block itself.
The lower part of the diagram shows the remaining bits of the memory address, with the word/byte offset on the left and the line/block index on the right. The word/byte offset specifies the byte or word within the cache block that is required, while the line/block index determines which line or block in the set should contain the required block of the main memory.
Let us now understand the address structure in all three types of cache mapping, viz., direct cache, set-associative cache, and fully associative cache.
Address structure in direct cache mapping
The address is divided into three parts: tag bits, index/line bits, and offset/word bits.
+-------------------+----------------------+------------------------+
| Tag Bits | Index/Line Bits | Offset/Word Bits |
+-------------------+----------------------+------------------------+
The number of bits in each part depends on the size and organization of the cache. The tag bits identify which block of memory the data belongs to, the index/line bits identify which set (in set-associative or fully-associative mapping) or which block (in direct mapping) the data belongs to, and the offset/word bits identify the position of the data within the block.
Address structure in set-associative cache mapping
In set associative cache mapping, the address is divided into three parts: the tag, the set index, and the word/byte offset.
|--------------------------------------------------------------|
| Tag | Set Index | Word/Byte Offset |
|--------------------------------------------------------------|
| Tag bits | Set Index bits | Word/Byte Offset bits |
|--------------------------------------------------------------|
The tag is used to identify the block of data stored in the cache. The set index identifies the set in which the block is stored, and the word/byte offset identifies the location of the requested data within the block. The number of bits allocated to each part depends on the size and organization of the cache.
Address structure in direct fully associative cache mapping
In fully associative cache mapping, there is only one set, so the index bits are not needed.
+-----------------------+-----------------------+-----------------------+-----------------------+
| Tag | Data / Tag | Data / Tag | Data / Tag |
+-----------------------+-----------------------+-----------------------+-----------------------+
| Offset | Offset | Offset | Offset |
+-----------------------+-----------------------+-----------------------+-----------------------+
The entire cache is searched for a match with the tag bits of the address. The offset bits indicate the byte within the cache block that is being accessed. The data field holds either the actual data that is being cached or the tag information that is associated with the data.
Cache mapping in Intel i7 processors
In general cache organization and mapping vary widely between different processors and even different models within the same family of processors and intel processors are no exception. However, a typical i7 processor has a complex cache hierarchy with a three-level consisting of a Level 1 (L1) cache, a Level 2 (L2) cache, and a Level 3 (L3) cache.
L1 Cache
4-way set-associative
64-byte cache line size
Separate instruction and data caches
32 KB total size (16 KB for instructions and 16 KB for data)
8 cycles for L1 cache access
The L1 cache is split into two parts: one for data and one for instructions. The data cache has a capacity of 32KB and a 64-byte cache line size, while the instruction cache has a capacity of 32KB and a 32-byte cache line size. Both caches are 8-way set associative.
L2 Cache
8-way set-associative
64-byte cache line size
Shared cache for instructions and data
256 KB total size
12-20 cycles for L2 cache access
The L2 cache has a capacity of 256KB to 1MB and a 64-byte cache line size, depending on the specific model of the i7 processor. It is also an 8-way set associative.
L3 Cache
12-way set-associative
64-byte cache line size
Shared cache for instructions and data
8 MB total size (may vary depending on the specific i7 model)
30-45 cycles for L3 cache access
The L3 cache has a capacity of 8MB to 20MB and a 64-byte cache line size, again depending on the specific model. It is typically shared among all cores on the processor and is a 12-way set associative.
Overall, the cache organization and architecture of the Intel i7 processor are designed to provide fast access to frequently used data and instructions, improving the overall performance of the processor.
Furthermore, the tag, line/block, and word/offset arrangement in cache mapping for the Intel i7 processor is complex and depends on the specific cache organization and architecture details of each level of cache. However, in general, cache mapping in the Intel i7 processor uses a combination of set-associative and/or fully-associative mapping techniques to map memory addresses to specific cache locations.