Caching is the process of storing small amounts of a large data source in order to speed up the data access time. In this case it is achieved by storing a small amount of the main memory in fast RAM to speed up the process of accessing memory.
CPU caching makes use of the sequential and predictable nature of running a program. In general programs run the instruction at memory location 0, followed by the instruction at memory location 1, then 2, 3, 4 and so on. This is known as memory locality.
The slowest part of running the program is accessing the instruction in memory and placing it in the CPU. If we can speed up this part of the fetch process then we dramatically increase the overall speed of the computer.
To do this we place a small piece of fast RAM (SRAM, access speed of 10ns) between the CPU and the relatively slow main memory (DRAM, access speed of 60ns). When the CPU requests one of the memory locations the cache hardware automatically loads the memory locations close to the location requested into cache. When the CPU requests the next memory location the chances are that it will already have been loaded into the fast cache memory and can be supplied more quickly to the CPU.
Next: CPU Caching Diagram