Design of a Simple Cache Controller in VHDL

by AmCoder in Circuits > Electronics

9278 Views, 8 Favorites, 0 Comments

Design of a Simple Cache Controller in VHDL

Cache+and+Main+Memory+–+Single+Cache.jpg

This is to inform that this blog is now archived and I have started a new website/blog of my own: Chipmunk Logic. I hope you guys follow/subscribe me for free content and knowledge and continue supporting me. Hereafter, I will publish all my future technical blogs there :)


I am writing this instructable, because I found it a little difficult to get some reference VHDL code to learn and start designing a cache controller. So I designed a cache controller myself from scratch, and tested it successfully on FPGA. I have presented a simple direct mapped cache controller here, as well as modeled an entire Processor-Memory System to test the Cache Controller. I hope you guys find this instructable useful as a reference to design your own cache controllers.

Specifications

Capture.jpg

These are the main specifications of the Cache Controller we are going to design:

  • Direct Mapped. (go to this link if looking for Associative Mapped Cache Controller)
  • Single-Banked, Blocking Cache.
  • Write-Through Policy on Write hits.
  • No-Write allocate or Write Around Policy on Write misses.
  • No Write Buffer or other optimizations.
  • Tag Array is Incorporated.

Besides that, we will design a Cache Memory and a Main Memory System as well.

The default (configurable) specifications of the Cache Memory:

  • 256 Bytes Single-Banked Cache.
  • 16 Cache Lines, each Cache Line (Block) = 16 Bytes.

The specifications of the Main Memory:

  • Synchronous Read/Write Memory.
  • Multi-banked Interleaved Memory - four memory banks.
  • Each bank size = 1 kB each. Hence, total size = 4 kB.
  • Word (4 Bytes) addressable memory with 10-bit Address Bus.
  • Higher Bandwidth for Read. Read Data Width = 16 Bytes in one clock cycle.
  • Write Data Width = 4 Bytes.

NOTE: check my newer instructable if you are looking for 4-way associative cache controller design

RTL View of the Entire System

rtl.jpg

Complete RTL representation of the Top Module is shown in the Figure (excluding the processor). Default specs for the buses are:

  • All Data Buses are 32-bit Buses.
  • Address Bus = 32-bit Bus (But only 10 bits are addressable here by the Memory).
  • Data Block = 128 bits (Wide Bandwidth Bus for Read).
  • All components are driven by the same clock.

Test Environment

The Top Module was tested using a Test Bench, that simply models a non-pipelined Processor (Because designing an entire processor is not at all easy !!). The Test Bench generates Read/Write Data requests to the Memory frequently. This mocks typical "Load" and "Store" instructions, common in all programs executed by a processor. The test results successfully verified the functionality of the Cache Controller. Following are the test stats observed:

  • All Read/Write Miss and Hit signals were generated correctly.
  • All Read/Write Data operations were successful.
  • No Data incoherence/inconsistency problems detected.
  • The Design was successfully timing verified for a Maxm. Clock Frequency of operation = 110 MHz in Xilinx Virtex-4 ML-403 Board (whole system), 195 MHz for Cache Controller alone.
  • Block RAMs were inferred for the Main Memory. All other arrays were implemented on LUTs.

Attached Files

Following files are attached here with this blog:

  • .VHD files of Cache Controller, Cache Data Array, Main Memory System.
  • Test Bench.
  • Documentation on Cache Controller.

Notes:

  • Go through the documentation for full understanding of the specifications of the Cache Controller presented here.
  • Any changes in the code have dependency on other modules. So, the changes should be done judiciously. Pay attention to all the comments and headers that I have given.
  • If for any reason, Block RAMs are not inferred for the Main Memory, REDUCE the size of the memory, followed by changes in address bus widths across the files and so on. So that the same memory can be implemented either on LUTs or Distributed RAM. This will save the routing time and resources. Or, Go to the specific FPGA documentation and find the compatible code for Block RAM and edit code accordingly, and use the same address bus width specifications. Same technique for Altera FPGAs.

For queries and feedback, mail me:

iammituraj@gmail.com

Mitu Raj