Simple direct-mapped cache simulation on FPGA
This article is a part of a course work for first year bachelor students of Innopolis University. All work is done in a team. The purpose of this article is to show an understanding of the topic, or to help to understand it using simulation.
Principle of work but from the user side should look like:
- To write any data in memory, you need to access the RAM with data and address in which we want to write.
- To access the data, we have to adress to cache. If the cache cannot find the necessary data, then it accesses the RAM by copying data from there.
When working with Verilog, it should be understood that each individual block of the program is represented as a module. As you know, the cache is not an independent part of fast memory, and for its proper operation it needs to take data from another memory block — RAM. Therefore, in order to simulate the work of the cache at the FPGA, we have to simulate whole RAM module which includes cache as well, but the main point is cache simulation.
The implementation consists of such modules:
- ram.v — RAM memory module
- cache.v — Cache memory module
- cache_and_ram.v — module that operates with data and memory.
- testbench.v and testbench2.v — module to show that main modules work perfectly.
RAM module:
module ram();
parameter size = 4096; //size of a ram in bits
reg [31:0] ram [0:size-1]; //data matrix for ram
endmodule
Module represents memory which is used as RAM. It has 4096 32-bit addressable cells to store some data.
Cache module:
module cache();
parameter size = 64; // cache size
parameter index_size = 6; // index size
reg [31:0] cache [0:size - 1]; //registers for the data in cache
reg [11 - index_size:0] tag_array [0:size - 1]; // for all tags in cache
reg valid_array [0:size - 1]; //0 - there is no data 1 - there is data
initial
begin: initialization
integer i;
for (i = 0; i < size; i = i + 1)
begin
valid_array[i] = 6'b000000;
tag_array[i] = 6'b000000;
end
end
endmodule
So the cache contains more than just copies of the data in
memory; it also has bits to help us find data within the cache and
verify its validity.
Cache and RAM module:
module cache_and_ram(
input [31:0] address,
input [31:0] data,
input clk,
input mode, //mode equal to 1 when we write and equal to 0 when we read
output [31:0] out
);
//previous values
reg [31:0] prev_address, prev_data;
reg prev_mode;
reg [31:0] temp_out;
reg [cache.index_size - 1:0] index; // for keeping index of current address
reg [11 - cache.index_size:0] tag; // for keeping tag of ceurrent address
ram ram();
cache cache();
initial
begin
index = 0;
tag = 0;
prev_address = 0;
prev_data = 0;
prev_mode = 0;
end
always @(posedge clk)
begin
//check if the new input is updated
if (prev_address != address || prev_data != data || prev_mode != mode)
begin
prev_address = address % ram.size;
prev_data = data;
prev_mode = mode;
tag = prev_address >> cache.index_size; // tag = first bits of address except index ones (In our particular case - 6)
index = address % cache.size; // index value = last n (n = size of cache) bits of address
if (mode == 1)
begin
ram.ram[prev_address] = data;
//write new data to the relevant cache block if there is such one
if (cache.valid_array[index] == 1 && cache.tag_array[index] == tag)
cache.cache[index] = data;
end
else
begin
//write new data to the relevant cache's block, because the one we addressing to will be possibly addressed one more time soon
if (cache.valid_array[index] != 1 || cache.tag_array[index] != tag)
begin
cache.valid_array[index] = 1;
cache.tag_array[index] = tag;
cache.cache[index] = ram.ram[prev_address];
end
temp_out = cache.cache[index];
end
end
end
assign out = temp_out;
endmodule
Represents operations for work with data in memory modules. Gets input on each clock positive edge. Checks if there are new inputs — depending on the mode (1 for write/0 for read) executes relevant operations. If mode is 1(write):
•Write data to address then check whether input address exists in cache, if so — replace the data, else standstill.
If mode is 0(read):
•Check whether input address exists in cache, If so — return the data, else get the data from ram. Refresh the address in the cache with new data.
Testbenches:
module testbench;
reg [31:0] address, data;
reg mode, clk;
wire [31:0] out;
cache_and_ram tb(
.address(address),
.data(data),
.mode(mode),
.clk(clk),
.out(out)
);
initial
begin
clk = 1'b1;
address = 32'b00000000000000000000000000000000; // 0
data = 32'b00000000000000000011100011000000; // 14528
mode = 1'b1;
#200
address = 32'b10100111111001011111101111011100; // 2816867292 % size = 3036
data = 32'b00000000000010000000100001010101; // 526421
mode = 1'b1;
#200
address = 32'b00000000000011110100011111010001; // 1001425 % size = 2001
data = 32'b00000001100000110001101100010110; // 25369366
mode = 1'b1;
#200
address = 32'b10100111111001011111101111011100; // 2816867292 % size = 3036
data = 32'b00000000000000000011100011000000; // 14528
mode = 1'b1;
#200
address = 32'b00000000000011110100011111010001; // 1001425 % size = 2001
data = 32'b00000000000000000011100011000000; // 14528
mode = 1'b1;
#200
address = 32'b00000000000011110100011111010001; // 1001425 % size = 2001
data = 32'b00000000000000000000000000000000; // 0
mode = 1'b0;
#200
address = 32'b10100111111001011111101111011100; // 2816867292 % size = 3036
data = 32'b00000000000000000000000000000000; // 0
mode = 1'b0;
#200
address = 32'b00000000000000000000000000000000; // 0
data = 32'b00000000000000000011100011000000; // 14528
mode = 1'b0;
end
initial
$monitor("address = %d data = %d mode = %d out = %d", address % 4096, data, mode, out);
always #25 clk = ~clk;
endmodule
module testbench2;
reg [31:0] address, data;
reg mode, clk;
wire [31:0] out;
cache_and_ram tb(
.address(address),
.data(data),
.mode(mode),
.clk(clk),
.out(out)
);
initial
begin
clk = 1'b1;
address = 32'b00000000000000000000000000000000; // 0
data = 32'b00000000000000000011100011000000; // 14528
mode = 1'b1;
#200
address = 32'b10100111111001011111101111011100; // 2816867292 % size = 3036
data = 32'b00000000000010000000100001010101; // 526421
mode = 1'b1;
#200
address = 32'b00000000000000000000000000000000; // 0
data = 32'b00000000000000000011100011000000; // 14528
mode = 1'b0;
#200
address = 32'b10100111111001011111101111011100; // 2816867292 % size = 3036
data = 32'b00000000000010000000100001010101; // 526421
mode = 1'b0;
#200
address = 32'b00000000000011110100011111010001; // 1001425 % size = 2001
data = 32'b00000001100000110001101100010110; // 25369366
mode = 1'b1;
#200
address = 32'b00000000000011110100011111010001; // 1001425 % size = 2001
data = 32'b00000001100000110001101100010110; // 25369366
mode = 1'b0;
#200
address = 32'b10100111111001011111101111011100; // 2816867292 % size = 3036
data = 32'b00000000000000000011100011000000; // 14528
mode = 1'b1;
#200
address = 32'b00000000000011110100011111010001; // 1001425 % size = 2001
data = 32'b00000000000000000011100011000000; // 14528
mode = 1'b1;
#200
address = 32'b00000000000011110100011111010001; // 1001425 % size = 2001
data = 32'b00000000000000000000000000000000; // 0
mode = 1'b0;
#200
address = 32'b10100111111001011111101111011100; // 2816867292 % size = 3036
data = 32'b00000000000000000000000000000000; // 0
mode = 1'b0;
end
initial
$monitor("address = %d data = %d mode = %d out = %d", address % 4096, data, mode, out);
always #25 clk = ~clk;
endmodule
To run a testbench, load all files into the ModelSim project and run a simulation of one of the testbench files.
Комментарии (13)
Mogwaika
06.12.2018 17:43Is this code for fpga or for simulation only? It isn't useful for fpga implementation.
32bit_me
06.12.2018 18:27Obviously, a code with the initial keyword can't be implemented in hardware.
nerudo
06.12.2018 20:26Конечно же это не так. Initial — это механизм задания начальных значений, который может быть использован на тех платформах, где это поддерживается.
32bit_me
06.12.2018 20:53На каких конкретно?
nerudo
06.12.2018 20:56+1Ну так все современные FPGA умеют инициализировать триггеры и большинтсво видов блочной памяти. С асиками не знаю, там отдельный разговор.
32bit_me
06.12.2018 21:03То есть, вы можете предоставить проект для любой современной ПЛИС, например Cyclone 10 (это современная ПЛИС, не так ли) в котором триггеры будут инициализироааться в блоке initial, и этот проект будет синтезироваться и работать на девборде.
Ну ок, скинете ссылку на гитхаб, посмотрим.nerudo
06.12.2018 21:37Конечно могу. Такса стандартная — 100$/час. Впрочем могу предложить облегченный вариант, достаточно будет если вы съедите свою шляпу.
module tffmy ( input clk, output [31:0] sr_out ); reg [31:0] sr; initial begin sr = 32'hdeada550; end always @ (posedge clk) begin sr <= {sr[0], sr[31:1]} ; end assign sr_out = sr; endmodule
set_global_assignment -name FAMILY "Cyclone IV E" set_global_assignment -name DEVICE EP4CE6F17C8 set_global_assignment -name TOP_LEVEL_ENTITY tffmy set_global_assignment -name LAST_QUARTUS_VERSION "16.1.0 Lite Edition"
Post-route simulation
Mogwaika
06.12.2018 22:10А за час вы много успеете понаписать? И где и за что столько платят, если не секрет?
Mogwaika
06.12.2018 20:59Я про остаток имел в виду, что он в общем случае не будет работать и про дурацкое присваивание в always блоке.
justhabrauser
07.12.2018 01:15SLY_G и marks со своими левыми переводами наверняка рвут на себе всё подряд — «а что, так можно было?»
old_bear
Зачем убирать под спойлеры текст? Его тут и так, скажем мягко, не слишком много, что девальвирует изначально неплохую идею по написанию англоязычных статей.
P.S. Код выглядит как типовой вариант «перепишем этот С в синтаксисе HDL». Я бы постеснялся такое выкладывать, откровенно говоря.