6.12. Memory Hierarchy
6.12.1. The L1 Caches
Each CPU tile has an L1 instruction cache and L1 data cache. The size and
associativity of these caches can be configured. The default RocketConfig
uses 16 KiB, 4-way set-associative instruction and data caches. However,
if you use the WithNMedCores
or WithNSmallCores
configurations, you can
configure 4 KiB direct-mapped caches for L1I and L1D.
If you only want to change the size or associativity, there are config
fragments for those too. See Config Fragments for how to add these to a custom Config
.
new freechips.rocketchip.subsystem.WithL1ICacheSets(128) ++ // change rocket I$
new freechips.rocketchip.subsystem.WithL1ICacheWays(2) ++ // change rocket I$
new freechips.rocketchip.subsystem.WithL1DCacheSets(128) ++ // change rocket D$
new freechips.rocketchip.subsystem.WithL1DCacheWays(2) ++ // change rocket D$
You can also configure the L1 data cache as an data scratchpad instead. However, there are some limitations on this. If you are using a data scratchpad, you can only use a single core and you cannot give the design an external DRAM. Note that these configurations fully remove the L2 cache and mbus.
class ScratchpadOnlyRocketConfig extends Config(
new chipyard.config.WithL2TLBs(0) ++
new testchipip.soc.WithNoScratchpads ++ // remove subsystem scratchpads, confusingly named, does not remove the L1D$ scratchpads
new freechips.rocketchip.subsystem.WithNBanks(0) ++
new freechips.rocketchip.subsystem.WithNoMemPort ++ // remove offchip mem port
new freechips.rocketchip.rocket.WithScratchpadsOnly ++ // use rocket l1 DCache scratchpad as base phys mem
new freechips.rocketchip.rocket.WithNBigCores(1) ++
new chipyard.config.AbstractConfig)
This configuration fully removes the L2 cache and memory bus by setting the number of channels and number of banks to 0.
6.12.2. The System Bus
The system bus is the TileLink network that sits between the tiles and the L2 agents and MMIO peripherals. Ordinarily, it is a fully-connected crossbar, but a network-on-chip-based implementation can be generated using Constellation. See SoCs with NoC-based Interconnects for more.
6.12.3. The Inclusive Last-Level Cache
The default RocketConfig
provided in the Chipyard example project uses the Rocket-Chip
InclusiveCache generator to produce a shared L2 cache. In the default
configuration, the L2 uses a single cache bank with 512 KiB capacity and 8-way
set-associativity. However, you can change these parameters to obtain your
desired cache configuration. The main restriction is that the number of ways
and the number of banks must be powers of 2.
Refer to the CacheParameters
object defined in rocket-chip-inclusive-cache
for
customization options.
6.12.4. The Broadcast Hub
If you do not want to use the L2 cache (say, for a resource-limited embedded
design), you can create a configuration without it. Instead of using the L2
cache, you will instead use RocketChip’s TileLink broadcast hub.
To make such a configuration, you can just copy the definition of
RocketConfig
but omit the WithInclusiveCache
config fragment from the
list of included mixims.
If you want to reduce the resources used even further, you can configure
the Broadcast Hub to use a bufferless design. This config fragment is
freechips.rocketchip.subsystem.WithBufferlessBroadcastHub
.
6.12.5. The Outer Memory System
The L2 coherence agent (either L2 cache or Broadcast Hub) makes requests to an outer memory system consisting of an AXI4-compatible DRAM controller.
The default configuration uses a single memory channel, but you can configure the system to use multiple channels. As with the number of L2 banks, the number of DRAM channels is restricted to powers of two.
new freechips.rocketchip.subsystem.WithNMemoryChannels(2)
In VCS and Verilator simulation, the DRAM is simulated using the
SimAXIMem
module, which simply attaches a single-cycle SRAM to each
memory channel.
Instead of connecting to off-chip DRAM, you can instead connect a scratchpad
and remove the off-chip link. This is done by adding a fragment like
testchipip.soc.WithScratchpad
to your configuration and removing the
memory port with freechips.rocketchip.subsystem.WithNoMemPort
.
class MbusScratchpadOnlyRocketConfig extends Config(
new testchipip.soc.WithMbusScratchpad(banks=2, partitions=2) ++ // add 2 partitions of 2 banks mbus backing scratchpad
new freechips.rocketchip.subsystem.WithNoMemPort ++ // remove offchip mem port
new freechips.rocketchip.rocket.WithNBigCores(1) ++
new chipyard.config.AbstractConfig)
If you want a more realistic memory simulation, you can use FireSim, which can simulate the timing of DDR3 controllers. More documentation on FireSim memory models is available in the FireSim docs.