6.5. MMIO Peripherals

The easiest way to create a MMIO peripheral is to use the TLRegisterRouter or AXI4RegisterRouter widgets, which abstracts away the details of handling the interconnect protocols and provides a convenient interface for specifying memory-mapped registers. Since Chipyard and Rocket Chip SoCs primarily use Tilelink as the on-chip interconnect protocol, this section will primarily focus on designing Tilelink-based peripherals. However, see generators/chipyard/src/main/scala/GCD.scala for how an example AXI4 based peripheral is defined and connected to the Tilelink graph through converters.

To create a RegisterRouter-based peripheral, you will need to specify a parameter case class for the configuration settings, a bundle trait with the extra top-level ports, and a module implementation containing the actual RTL.

For this example, we will show how to connect a MMIO peripheral which computes the GCD. The full code can be found in generators/chipyard/src/main/scala/GCD.scala.

In this case we use a submodule GCDMMIOChiselModule to actually perform the GCD. The GCDModule class only creates the registers and hooks them up using regmap.

6.5.1. Advanced Features of RegField Entries

RegField exposes polymorphic r and w methods that allow read- and write-only memory-mapped registers to be interfaced to hardware in multiple ways.

  • RegField.r(2, status) is used to create a 2-bit, read-only register that captures the current value of the status signal when read.
  • RegField.r(params.width, gcd) “connects” the decoupled handshaking interface gcd to a read-only memory-mapped register. When this register is read via MMIO, the ready signal is asserted. This is in turn connected to output_ready on the GCD module through the glue logic.
  • RegField.w(params.width, x) exposes a plain register via MMIO, but makes it write-only.
  • RegField.w(params.width, y) associates the decoupled interface signal y with a write-only memory-mapped register, causing y.valid to be asserted when the register is written.

Since the ready/valid signals of y are connected to the input_ready and input_valid signals of the GCD module, respectively, this register map and glue logic has the effect of triggering the GCD algorithm when y is written. Therefore, the algorithm is set up by first writing x and then performing a triggering write to y. Polling can be used for status checks.

6.5.3. Top-level Traits

After creating the module, we need to hook it up to our SoC. Rocket Chip accomplishes this using the cake pattern. This basically involves placing code inside traits. In the Rocket Chip cake, there are two kinds of traits: a LazyModule trait and a module implementation trait.

The LazyModule trait runs setup code that must execute before all the hardware gets elaborated. For a simple memory-mapped peripheral, this just involves connecting the peripheral’s TileLink node to the MMIO crossbar.

Note that the GCDTL class we created from the register router is itself a LazyModule. Register routers have a TileLink node simply named “node”, which we can hook up to the Rocket Chip bus. This will automatically add address map and device tree entries for the peripheral. Also observe how we have to place additional AXI4 buffers and converters for the AXI4 version of this peripheral.

For peripherals which instantiate a concrete module, or which need to be connected to concrete IOs or wires, a matching concrete trait is necessary. We will make our GCD example output a gcd_busy signal as a top-level port to demonstrate. In the concrete module implementation trait, we instantiate the top level IO (a concrete object) and wire it to the IO of our lazy module.

6.5.4. Constructing the Top and Config

Now we want to mix our traits into the system as a whole. This code is from generators/chipyard/src/main/scala/Top.scala.

class Top(implicit p: Parameters) extends System
  with testchipip.CanHaveTraceIO // Enables optionally adding trace IO
  with testchipip.CanHaveBackingScratchpad // Enables optionally adding a backing scratchpad
  with testchipip.CanHavePeripheryBlockDevice // Enables optionally adding the block device
  with testchipip.CanHavePeripherySerial // Enables optionally adding the TSI serial-adapter and port
  with sifive.blocks.devices.uart.HasPeripheryUART // Enables optionally adding the sifive UART
  with sifive.blocks.devices.gpio.HasPeripheryGPIO // Enables optionally adding the sifive GPIOs
  with icenet.CanHavePeripheryIceNIC // Enables optionally adding the IceNIC for FireSim
  with chipyard.example.CanHavePeripheryInitZero // Enables optionally adding the initzero example widget
  with chipyard.example.CanHavePeripheryGCD // Enables optionally adding the GCD example widget
{
  override lazy val module = new TopModule(this)
}

class TopModule[+L <: Top](l: L) extends SystemModule(l)
  with testchipip.CanHaveTraceIOModuleImp
  with testchipip.CanHavePeripheryBlockDeviceModuleImp
  with testchipip.CanHavePeripherySerialModuleImp
  with sifive.blocks.devices.uart.HasPeripheryUARTModuleImp
  with sifive.blocks.devices.gpio.HasPeripheryGPIOModuleImp
  with icenet.CanHavePeripheryIceNICModuleImp
  with chipyard.example.CanHavePeripheryGCDModuleImp
  with freechips.rocketchip.util.DontTouch

Just as we need separate traits for LazyModule and module implementation, we need two classes to build the system. The Top class contains the set of traits which parameterize and define the Top. Typically these traits will optionally add IOs or peripherals to the Top. The Top class includes the pre-elaboration code and also a lazy val to produce the module implementation (hence LazyModule). The TopModule class is the actual RTL that gets synthesized.

And finally, we create a configuration class in generators/chipyard/src/main/scala/Configs.scala that uses the WithGCD config fragment defined earlier.

6.5.5. Testing

Now we can test that the GCD is working. The test program is in tests/gcd.c.

#include "mmio.h"

#define GCD_STATUS 0x2000
#define GCD_X 0x2004
#define GCD_Y 0x2008
#define GCD_GCD 0x200C

unsigned int gcd_ref(unsigned int x, unsigned int y) {
  while (y != 0) {
    if (x > y)
      x = x - y;
    else
      y = y - x;
  }
  return x;
}

// DOC include start: GCD test
int main(void)
{
  uint32_t result, ref, x = 20, y = 15;

  // wait for peripheral to be ready
  while ((reg_read8(GCD_STATUS) & 0x2) == 0) ;

  reg_write32(GCD_X, x);
  reg_write32(GCD_Y, y);


  // wait for peripheral to complete
  while ((reg_read8(GCD_STATUS) & 0x1) == 0) ;

  result = reg_read32(GCD_GCD);
  ref = gcd_ref(x, y);

  if (result != ref) {
    printf("Hardware result %d does not match reference value %d\n", result, ref);
    return 1;
  }
  return 0;
}
// DOC include end: GCD test

This just writes out to the registers we defined earlier. The base of the module’s MMIO region is at 0x2000 by default. This will be printed out in the address map portion when you generate the Verilog code. You can also see how this changes the emitted .json addressmap files in generated-src.

Compiling this program with make produces a gcd.riscv executable.

Now with all of that done, we can go ahead and run our simulation.

cd sims/verilator
make CONFIG=GCDTLRocketConfig BINARY=../../tests/gcd.riscv run-binary