6.7. MMIO Peripherals

The easiest way to create a MMIO peripheral is to follow the GCD TileLink MMIO example. Since Chipyard and Rocket Chip SoCs primarily use Tilelink as the on-chip interconnect protocol, this section will primarily focus on designing Tilelink-based peripherals. However, see generators/chipyard/src/main/scala/example/GCD.scala for how an example AXI4 based peripheral is defined and connected to the Tilelink graph through converters.

To create a MMIO-mapped peripheral, you will need to specify a LazyModule wrapper containing the TileLink port as a Diplomacy Node, as well as an internal LazyModuleImp class that defines the MMIO’s implementation and any non-TileLink I/O.

For this example, we will show how to connect a MMIO peripheral which computes the GCD. The full code can be found in generators/chipyard/src/main/scala/example/GCD.scala.

In this case we use a submodule GCDMMIOChiselModule to actually perform the GCD. The GCDTL and GCDAXI4 classes are the LazyModule classes which construct the TileLink or AXI4 ports, wrapping the inner GCDMMIOChiselModule. The node object is a Diplomacy node, which connects the peripheral to the Diplomacy interconnect graph.

class GCDMMIOChiselModule(val w: Int) extends Module {
  val io = IO(new GCDIO(w))
  val s_idle :: s_run :: s_done :: Nil = Enum(3)

  val state = RegInit(s_idle)
  val tmp   = Reg(UInt(w.W))
  val gcd   = Reg(UInt(w.W))

  io.input_ready := state === s_idle
  io.output_valid := state === s_done
  io.gcd := gcd

  when (state === s_idle && io.input_valid) {
    state := s_run
  } .elsewhen (state === s_run && tmp === 0.U) {
    state := s_done
  } .elsewhen (state === s_done && io.output_ready) {
    state := s_idle
  }

  when (state === s_idle && io.input_valid) {
    gcd := io.x
    tmp := io.y
  } .elsewhen (state === s_run) {
    when (gcd > tmp) {
      gcd := gcd - tmp
    } .otherwise {
      tmp := tmp - gcd
    }
  }

  io.busy := state =/= s_idle
}
class GCDTL(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
  val device = new SimpleDevice("gcd", Seq("ucbbar,gcd")) 
  val node = TLRegisterNode(Seq(AddressSet(params.address, 4096-1)), device, "reg/control", beatBytes=beatBytes)

  override lazy val module = new GCDImpl
  class GCDImpl extends Impl with HasGCDTopIO {
    val io = IO(new GCDTopIO)
    withClockAndReset(clock, reset) {
      // How many clock cycles in a PWM cycle?
      val x = Reg(UInt(params.width.W))
      val y = Wire(new DecoupledIO(UInt(params.width.W)))
      val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
      val status = Wire(UInt(2.W))

      val impl_io = if (params.useBlackBox) {
        val impl = Module(new GCDMMIOBlackBox(params.width))
        impl.io
      } else {
        val impl = Module(new GCDMMIOChiselModule(params.width))
        impl.io
      }

      impl_io.clock := clock
      impl_io.reset := reset.asBool

      impl_io.x := x
      impl_io.y := y.bits
      impl_io.input_valid := y.valid
      y.ready := impl_io.input_ready

      gcd.bits := impl_io.gcd
      gcd.valid := impl_io.output_valid
      impl_io.output_ready := gcd.ready

      status := Cat(impl_io.input_ready, impl_io.output_valid)
      io.gcd_busy := impl_io.busy

// DOC include start: GCD instance regmap
      node.regmap(
        0x00 -> Seq(
          RegField.r(2, status)), // a read-only register capturing current status
        0x04 -> Seq(
          RegField.w(params.width, x)), // a plain, write-only register
        0x08 -> Seq(
          RegField.w(params.width, y)), // write-only, y.valid is set on write
        0x0C -> Seq(
          RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
// DOC include end: GCD instance regmap
    }
  }
}

class GCDAXI4(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
  val node = AXI4RegisterNode(AddressSet(params.address, 4096-1), beatBytes=beatBytes)
  override lazy val module = new GCDImpl
  class GCDImpl extends Impl with HasGCDTopIO {
    val io = IO(new GCDTopIO)
    withClockAndReset(clock, reset) {
      // How many clock cycles in a PWM cycle?
      val x = Reg(UInt(params.width.W))
      val y = Wire(new DecoupledIO(UInt(params.width.W)))
      val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
      val status = Wire(UInt(2.W))

      val impl_io = if (params.useBlackBox) {
        val impl = Module(new GCDMMIOBlackBox(params.width))
        impl.io
      } else {
        val impl = Module(new GCDMMIOChiselModule(params.width))
        impl.io
      }

      impl_io.clock := clock
      impl_io.reset := reset.asBool

      impl_io.x := x
      impl_io.y := y.bits
      impl_io.input_valid := y.valid
      y.ready := impl_io.input_ready

      gcd.bits := impl_io.gcd
      gcd.valid := impl_io.output_valid
      impl_io.output_ready := gcd.ready

      status := Cat(impl_io.input_ready, impl_io.output_valid)
      io.gcd_busy := impl_io.busy

      node.regmap(
        0x00 -> Seq(
          RegField.r(2, status)), // a read-only register capturing current status
        0x04 -> Seq(
          RegField.w(params.width, x)), // a plain, write-only register
        0x08 -> Seq(
          RegField.w(params.width, y)), // write-only, y.valid is set on write
        0x0C -> Seq(
          RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
    }
  }
}

6.7.1. Advanced Features of RegField Entries

RegField exposes polymorphic r and w methods that allow read- and write-only memory-mapped registers to be interfaced to hardware in multiple ways.

  • RegField.r(2, status) is used to create a 2-bit, read-only register that captures the current value of the status signal when read.

  • RegField.r(params.width, gcd) “connects” the decoupled handshaking interface gcd to a read-only memory-mapped register. When this register is read via MMIO, the ready signal is asserted. This is in turn connected to output_ready on the GCD module through the glue logic.

  • RegField.w(params.width, x) exposes a plain register via MMIO, but makes it write-only.

  • RegField.w(params.width, y) associates the decoupled interface signal y with a write-only memory-mapped register, causing y.valid to be asserted when the register is written.

Since the ready/valid signals of y are connected to the input_ready and input_valid signals of the GCD module, respectively, this register map and glue logic has the effect of triggering the GCD algorithm when y is written. Therefore, the algorithm is set up by first writing x and then performing a triggering write to y. Polling can be used for status checks.

      node.regmap(
        0x00 -> Seq(
          RegField.r(2, status)), // a read-only register capturing current status
        0x04 -> Seq(
          RegField.w(params.width, x)), // a plain, write-only register
        0x08 -> Seq(
          RegField.w(params.width, y)), // write-only, y.valid is set on write
        0x0C -> Seq(
          RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read

Note

In older versions of Chipyard and Rocket-Chip, a TLRegisterRouter abstrat class was used to abstract away the construction of the TLRegisterNode and LazyModule classes necessary to construct MMIO peripherals. This was removed, in favor of requiring users to explicitly construct the necessary classes.

This matches more closely how standard Modules and LazyModules are constructed, making it clearer how a MMIO peripheral fits into the Module and LazyModule design patterns.

6.7.3. Top-level Traits

After creating the module, we need to hook it up to our SoC. The LazyModule abstract class containst the TileLink node representing the peripheral’s I/O. For a simple memory-mapped peripheral, connecting the peripheral’s TileLink node must be connected to the relevant bu.

trait CanHavePeripheryGCD { this: BaseSubsystem =>
  private val portName = "gcd"

  private val pbus = locateTLBusWrapper(PBUS)

  // Only build if we are using the TL (nonAXI4) version
  val gcd_busy = p(GCDKey) match {
    case Some(params) => {
      val gcd = if (params.useAXI4) {
        val gcd = LazyModule(new GCDAXI4(params, pbus.beatBytes)(p))
        gcd.clockNode := pbus.fixedClockNode
        pbus.coupleTo(portName) {
          gcd.node :=
          AXI4Buffer () :=
          TLToAXI4 () :=
          // toVariableWidthSlave doesn't use holdFirstDeny, which TLToAXI4() needsx
          TLFragmenter(pbus.beatBytes, pbus.blockBytes, holdFirstDeny = true) := _
        }
        gcd
      } else {
        val gcd = LazyModule(new GCDTL(params, pbus.beatBytes)(p))
        gcd.clockNode := pbus.fixedClockNode
        pbus.coupleTo(portName) { gcd.node := TLFragmenter(pbus.beatBytes, pbus.blockBytes) := _ }
        gcd
      }
      val gcd_busy = InModuleBody {
        val busy = IO(Output(Bool())).suggestName("gcd_busy")
        busy := gcd.module.io.gcd_busy
        busy
      }
      Some(gcd_busy)
    }
    case None => None
  }
}

Also observe how we have to place additional AXI4 buffers and converters for the AXI4 version of this peripheral.

Peripherals which expose I/O can use InModuleBody to punch their I/O to the DigitalTop module. In this example, the GCD module’s gcd_busy signal is exposed as a I/O of DigitalTop.

6.7.4. Constructing the DigitalTop and Config

Now we want to mix our traits into the system as a whole. This code is from generators/chipyard/src/main/scala/DigitalTop.scala.

class DigitalTop(implicit p: Parameters) extends ChipyardSystem
  with testchipip.tsi.CanHavePeripheryUARTTSI // Enables optional UART-based TSI transport
  with testchipip.boot.CanHavePeripheryCustomBootPin // Enables optional custom boot pin
  with testchipip.boot.CanHavePeripheryBootAddrReg // Use programmable boot address register
  with testchipip.cosim.CanHaveTraceIO // Enables optionally adding trace IO
  with testchipip.soc.CanHaveBankedScratchpad // Enables optionally adding a banked scratchpad
  with testchipip.iceblk.CanHavePeripheryBlockDevice // Enables optionally adding the block device
  with testchipip.serdes.CanHavePeripheryTLSerial // Enables optionally adding the tl-serial interface
  with testchipip.serdes.old.CanHavePeripheryTLSerial // Enables optionally adding the DEPRECATED tl-serial interface
  with testchipip.soc.CanHavePeripheryChipIdPin // Enables optional pin to set chip id for multi-chip configs
  with sifive.blocks.devices.i2c.HasPeripheryI2C // Enables optionally adding the sifive I2C
  with sifive.blocks.devices.timer.HasPeripheryTimer // Enables optionally adding the timer device
  with sifive.blocks.devices.pwm.HasPeripheryPWM // Enables optionally adding the sifive PWM
  with sifive.blocks.devices.uart.HasPeripheryUART // Enables optionally adding the sifive UART
  with sifive.blocks.devices.gpio.HasPeripheryGPIO // Enables optionally adding the sifive GPIOs
  with sifive.blocks.devices.spi.HasPeripherySPIFlash // Enables optionally adding the sifive SPI flash controller
  with sifive.blocks.devices.spi.HasPeripherySPI // Enables optionally adding the sifive SPI port
  with icenet.CanHavePeripheryIceNIC // Enables optionally adding the IceNIC for FireSim
  with chipyard.example.CanHavePeripheryInitZero // Enables optionally adding the initzero example widget
  with chipyard.example.CanHavePeripheryGCD // Enables optionally adding the GCD example widget
  with chipyard.example.CanHavePeripheryStreamingFIR // Enables optionally adding the DSPTools FIR example widget
  with chipyard.example.CanHavePeripheryStreamingPassthrough // Enables optionally adding the DSPTools streaming-passthrough example widget
  with nvidia.blocks.dla.CanHavePeripheryNVDLA // Enables optionally having an NVDLA
  with chipyard.clocking.HasChipyardPRCI // Use Chipyard reset/clock distribution
  with chipyard.clocking.CanHaveClockTap // Enables optionally adding a clock tap output port
  with fftgenerator.CanHavePeripheryFFT // Enables optionally having an MMIO-based FFT block
  with constellation.soc.CanHaveGlobalNoC // Support instantiating a global NoC interconnect
{
  override lazy val module = new DigitalTopModule(this)
}

class DigitalTopModule(l: DigitalTop) extends ChipyardSystemModule(l)
  with freechips.rocketchip.util.DontTouch

Just as we need separate traits for LazyModule and module implementation, we need two classes to build the system. The DigitalTop class contains the set of traits which parameterize and define the DigitalTop. Typically these traits will optionally add IOs or peripherals to the DigitalTop. The DigitalTop class includes the pre-elaboration code and also a lazy val to produce the module implementation (hence LazyModule). The DigitalTopModule class is the actual RTL that gets synthesized.

And finally, we create a configuration class in generators/chipyard/src/main/scala/config/MMIOAcceleratorConfigs.scala that uses the WithGCD config fragment defined earlier.

class WithGCD(useAXI4: Boolean = false, useBlackBox: Boolean = false) extends Config((site, here, up) => {
  case GCDKey => Some(GCDParams(useAXI4 = useAXI4, useBlackBox = useBlackBox))
})
class GCDTLRocketConfig extends Config(
  new chipyard.example.WithGCD(useAXI4=false, useBlackBox=false) ++          // Use GCD Chisel, connect Tilelink
  new freechips.rocketchip.subsystem.WithNBigCores(1) ++
  new chipyard.config.AbstractConfig)

6.7.5. Testing

Now we can test that the GCD is working. The test program is in tests/gcd.c.

#include "mmio.h"

#define GCD_STATUS 0x4000
#define GCD_X 0x4004
#define GCD_Y 0x4008
#define GCD_GCD 0x400C

unsigned int gcd_ref(unsigned int x, unsigned int y) {
  while (y != 0) {
    if (x > y)
      x = x - y;
    else
      y = y - x;
  }
  return x;
}

// DOC include start: GCD test
int main(void)
{
  uint32_t result, ref, x = 20, y = 15;

  // wait for peripheral to be ready
  while ((reg_read8(GCD_STATUS) & 0x2) == 0) ;

  reg_write32(GCD_X, x);
  reg_write32(GCD_Y, y);


  // wait for peripheral to complete
  while ((reg_read8(GCD_STATUS) & 0x1) == 0) ;

  result = reg_read32(GCD_GCD);
  ref = gcd_ref(x, y);

  if (result != ref) {
    printf("Hardware result %d does not match reference value %d\n", result, ref);
    return 1;
  }
  printf("Hardware result %d is correct for GCD\n", result);
  return 0;
}
// DOC include end: GCD test

This just writes out to the registers we defined earlier. The base of the module’s MMIO region is at 0x2000 by default. This will be printed out in the address map portion when you generate the Verilog code. You can also see how this changes the emitted .json addressmap files in generated-src.

Compiling this program with make produces a gcd.riscv executable.

Now with all of that done, we can go ahead and run our simulation.

cd sims/verilator
make CONFIG=GCDTLRocketConfig BINARY=../../tests/gcd.riscv run-binary