6.7. MMIO Peripherals
The easiest way to create a MMIO peripheral is to follow the GCD TileLink MMIO example. Since Chipyard and Rocket Chip SoCs primarily use Tilelink as the on-chip interconnect protocol, this section will primarily focus on designing Tilelink-based peripherals. However, see generators/chipyard/src/main/scala/example/GCD.scala
for how an example AXI4 based peripheral is defined and connected to the Tilelink graph through converters.
To create a MMIO-mapped peripheral, you will need to specify a LazyModule
wrapper containing the TileLink port as a Diplomacy Node, as well as an internal LazyModuleImp
class that defines the MMIO’s implementation and any non-TileLink I/O.
For this example, we will show how to connect a MMIO peripheral which computes the GCD.
The full code can be found in generators/chipyard/src/main/scala/example/GCD.scala
.
In this case we use a submodule GCDMMIOChiselModule
to actually perform the GCD. The GCDTL
and GCDAXI4
classes are the LazyModule
classes which construct the TileLink or AXI4 ports, wrapping the inner GCDMMIOChiselModule
.
The node
object is a Diplomacy node, which connects the peripheral to the Diplomacy interconnect graph.
class GCDMMIOChiselModule(val w: Int) extends Module {
val io = IO(new GCDIO(w))
val s_idle :: s_run :: s_done :: Nil = Enum(3)
val state = RegInit(s_idle)
val tmp = Reg(UInt(w.W))
val gcd = Reg(UInt(w.W))
io.input_ready := state === s_idle
io.output_valid := state === s_done
io.gcd := gcd
when (state === s_idle && io.input_valid) {
state := s_run
} .elsewhen (state === s_run && tmp === 0.U) {
state := s_done
} .elsewhen (state === s_done && io.output_ready) {
state := s_idle
}
when (state === s_idle && io.input_valid) {
gcd := io.x
tmp := io.y
} .elsewhen (state === s_run) {
when (gcd > tmp) {
gcd := gcd - tmp
} .otherwise {
tmp := tmp - gcd
}
}
io.busy := state =/= s_idle
}
class GCDTL(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val device = new SimpleDevice("gcd", Seq("ucbbar,gcd"))
val node = TLRegisterNode(Seq(AddressSet(params.address, 4096-1)), device, "reg/control", beatBytes=beatBytes)
override lazy val module = new GCDImpl
class GCDImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl_io = if (params.useBlackBox) {
val impl = Module(new GCDMMIOBlackBox(params.width))
impl.io
} else {
val impl = Module(new GCDMMIOChiselModule(params.width))
impl.io
}
impl_io.clock := clock
impl_io.reset := reset.asBool
impl_io.x := x
impl_io.y := y.bits
impl_io.input_valid := y.valid
y.ready := impl_io.input_ready
gcd.bits := impl_io.gcd
gcd.valid := impl_io.output_valid
impl_io.output_ready := gcd.ready
status := Cat(impl_io.input_ready, impl_io.output_valid)
io.gcd_busy := impl_io.busy
// DOC include start: GCD instance regmap
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
// DOC include end: GCD instance regmap
}
}
}
class GCDAXI4(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val node = AXI4RegisterNode(AddressSet(params.address, 4096-1), beatBytes=beatBytes)
override lazy val module = new GCDImpl
class GCDImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl_io = if (params.useBlackBox) {
val impl = Module(new GCDMMIOBlackBox(params.width))
impl.io
} else {
val impl = Module(new GCDMMIOChiselModule(params.width))
impl.io
}
impl_io.clock := clock
impl_io.reset := reset.asBool
impl_io.x := x
impl_io.y := y.bits
impl_io.input_valid := y.valid
y.ready := impl_io.input_ready
gcd.bits := impl_io.gcd
gcd.valid := impl_io.output_valid
impl_io.output_ready := gcd.ready
status := Cat(impl_io.input_ready, impl_io.output_valid)
io.gcd_busy := impl_io.busy
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
}
}
}
6.7.1. Advanced Features of RegField Entries
RegField
exposes polymorphic r
and w
methods
that allow read- and write-only memory-mapped registers to be
interfaced to hardware in multiple ways.
RegField.r(2, status)
is used to create a 2-bit, read-only register that captures the current value of thestatus
signal when read.RegField.r(params.width, gcd)
“connects” the decoupled handshaking interfacegcd
to a read-only memory-mapped register. When this register is read via MMIO, theready
signal is asserted. This is in turn connected tooutput_ready
on the GCD module through the glue logic.RegField.w(params.width, x)
exposes a plain register via MMIO, but makes it write-only.RegField.w(params.width, y)
associates the decoupled interface signaly
with a write-only memory-mapped register, causingy.valid
to be asserted when the register is written.
Since the ready/valid signals of y
are connected to the
input_ready
and input_valid
signals of the GCD module,
respectively, this register map and glue logic has the effect of
triggering the GCD algorithm when y
is written. Therefore, the
algorithm is set up by first writing x
and then performing a
triggering write to y
. Polling can be used for status checks.
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
Note
In older versions of Chipyard and Rocket-Chip, a TLRegisterRouter
abstrat
class was used to abstract away the construction of the TLRegisterNode
and
LazyModule
classes necessary to construct MMIO peripherals. This was removed,
in favor of requiring users to explicitly construct the necessary classes.
This matches more closely how standard Modules
and LazyModules
are
constructed, making it clearer how a MMIO peripheral fits into the Module
and LazyModule
design patterns.
6.7.2. Connecting by TileLink
The key to connecting to the TileLink Diplomatic graph is the construction of the TileLink node for this peripheral.
In this case, since the peripheral acts as a manager of some register-mapped address space, it uses the TLRegisterNode
object.
The parameters to the TLRegisterNode
object specify the size of the managed space, the base address, and the port width.
Within the register-mapped peripheral, the control registers can be mapped using the node.regmap
function, as described above.
A similar procedure is followed for both AXI4 and TileLin peripherals.
class GCDTL(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val device = new SimpleDevice("gcd", Seq("ucbbar,gcd"))
val node = TLRegisterNode(Seq(AddressSet(params.address, 4096-1)), device, "reg/control", beatBytes=beatBytes)
override lazy val module = new GCDImpl
class GCDImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl_io = if (params.useBlackBox) {
val impl = Module(new GCDMMIOBlackBox(params.width))
impl.io
} else {
val impl = Module(new GCDMMIOChiselModule(params.width))
impl.io
}
impl_io.clock := clock
impl_io.reset := reset.asBool
impl_io.x := x
impl_io.y := y.bits
impl_io.input_valid := y.valid
y.ready := impl_io.input_ready
gcd.bits := impl_io.gcd
gcd.valid := impl_io.output_valid
impl_io.output_ready := gcd.ready
status := Cat(impl_io.input_ready, impl_io.output_valid)
io.gcd_busy := impl_io.busy
// DOC include start: GCD instance regmap
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
// DOC include end: GCD instance regmap
}
}
}
class GCDAXI4(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val node = AXI4RegisterNode(AddressSet(params.address, 4096-1), beatBytes=beatBytes)
override lazy val module = new GCDImpl
class GCDImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl_io = if (params.useBlackBox) {
val impl = Module(new GCDMMIOBlackBox(params.width))
impl.io
} else {
val impl = Module(new GCDMMIOChiselModule(params.width))
impl.io
}
impl_io.clock := clock
impl_io.reset := reset.asBool
impl_io.x := x
impl_io.y := y.bits
impl_io.input_valid := y.valid
y.ready := impl_io.input_ready
gcd.bits := impl_io.gcd
gcd.valid := impl_io.output_valid
impl_io.output_ready := gcd.ready
status := Cat(impl_io.input_ready, impl_io.output_valid)
io.gcd_busy := impl_io.busy
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
}
}
}
6.7.3. Top-level Traits
After creating the module, we need to hook it up to our SoC.
The LazyModule
abstract class containst the TileLink node representing the peripheral’s I/O.
For a simple memory-mapped peripheral, connecting the peripheral’s TileLink node must be connected to the relevant bu.
trait CanHavePeripheryGCD { this: BaseSubsystem =>
private val portName = "gcd"
private val pbus = locateTLBusWrapper(PBUS)
// Only build if we are using the TL (nonAXI4) version
val gcd_busy = p(GCDKey) match {
case Some(params) => {
val gcd = if (params.useAXI4) {
val gcd = LazyModule(new GCDAXI4(params, pbus.beatBytes)(p))
gcd.clockNode := pbus.fixedClockNode
pbus.coupleTo(portName) {
gcd.node :=
AXI4Buffer () :=
TLToAXI4 () :=
// toVariableWidthSlave doesn't use holdFirstDeny, which TLToAXI4() needsx
TLFragmenter(pbus.beatBytes, pbus.blockBytes, holdFirstDeny = true) := _
}
gcd
} else {
val gcd = LazyModule(new GCDTL(params, pbus.beatBytes)(p))
gcd.clockNode := pbus.fixedClockNode
pbus.coupleTo(portName) { gcd.node := TLFragmenter(pbus.beatBytes, pbus.blockBytes) := _ }
gcd
}
val gcd_busy = InModuleBody {
val busy = IO(Output(Bool())).suggestName("gcd_busy")
busy := gcd.module.io.gcd_busy
busy
}
Some(gcd_busy)
}
case None => None
}
}
Also observe how we have to place additional AXI4 buffers and converters for the AXI4 version of this peripheral.
Peripherals which expose I/O can use InModuleBody to punch their I/O to the DigitalTop module.
In this example, the GCD module’s gcd_busy
signal is exposed as a I/O of DigitalTop.
6.7.4. Constructing the DigitalTop and Config
Now we want to mix our traits into the system as a whole.
This code is from generators/chipyard/src/main/scala/DigitalTop.scala
.
class DigitalTop(implicit p: Parameters) extends ChipyardSystem
with testchipip.tsi.CanHavePeripheryUARTTSI // Enables optional UART-based TSI transport
with testchipip.boot.CanHavePeripheryCustomBootPin // Enables optional custom boot pin
with testchipip.boot.CanHavePeripheryBootAddrReg // Use programmable boot address register
with testchipip.cosim.CanHaveTraceIO // Enables optionally adding trace IO
with testchipip.soc.CanHaveBankedScratchpad // Enables optionally adding a banked scratchpad
with testchipip.iceblk.CanHavePeripheryBlockDevice // Enables optionally adding the block device
with testchipip.serdes.CanHavePeripheryTLSerial // Enables optionally adding the tl-serial interface
with testchipip.serdes.old.CanHavePeripheryTLSerial // Enables optionally adding the DEPRECATED tl-serial interface
with testchipip.soc.CanHavePeripheryChipIdPin // Enables optional pin to set chip id for multi-chip configs
with sifive.blocks.devices.i2c.HasPeripheryI2C // Enables optionally adding the sifive I2C
with sifive.blocks.devices.timer.HasPeripheryTimer // Enables optionally adding the timer device
with sifive.blocks.devices.pwm.HasPeripheryPWM // Enables optionally adding the sifive PWM
with sifive.blocks.devices.uart.HasPeripheryUART // Enables optionally adding the sifive UART
with sifive.blocks.devices.gpio.HasPeripheryGPIO // Enables optionally adding the sifive GPIOs
with sifive.blocks.devices.spi.HasPeripherySPIFlash // Enables optionally adding the sifive SPI flash controller
with sifive.blocks.devices.spi.HasPeripherySPI // Enables optionally adding the sifive SPI port
with icenet.CanHavePeripheryIceNIC // Enables optionally adding the IceNIC for FireSim
with chipyard.example.CanHavePeripheryInitZero // Enables optionally adding the initzero example widget
with chipyard.example.CanHavePeripheryGCD // Enables optionally adding the GCD example widget
with chipyard.example.CanHavePeripheryStreamingFIR // Enables optionally adding the DSPTools FIR example widget
with chipyard.example.CanHavePeripheryStreamingPassthrough // Enables optionally adding the DSPTools streaming-passthrough example widget
with nvidia.blocks.dla.CanHavePeripheryNVDLA // Enables optionally having an NVDLA
with chipyard.clocking.HasChipyardPRCI // Use Chipyard reset/clock distribution
with chipyard.clocking.CanHaveClockTap // Enables optionally adding a clock tap output port
with fftgenerator.CanHavePeripheryFFT // Enables optionally having an MMIO-based FFT block
with constellation.soc.CanHaveGlobalNoC // Support instantiating a global NoC interconnect
{
override lazy val module = new DigitalTopModule(this)
}
class DigitalTopModule(l: DigitalTop) extends ChipyardSystemModule(l)
with freechips.rocketchip.util.DontTouch
Just as we need separate traits for LazyModule
and module implementation, we need two classes to build the system.
The DigitalTop
class contains the set of traits which parameterize and define the DigitalTop
. Typically these traits will optionally add IOs or peripherals to the DigitalTop
.
The DigitalTop
class includes the pre-elaboration code and also a lazy val
to produce the module implementation (hence LazyModule
).
The DigitalTopModule
class is the actual RTL that gets synthesized.
And finally, we create a configuration class in generators/chipyard/src/main/scala/config/MMIOAcceleratorConfigs.scala
that uses the WithGCD
config fragment defined earlier.
class WithGCD(useAXI4: Boolean = false, useBlackBox: Boolean = false) extends Config((site, here, up) => {
case GCDKey => Some(GCDParams(useAXI4 = useAXI4, useBlackBox = useBlackBox))
})
class GCDTLRocketConfig extends Config(
new chipyard.example.WithGCD(useAXI4=false, useBlackBox=false) ++ // Use GCD Chisel, connect Tilelink
new freechips.rocketchip.subsystem.WithNBigCores(1) ++
new chipyard.config.AbstractConfig)
6.7.5. Testing
Now we can test that the GCD is working. The test program is in tests/gcd.c
.
#include "mmio.h"
#define GCD_STATUS 0x4000
#define GCD_X 0x4004
#define GCD_Y 0x4008
#define GCD_GCD 0x400C
unsigned int gcd_ref(unsigned int x, unsigned int y) {
while (y != 0) {
if (x > y)
x = x - y;
else
y = y - x;
}
return x;
}
// DOC include start: GCD test
int main(void)
{
uint32_t result, ref, x = 20, y = 15;
// wait for peripheral to be ready
while ((reg_read8(GCD_STATUS) & 0x2) == 0) ;
reg_write32(GCD_X, x);
reg_write32(GCD_Y, y);
// wait for peripheral to complete
while ((reg_read8(GCD_STATUS) & 0x1) == 0) ;
result = reg_read32(GCD_GCD);
ref = gcd_ref(x, y);
if (result != ref) {
printf("Hardware result %d does not match reference value %d\n", result, ref);
return 1;
}
printf("Hardware result %d is correct for GCD\n", result);
return 0;
}
// DOC include end: GCD test
This just writes out to the registers we defined earlier.
The base of the module’s MMIO region is at 0x2000 by default.
This will be printed out in the address map portion when you generate the Verilog code.
You can also see how this changes the emitted .json
addressmap files in generated-src
.
Compiling this program with make
produces a gcd.riscv
executable.
Now with all of that done, we can go ahead and run our simulation.
cd sims/verilator
make CONFIG=GCDTLRocketConfig BINARY=../../tests/gcd.riscv run-binary