6.7. MMIO Peripherals
The easiest way to create a MMIO peripheral is to follow the GCD TileLink MMIO example. Since Chipyard and Rocket Chip SoCs primarily use Tilelink as the on-chip interconnect protocol, this section will primarily focus on designing Tilelink-based peripherals. However, see generators/chipyard/src/main/scala/example/GCD.scala for how an example AXI4 based peripheral is defined and connected to the Tilelink graph through converters.
To create a MMIO-mapped peripheral, you will need to specify a LazyModule wrapper containing the TileLink port as a Diplomacy Node, as well as an internal LazyModuleImp class that defines the MMIO’s implementation and any non-TileLink I/O.
For this example, we will show how to connect a MMIO peripheral which computes the GCD.
The full code can be found in generators/chipyard/src/main/scala/example/GCD.scala.
In this case we use a submodule GCDMMIOChiselModule to actually perform the GCD. The GCDTL and GCDAXI4 classes are the LazyModule classes which construct the TileLink or AXI4 ports, wrapping the inner GCDMMIOChiselModule.
The node object is a Diplomacy node, which connects the peripheral to the Diplomacy interconnect graph.
class GCDMMIOChiselModule(val w: Int) extends Module {
val io = IO(new GCDIO(w))
val s_idle :: s_run :: s_done :: Nil = Enum(3)
val state = RegInit(s_idle)
val tmp = Reg(UInt(w.W))
val gcd = Reg(UInt(w.W))
io.input_ready := state === s_idle
io.output_valid := state === s_done
io.gcd := gcd
when (state === s_idle && io.input_valid) {
state := s_run
} .elsewhen (state === s_run && tmp === 0.U) {
state := s_done
} .elsewhen (state === s_done && io.output_ready) {
state := s_idle
}
when (state === s_idle && io.input_valid) {
gcd := io.x
tmp := io.y
} .elsewhen (state === s_run) {
when (gcd > tmp) {
gcd := gcd - tmp
} .otherwise {
tmp := tmp - gcd
}
}
io.busy := state =/= s_idle
}
class GCDTL(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val device = new SimpleDevice("gcd", Seq("ucbbar,gcd"))
val node = TLRegisterNode(Seq(AddressSet(params.address, 4096-1)), device, "reg/control", beatBytes=beatBytes)
override lazy val module = new GCDImpl
class GCDImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl_io = if (params.useBlackBox) {
val impl = Module(new GCDMMIOBlackBox(params.width))
impl.io
} else {
val impl = Module(new GCDMMIOChiselModule(params.width))
impl.io
}
impl_io.clock := clock
impl_io.reset := reset.asBool
impl_io.x := x
impl_io.y := y.bits
impl_io.input_valid := y.valid
y.ready := impl_io.input_ready
gcd.bits := impl_io.gcd
gcd.valid := impl_io.output_valid
impl_io.output_ready := gcd.ready
status := Cat(impl_io.input_ready, impl_io.output_valid)
io.gcd_busy := impl_io.busy
// DOC include start: GCD instance regmap
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
// DOC include end: GCD instance regmap
}
}
}
class GCDAXI4(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val node = AXI4RegisterNode(AddressSet(params.address, 4096-1), beatBytes=beatBytes)
override lazy val module = new GCDImpl
class GCDImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl_io = if (params.useBlackBox) {
val impl = Module(new GCDMMIOBlackBox(params.width))
impl.io
} else {
val impl = Module(new GCDMMIOChiselModule(params.width))
impl.io
}
impl_io.clock := clock
impl_io.reset := reset.asBool
impl_io.x := x
impl_io.y := y.bits
impl_io.input_valid := y.valid
y.ready := impl_io.input_ready
gcd.bits := impl_io.gcd
gcd.valid := impl_io.output_valid
impl_io.output_ready := gcd.ready
status := Cat(impl_io.input_ready, impl_io.output_valid)
io.gcd_busy := impl_io.busy
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
}
}
}
6.7.1. Advanced Features of RegField Entries
RegField exposes polymorphic r and w methods
that allow read- and write-only memory-mapped registers to be
interfaced to hardware in multiple ways.
RegField.r(2, status)is used to create a 2-bit, read-only register that captures the current value of thestatussignal when read.RegField.r(params.width, gcd)“connects” the decoupled handshaking interfacegcdto a read-only memory-mapped register. When this register is read via MMIO, thereadysignal is asserted. This is in turn connected tooutput_readyon the GCD module through the glue logic.RegField.w(params.width, x)exposes a plain register via MMIO, but makes it write-only.RegField.w(params.width, y)associates the decoupled interface signalywith a write-only memory-mapped register, causingy.validto be asserted when the register is written.
Since the ready/valid signals of y are connected to the
input_ready and input_valid signals of the GCD module,
respectively, this register map and glue logic has the effect of
triggering the GCD algorithm when y is written. Therefore, the
algorithm is set up by first writing x and then performing a
triggering write to y. Polling can be used for status checks.
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
Note
In older versions of Chipyard and Rocket-Chip, a TLRegisterRouter abstrat
class was used to abstract away the construction of the TLRegisterNode and
LazyModule classes necessary to construct MMIO peripherals. This was removed,
in favor of requiring users to explicitly construct the necessary classes.
This matches more closely how standard Modules and LazyModules are
constructed, making it clearer how a MMIO peripheral fits into the Module
and LazyModule design patterns.
6.7.2. Connecting by TileLink
The key to connecting to the TileLink Diplomatic graph is the construction of the TileLink node for this peripheral.
In this case, since the peripheral acts as a manager of some register-mapped address space, it uses the TLRegisterNode object.
The parameters to the TLRegisterNode object specify the size of the managed space, the base address, and the port width.
Within the register-mapped peripheral, the control registers can be mapped using the node.regmap function, as described above.
A similar procedure is followed for both AXI4 and TileLin peripherals.
class GCDTL(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val device = new SimpleDevice("gcd", Seq("ucbbar,gcd"))
val node = TLRegisterNode(Seq(AddressSet(params.address, 4096-1)), device, "reg/control", beatBytes=beatBytes)
override lazy val module = new GCDImpl
class GCDImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl_io = if (params.useBlackBox) {
val impl = Module(new GCDMMIOBlackBox(params.width))
impl.io
} else {
val impl = Module(new GCDMMIOChiselModule(params.width))
impl.io
}
impl_io.clock := clock
impl_io.reset := reset.asBool
impl_io.x := x
impl_io.y := y.bits
impl_io.input_valid := y.valid
y.ready := impl_io.input_ready
gcd.bits := impl_io.gcd
gcd.valid := impl_io.output_valid
impl_io.output_ready := gcd.ready
status := Cat(impl_io.input_ready, impl_io.output_valid)
io.gcd_busy := impl_io.busy
// DOC include start: GCD instance regmap
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
// DOC include end: GCD instance regmap
}
}
}
class GCDAXI4(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val node = AXI4RegisterNode(AddressSet(params.address, 4096-1), beatBytes=beatBytes)
override lazy val module = new GCDImpl
class GCDImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
// How many clock cycles in a PWM cycle?
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val status = Wire(UInt(2.W))
val impl_io = if (params.useBlackBox) {
val impl = Module(new GCDMMIOBlackBox(params.width))
impl.io
} else {
val impl = Module(new GCDMMIOChiselModule(params.width))
impl.io
}
impl_io.clock := clock
impl_io.reset := reset.asBool
impl_io.x := x
impl_io.y := y.bits
impl_io.input_valid := y.valid
y.ready := impl_io.input_ready
gcd.bits := impl_io.gcd
gcd.valid := impl_io.output_valid
impl_io.output_ready := gcd.ready
status := Cat(impl_io.input_ready, impl_io.output_valid)
io.gcd_busy := impl_io.busy
node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
}
}
}
6.7.3. Top-level Traits
After creating the module, we need to hook it up to our SoC.
The LazyModule abstract class containst the TileLink node representing the peripheral’s I/O.
For a simple memory-mapped peripheral, connecting the peripheral’s TileLink node must be connected to the relevant bu.
trait CanHavePeripheryGCD { this: BaseSubsystem =>
private val portName = "gcd"
private val pbus = locateTLBusWrapper(PBUS)
// Only build if we are using the TL (nonAXI4) version
val (gcd_busy, gcd_clock) = p(GCDKey) match {
case Some(params) => {
// If externallyClocked is true, create an input port for the GCD clock.
// This clock is distinct from the pbus clock or other internal clocks.
// It's defined within InModuleBody as it's a hardware port.
val gcd_clock = Option.when(params.externallyClocked) {
InModuleBody { IO(Input(Clock())).suggestName("gcd_clock_in") }
}
// Define the clock source node for the GCD module.
val gcdClockNode = if (params.externallyClocked) {
// If externally clocked, create a new ClockSourceNode.
// This node acts as the root of the GCD's independent clock domain.
val gcdSourceClockNode = ClockSourceNode(Seq(ClockSourceParameters()))
InModuleBody {
// Connect the ClockSourceNode's output clock to the external gcd_clock input.
gcdSourceClockNode.out(0)._1.clock := gcd_clock.get
// The reset signal for the GCD's clock domain must be synchronous to the gcd_clock.
// ResetCatchAndSync synchronizes the asynchronous pbus reset to the gcd_clock domain.
gcdSourceClockNode.out(0)._1.reset := ResetCatchAndSync(gcd_clock.get, pbus.module.reset.asBool)
}
gcdSourceClockNode
} else {
// If not externally clocked, the GCD runs on the same clock as the pbus.
pbus.fixedClockNode
}
// Define the type of clock crossing required between the pbus and the GCD module.
val gcdCrossing = if (params.externallyClocked) {
// If the GCD has its own clock, an AsynchronousCrossing is necessary
// to safely transfer data between the pbus clock domain and the GCD clock domain.
AsynchronousCrossing()
} else {
// If the GCD uses the pbus clock, a SynchronousCrossing can be used.
SynchronousCrossing()
}
// Instantiate the GCD module (either TL, AXI4, or HLS variant)
val gcd = if (params.useAXI4) {
val gcd = LazyModule(new GCDAXI4(params, pbus.beatBytes)(p))
// Connect the GCD's clock input to our determined gcdClockNode.
gcd.clockNode := gcdClockNode
// Couple the GCD to the pbus, inserting the necessary clock crossing logic.
pbus.coupleTo(portName) {
// AXI4InwardClockCrossingHelper handles crossing details for AXI4.
AXI4InwardClockCrossingHelper("gcd_crossing", gcd, gcd.node)(gcdCrossing) :=
AXI4Buffer () :=
TLToAXI4 () :=
// toVariableWidthSlave doesn't use holdFirstDeny, which TLToAXI4() needs
TLFragmenter(pbus.beatBytes, pbus.blockBytes, holdFirstDeny = true) := _
}
gcd
} else if (params.useHLS) {
val gcd = LazyModule(new HLSGCDAccel(params, pbus.beatBytes)(p))
// Connect the GCD's clock input to our determined gcdClockNode.
gcd.clockNode := gcdClockNode
// Couple the GCD to the pbus, inserting the necessary clock crossing logic.
pbus.coupleTo(portName) {
// TLInwardClockCrossingHelper handles crossing details for TileLink.
TLInwardClockCrossingHelper("gcd_crossing", gcd, gcd.node)(gcdCrossing) :=
TLFragmenter(pbus.beatBytes, pbus.blockBytes) := _
}
gcd
} else {
val gcd = LazyModule(new GCDTL(params, pbus.beatBytes)(p))
// Connect the GCD's clock input to our determined gcdClockNode.
gcd.clockNode := gcdClockNode
// Couple the GCD to the pbus, inserting the necessary clock crossing logic.
pbus.coupleTo(portName) {
// TLInwardClockCrossingHelper handles crossing details for TileLink.
TLInwardClockCrossingHelper("gcd_crossing", gcd, gcd.node)(gcdCrossing) :=
TLFragmenter(pbus.beatBytes, pbus.blockBytes) := _
}
gcd
}
// Expose the GCD's busy signal.
val gcd_busy = InModuleBody {
val busy = IO(Output(Bool())).suggestName("gcd_busy")
busy := gcd.module.io.gcd_busy
busy
}
// Return the busy signal (always needed if GCD exists) and the optional external clock input.
// The Option[Clock] allows the IOBinder (WithGCDIOPunchthrough) to conditionally
// create the top-level clock input only when `externallyClocked` is true.
// The busy signal is Some(busy) because the entire GCD peripheral itself is optional based on GCDKey.
(Some(gcd_busy), gcd_clock)
}
// If GCDKey is None, the GCD peripheral is not instantiated. Return None for both signals.
case None => (None, None)
}
}
Also observe how we have to place additional AXI4 buffers and converters for the AXI4 version of this peripheral.
Peripherals which expose I/O can use InModuleBody to punch their I/O to the DigitalTop module.
In this example, the GCD module’s gcd_busy signal is exposed as a I/O of DigitalTop.
6.7.4. Constructing the DigitalTop and Config
Now we want to mix our traits into the system as a whole.
This code is from generators/chipyard/src/main/scala/DigitalTop.scala.
class DigitalTop(implicit p: Parameters) extends ChipyardSystem
with testchipip.tsi.CanHavePeripheryUARTTSI // Enables optional UART-based TSI transport
with testchipip.boot.CanHavePeripheryCustomBootPin // Enables optional custom boot pin
with testchipip.cosim.CanHaveTraceIO // Enables optionally adding trace IO
with testchipip.soc.CanHaveSubsystemInjectors // Enables the subsystem injector API
with testchipip.soc.CanHaveSwitchableOffchipBus // Enables optional off-chip-bus with interface-switch
with testchipip.iceblk.CanHavePeripheryBlockDevice // Enables optionally adding the block device
with testchipip.serdes.CanHavePeripheryTLSerial // Enables optionally adding the tl-serial interface
with testchipip.serdes.old.CanHavePeripheryTLSerial // Enables optionally adding the DEPRECATED tl-serial interface
with testchipip.soc.CanHavePeripheryChipIdPin // Enables optional pin to set chip id for multi-chip configs
with sifive.blocks.devices.i2c.HasPeripheryI2C // Enables optionally adding the sifive I2C
with sifive.blocks.devices.timer.HasPeripheryTimer // Enables optionally adding the timer device
with sifive.blocks.devices.pwm.HasPeripheryPWM // Enables optionally adding the sifive PWM
with sifive.blocks.devices.uart.HasPeripheryUART // Enables optionally adding the sifive UART
with sifive.blocks.devices.gpio.HasPeripheryGPIO // Enables optionally adding the sifive GPIOs
with sifive.blocks.devices.spi.HasPeripherySPIFlash // Enables optionally adding the sifive SPI flash controller
with sifive.blocks.devices.spi.HasPeripherySPI // Enables optionally adding the sifive SPI port
with icenet.CanHavePeripheryIceNIC // Enables optionally adding the IceNIC for FireSim
with chipyard.example.CanHavePeripheryGCD // Enables optionally adding the GCD example widget
with chipyard.clocking.HasChipyardPRCI // Use Chipyard reset/clock distribution
with chipyard.clocking.CanHaveClockTap // Enables optionally adding a clock tap output port
with constellation.soc.CanHaveGlobalNoC // Support instantiating a global NoC interconnect
with rerocc.CanHaveReRoCCTiles // Support tiles that instantiate rerocc-attached accelerators
with testchipip.ctc.CanHavePeripheryCTC // Support optional CTC link
{
override lazy val module = new DigitalTopModule(this)
}
class DigitalTopModule(l: DigitalTop) extends ChipyardSystemModule(l)
with freechips.rocketchip.util.DontTouch
Just as we need separate traits for LazyModule and module implementation, we need two classes to build the system.
The DigitalTop class contains the set of traits which parameterize and define the DigitalTop. Typically these traits will optionally add IOs or peripherals to the DigitalTop.
The DigitalTop class includes the pre-elaboration code and also a lazy val to produce the module implementation (hence LazyModule).
The DigitalTopModule class is the actual RTL that gets synthesized.
And finally, we create a configuration class in generators/chipyard/src/main/scala/config/MMIOAcceleratorConfigs.scala that uses the WithGCD config fragment defined earlier.
class WithGCD(useAXI4: Boolean = false, useBlackBox: Boolean = false, useHLS: Boolean = false, externallyClocked: Boolean = false) extends Config((site, here, up) => {
case GCDKey => {
// useHLS cannot be used with useAXI4 and useBlackBox
assert(!useHLS || (useHLS && !useAXI4 && !useBlackBox))
Some(GCDParams(useAXI4 = useAXI4, useBlackBox = useBlackBox, useHLS = useHLS, externallyClocked = externallyClocked))
}
})
class GCDTLRocketConfig extends Config(
new chipyard.example.WithGCD(useAXI4=false, useBlackBox=false) ++ // Use GCD Chisel, connect Tilelink
new freechips.rocketchip.rocket.WithNHugeCores(1) ++
new chipyard.config.AbstractConfig)
6.7.5. Testing
Now we can test that the GCD is working. The test program is in tests/gcd.c.
#include "mmio.h"
#define GCD_STATUS 0x4000
#define GCD_X 0x4004
#define GCD_Y 0x4008
#define GCD_GCD 0x400C
unsigned int gcd_ref(unsigned int x, unsigned int y) {
while (y != 0) {
if (x > y)
x = x - y;
else
y = y - x;
}
return x;
}
// DOC include start: GCD test
int main(void)
{
uint32_t result, ref, x = 20, y = 15;
// wait for peripheral to be ready
while ((reg_read8(GCD_STATUS) & 0x2) == 0) ;
reg_write32(GCD_X, x);
reg_write32(GCD_Y, y);
// wait for peripheral to complete
while ((reg_read8(GCD_STATUS) & 0x1) == 0) ;
result = reg_read32(GCD_GCD);
ref = gcd_ref(x, y);
if (result != ref) {
printf("Hardware result %d does not match reference value %d\n", result, ref);
return 1;
}
printf("Hardware result %d is correct for GCD\n", result);
return 0;
}
// DOC include end: GCD test
This just writes out to the registers we defined earlier.
The base of the module’s MMIO region is at 0x2000 by default.
This will be printed out in the address map portion when you generate the Verilog code.
You can also see how this changes the emitted .json addressmap files in generated-src.
Compiling this program with make produces a gcd.riscv executable.
Now with all of that done, we can go ahead and run our simulation.
cd sims/verilator
make CONFIG=GCDTLRocketConfig BINARY=../../tests/gcd.riscv run-binary