6.2. Adding an Accelerator/Device

Accelerators or custom IO devices can be added to your SoC in several ways:

  • MMIO Peripheral (a.k.a TileLink-Attached Accelerator)
  • Tightly-Coupled RoCC Accelerator

These approaches differ in the method of the communication between the processor and the custom block.

With the TileLink-Attached approach, the processor communicates with MMIO peripherals through memory-mapped registers.

In contrast, the processor communicates with a RoCC accelerators through a custom protocol and custom non-standard ISA instructions reserved in the RISC-V ISA encoding space. Each core can have up to four accelerators that are controlled by custom instructions and share resources with the CPU. RoCC coprocessor instructions have the following form.

customX rd, rs1, rs2, funct

The X will be a number 0-3, and determines the opcode of the instruction, which controls which accelerator an instruction will be routed to. The rd, rs1, and rs2 fields are the register numbers of the destination register and two source registers. The funct field is a 7-bit integer that the accelerator can use to distinguish different instructions from each other.

Note that communication through a RoCC interface requires a custom software toolchain, whereas MMIO peripherals can use that standard toolchain with appropriate driver support.

6.2.1. Integrating into the Generator Build System

While developing, you want to include Chisel code in a submodule so that it can be shared by different projects. To add a submodule to the Chipyard framework, make sure that your project is organized as follows.

yourproject/
    build.sbt
    src/main/scala/
        YourFile.scala

Put this in a git repository and make it accessible. Then add it as a submodule to under the following directory hierarchy: generators/yourproject.

cd generators/
git submodule add https://git-repository.com/yourproject.git

Then add yourproject to the Chipyard top-level build.sbt file.

lazy val yourproject = (project in file("generators/yourproject")).settings(commonSettings).dependsOn(rocketchip)

You can then import the classes defined in the submodule in a new project if you add it as a dependency. For instance, if you want to use this code in the example project, change the final line in build.sbt to the following.

lazy val example = (project in file(".")).settings(commonSettings).dependsOn(testchipip, yourproject)

6.2.2. MMIO Peripheral

The easiest way to create a TileLink peripheral is to use the TLRegisterRouter, which abstracts away the details of handling the TileLink protocol and provides a convenient interface for specifying memory-mapped registers. To create a RegisterRouter-based peripheral, you will need to specify a parameter case class for the configuration settings, a bundle trait with the extra top-level ports, and a module implementation containing the actual RTL. In this case we use a submodule PWMBase to actually perform the pulse-width modulation. The PWMModule class only creates the registers and hooks them up using regmap.

case class PWMParams(address: BigInt, beatBytes: Int)

class PWMBase(w: Int) extends Module {
  val io = IO(new Bundle {
    val pwmout = Output(Bool())
    val period = Input(UInt(w.W))
    val duty = Input(UInt(w.W))
    val enable = Input(Bool())
  })

  // The counter should count up until period is reached
  val counter = Reg(UInt(w.W))

  when (counter >= (io.period - 1.U)) {
    counter := 0.U
  } .otherwise {
    counter := counter + 1.U
  }

  // If PWM is enabled, pwmout is high when counter < duty
  // If PWM is not enabled, it will always be low
  io.pwmout := io.enable && (counter < io.duty)
}

trait PWMBundle extends Bundle {
  val pwmout = Output(Bool())
}

trait PWMModule extends HasRegMap {
  val io: PWMBundle
  implicit val p: Parameters
  def params: PWMParams

  // How many clock cycles in a PWM cycle?
  val period = Reg(UInt(32.W))
  // For how many cycles should the clock be high?
  val duty = Reg(UInt(32.W))
  // Is the PWM even running at all?
  val enable = RegInit(false.B)

  val base = Module(new PWMBase(32))
  io.pwmout := base.io.pwmout
  base.io.period := period
  base.io.duty := duty
  base.io.enable := enable

  regmap(
    0x00 -> Seq(
      RegField(32, period)),
    0x04 -> Seq(
      RegField(32, duty)),
    0x08 -> Seq(
      RegField(1, enable)))
}

Once you have these classes, you can construct the final peripheral by extending the TLRegisterRouter and passing the proper arguments. The first set of arguments determines where the register router will be placed in the global address map and what information will be put in its device tree entry. The second set of arguments is the IO bundle constructor, which we create by extending TLRegBundle with our bundle trait. The final set of arguments is the module constructor, which we create by extends TLRegModule with our module trait.

class PWMTL(c: PWMParams)(implicit p: Parameters)
  extends TLRegisterRouter(
    c.address, "pwm", Seq("ucbbar,pwm"),
    beatBytes = c.beatBytes)(
      new TLRegBundle(c, _) with PWMBundle)(
      new TLRegModule(c, _, _) with PWMModule)

The full module code can be found in generators/example/src/main/scala/PWM.scala.

After creating the module, we need to hook it up to our SoC. Rocket Chip accomplishes this using the cake pattern. This basically involves placing code inside traits. In the Rocket Chip cake, there are two kinds of traits: a LazyModule trait and a module implementation trait.

The LazyModule trait runs setup code that must execute before all the hardware gets elaborated. For a simple memory-mapped peripheral, this just involves connecting the peripheral’s TileLink node to the MMIO crossbar.

trait HasPeripheryPWMTL { this: BaseSubsystem =>
  implicit val p: Parameters

  private val address = 0x2000
  private val portName = "pwm"

  val pwm = LazyModule(new PWMTL(
    PWMParams(address, pbus.beatBytes))(p))

  pbus.toVariableWidthSlave(Some(portName)) { pwm.node }
}

Note that the PWMTL class we created from the register router is itself a LazyModule. Register routers have a TileLink node simply named “node”, which we can hook up to the Rocket Chip bus. This will automatically add address map and device tree entries for the peripheral.

The module implementation trait is where we instantiate our PWM module and connect it to the rest of the SoC. Since this module has an extra pwmout output, we declare that in this trait, using Chisel’s multi-IO functionality. We then connect the PWMTL’s pwmout to the pwmout we declared.

trait HasPeripheryPWMTLModuleImp extends LazyModuleImp {
  implicit val p: Parameters
  val outer: HasPeripheryPWMTL

  val pwmout = IO(Output(Bool()))

  pwmout := outer.pwm.module.io.pwmout
}

Now we want to mix our traits into the system as a whole. This code is from generators/example/src/main/scala/Top.scala.

class TopWithPWMTL(implicit p: Parameters) extends Top
  with HasPeripheryPWMTL {
  override lazy val module = new TopWithPWMTLModule(this)
}

class TopWithPWMTLModule(l: TopWithPWMTL) extends TopModule(l)
  with HasPeripheryPWMTLModuleImp

Just as we need separate traits for LazyModule and module implementation, we need two classes to build the system. The Top classes already have the basic peripherals included for us, so we will just extend those.

The Top class includes the pre-elaboration code and also a lazy val to produce the module implementation (hence LazyModule). The TopModule class is the actual RTL that gets synthesized.

Next, we need to add a configuration mixin in generators/example/src/main/scala/ConfigMixins.scala that tells the TestHarness to instantiate TopWithPWMTL instead of the default Top.

class WithPWMTop extends Config((site, here, up) => {
  case BuildTop => (clock: Clock, reset: Bool, p: Parameters) =>
    Module(LazyModule(new TopWithPWMTL()(p)).module)
})

And finally, we create a configuration class in generators/example/src/main/scala/Configs.scala that uses this mixin.

class PWMRocketConfig extends Config(
  new WithPWMTop ++                                        // use top with tilelink-controlled PWM
  new WithBootROM ++
  new freechips.rocketchip.subsystem.WithInclusiveCache ++
  new freechips.rocketchip.subsystem.WithNBigCores(1) ++
  new freechips.rocketchip.system.BaseConfig)

Now we can test that the PWM is working. The test program is in tests/pwm.c.

#define PWM_PERIOD 0x2000
#define PWM_DUTY 0x2004
#define PWM_ENABLE 0x2008

#include "mmio.h"

int main(void)
{
	reg_write32(PWM_PERIOD, 20);
	reg_write32(PWM_DUTY, 5);
	reg_write32(PWM_ENABLE, 1);

	return 0;
}

This just writes out to the registers we defined earlier. The base of the module’s MMIO region is at 0x2000. This will be printed out in the address map portion when you generated the verilog code.

Compiling this program with make produces a pwm.riscv executable.

Now with all of that done, we can go ahead and run our simulation.

cd sims/verilator
make CONFIG=PWMRocketConfig TOP=TopWithPWMTL
./simulator-example-PWMRocketConfig ../../tests/pwm.riscv

6.2.3. Adding a RoCC Accelerator

RoCC accelerators are lazy modules that extend the LazyRoCC class. Their implementation should extends the LazyRoCCModule class.

class CustomAccelerator(opcodes: OpcodeSet)
    (implicit p: Parameters) extends LazyRoCC(opcodes) {
  override lazy val module = new CustomAcceleratorModule(this)
}

class CustomAcceleratorModule(outer: CustomAccelerator)
    extends LazyRoCCModuleImp(outer) {
  val cmd = Queue(io.cmd)
  // The parts of the command are as follows
  // inst - the parts of the instruction itself
  //   opcode
  //   rd - destination register number
  //   rs1 - first source register number
  //   rs2 - second source register number
  //   funct
  //   xd - is the destination register being used?
  //   xs1 - is the first source register being used?
  //   xs2 - is the second source register being used?
  // rs1 - the value of source register 1
  // rs2 - the value of source register 2
  ...
}

The opcodes parameter for LazyRoCC is the set of custom opcodes that will map to this accelerator. More on this in the next subsection.

The LazyRoCC class contains two TLOutputNode instances, atlNode and tlNode. The former connects into a tile-local arbiter along with the backside of the L1 instruction cache. The latter connects directly to the L1-L2 crossbar. The corresponding Tilelink ports in the module implementation’s IO bundle are atl and tl, respectively.

The other interfaces available to the accelerator are mem, which provides access to the L1 cache; ptw which provides access to the page-table walker; the busy signal, which indicates when the accelerator is still handling an instruction; and the interrupt signal, which can be used to interrupt the CPU.

Look at the examples in generators/rocket-chip/src/main/scala/tile/LazyRocc.scala for detailed information on the different IOs.

6.2.3.1. Adding RoCC accelerator to Config

RoCC accelerators can be added to a core by overriding the BuildRoCC parameter in the configuration. This takes a sequence of functions producing LazyRoCC objects, one for each accelerator you wish to add.

For instance, if we wanted to add the previously defined accelerator and route custom0 and custom1 instructions to it, we could do the following.

class WithCustomAccelerator extends Config((site, here, up) => {
  case BuildRoCC => Seq((p: Parameters) => LazyModule(
    new CustomAccelerator(OpcodeSet.custom0 | OpcodeSet.custom1)(p)))
})

class CustomAcceleratorConfig extends Config(
  new WithCustomAccelerator ++ new RocketConfig)

To add RoCC instructions in your program, use the RoCC C macros provided in tests/rocc.h. You can find examples in the files tests/accum.c and charcount.c.

6.2.4. Adding a DMA port

For IO devices or accelerators (like a disk or network driver), instead of having the CPU poll data from the device, we may want to have the device write directly to the coherent memory system instead. For example, here is a device that writes zeros to the memory at a configured address.

package example

import chisel3._
import chisel3.util._
import freechips.rocketchip.subsystem.{BaseSubsystem, CacheBlockBytes}
import freechips.rocketchip.config.{Parameters, Field}
import freechips.rocketchip.diplomacy.{LazyModule, LazyModuleImp, IdRange}
import testchipip.TLHelper

case class InitZeroConfig(base: BigInt, size: BigInt)
case object InitZeroKey extends Field[InitZeroConfig]

class InitZero(implicit p: Parameters) extends LazyModule {
  val node = TLHelper.makeClientNode(
    name = "init-zero", sourceId = IdRange(0, 1))

  lazy val module = new InitZeroModuleImp(this)
}

class InitZeroModuleImp(outer: InitZero) extends LazyModuleImp(outer) {
  val config = p(InitZeroKey)

  val (mem, edge) = outer.node.out(0)
  val addrBits = edge.bundle.addressBits
  val blockBytes = p(CacheBlockBytes)

  require(config.size % blockBytes == 0)

  val s_init :: s_write :: s_resp :: s_done :: Nil = Enum(4)
  val state = RegInit(s_init)

  val addr = Reg(UInt(addrBits.W))
  val bytesLeft = Reg(UInt(log2Ceil(config.size+1).W))

  mem.a.valid := state === s_write
  mem.a.bits := edge.Put(
    fromSource = 0.U,
    toAddress = addr,
    lgSize = log2Ceil(blockBytes).U,
    data = 0.U)._2
  mem.d.ready := state === s_resp

  when (state === s_init) {
    addr := config.base.U
    bytesLeft := config.size.U
    state := s_write
  }

  when (edge.done(mem.a)) {
    addr := addr + blockBytes.U
    bytesLeft := bytesLeft - blockBytes.U
    state := s_resp
  }

  when (mem.d.fire()) {
    state := Mux(bytesLeft === 0.U, s_done, s_write)
  }
}

trait HasPeripheryInitZero { this: BaseSubsystem =>
  implicit val p: Parameters

  val initZero = LazyModule(new InitZero()(p))
  fbus.fromPort(Some("init-zero"))() := initZero.node
}

trait HasPeripheryInitZeroModuleImp extends LazyModuleImp {
  // Don't need anything here
}
class TopWithInitZero(implicit p: Parameters) extends Top
    with HasPeripheryInitZero {
  override lazy val module = new TopWithInitZeroModuleImp(this)
}

class TopWithInitZeroModuleImp(l: TopWithInitZero) extends TopModule(l)
  with HasPeripheryInitZeroModuleImp

We use TLHelper.makeClientNode to create a TileLink client node for us. We then connect the client node to the memory system through the front bus (fbus). For more info on creating TileLink client nodes, take a look at Client Node.

Once we’ve created our top-level module including the DMA widget, we can create a configuration for it as we did before.

class WithInitZero(base: BigInt, size: BigInt) extends Config((site, here, up) => {
  case InitZeroKey => InitZeroConfig(base, size)
})

class WithInitZeroTop extends Config((site, here, up) => {
  case BuildTop => (clock: Clock, reset: Bool, p: Parameters) =>
    Module(LazyModule(new TopWithInitZero()(p)).module)
})
class InitZeroRocketConfig extends Config(
  new WithInitZero(0x88000000L, 0x1000L) ++
  new WithInitZeroTop ++
  new WithBootROM ++
  new freechips.rocketchip.subsystem.WithInclusiveCache ++
  new freechips.rocketchip.subsystem.WithNBigCores(1) ++
  new freechips.rocketchip.system.BaseConfig)