Dsptools is a Chisel library that aids in writing custom signal processing accelerators. It does this by: * Giving types and helpers that allow you to express mathematical operations more directly. * Typeclasses that let you write polymorphic generators, for example an FIR filter generator that works for both real- and complex-valued filters. * Structures for packaging DSP blocks and integrating them into a rocketchip-based SoC. * Test harnesses for testing DSP circuits, as well as VIP-style drivers and monitors for DSP blocks.
The Dsptools repository has more documentation.
6.8. Dsptools Blocks
A DspBlock
is the basic unit of signal processing functionality that can be integrated into an SoC.
It has a AXI4-stream interface and an optional memory interface.
The idea is that these DspBlocks
can be easily designed, unit tested, and assembled lego-style to build complex functionality.
A DspChain
is one example of how to assemble DspBlocks
, in which case the streaming interfaces are connected serially into a pipeline, and a bus is instatiated and connected to every block with a memory interface.
Chipyard has example designs that integrate a DspBlock
to a rocketchip-based SoC as an MMIO peripheral. The custom DspBlock
has a ReadQueue
before it and a WriteQueue
after it, which allow memory mapped access to the streaming interfaces so the rocket core can interact with the DspBlock
[1]. This section will primarily focus on designing Tilelink-based peripherals. However, through the resources provided in Dsptools, one could also define an AXI4-based peripheral by following similar steps. Furthermore, the examples here are simple, but can be extended to implement more complex accelerators, for example an OFDM baseband or a spectrometer.
For this example, we will show you how to connect a simple FIR filter created using Dsptools as an MMIO peripheral as shown in the figure above. The full code can be found in generators/chipyard/src/main/scala/example/dsptools/GenericFIR.scala
. That being said, one could substitute any module with a ready valid interface in the place of the FIR and achieve the same results. As long as the read and valid signals of the module are attached to those of a corresponding DSPBlock
wrapper, and that wrapper is placed in a chain with a ReadQueue
and a WriteQueue
, following the general outline establised by these steps will allow you to interact with that block as a memory mapped IO.
The module GenericFIR
is the overall wrapper of our FIR module. This module links together a variable number of GenericFIRDirectCell
submodules, each of which performs the computations for one coefficient in a FIR direct form architecture. It is important to note that both modules are type-generic, which means that they can be instantiated for any datatype T
that implements Ring
operations (e.g. addition, multiplication, identities).
class GenericFIR[T<:Data:Ring](genIn:T, genOut:T, coeffs: => Seq[T]) extends Module {
val io = IO(GenericFIRIO(genIn, genOut))
// Construct a vector of genericFIRDirectCells
val directCells = Seq.fill(coeffs.length){ Module(new GenericFIRDirectCell(genIn, genOut)).io }
// Construct the direct FIR chain
for ((cell, coeff) <- directCells.zip(coeffs)) {
cell.coeff := coeff
}
// Connect input to first cell
directCells.head.in.bits.data := io.in.bits.data
directCells.head.in.bits.carry := Ring[T].zero
directCells.head.in.valid := io.in.valid
io.in.ready := directCells.head.in.ready
// Connect adjacent cells
// Note that .tail() returns a collection that consists of all
// elements in the inital collection minus the first one.
// This means that we zip together directCells[0, n] and
// directCells[1, n]. However, since zip ignores unmatched elements,
// the resulting zip is (directCells[0], directCells[1]) ...
// (directCells[n-1], directCells[n])
for ((current, next) <- directCells.zip(directCells.tail)) {
next.in.bits := current.out.bits
next.in.valid := current.out.valid
current.out.ready := next.in.ready
}
// Connect output to last cell
io.out.bits.data := directCells.last.out.bits.carry
directCells.last.out.ready := io.out.ready
io.out.valid := directCells.last.out.valid
}
class GenericFIRDirectCell[T<:Data:Ring](genIn: T, genOut: T) extends Module {
val io = IO(GenericFIRCellIO(genIn, genOut))
// Registers to delay the input and the valid to propagate with calculations
val hasNewData = RegInit(0.U)
val inputReg = Reg(genIn.cloneType)
// Passthrough ready
io.in.ready := io.out.ready
// When a new transaction is ready on the input, we will have new data to output
// next cycle. Take this data in
when (io.in.fire) {
hasNewData := 1.U
inputReg := io.in.bits.data
}
// We should output data when our cell has new data to output and is ready to
// recieve new data. This insures that every cell in the chain passes its data
// on at the same time
io.out.valid := hasNewData & io.in.fire
io.out.bits.data := inputReg
// Compute carry
// This uses the ring implementation for + and *, i.e.
// (a * b) maps to (Ring[T].prod(a, b)) for whicever T you use
io.out.bits.carry := inputReg * io.coeff + io.in.bits.carry
}
6.8.1. Creating a DspBlock
The first step in attaching the FIR filter as a MMIO peripheral is to create an abstract subclass of DspBlock
the wraps around the GenericFIR
module. Streaming outputs and inputs are packed and unpacked into UInt
s. If there were control signals, this is where they’d go from raw IOs to memory mapped. The main steps of this process are as follows.
Instantiate a
GenericFIR
withinGenericFIRBlock
.Attach the ready and valid signals from the in and out connections.
Cast the module input data to the input type of
GenericFIR
(GenericFIRBundle
) and attach.Cast the output of
GenericFIR
toUInt
and attach to the module output.
abstract class GenericFIRBlock[D, U, EO, EI, B<:Data, T<:Data:Ring]
(
genIn: T,
genOut: T,
coeffs: => Seq[T]
)(implicit p: Parameters) extends DspBlock[D, U, EO, EI, B] {
val streamNode = AXI4StreamIdentityNode()
val mem = None
lazy val module = new LazyModuleImp(this) {
require(streamNode.in.length == 1)
require(streamNode.out.length == 1)
val in = streamNode.in.head._1
val out = streamNode.out.head._1
// instantiate generic fir
val fir = Module(new GenericFIR(genIn, genOut, coeffs))
// Attach ready and valid to outside interface
in.ready := fir.io.in.ready
fir.io.in.valid := in.valid
fir.io.out.ready := out.ready
out.valid := fir.io.out.valid
// cast UInt to T
fir.io.in.bits := in.bits.data.asTypeOf(GenericFIRBundle(genIn))
// cast T to UInt
out.bits.data := fir.io.out.bits.asUInt
}
}
Note that at this point the GenericFIRBlock
does not have a type of memory interface specified. This abstract class can be used to create different flavors that use AXI-4, TileLink, AHB, or whatever other memory interface you like like.
6.8.2. Connecting DspBlock by TileLink
With these classes implemented, you can begin to construct the chain by extending GenericFIRBlock
while using the TLDspBlock
trait via mixin.
class TLGenericFIRBlock[T<:Data:Ring]
(
val genIn: T,
val genOut: T,
coeffs: => Seq[T]
)(implicit p: Parameters) extends
GenericFIRBlock[TLClientPortParameters, TLManagerPortParameters, TLEdgeOut, TLEdgeIn, TLBundle, T](
genIn, genOut, coeffs
) with TLDspBlock
We can then construct the final chain by utilizing the TLWriteQueue
and TLReadeQueue
modules found in generators/chipyard/src/main/scala/example/dsptools/DspBlocks.scala
. The chain is created by passing a list of factory functions to the constructor of TLChain
. The constructor then automatically instantiates these DspBlocks
, connects their stream nodes in order, creates a bus, and connects any DspBlocks
that have memory interfaces to the bus.
class TLGenericFIRChain[T<:Data:Ring] (genIn: T, genOut: T, coeffs: => Seq[T], params: GenericFIRParams)(implicit p: Parameters)
extends TLChain(Seq(
TLWriteQueue(params.depth, AddressSet(params.writeAddress, 0xff))(_),
{ implicit p: Parameters =>
val fir = LazyModule(new TLGenericFIRBlock(genIn, genOut, coeffs))
fir
},
TLReadQueue(params.depth, AddressSet(params.readAddress, 0xff))(_)
))
6.8.3. Top Level Traits
As in the previous MMIO example, we use a cake pattern to hook up our module to our SoC.
trait CanHavePeripheryStreamingFIR extends BaseSubsystem {
val streamingFIR = p(GenericFIRKey) match {
case Some(params) => {
val pbus = locateTLBusWrapper(PBUS)
val domain = pbus.generateSynchronousDomain.suggestName("fir_domain")
val streamingFIR = domain { LazyModule(new TLGenericFIRChain(
genIn = FixedPoint(8.W, 3.BP),
genOut = FixedPoint(8.W, 3.BP),
coeffs = Seq(1.U.asFixedPoint(0.BP), 2.U.asFixedPoint(0.BP), 3.U.asFixedPoint(0.BP)),
params = params)) }
pbus.coupleTo("streamingFIR") { domain { streamingFIR.mem.get := TLFIFOFixer() := TLFragmenter(pbus.beatBytes, pbus.blockBytes) } := _ }
Some(streamingFIR)
}
case None => None
}
}
Note that this is the point at which we decide the datatype for our FIR. You could create different configs that use different types for the FIR, for example a config that instantiates a complex-valued FIR filter.
6.8.4. Constructing the Top and Config
Once again following the path of the previous MMIO example, we now want to mix our traits into the system as a whole. The code is from generators/chipyard/src/main/scala/DigitalTop.scala
class DigitalTop(implicit p: Parameters) extends ChipyardSystem
with testchipip.tsi.CanHavePeripheryUARTTSI // Enables optional UART-based TSI transport
with testchipip.boot.CanHavePeripheryCustomBootPin // Enables optional custom boot pin
with testchipip.boot.CanHavePeripheryBootAddrReg // Use programmable boot address register
with testchipip.cosim.CanHaveTraceIO // Enables optionally adding trace IO
with testchipip.soc.CanHaveBankedScratchpad // Enables optionally adding a banked scratchpad
with testchipip.iceblk.CanHavePeripheryBlockDevice // Enables optionally adding the block device
with testchipip.serdes.CanHavePeripheryTLSerial // Enables optionally adding the tl-serial interface
with testchipip.serdes.old.CanHavePeripheryTLSerial // Enables optionally adding the DEPRECATED tl-serial interface
with testchipip.soc.CanHavePeripheryChipIdPin // Enables optional pin to set chip id for multi-chip configs
with sifive.blocks.devices.i2c.HasPeripheryI2C // Enables optionally adding the sifive I2C
with sifive.blocks.devices.timer.HasPeripheryTimer // Enables optionally adding the timer device
with sifive.blocks.devices.pwm.HasPeripheryPWM // Enables optionally adding the sifive PWM
with sifive.blocks.devices.uart.HasPeripheryUART // Enables optionally adding the sifive UART
with sifive.blocks.devices.gpio.HasPeripheryGPIO // Enables optionally adding the sifive GPIOs
with sifive.blocks.devices.spi.HasPeripherySPIFlash // Enables optionally adding the sifive SPI flash controller
with sifive.blocks.devices.spi.HasPeripherySPI // Enables optionally adding the sifive SPI port
with icenet.CanHavePeripheryIceNIC // Enables optionally adding the IceNIC for FireSim
with chipyard.example.CanHavePeripheryInitZero // Enables optionally adding the initzero example widget
with chipyard.example.CanHavePeripheryGCD // Enables optionally adding the GCD example widget
with chipyard.example.CanHavePeripheryStreamingFIR // Enables optionally adding the DSPTools FIR example widget
with chipyard.example.CanHavePeripheryStreamingPassthrough // Enables optionally adding the DSPTools streaming-passthrough example widget
with nvidia.blocks.dla.CanHavePeripheryNVDLA // Enables optionally having an NVDLA
with chipyard.clocking.HasChipyardPRCI // Use Chipyard reset/clock distribution
with chipyard.clocking.CanHaveClockTap // Enables optionally adding a clock tap output port
with fftgenerator.CanHavePeripheryFFT // Enables optionally having an MMIO-based FFT block
with constellation.soc.CanHaveGlobalNoC // Support instantiating a global NoC interconnect
with rerocc.CanHaveReRoCCTiles // Support tiles that instantiate rerocc-attached accelerators
{
override lazy val module = new DigitalTopModule(this)
}
class DigitalTopModule(l: DigitalTop) extends ChipyardSystemModule(l)
with freechips.rocketchip.util.DontTouch
Finally, we create the configuration class in generators/chipyard/src/main/scala/config/MMIOAcceleratorConfigs.scala
that uses the WithFIR
mixin defined in generators/chipyard/src/main/scala/example/dsptools/GenericFIR.scala
.
class WithStreamingFIR extends Config((site, here, up) => {
case GenericFIRKey => Some(GenericFIRParams(depth = 8))
})
class StreamingFIRRocketConfig extends Config (
new chipyard.example.WithStreamingFIR ++ // use top with tilelink-controlled streaming FIR
new freechips.rocketchip.rocket.WithNBigCores(1) ++
new chipyard.config.AbstractConfig)
6.8.5. FIR Testing
We can now test that the FIR is working. The test program is found in tests/streaming-fir.c
.
#define PASSTHROUGH_WRITE 0x2000
#define PASSTHROUGH_WRITE_COUNT 0x2008
#define PASSTHROUGH_READ 0x2100
#define PASSTHROUGH_READ_COUNT 0x2108
#define BP 3
#define BP_SCALE ((double)(1 << BP))
#include "mmio.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
uint64_t roundi(double x)
{
if (x < 0.0) {
return (uint64_t)(x - 0.5);
} else {
return (uint64_t)(x + 0.5);
}
}
int main(void)
{
double test_vector[15] = {1.0, 2.0, 3.0, 4.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.5, 0.25, 0.125, 0.125};
uint32_t num_tests = sizeof(test_vector) / sizeof(double);
printf("Starting writing %d inputs\n", num_tests);
for (int i = 0; i < num_tests; i++) {
reg_write64(PASSTHROUGH_WRITE, roundi(test_vector[i] * BP_SCALE));
}
printf("Done writing\n");
uint32_t rcnt = reg_read32(PASSTHROUGH_READ_COUNT);
printf("Write count: %d\n", reg_read32(PASSTHROUGH_WRITE_COUNT));
printf("Read count: %d\n", rcnt);
int failed = 0;
if (rcnt != 0) {
for (int i = 0; i < num_tests - 3; i++) {
uint32_t res = reg_read32(PASSTHROUGH_READ);
// double res = ((double)reg_read32(PASSTHROUGH_READ)) / BP_SCALE;
double expected_double = 3*test_vector[i] + 2*test_vector[i+1] + test_vector[i+2];
uint32_t expected = ((uint32_t)(expected_double * BP_SCALE + 0.5)) & 0xFF;
if (res == expected) {
printf("\n\nPass: Got %u Expected %u\n\n", res, expected);
} else {
failed = 1;
printf("\n\nFail: Got %u Expected %u\n\n", res, expected);
}
}
} else {
failed = 1;
}
if (failed) {
printf("\n\nSome tests failed\n\n");
} else {
printf("\n\nAll tests passed\n\n");
}
return 0;
}
The test feed a series of values into the fir and compares the output to a golden model of computation. The base of the module’s MMIO write region is at 0x2000 and the base of the read region is at 0x2100 by default.
Compiling this program with make
produces a streaming-fir.riscv
executable.
Now we can run our simulation.
cd sims/verilator
make CONFIG=StreamingFIRRocketConfig BINARY=../../tests/streaming-fir.riscv run-binary