Tiny Tapeout Course 2nd lesson - FPGAs

Paul Campbell - March 2023

paul@taniwha.com @moonbaseotago

Course notes


(C) Copyright Moonbase Otago 2023

All rights reserved

Before you start

Install on your laptop:

Vivado is big, a full install is ~60Gb, you can get it down to ~20Gb if you just install what's required for the Arty boards

(I understand that Mac users may have to do this in a VM)

Things to Learn

  • Making a chip
  • Driving Vivado

Making a chip

We talked last time about how to take a verilog design and compile it into a form where we can simulate a circuit, now we're going to compile one into gates and load it into an FPGA - there are 3 basic steps

  • Synthesis - convert a verilog design into gates and wires
  • Place and route aka 'P&R' (usually done in the order):
    • Physically place the gates
    • Route the wires between them

All of these are timing driven with the tools calculating the delays through and between gates - synthesis has to guess at wire delays, it can change the gates it produces to make things faster, place and route can move things around to make them faster.

Gate libraries

When we're building a chip we normally start with a gate library - this is a group of 'gates' - often hand made physical designs, usually they are all the same height (so that the power rails hook up in the metal1 layer).

Libraries usually contain something like:

  • flip-flops
  • buffers and inverters of different sizes
  • nand/nors in different sizes
  • xor, half/full adders, muxes
  • misc collection of 3-4 input gates

Each library is different, people argue about what is best, how much effort should go into one, usually you don't need to know the details. The designer will have used Spice to do heavy timing simulation of each gate, synthesis/P&R use this to calculate a design's timing

Timing

Max time: usually timing is calculated for every possible path from every flip-flop to every possible other flip-flop (rising and falling) the goal is to make sure that the signal will be stable before the destination flipflop's setup time (wrt the clock)

Min time: flip-flops also have a hold time (how long an input signal must be stable after a clock edge) - this is an easy fix, synthesis will insert buffers in paths that are too fast to slow them down (usually just for direct flop to flop wires)

Place and Route

As mentioned before place and route is the physical layout of the gates produced during synthesis - it starts with placing the gates depending on the timing of the paths through them

The next step is usually to insert a clock tree - a bunch of buffers/inverters so that clock edges are sharp (because they are only driving a few flip-flops each), the geometry of the tree (fanout etc) is adjusted and the layout is then carefully managed so that every flip flop sees the clock edge at the same time.

Some systems insert a scan chain at this point for test during manufacturing

Next we do a timing driven route, wiring up the gates - at this point sometimes the design will be ripped up and rerouted to meet timing constraints

FPGAs

FPGAs essentially have arrays of predesigned gate libraries that can be configured with a bitstream when they are powered up (or later).

Each cell (called a 'LUT' for 'look up table') consists of a flop or two and a configurable logic cell at it's input, if you don't use the flops multiple LUTs can be configured together to make more complex logic

System wide clock trees are pre-wired across the entire chip - you usually have to do something special to hook them up correctly

There are chip wide routing resources (wires and configurable muxes) some of it global, some local

Different FPGA have other specialized resources - RAM, DSP resources, clock PLLs

Pad connectors allow you to connect your logic to external pads

FPGAs 2

Synthesis and place and route for FPGAs is all about configuring LUTS, choosing which ones to use for which timing path and configuring the wiring resources

This config is a bitstream that's shifted in to the chip, often into sram cells at system startup time to configure the design

Every FPGA manufacturer has their own bespoke tool, and secret bitstream format, they're all incompatible - open source developers have been reverse engineering the bitstreams and building their own tools - it's all very much a work in process

Vivado

Vivado

Vivado is Xilinix's proprietary design tool

Everyone hates Vivado

Everyone hates the alternatives

Underneath it's a TCL based framework driving various tools (synthesis, P&R), you're going to use it this way if you're making something big and want to be able to walk away while it builds (on AWS my CPU takes 20+ hours)

We're going to use the GUI

Sample code

As an example we're going to build a design with this at it's core, in a file called "count.v"

    module count(
        input clk,
        input reset,
        input sel,
        output [3:0]out
        );
    
        reg [24:0]r_count, c_count;
    
        assign out = !sel?r_count[24:21]:r_count[22:19];
    
        always @(posedge clk)
            r_count <= c_count;
        
        always @(*) begin
            c_count = r_count;
            if (reset) begin
                c_count = 0;
            end else begin
                c_count = r_count+1;
            end
        end

    endmodule

Sample code 2

And this at the top level - put this in "top.v":

    module top(
        input clk_in_1,
        input sel,
        output out_3,
        output out_2,
        output out_1,
        output out_0
        );

        wire clk;

        clk_wiz_0 c(.clk_in1(clk_in_1), .reset(1'b0), .clk_out1(clk));
        count cl(.clk(clk), .reset(1'b0), .sel(sel), .out({out_3, out_2, out_1, out_0}));

    endmodule

Putting verilog into a design

Now I'm going to give an example of how to put some verilog into a design, we'll:

  • Make a new project, set it up for the correct FPGA
  • Insert the verilog source
  • Create a clock PLL (required for clock wiring) using a wizard
  • Run synthesis
  • Hook up our top level pins to physical pins
  • Run place and route
  • Load a bitstream into a real device and test it

Make a Vivado project

  • Quick-start->Create Project
  • next->next->RTL project, "do not specify projects at this time"
  • next->choose board (mine is Arty A7-35)->next->finish

Add Sources to a Vivado project

  • Proj manager/Add Sources-^gt;Add-files->(choose count.v/top.v)->finish

Add a clock/PLL block

  • Choose Proj manager/IP Catalog, scroll down to FPGA features/Clocking/Clocking Wizard
  • Under Clocking options leave clk_in1 at 100MHz (our board has a 100MHz oscillator on it)
  • Under Output Clocks set clk_out to whatever you want your system clock to be
  • Click OK and then generate, then wait for that to complete

Run Synthesis

  • Proj manager/Run Synthesis
  • look in the top right corner, wait until it's done
  • Open Synthesized design lets you explore the result - try clicking on "schematic"

Assign pads to ports in "Top" - 1

Checking out the Arty schematic

  • E3 clock 100MHz 3.3v
  • D9 push button 0 3.3v
  • C9 push button 1 3.3v
  • B9 push button 2 3.3v
  • B8 push button 3 3.3v
  • H5 LED4 3.3v
  • J5 LED5 3.3v
  • T9 LED6 3.3v
  • T10 LED7 3.3v

Assign pads to ports in "Top" - 2

  • In the bottom right pane chose "I/O Ports"
  • Open scalar ports
  • for each port set I/O Std to LVCMOSS3.3 - these ports are 3.3V on the board
  • For each port hook it it clk_in_1 to E3, sel to D9, out_0-3 to H5/J5/T9/T10
  • click run synthesis again

Run Place and Route

  • Vivado calls place and route "Implementation"
  • Click "Run Implementation"
  • Watch that top right corner until it's done
  • Open Implemented Design, you can explore the layout/wiring

Load your design into the FPGA

  • Generate Bitstream
  • Open Hardware Manager
  • Program Device - you'll need to look around for the bitstream file, it will be somewhere like:

    project_1.runs/impl_1/top.bit