A cloud based workshop on verilog coding guidelines, resulting in predictable logic in silicon. The workshop insights include:
- Digital logic design using Verilog HDL
- Functionality Validation of the design using Functional Simulation
- Creating Test Benches to validate the functionality of the RTL design
- Logic synthesis of the Functional RTL Code
- Gate Level Simulation of the Synthesized Netlist
DAY COUNT | TOPICS COVERED |
---|---|
Day 1 | Introduction to Verilog RTL Design and Synthesis |
Day 2 | Timing Libs, Hierarchial Vs Flat Synthesis and Efficient Flop Coding Styles |
Day 3 | Combinational and Sequential Optimizations |
Day 4 | GLS, blocking vs non-blocking and Synthesis-Simulation mismatch |
Day 5 | Optimization in synthesis |
VSD - Kunal Ghosh and Team: https://www.vlsisystemdesign.com/rtl-design-using-verilog-with-sky130-technology/
Register Tranfer Level is a low-level abstraction to represent a design circuit. RTL design facilitates the designers by automating the design process. It converts the functionality of a design circuit written in Hardware Description Languages (HDL) into equivalent combinational and sequential circuit. Fundamentally, Design is the actual Verilog code of set of Verilog codes which has intended functionalities to meet the specifications. RTL design is also checked for adherence to specification by simulating the design. While a Simulator is used to verify or check the design, Testbench is the setup to apply stimulus to the design to check its functionality.
Note: Design module may have one or more primary inputs and primary outputs. However, testbench will not have any primary input or output.
Note: Simulator continuously checks for changes in the input. If there is an input change, the output is evaluated; else the simulator will never evaluate the output.
#Steps Followed:
//create a directory
$ mkdir VLSI
//Git Clone vsdflow.
//Reference: https://github.com/kunalg123/vsdflow
$ git clone https://github.com/kunalg123/vsdflow.git
//Git Clone sky130RTLDesignAndSynthesisWorkshop.
//Reference: https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop
$ git clone https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git
Note: sky130RTLDesignAndSynthesisWorkshop Directory has: My_Lib - which contains all the necessary library files; where lib has the standard cell libraries to be used in synthesis and verilog_model with all standard cell verilog models for the standard cells present in the lib. Ther verilog_files folder contains all the experiments for lab sessions including both verilog code and test bench codes.
The multiplexer (MUX) is a combinational logic circuit which is designed to switch one of the several input lines through to a single common output. The input A acts to control which input (either I0 or I1) gets passed to the output Q. A good mux - when the data select input A is at logic 0, input I1 passes its data to the output, while I0 is blocked. When the input is at logic 1, input I0 passes its data to the output, while I1 is blocked. Output Expression is given as Q = A'I1 + A I0.
#Steps Followed:
//Load the design in iVerilog by giving the verilog and testbench file names
$ iverilog good_mux.v tb_good_mux.v
//List so as to ensure that it has been added to the simulator
$ ls
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_good_mux.vcd
Note: The invoking command for iVerilog should have both source and testbench Verilog files. Once this command is run, it creates a.out file which dumps the vcd file. GTKwave viewer is used to open the vcd dump file and analyze the waveform.
Note: GTKWave is the waveform analyzer and is the primary tool used for visualization and thereby to check the design functionality.
Additionally, the verilog and test bench modules can be accessed using the following command:
$ gvim tb_good_mux.v -o good_mux.v
Note: Vim is a highly configurable text editor built to enable efficient text editing. gvim brings all the functionality, power, and features of Vim while adding the convenience and intuitive nature of a GUI environment.
Synthesizer (Yosys) is the tool that helps to convert RTL to netlist. Netlist is the representation of the design in the form of standard cells (in the form of the cells present in the .lib). Design and netlist file should be one and the same. Logic synthesis is the optimiztion stage during the CAD process where the RTL code is being transformed into netlist.
The stimulus should be same as the output observed in the RTL simulation. The design was written in Verilog code and netlist is the standard cell representation of the code. The set of primary inputs and primary outputs will remain the same between the RTL design and synthesized netlist. It implies that, we can use the same testbench as RTL design.
RTL Design is the behavioral representation in HDL form for the required specification.
But How to map the code with the hardware circuit? Synthesis - RTL to gate level translation. The design is converted into gates and the interconnections are made between the gates. This is given out as a file named as netlist. We take the RTL design, combine with .lib and synthesis it to get the netlist file.
Note: .lib file is a collection of logical modules which includes all basic logic gates. It may also contain different flavors of the same gate (2 input AND, 3 input AND – slow, medium and fast version).
Fast and Slow cells comes with its own advantages and disadvantages when we consider the critical delays in combinational circuits. It is necessary to guide the synthesizer for selecting the cells that is optimum for implementing the logic circuits. This is called as Constraints.
Note: Faster Cells lead to increased area and power, potentially leading to hold time violations. Slower Cells will result in slow circuits and may fail to meet the performance.
#Steps Followed:
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__t_025C_1v80.lib
//Read Design
$ read_verilog good_mux.v
//Synthesize Design
$ synth -top good_mux
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Realizing Graphical Version of Logic
$ show
//Writing the netlist in a crisp manner
$ write_verilog -noattr good_mux_netlist.v
Screenshot: Steps for Design Synthesis
Screenshot: Inference from Synthesis and Execution of netlist generation
Screenshot: Inference from abc command - Synthesized Netlist
Screenshot: Graphical Representation of the Logic using show command
Observation: The Generated circuit thus has 2-input NAND, OR-AND-Inverted and NOT gates. The output of this circuit is i1sel + i0sel'
Screenshot: Writing the Netlist
# | TOPICS COVERED |
---|---|
1. | UNDERSTANDING THE LIBRARY |
2. | CONTENTS OF THE LIBRARY FILE |
3. | HIERARCHICAL SYNTHESIS |
4. | FLAT SYNTHESIS |
5. | SUB MODULE LEVEL SYNTHESIS |
6. | FLIP FLOP OVERVIEW |
7. | FLIP FLOP SIMULATION |
8. | FLIP FLOP SYNTHESIS |
9. | OPTIMIZATION TECHNIQUES |
For a design to work, there are three important parameters that determines how the Silicon works: Process (Variations due to Fabrications), Voltage (Changes in the behavior of the circuit) and Temperature (Sensitivity of semiconductors). Libraries are characterized to model these variations.
//Steps Followed:
//Command to open the libary file
$ gvim ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//To shut off the background colors/ syntax off:
: syn off
//To enable the line numbers
: se nu
sky130_fd_sc_hd__tt_025C_1v80.lib
PARAMETERS | MEANING |
---|---|
SKY130 | Technology - CMOS |
fd | Foundary - Skywater |
sc | Standard Cell - Digital |
hd | Density - High |
tt | Process - Typical |
025C | Temperature - Measure |
1v80 | Voltage - Measure |
Screenshot: .lib Sample File
.lib is a bucket of all standard cells that are available (including every flavor of the cells). For all combinations of cells, it also contains the features: Area, Power, Timing and Pin details, Capacitance to mention a few.
//Steps Followed:
//To view the Equivalent Verilog model (in order to understand the functionality of the cell)
//to open without power ports
:sp ../my_lib/verilog_model/sky_130_fd_sc_hd__a2111o.behavioral.v
Screenshot: .lib file containing the cell details for sky_130_fd_sc_hd__a2111o_1
Screenshot: Equivalent Verilog Model for sky_130_fd_sc_hd__a2111o_1
Note: For this above 5-input gate, there will be 2^5 = 32 combinations for each feature value (e.g leakage power consumption for all 32 combinations).
Screenshot: Various Flavours of AND Cell
//Steps Followed:
//To open and cell
/cell .*and
//To open variations of and cell in parallel
:vsp
Note: AND2_4 implies that it is a wider cell compared to the other two. Hence, it has more area and power specifications. It also has the least delay. On the other hand, AND2_0 - cell with least area and power has bigger delay.
//Steps Followed:
//Opening the file used for this experiment
$ gvim multiple_modules.v
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog multiple_modules.v
//Synthesize Design
$ synth -top multiple_modules
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__t_025C_1v80.lib
//Realizing Graphical Version of Logic for multiple modules
$ show multiple_modules
//Writing the netlist in a crisp manner
$ write_verilog -noattr multiple_modules_hier.v
$ !gvim multiple_modules_hier.v
Screenshot: Multiple_modules File
Note: There are 2 submodules in this file. The module multiple_modules instantiaties two other sub-modules.
Screenshot: Statistics of Multiple_modules File
Screenshot: Graphical Realization of the Logic
Observation: The realization after show command is different from the expected Realization of the file. The instants of the sub-modules u1 and u2 are seen instead of the the AND and OR gates. This is called as Hierarchical Design where the hierarchies are preserved.
Screenshot: Expected Realization of Multiple_modules File
_Screenshot: Netlist file showing the sub-modules, preserving the hierarchy
Observation: There is an interesting note on how the OR gate is realized through De-Morgan's Law. The Synthesis tool has chosen NAND implementation. This is because, for a NAND gate in CMOS implementaion - there is a stacked NMOS structure. For realising OR gate - there is a stacked PMOS structure. Stacking PMOS will have negative effects (since they need good logical efforts due to poor mobility of holes). Hence there is a need for wide cell to put in the logical efforts.
//Steps Followed:
//To flatten the netlist
$ flatten
//Writing the netlist in a crisp manner and to view it
$ write_verilog -noattr multiple_modules_flat.v
$ !gvim multiple_modules_flat.v
Screenshot: Graphical Realization of the Logic
_Screenshot: Netlist file showing the flattened netlist, without any hierarchy
Observation: There is no hierarchy preserved here and the gates are directly instantiated under module multiple_module without being called under the sub-module.
Sub-module level synthesis is preferred when there are multiple instances of same module. Sythesizing the same module over several times may not be advantageous with respect to time. Instead, synthsis can be performed for one module, its netlist can be replicated and then stitched together in the top module. This is also used particulary in massive designs using divide and conquer method.
//Steps Followed:
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Read Design
$ read_verilog multiple_modules.v
//Synthesize Design - this controls which module to synthesize
$ synth -top sub_module1
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
//Writing the netlist in a crisp manner
$ write_verilog -noattr multiple_modules_hier.v
$ !gvim multiple_modules_hier.v
Screenshot: Statistics of Sub-module
Screenshot: Graphical Realization of the Logic
Screenshot: NetList File of Sub-module
Note: The gates that are instantiated under submodule 1 can only bee seen.
In a digital design, when an input signal changes state, the output changes after a propogation delay. All logic gates add some delay to singals. These delays cause expected and unwanted transitions in the output, called as Glitches where the output value is momentarily different from the expected value. An increased delay in one path can cause glitch when those signals are combined at the output gate. In short, more combinational circuits lead to more glitchy outputs that will not settle down with the output value.
Hence, there is a need to store the values called as flop elements. D Flip-flops (aka Data or Delay Flip Flops) are the widely storage elements used to restrict the glitches.
Flop elements stores a single bit of data - with either of the states 0 and 1. They are placed inbetween the combinational circuits and the output of flop will change at the edge of clock. Though it's input is glitchy, the output is stable. That way, the successive combinational circuit will receive a stable intput and its output will also be predictable and settled down.
Every flop element needs an initial state, else the combinational circuit will evaluate to a garbage value. In order to achieve this, there are control pins in the flop namely: Set and Reset which can either be Synchronous or Asynchronous.
Note: Here, always block gets evaluated when there is a change in the clock or change in the set/reset.The circuit is sensitive to positive edge of the clock. Upon the signal going low/high depending on reset or set control, singal q line goes changes respectively. Hence, it does not wait for the positive edge of the clock and happens irrespective of the clock.
Note: Here, the singal waits for the clock always and is always set to D Pin of flop. The D pin will wait for the positive edge of the clock and on the subsequent occurence, the output changes respectively. The sensitivity list only contains posedge clk.
Note: Care needs to be taken when using the Set and Reset control pins and they may lead to race conditions. The always block is evaluated for positive edge of clock and asynchronous reset. The execution of else if implies that the always block has been evaluated because of the positive edge of the clock.
#Steps Followed for analysing Asynchronous behavior:
//Load the design in iVerilog by giving the verilog and testbench file names
$ iverilog dff_asyncres.v tb_dff_asyncres.v
//List so as to ensure that it has been added to the simulator
$ ls
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_dff_asyncres.vcd
Screenshot: Waveform Behavior of DFF with Asynchronous Reset
Observation: The output does not wait for the clock (independent of positive edge of the clock).
//Steps Followed:
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog dff_asyncres.v
//Synthesize Design - this controls which module to synthesize
$ synth -top dff_asyncres
//There will be a separate flop library under a standard library
//so we need to tell the design where to specifically pick up the DFF
//But here we point back to the same library and tool looks only for DFF instead of all cells
$ dfflibmap -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
//Writing the netlist in a crisp manner
$ write_verilog -noattr dff_asyncres_ff.v
$ !gvim dff_asyncres_ff.v
Screenshot: Statistics of D FLipflop with Asynchronous Reset
Screenshot: Graphical Realization of the Logic
Screenshot: NetList File of D FLipflop with Asynchronous Reset
//Steps Followed:
//modules used for this experiment are opened using the command
$ gvim mult_*.v -o
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog mult_2.v
//Synthesize Design - this controls which module to synthesize
$ synth -top mul2
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
//Writing the netlist in a crisp manner
$ write_verilog -noattr mult_2.v
$ !gvim mult_2.v
Screenshot: Expected logic from RTL file
Note: Mul2: It has a 3 bit input and generating a 4 bit output. The relationship for the output is twice the input a. Apparently, the output can be written as the input a itself appended with zeros. Ideally, there is no requirement for Hardware without needing a multiplier.
Screenshot: Statistics of D FLipflop with Asynchronous Reset
Note: No hardware requirements - No # of memories, memory bites, processes and cells. Number of cells inferred is 0.
_Screenshot: abc command return due to absence of standard cell library
Observation: Due to absence of the cell, the tool returns Not to call abc command as there is nothing to map.
Screenshot: Graphical Realization of the Logic
Note: The ouput here is just a appended with a zero.
Screenshot: NetList File of Sub-module
Screenshot: Expected logic from RTL file
Note: Incase if the input a has 3 bits and generated output has 5 bits. The relationship for the output y is always a constant (say 9) times the input a. The number 9a can be split as (8 + 1)a. a(1) - a mapped to a 3-bit number. a(8) - a followed by 000 which gives a 6-bit number.
Screenshot: Statistics
Screenshot: Graphical Realization of the Logic
Note: There is no hardware requirement.
Screenshot: NetList File of Sub-module
# | TOPICS COVERED |
---|---|
1. | LOGIC CIRCUITS OVERVIEW |
2. | COMBINATIONAL LOGIC OPTIMIZATION |
3. | SEQUENTIAL LOGIC OPTIMIZATION |
4. | SEQUENTIAL UNUSED OUTPUT OPTIMIZATION |
There are two types of digital logic circuits - combinational and sequential logic circuits. Combinational circuits are collection of basic logic gates, where the output depends only on the current inputs and do not require any clocks. They result in a simple circuit capable of implementing complex logic using logic gates only. Sequential circuits are collection of memory elements calls as flip-flops. The circuit's output depends on current input as well as the past intputs. Due to the presence of flip-flops, the output requires clock inputs. Hence, they result in a complex circuit capable of implmenting complex logic using memory.
Logic optimization is a part of logic synthesis to find an equivalent representation of the specified logic circuit under one or more specified constraints. We perform in order to squeeze the logic and get the most optimized design which can lead to area and power savings. This can be achieved through Constant Propoation Method (Ex: Direct Optimisation) or through Boolean Logic Optimization (ex: K-Map or Quine-McCluskey Methods). In Constant Propogation, most optimized logic is obtained by propogating the value of one input is to the next stage and all the way to the output. In Boolean Logic Optimization, synthesis tools reduce complex logic equations to simplied version using boolean algebra/K-map reductions.
One of the most basic Optimization Techniques for sequential circuits is the constant propogation method. At times of logic design when D input is tied Low, in order to optimize the sequential logic, the Q pin of flop should always have a constant value. There are also advanced techniques to obtain a most condensed state machine: 1) State Optimization where the unused states are being optimized. 2) Cloning way of logic is done during physical aware synthesis (where if two flops are sitting far off - there might be a large routing delay. To prevent this, a flip flop with huge positive slack can be cloned and the timing can be met). 3) Re-timing - the combinational logic is partitioned effectively to reduce the delay, thereby increasing the frequency of operation. Hence, the performance of the circuit is improved using this technique.
//Steps Followed for each of the optimization problems:
//to view all optimization files
$ ls *opt_check*
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog opt_check.v
//Synthesize Design - this controls which module to synthesize
$ synth -top opt_check
//To perform constant propogation optimization
$ opt_clean -purge
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
Screenshot: Expected logic from verilog file
Note: The value of y depends on a, y = ab.
Screenshot: Command for performing combinational optimization using constant propogation method
Screenshot: Graphical Realization of the Logic
_Note: The optimized graphical realization thus shows a 2-input AND gate being implemented. _
Screenshot: Expected logic from verilog file
Note: The value of y depends on a, y = a+b.
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization thus shows 2-input OR gate being implemented. Although OR gate can be realized using NOR, it can lead to having stacked PMOS configuration which is not a design recommendation. So the OR gate is realized using NAND and NOT gates (which has stacked NMOS configuration).
Screenshot: Expected logic from verilog file
Note: The value of y depends on a, y = abc.
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization thus shows 3-input AND gate being implemented.
Screenshot: Expected logic from verilog file
Note: The value of y depends on a, y = a'c + ac.
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization thus shows A XNOR C gate being implemented.
Note: Due to the presence of multiple modules, the netlist was flattened before optimizing the logic circuit.
Screenshot: Verilog file
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization thus shows a 2-input AND into first input of 2-input OR gate being implemented.
Screenshot: Verilog file
Screenshot: Graphical Realization of the Logic
//Steps Followed for each of the optimization problems:
//To view all optimization files
$ ls *df*const*
//To open multiple files
$ dff_const1.v -o dff_const2.v
//Performing Simulation
//Load the design in iVerilog by giving the verilog and testbench file names
$ iverilog dff_const1.v tb_dff_const1.v
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_dff_const1.vcd
//Performing Synthesis
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Read Design
$ read_verilog dff_const1.v
//Synthesize Design - this controls which module to synthesize
$ synth -top dff_const1
//There will be a separate flop library under a standard library
//so we need to tell the design where to specifically pick up the DFF
//But here we point back to the same library and tool looks only for DFF instead of all cells
$ dfflibmap -liberty ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
Screenshot: Expected logic from verilog file
Note: Although Reset goes low, Q will wait for the clock to go high in order to become high. The flop will be inferred in this design.
Screenshot: Verifying the Observation using Simulation
Screenshot: Statistics showing a flop inferred
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization thus shows the flop inferred. Also, the design code has active high reset and the standard cell library has active low reset - so, there is a presence of inverter for the reset.
Screenshot: Expected logic from verilog file
Note: Q is constant with value of 1
Screenshot: Verifying the Observation using Simulation
Screenshot: Statistics showing no flop inferred
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization thus does not have any flop inferred and is a constant value of 1, irrespective of reset or clock signals.
Screenshot: Expected logic from verilog file
Note: There are 2 flops, reset if condition defines Q else condition defines D signal. Q1 waits for the next positive edge of the clock with reset is applied. The successive flop will sample the value of 0 due to TCK delay effect in the preceeding flop. The output Q will always be high except for a one clock cycle. Both flops are expected to be present and will not be optimized.
Screenshot: Verifying the Observation using Simulation
Screenshot: Statistics showing both flops being inferred
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization thus shows both flops being inferred. It can also be seen that the first flop is reset and second is set.
Screenshot: Expected logic from verilog file
Note: Since both the flops have constant 1 in the output lines and thus they are expected to be optimized. The resulting netlist will not have any flops inferred. .
Screenshot: Verifying the Observation using Simulation
Screenshot: Statistics showing no flops being inferred
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization thus does not have flops inferred and is a constant value of 1, irrespective of reset or clock signals.
Screenshot: Expected logic from verilog file
Screenshot: Verifying the Observation using Simulation
Screenshot: Statistics showing both same flop being inferred twice
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization have the same flop inferred twice.
//Steps Followed for each of the unused output optimization problems:
//opening the file
$ gvim counter_opt.v
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog opt_check.v
//Synthesize Design - this controls which module to synthesize
$ synth -top opt_check
//To perform constant propogation optimization
$ opt_clean -purge
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd_-tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
Screenshot: Expected logic from verilog file
Note: If there is a reset, the counter is intialised to 0, else it is incremented - performing like an upcounter. Since it is a 3 bit signal, the counter rolls back after 7. However, the final output q is sensing only the count [0], so the bit is toggling in every clock cycle (000, 001, 010 ...111). The other two outputs are unused and does not create any output dependency. Hence, these unused outpus need not be present in the design.
Screenshot: Statistics showing only one flop inferred instead of 3 flops sinces it is a 3 bit counter
Screenshot: Graphical Realization of the Logic
Note: The optimized graphical realization output Q (count0) being fed to NOT gate so as to perform the toggle function. The other outputs which has no dependency on the primary out is optimized off.
//Steps Followed:
//Copying the code to a new file
$ cp counter_opt.v counter_opt2.v
$ gvim counter_opt2.v
//Changes made in the verilog code, i for insert mode:
- assign q = [count2:0] == 3'b100;
Screenshot: Expected logic from verilog file
Note: In this case, all three bits of the counter is used and hence 3 flops are expected in the optimized netlist.
Screenshot: Statistics showing all three flops inferred
Screenshot: Graphical Realization of the Logic
Note: All three flops can be seen. There is a need for incremental logic, so the logic other than flops represent the adder circuit. The expression at the output is q = counter2.counter1'.counter0'. Therefore, the outputs having no direct role on the primary output will only be optimized away.
# | TOPICS COVERED |
---|---|
1. | GATE LEVEL SIMULATION |
2. | SYNTHESIS SIMULATION MISMATCH |
3. | EXPERIMENTS WITH GLS |
4. | MISSING SENSITIVITY LIST |
5. | CAVEATS IN BLOCKING ASSIGNMENTS |
Previously, the functionality of the design was given stimulus inputs and the output was verified to meet the specifications through a test bench module. The RTL design was considered as the DUT (Design Under Test). In Gate Level Simulation, the Netlist is considered as the Design Under Test. Netlist is logically same as the RTL code that was converted to Standard Cell Gates. Hence, same test bench will align with the design.
- To logically verify the correctness of the design after Synthesis.
- During the RTL Simulation, timing was not accounted. But for practical applications, there is a need to ensure the timing of the design to be met.
//consider a netlist
and uand (.a(a),.b(b))
or uor (.a(a),.b(b))
//There is a need to define the meaning of and and or
//Thus we need netlist, testbench and verilog models of the standard cells
Note: Netlist consists of all standard cells instantiated and it's meaning is conveyed to the iVerilog using Gate Level Verilog Models. Gate Level Verilog Models can be functional or timing aware. If the gate level models are delay annotated, then GLS can be performed for timing validation also in addition to functional validation.
If netlist is a true reciprocation of RTL, what is the need to validate the functionality of netlist? There may be synthesis and simulation mismatch due to the following reasons:
- Absence of Sensitivity List
- Blocking Vs Non Blocking Assignments
- Non Standard Verilog Coding
module mux(
input i0,input i1
input sel,
output reg y
);
always @ (sel)
begin
if (sel)
y = i1;
else
y = i0;
end
endmodule
The output of Simulator changes only when the input changes. The output is not evaluated when there is no activity. In the above 2x1 mux code, when select is changing (when select is 1), the output is 1 when input is 1 else the output is 0. The always block evaluates only when there is a transition change in select pin, and is not sensitive (output does not reflect) to changes in the inputs 0 and 1.
Time | Logic |
---|---|
During Simulation | Logic acts as a Latch/Double edged Flop |
During Synthesis | Logic acts as a Mux |
Hence there is a Synthesis Simulation mismatch due to missing sensitivity list. This is because the synthesizer will not take sensitivity list into account and always looks for the functionality of the code.
module mux(
input i0,input i1
input sel,
output reg y
);
always @ (*)
begin
if (sel)
y = i1;
else
y = i0;
end
endmodule
Note: Thus the mismatch is corrected by having always @ (*) where the always block is evaluated when any signal changes. So, any changes in inputs will also be seen in the output.
The error always occurs when inside an always block. Blocking Statements executes the statements in the order it is written. The first statement is always evaluated before second statement (like a C program). Non-Blocking Statements executes the statements in parallel. All the right hand side assignments will be evaluated before assigning to the left hand side.
Assignment | Statement |
---|---|
= | Blocking Statment |
<= | Non-Blocking Statment |
module code (input clk,input reset,
input d,
output reg q);
always @ (posedge clk,posedge reset)
begin
if(reset)
begin
q0 = 1'b0;
q = 1'b0;
end
else
q = q0;
q0 = d;
end
endmodule
The code is aimed to create a shift register when two flops as shown above. The assignments inside the code represent the blocking statements. q0 and q are assigned to 1 bit 0s - so asynchronous reset connection happens. However, in the later parts, q0 is assigned to q and then d gets assigned to q0. If suppose, there is a change in the code
module code (input clk,input reset,
input d,
output reg q);
always @ (posedge clk,posedge reset)
begin
if(reset)
begin
q0 = 1'b0;
q = 1'b0;
end
else
q0 = d;
q = q0;
end
endmodule
In this case, d is assigned to q0 and then q0 is assigned to q. So, by the time the second statment gets executed, q0 has the value of d. This will lead to implementation of only one flop. Previously, q has the value of q0 and q0 has the value of d - which lead to implementation of 2 storage elements.
module code (input clk,input reset,
input d,
output reg q);
always @ (posedge clk,posedge reset)
begin
if(reset)
begin
q0 <= 1'b0;
q <= 1'b0;
end
else
q0 <= d;
q <= q0;
end
endmodule
Therefore the order does not matter here as RHS gets evaluated first and then assignment takes place. Presence of two flops irrespective of the order. Always use non blocking statements for writing sequential circuits.
module code (input a,b,c
output reg y);
reg q0;
always @ (*)
begin
y = q0 & c;
q0 = a|b ;
end
endmodule
The code is aimed to create a function of y = (A+B).C. In the above code, when the code enters always block, due to the presence of blocking statements, they get evaulated in order. So y gets evaluated first (q0.C), where the q0 results corresponds to the previous iteration's result. The q0 value gets updated only in the second statement.
Time | Logic |
---|---|
During Simulation | Logic mimcs a delay or flop |
During Synthesis | Logic will not have a flop |
When the order of the statements is changed: In this case, a OR b is evaluated first and the latest value is used for calculating y.
module code (input a,b,c
output reg y);
reg q0;
always @ (*)
begin
q0 = a|b ;
y = q0 & c;
end
endmodule
Therefore there is a paramount importance to run the GLS on the netlist and match the specifications, to ensure there is no simulation synthesis mismatch.
//Steps Followed:
//opening the file
$ gvim ternary_operator_mux.v
//PERFORMING SIMULATION
//Load the design in iVerilog by giving the verilog and testbench file names
$ iverilog ternary_operator_mux.v tb_ternary_operator_mux.v
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_ternary_operator_mux.vcd
//PERFORMING SYNTHESIS
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog ternary_operator_mux.v
//Synthesize Design - this controls which module to synthesize
$ synth -top ternary_operator_mux
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
//Write Verilog netlist
$ write_verilog ternary_operator_mux_net.v
//PERFORMING GLS
//Opening Verilog Models, Netlist and Test Bench
$ iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/
sky130_fd_sc_hd.v ternary_operator_mux_net.v tb_ternary_operator_mux.v
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_ternary_operator_mux.vcd
_Note: Mux function is written using a ternary operator. Ternary operator takes 3 operands with the format.
<Condition>?<True>:<False>
Screenshot: Verilog file
Screenshot: Verifying the Observation using Simulation
Observation: Function of a 2x1 mux.
Screenshot: Statistics showing a flop inferred
Observation: A mux has been inferred.
Screenshot: Graphical Realization of the Logic
Note: NAND gate with i1 and sel, inverted io and Or to And invert gate, to which the inputs are sel and inverted i0. The output y is given by the expression = sel'.i0 + sel.i1
Screenshot: Commands to perform Gate Level Simulation
Screenshot: GLS Output
Observation: Confirms the functionality of 2x1 mux.
Screenshot: Verilog file
Expected Behavior: The sensitivity list contains only select input. So during Simulation, the logic acts as a latch and during synthesis, it acts as a mux.
Screenshot: Simulation Output
Observation: When select is low, it follows i0, and there is no activity happening in select line - so the output remains low. When the select is high, it follows i1, and again there is no activiting in the select line. Thus it acts as a flop, retaining its value.
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Note: There is a mux inferred during synthesis of the logic.
Screenshot: GLS Output
Observation: Confirms the functionality of 2x1 mux after synthesis where when the select is low, activity of input 0 is reflected on y. Similarly, when the select is hight, activity of input 1 is reflected on y. Hence there is a synthesis simulation mismatch due to missing sensitivity list.
Screenshot: Verilog file
Expected Behavior: In the above code, when the code enters always block, due to the presence of blocking statements, they get evaulated in order. So d gets evaluated first (x.c), where the x results corresponds to the previous iteration's result (a|b). The d value gets updated only in the second statement. The output expression is given as d = (a+b).c
Screenshot: Simulation Output
Observation: d = (a+b).c, if the inputs a,b = 0; then a+b = 0. The output d = 0. But, we observe the output d = 1 because it looks at the past value where a+b was 1.
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Note: The synthesized design has or 2 and gate to realize the output.
Screenshot: GLS Output
Observation: The value of output d is 0 after simulation and 1 after synthesis for the same set of input values. Hence there is a synthesis simulation mismatch due to blocking assignments.
# | TOPICS COVERED |
---|---|
1. | IF STATEMENTS |
2. | CASE STATEMENTS |
3. | INCOMPLETE IF STATEMENTS |
4. | INCOMPLETE CASE STATEMENTS |
5. | STATEMENTS USING FOR |
6. | STATEMENTS USING GENERATE |
The if statement is a conditional statement which uses boolean conditions to determine which blocks of verilog code to execute. If always translates into Multiplexer. It is used for priority Logic and are always used inside always block.The variable should be assigned as a register.
if<cond>
begin
.....
.....
end
else
begin
.....
.....
end
if<cond1>
begin
.....
executes cb1
.....
end
else if<cond2>
begin
.....
executes cb2
.....
end
else if<cond3>
begin
.....
executes cb3
.....
end
else
begin
.....
executes cb4
.....
end
Note: Condition 1 gets the highest Prority, If the condition1 is met - other conditions are not evaluated. LegN gets evaluated only if all the conditions precedding fail to meet.
Inferred latches can serve as a 'warning sign' that the logic design might not be implemented as intended. They represent a bad coding style, which happens because of incomplete if statements/crucial statements missing in the design. For ex: if a else statement is missing in the logic code, the hardware has not been informed on the decision, and hence it will latch and will tried retain the value. This type of design should be completely avoided unless intended to meet the design functionality (ex: Counter Design).
Note: Combinational circuits cannot have an inferred latch.
The hardware implementation is a Multiplexer. Similar to IF Statements, Case statements are also used inside always block and the variable should be a register variable.
reg y
always @ (*)
begin
case(sel)
2'b00:begin
....
end
2'b01:begin
....
end
.
.
.
endcase
end
- Case statements are dangerous when there is an incomplete Case Statement Structure may lead to inferred latches. To avoid inferred latches, code Case with default conditions. When the conditions are not met, the code executes default condition.
reg y
always @ (*)
begin
case(sel)
2'b00:begin
....
end
2'b01:begin
....
end
.
.
default:begin
....
end
endcase
end
- Partial Assignments in Case statements - not specifying the values. This will also create inferred latches. To avoid inferred latches, assign all the inputs in all the segments of the case statement.
If - Else If - Else If - Else Structure | Case Structure |
---|---|
Undergoes concept of priority. | No priority of segments in case structure |
Only one segment of the code will execute as it follows top-bottom approach sequentially. | May lead to Unpredictable outputs in bad case structures as there may be more than one segment executing the code. Thus, we should not have overlapping case statements. |
//Steps Followed for all the experiments:
//opening the file
$ ls *incomp*
$ gvim *incomp* -o
//PERFORMING SIMULATION
//Load the design in iVerilog by giving the verilog and testbench file names
$ iverilog incomp_if.v tb_incomp_if.v
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_incomp_if.vcd
//PERFORMING SYNTHESIS
//Invoke Yosys
$ yosys
//Read library
$ read_liberty -lib ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Read Design
$ read_verilog incomp_if.v
//Synthesize Design - this controls which module to synthesize
$ synth -top incomp_if
//Generate Netlist
$ abc -liberty ../my_lib/lib/sky130_fd_sc_hd__tt_025C_1v80.lib
//Realizing Graphical Version of Logic for single modules
$ show
//To write the netlist
$ write_verilog -noattr incomp_if_net.v
//PERFORMING GLS
//Opening Verilog Models, Netlist and Test Bench
$ iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/
sky130_fd_sc_hd.v incomp_if_net.v tb_incomp_if.v
//To dump the VCD file
$ ./a.out
//To load the VCD file in GTKwaveform
$ gtkwave tb_incomp_if.vcd
Screenshot: Verilog file
Expected Behavior: Else case is missing so there will be a D latch.
Screenshot: Simulation Output
Observation: When i0 (select line) is low, the output latches to a constant value. Presence of inferred latches due to incomplete if structure.
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Note: The synthesized design has a D Latch inferred due to incomplete if structure (missing else statement).
Screenshot: Verilog file
Expected Behavior: Else case is missing so there will be a latch.
Screenshot: Simulation Output
Observation: When i0 is high, the output follows i1. When i0 is low, the output latches to a constant value (when both i0 and i2 are 0). Presence of inferred latches due to incomplete if structure.
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Note: The synthesized design has a D Latch inferred due to incomplete if structure (missing else statement).
Screenshot: Verilog file
Expected Behavior: There is an incomplete case structure, so a latch is expected.
Screenshot: Simulation Output
Observation: When select signal is 00, the output follows i0 and is i1 when the select value is 01. Since the output is undefined for 10 and 11 values, the ouput latches to the previously available value.
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Note: The synthesized design has a D Latch inferred due to incomplete case structure (missing output definition for 2 of the select statements).
Screenshot: Verilog file
Expected Behavior: There is an incomplete case structure but with a default condition, so a latch is not expected.
_Observation: When select signal is 00, the output follows i0 and is i1 when the select value is 01. Since the output is undefined for 10 and 11 values, the presence of default sets the output to i2 when the select line is 10 or 11. The ouput will not latch and be a proper combinational circuit.
Screenshot: Synthesis Statistics Report
Note: The synthesized design has combinational logic without latch due to the presence of default case statement.
Screenshot: Verilog file
Expected Behavior: There is a partial case structure with output of x undefined for one of the select values, so a latch is not expected.
Observation: The mux for output y will not have a latch, while there will be a latch for mux with output x as one of the conditions is not defined
Select | Output y | Output x |
---|---|---|
00 | i0 | i2 |
01 | i1 | Latch |
10 | i2 | i1 |
11 | i2 | i1 |
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Note: The synthesized design has a latch due to partial case statement for output x. Though we write default condition, there can be inferred latches.
Screenshot: Verilog file
Screen Shot 2021-09-04 at 11 12 34 PM
Expected Behavior: Although the case structure is not complete, there is overlapping of output when the select input is 10 or 11 and ? represented that the bit can be wither 0 or 1. Thus, the simulator may be confused.
Screenshot: Simulation Output
Observation: Here, when the select input is 11, the output value is latched to a value.
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Note: It can be inferred that there is no Latch in the synthesized netlist as the case structure is complete (no presence of inferred latches)
Screenshot: GLS Output
Observation: There is no latch observed in the output. The synthesizer tool does not get confused. Hence there is a Synthesis Simulation Mismatch due to overlapping of legs in the code. Care must be taken to address the legs individually without any overlap (mutually exlusive code)
FOR STATEMENTS | GENERATE STATEMENTS |
---|---|
These statements are used inside the always block | These statements are used outsde the always block |
Used for evaluating expressions | Used for instantiating/replicating Hardwares |
Screenshot: Verilog file
Expected Behavior: The structure is complete and expected to behave as a 4x1 multiplexer
Screenshot: Simulation Output
_Observation: It is a 4x1 multiplexer behavior which is given as mentioned below:
Select | Output y |
---|---|
00 | follows i0 |
01 | follows i1 |
10 | follows i2 |
11 | follows i3 |
Screenshot: Verilog file
Expected Behavior: All the outputs are initialised to 0, to avoid inferring laches. Depending on the select line, the input is allocated to one of the outputs.
Screenshot: Simulation Output
_Observation: It is a 1x8 multiplexer behavior which is given as mentioned below:
Select | i |
---|---|
000 | follows o0 |
001 | follows o1 |
010 | follows o2 |
011 | follows o3 |
100 | follows o4 |
101 | follows o5 |
110 | follows o6 |
111 | follows o7 |
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Note: It can be inferred that there is no Latch in the synthesized netlist as the case structure is complete (no presence of inferred latches)
Screenshot: GLS Output
Observation: The observed waveform in simulation and synthesis matches and conforms code functionality.
Note: The experiment of using demux with generate if statement functions same as of demux with case statement. However, the advantage of using generate if statements is the number of lines in the code which almost remains the same even if the input lines in demultiplexer increases.
Screenshot: Verilog file
Screenshot: Simulation Output
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output
Screenshot: GLS Output
Observation: The observed waveform in simulation and synthesis matches and conforms code functionality.
Note: Instantiating the full adder in a loop to replicate the hardware
Screenshot: Verilog file
Note: The output is always n+1 bits if both the inputs ate n bits. Since we are instantiating a full adder present in separate file, there is a need to tell the definition of full adder. It can also be seen that there is no always block used. The variable is genvar instead of integer.
Screenshot: Simulation Output
Screenshot: Synthesis Statistics Report
Screenshot: Synthesis Output - rca
Note: read_verilog for rca is performed before read_fa.
Screenshot: Synthesis Output - fa
_Screenshot: Commands for performing GLS
Screenshot: GLS Output
Observation: The observed waveform in simulation and synthesis matches and conforms code functionality.