This artifact includes:
- the code for our rtl-repair tool in:
rtlrepair
andsynth
- repair benchmarks from the CirFix paper:
benchmarks/cirfix
- the CirFix tool:
cirfix
- repair benchmarks derived from reproducible bugs from "Debugging in the Brave New World of Reconfigurable Hardware" (ASPLOS'22):
benchmarks/fpga-debugging
- benchmarking scripts
scripts
- a small rtl-repair demo:
demo
To quickly test out that the artifact works on your machine we recommend going through the following steps which you should be able to complete in around 30min:
- Installation (~15min)
- Artifact Setup (~2min)
- OSDD Measurements (~4min)
- Default RTL Repair Evaluation (run only the first command ~7min)
- Demo (Optional) (play with the tool, min. 2min)
Please make sure that you have Python in version 3.10 or newer installed. We have tested all scripts with Python 3.10.6
.
Please make sure you have a recent version of Rust installed. When in doubt, please update your version to the latest (most commonly through rustup update
).
Download OSS CAD Suite version 2022-06-22
from github and put the binaries on your path.
Note: newer versions of the OSS CAD Suite include Verilator 5. Unfortunately, the current implementation of RTL-Repair only works with Verilator 4.
Check to make sure that you have all tools that we require on your path in the correct version:
$ verilator --version
Verilator 4.225 devel rev v4.224-91-g0eeb40b9
$ bitwuzla --version
1.0-prerelease
$ iverilog -v
Icarus Verilog version 12.0 (devel) (s20150603-1556-g542da1166)
$ yosys -version
Yosys 0.18+29
To perform a full evaluation, you need access to the commercial VCS simulator from Synopsys. The setup for VCS may depend on the kind of licensing you use. Please contact your local computer support person for guidance.
While RTL-Repair works perfectly well without VCS, there are two reasons why VCS is required to perform the evaluation:
- The CirFix tool that we compare against only works well with VCS. We did try to make it work with
iverilog
, however, the results were much worse. - Some of the benchmarks provided by CirFix only execute correctly on VCS. Unfortunately their tests or their designs were written in such a manner that they won't work on
iverilog
.
Thus we require VCS to evaluate CirFix, to execute benchmarks when we check repairs and to execute benchmarks in order to calculate the OSDD (output state divergence delta).
This step should take around 2 min.
First we need to create a virtual environment in the root folder of the repository. This virtual environment needs to be activated in all later steps.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
We also need to build the synthesizer binary once.
cd synth
cargo build --release
cd ..
We implemented a VCD comparison for our OSDD measurement in rust as well:
cd scripts/osdd
cargo build --release
cd ../..
This step should take around 4 min.
To measure the output / state divergence delta (OSDD) we need to first generate VCD traces of all ground truth and buggy designs. To do so, run the following:
./scripts/generate_vcd_traces.py --sim=vcs --timeout=25 --verbose vcd-traces
./scripts/calc_osdd.py --working-dir=vcd-traces
# delete VCD files after analysis to save disk space
rm vcd-traces/*.vcd
This step should take around 14 h (with 8 threads on a 16-core CPU).
We added a script which automatically runs CirFix on all benchmarks used in the original evaluation. This script can also parallelize the execution which speeds up the evaluation. Please use a conservative number of threads in order not to disadvantage CirFix. Half the number of physical cores on your machine is a good starting point, i.e., if your machine has 8 physical cores, use 4 threads.
From the root folder please run the following (where $N
is the number of threads):
./cirfix/run.py --working-dir=cirfix-repairs --clear --experiment=cirfix-paper --simulator=vcs --threads=$N
This step should take around 40 min.
We run RTL-Repair on the CirFix benchmarks in three different configurations as well as on benchmarks from a paper on debugging FPGA hardware designs:
# takes ~7 min
./scripts/run_rtl_repair_experiment.py --working-dir=rtl-repair-default --clear --experiment=default
# expected final output line: defaultdict(<class 'int'>, {'success': 17, 'cannot-repair': 10, 'no-repair': 1, 'timeout': 4})
# takes ~14 min
./scripts/run_rtl_repair_experiment.py --working-dir=rtl-repair-all-templates --clear --experiment=all-templates
# expected final output line: defaultdict(<class 'int'>, {'success': 17, 'cannot-repair': 14, 'no-repair': 1})
# takes ~10 min
./scripts/run_rtl_repair_experiment.py --working-dir=rtl-repair-basic-synth --clear --experiment=basic-synth
# expected final output line: defaultdict(<class 'int'>, {'success': 16, 'cannot-repair': 8, 'no-repair': 1, 'timeout': 7})
# NOTE: if you are on a faster machine, you might experience fewer timeouts!
# takes ~4 min
./scripts/run_rtl_repair_experiment.py --working-dir=rtl-repair-fpga --clear --experiment=fpga
# expected final output line: defaultdict(<class 'int'>, {'success': 9, 'timeout': 3, 'cannot-repair': 1})
# NOTE: if you are on a faster machine, you might experience fewer timeouts!
Note 1: there is no parallelism built into the run script. However, feel free to run all four experiments in parallel, which is safe since each has its own independent working-dir
.
Note 2: error messages from failing repair attempts are expected since the run script does not intercept stderr
.
Note 3: the all-templates
and the basic-synth
configuration are used in the ablation study in Table 5.
Have a look at some repair results. Like:
# machine readable results file for `sha3_s1` benchmark
cat rtl-repair-default/sha3_padder_ssscrazy_buggy1_oracle-full/result.toml
# diff of ground truth and buggy Verilog file
cat rtl-repair-default/sha3_padder_ssscrazy_buggy1_oracle-full/bug_diff.txt
# diff of buggy and repaired Verilog file
cat rtl-repair-default/sha3_padder_ssscrazy_buggy1_oracle-full/padder_ssscrazy_buggy1.repaired.0.diff.txt
This step should take around 3 min.
As mentioned in our paper "We had to manually remove a tri-state bus and an asynchronous reset for two benchmarks as these constructs are not supported by RTL-Repair.". Thus, for the i2c
and sdram
benchmarks, RTL-Repair will work with a version of the benchmark where the tri-state bus or the asynchronous reset was manually removed. Unfortunately this means that the repair has to manually be ported to the original design.
To simplify this process we rely on the fact that repairs generated by RTL-Repair are always deterministic. We thus copied all repairs we expect you to get into the results/manual-port
directory and also added a port of the repair to the original Verilog code. Now all you need to do to add these manual repairs is to run a script on your results like this:
./scripts/add_manually_ported_repairs.py --working-dir=rtl-repair-default
./scripts/add_manually_ported_repairs.py --working-dir=rtl-repair-all-templates
./scripts/add_manually_ported_repairs.py --working-dir=rtl-repair-basic-synth
This script will look through all rtl-repair results and if it finds one for a benchmark that has been ported it:
- first checks to make sure that the repaired Verilog code matches the repair that we manually ported 100%
- if it matches, the manual port of the repair to the original Verilog (with the async-reset / tri-state bus) is copied to the results directory
- a
manual=
entry is added to theresult.toml
to indicate that there is a manual overwrite
This step should take around 130 min (~40min if you can run the scripts in parallel).
We provide a script that performs all the correctness tests listed in Table 4. We run that on all results from the CirFix benchmark set.
# takes ~40 min
./scripts/check_repairs.py --results=cirfix-repairs --working-dir=cirfix-repairs-check --sim=vcs
# takes ~30 min
./scripts/check_repairs.py --results=rtl-repair-default --working-dir=rtl-repair-default-check --sim=vcs
# takes ~30 min
./scripts/check_repairs.py --results=rtl-repair-all-templates --working-dir=rtl-repair-all-templates-check --sim=vcs
# takes ~30 min
./scripts/check_repairs.py --results=rtl-repair-basic-synth --working-dir=rtl-repair-basic-synth-check --sim=vcs
Note 1: there is no parallelism built into the script. However, feel free to run all invocations in parallel, which is safe since each has its own independent working-dir
. There are also no performance numbers taken in this step, so don't worry about interference from other tasks.
Note 2: the checks do not work with the FPGA benchmark set because they rely on Verilog testbenches, whereas the FPGA benchmarks come with C++ testbenches for the Verilator simulator.
This step should take around 2 min.
We provide a script that generates LaTex versions of Tables 1, 2, 4 and 5 from the data we previously collected.
./scripts/create_tables.py --working-dir=tables \
--osdd-toml=vcd-traces/osdd.toml \
--baseline-toml=vcd-traces/baseline_results.toml \
--cirfix-result-dir=cirfix-repairs-check \
--rtlrepair-result-dir=rtl-repair-default-check \
--rtlrepair-all-templates-result-dir=rtl-repair-all-templates-check \
--rtlrepair-basic-synth-result-dir=rtl-repair-basic-synth-check
Now we want to compare the results from the tables generated in the previous step to the results reported in the paper.
The performance numbers should roughly match the ones reported in the paper. Some variation is expected because the reproduction is run on a different machine. Different versions of VCS or different VCS license setups might also affect results.
cat tables/performance_statistics_table.txt
Note: If your experiments ran on a faster machine, you might see fewer timeouts.
cat tables/osdd_table.txt
cat tables/correctness_table.txt
cat tables/ablation_table.txt
Note: If your experiments ran on a faster machine, you might see fewer timeouts.
Unfortunately this table was created manually. To check the results, have a look at the result.toml
in your rtl-repair-fpga
folder.
This step takes around 2 min. Longer, if you would like to play some more with the tool.
The artifact contains a small demo to explore the repair capabilities of RTL-Repair.
In the demo
folder you find:
- a simple Verilog design from the CirFix benchmarks:
project/first_counter.v
- a simple testbench in the CSV format:
project/tb.csv
- a script to execute the testbench with
iverilog
:./run_tb.py
- a script to execute a repair with rtl-repair:
./do_repair.py
To get started, try changing the reset condition in project/first_counter.v
:
- if(reset == 1'b1) begin
+ if(reset == 1'b0) begin
Then run ./do_repair.py
which should output:
success (0.32s)
0) Possible repair with 1 changes:
14c14
< if(reset == 1'b0) begin
---
> if(reset == 1'b1) begin
================================================================================