FPGA Design Fundas 0.3: The Real Cost of Skipping Simulation Verification!

Many industries developing FPGA based electronic products choose to carry out inadequate or almost no simulation verification while resorting to only performing ad-hoc lab testing for debugging, verification and integration testing. Some of the common given reasons for this attitude towards simulation are, “there is no budget allocated in the project for simulation verification”, “there is no time left for simulation in the project”, “Business is willing to take the risk so as not to carry out simulation verification” etc.

In this article, firmware development term is used for FPGA development and lab testing term is used to refer to an activity of unstructured, undocumented, unplanned and infrequent ad-hoc hardware testing of some or all functionality of the firmware under development, i.e. quick and dirty prototyping or proof of concept demonstrator etc. ChipScope and ILA terms are used interchangeably.

This article assumes that reader is well familiar with industry standard FPGA design methodology and has read our previous article on FPGA Simulation Verification & Integration Testing. Furthermore, it is assumed that the reader understands what it means by sythesising a VHDL/Verilog RTL statement like C = A&B through a synthesis tool in contrast with compiling it in a ‘C’ compiler in order to create an executable code for a processor. If not then please read my post fpga-fundas-0-vhdl-is-not-a-programming-language on the same.

Pitfalls of Skipping Simulation Verification

The Internal Logic Analyser (ILA or Chipscope) tools are excellent for debugging a corner case bug or an issue which can’t be replicated/recreated in simulation. For example, a bug due to meta-stability in the design, generally, can’t be recreated in simulation. ILA is an excellent tool to help debug such difficult bugs.

However, such tools provide limited visibility and only if there is adequate space to accommodate the ILA cores inside the chip which may not be always possible due to densely packed design. In that case, without the ILA, any causes identified for the observed bugs or incorrect behaviours during lab testing would be merely speculative (i.e. guesswork). Additionally, many companies, for cost savings, deliberately decide to select just the right sized or smaller FPGA device at the outset of a project without factoring in the additional logic to accommodate debugging IP cores.

Furthermore, the introduction of such debugging logic blocks does impact the timing behaviour of respective signals marked for debug. This impact can further conceal timing violation bugs. For example, if the signals are not adequately synchronised or are crossing clock boundaries within the design, the debugging logic blocks could exacerbate the timing by causing the signal to go meta-stable or hide potential meta-stability by resolving the signal value in time before the clock edge.

FPGA designers are required to make an educated guess to identify relevant signals required to debug the issue using ILA. These signals can only be added to be “marked debug” after the design has been synthesised. Many times, after the design is synthesised, many signals get optimised out and do not have the same names as declared in the VHDL code. This makes it even more difficult and time consuming to identify the relevant signals. The designer then spends time identifying the correct optimised signal name by tracing through the synthesised netlist in a netlist viewer. Once the signals are marked debug the design has to be implemented. This implementation process can take further 15 minutes to half day depending on the design size. The addition of the ILA logic further slows the implementation process by 2x-5x which is an additional time overhead. If the designer has guessed the signals incorrectly or more signals are required to debug the incorrect behaviour, the same synthesis, mark debug, implement cycle is required repeated. The total time required to verify the design in the actual chip instead of simulation verification is a simple matter of multiplying the time to build the design for debugging with number of such cycles. Even then, this estimation does not factor in any issues with the components and peripherals on the board which may result into more time spent in isolating the issue between the firmware and the board.

So, what are the chances of the synthesised design preserving the signals of interest for a particular issue and the designer guessing, capturing and triggering the right signals to isolate and visualise the issue through ILA? Do you want to leave your design to a chance or luck in this manner? Has engineering come down purely to luck?

Even after all of the above, lab testing does not provide any means to quantify the effort of testing. The two main metrics used to quantify any software/firmware development testing/verification effort are functional and code coverage. Also, what kind of evidence based confidence can the designer provide.

In the case of lab testing, functional coverage can only be gathered manually by the means of a tick-box spreadsheet of functional requirements against tests conducted. This may not even be possible if the project does not have a set of formal written requirements. It is obvious that code coverage cannot be generated in the lab testing exercise as there is no visibility of the actual code. At the same time, both of those metrics can be reliably generated in the simulation verification if the testbenches are constructed following the industry standard Metric Driven Verification (MDV) methodology.

Back of the Envelop Calculations of the Effort Spent in Lab Testing

In many companies, decision makers are opting to perform RTL testing directly in hardware without any simulation verification. This may be perhaps just to observe some blinkies so as to use it as a measure of progress whilst lacking the appreciation of time spent in endless cycles of wasted effort in:

RTL->Synthesis->Guessing->ChipScope/ILA->Implementation->Ad-hoc Testing in Hardware->More Guessing->Repeat!!!

A simulation test bench does not have to be a fully automatic super duper SystemVerilog UVM one. Even a simple test bench without any sophistication or even a reference model will simulate the design faster than the above cycle while providing 100% visibility of the DUT signals upfront i.e. without requiring synthesis. Eyeballing waveforms in a simulation test bench is much more time efficient than eyeballing ILA/Chipscope waveforms. This is because:

Time Spent in ChipScope/ILA cycles >>>>> Time Spent in Developing a Simple Test bench

Let us do an exercise of roughly estimating effort required to verify a FPGA design only using ILA without any simulation verification in order to prove the above.

Lets examine the claim that the project has run out of budget so there is no time left to build any testbenches and run simulation. However, the engineer is allowed to perform debugging and testing using ILA i.e. lab testing to create finished product and complete the development due to the reasons outlined above.

The variables that can impact the effort estimate are following (not a comprehensive list):

  1. Designer’s experience and skill level in VHDL design for synthesis, FPGA technology, general digital logic debugging skills and debugging using ILA.
  2. Design complexity, size and occupied area in the chip. Amount of space left for ILA logic. This dictates how many signals can be marked debug and whether ILA can be accommodated or not.
  3. Availability of required signals after synthesis i.e. not optimised away during synthesis.
  4. Availability of identifiable signals after synthesis i.e. required signal names changed during synthesis.
  5. Knowledge of the design functionality and implementation. So, if the designer has implemented the design from scratch, he/she may have substantial design knowledge whereas if the implementation included part or full legacy design he/she may not have knowledge of full internal workings of the design.
  6. The prior verification/testing thus stability of the legacy part of the design. This means while using the legacy code, one cannot assume that it is going to be bug free and also requires knowledge of the legacy design for any debugging.
  7. State of the design implementation i.e. stable, unstable, readability of the code, code/design partitioning, implicit vs explicit coding style, design for synthesis coding i.e. whether the design is implemented in following design for synthesis rules,
  8. Number of bugs or issues encountered.

Let’s say the designer has encountered an issue in the FPGA design. Lets assume that the design fits and occupies 70-80% area in the FPGA as the project is at its end. So the consequences of this would be that it would be quite difficult to fit ILA with many required signals. Lets assume that the design is of small/medium size, large enough to fit in a Artix-7 type device with 50 to 75 K logic cells. Let’s assume a working day has 7.5 hours, a working week has 37.5 hours, a working month has 150 hours, 260 working days in a year without holidays and 240 with holidays roughly.

  1. So the designer first synthesises the design – 30 mins to 2 hours for
  2. Designer then looks for the required signals and marks debug, some are found but some have to be traced and identified in the synthesised netlist – easily 1 – 3 hours spent
  3. Designer then implements the design with ILA and waits – easily 1 – 2 hours spent. Many times this can fail due to insufficient logic resources, failed packing, failed routing, failed timing, etc. Also the implementation time is increased due to extra logic of ILA. Let’s say the designer has to cycle this 2 times. So another, 2 – 4 hours spent.
  4. Designer then adds trigger conditions and performs the visual inspection. Designer finds that there are some signals missing as the signals are not right ones or not enough to pin-point the problem so performs the step 2 and 3 again. So, let’s assume the designer has to cycle this another 2 times. So another, 6 – 14 hours spent.
  5. Designer then realises that the bug cannot be recreated as the there is no direct way to stimulate the design to cause the bug to happen. So spends some time in fixing the stimulus. If the design is changed for any reason, then the entire cycle from step 1 is to be repeated. Let’s say the designer has to cycle this 2 times. So another 19 to 46 hours spent. This could be roughly 2.5 days to 1.5 weeks spent on a single bug, conservatively speaking. So, is this guaranteed that the time spent will always be the best case 2.5 days per issue? Of course not, as this depends on how lucky the designer is.
  6. Let’s say the designer comes across 10 such small and big issues. Which means, the total time spent on this activity becomes 190 to 460 hours that is 5 weeks to 3 months if we clump the time together. The total time may not feel much while the designer is performing the debugging and testing as each of the above step takes time in a distributed amount.
  7. Surely, an experienced designer can come up with a decent enough testbench in 3 months for a small or medium sized FPGA design and even perform simulation verification. A simulation testbench provides many fold returns, first by verifying the entire design as much as possible and then whenever there is any issue, due to the 100% visibility of signals and direct/easier way to recreate bugs in simulation the debugging effort is reduced drastically.

Conclusion

Many managers and leaders argue that by skipping testbenching and simulation, they are saving time; it is difficult to see how that is the case no matter what stage the project is in. If the product is shipped with hidden bugs, chances are, it is going to come back later on and more time will have to be invested to fix the bugs in future due to many reasons including lost project knowledge etc. etc., at an additional cost of reputation damage.

References

Some of the references include:

  1. Section 3.8 in the NASA,ESA Lessons Learned from FPGA Developments, A report by Gaisler Research
    http://microelectronics.esa.int/asic/fpga_001_01-0-2.pdf
  2. Section 3 in the Best FPGA Development Practices
    http://www.irtc-hq.com/wp-content/uploads/2015/04/Best-FPGA-Development-Practices-2014-02-20.pdf