Verilog Races

In Verilog certain type of assignments or expression are scheduled for execution at the same time and order of their execution is not guaranteed. This means they could be executed in any order and the order could be change from time to time. This non-determinism is called the race condition in Verilog.

For the purpose of refreshing your memory here is the Verilog execution order again, which we had discussed in a prior post.

Figure : Verilog execution order.

If you look at the active event queue, it has multiple types of statements and commands with equal priority, which means they all are scheduled to be executed together in any random order, which leads to many of the races..

Lets look at some of the common race conditions that one may encounter.

1) Read-Write or Write-Read race condition.

Take the following example :

always @(posedge clk)
x = 2;

always @(posedge clk)
y = x;

Both assignments have same sensitivity ( posedge clk ), which means when clock rises, both will be scheduled to get executed at the same time. Either first ‘x’ could be assigned value ‘2’ and then ‘y’ could be assigned ‘x’, in which case ‘y’ would end up with value ‘2’. Or it could be other way around, ‘y’ could be assigned value of ‘x’ first, which could be something other than ‘2’ and then ‘x’ is assigned value of ‘2’. So depending on the order final value of ‘y’ could be different.

How can you avoid this race ? It depends on what your intention is. If you wanted to have a specific order, put both of the statements in that order within a ‘begin’…’end’ block inside a single ‘always’ block. Let’s say you wanted ‘x’ value to be updated first and then ‘y’ you can do following. Remember blocking assignments within a ‘begin’ .. ‘end’ block are executed in the order they appear.

always @(posedge clk)
x = 2;
y = x;

2) Write-Write race condition.

always @(posedge clk)
x = 2;

always @(posedge clk)
x = 9;

Here again both blocking assignments have same sensitivity, which means they both get scheduled to be executed at the same time in ‘active event’ queue, in any order. Depending on the order you could get final value of ‘x’ to be either ‘2’ or ‘9’. If you wanted a specific order, you can follow the example in previous race condition.

3) Race condition arising from a ‘fork’…’join’ block.

always @(posedge clk)
x = 2;
y = x;

Unlike ‘begin’…’end’ block where expressions are executed in the order they appear, expression within ‘fork’…’join’ block are executed in parallel. This parallelism can be the source of the race condition as shown in above example.

Both blocking assignments are scheduled to execute in parallel and depending upon the order of their execution eventual value of ‘y’ could be either ‘2’ or the previous value of ‘x’, but it can not be determined beforehand.

4) Race condition because of variable initialization.

reg clk = 0

clk = 1

In Verilog ‘reg’ type variable can be initialized within the declaration itself. This initialization is executed at time step zero, just like initial block and if you happen to have an initial block that does the assignment to the ‘reg’ variable, you have a race condition.

There are few other situations where race conditions could come up, for example if a function is invoked from more than one active blocks at the same time, the execution order could become non-deterministic.



Synchronous or Asynchronous resets ?

Both synchronous reset and asynchronous reset have advantages and disadvantages and based on their characteristics and the designers needs, one has to choose particular implementation.

Synchronous reset :

Advantages :

– This is the obvious advantage. synchronous reset conforms to synchronous design guidelines hence it ensures your design is 100% synchronous. This may not be a requirement for everyone, but many times it is a requirement that design be 100% synchronous. In such cases, it will be better to go with synchronous reset implementation.

– Protection against spurious glitches. Synchronous reset has to set up to the active clock edge in order to be effective. This provides for protection against accidental glitches as long these glitches don’t happen near the active clock edges. In that sense it is not 100% protection as random glitch could happen near the active clock edge and meet both setup and hold requirements and can cause flops to reset, when they are not expected to be reset.

This type of random glitches are more likely to happen if reset is generated by some internal conditions, which most of the time means reset travels through some combinational logic before it finally gets distributed throughout the system.

Figure : Glitch with synchronous reset

As shown in the figure, x1 and x2 generate (reset)bar. Because of the way x1 and x2 transition during the first clock cycle we get a glitch on reset signal, but because reset is synchronous and because glitch did not happen near the active clock edge, it got filtered and we only saw reset take effect later during the beginning of 4th clock cycle, where it was expected.

– One advantage that is touted for synchronous resets is smaller flops or the area savings. This is really not that much of an advantage. In terms of area savings it is really a wash between synchronous and asynchronous resets.

Synchronous reset flops are smaller as reset is just and-ed outside the flop with data, but you need that extra and gate per flop to accommodate reset. While asynchronous reset flop has to factor reset inside the flop design, where typically one of the last inverters in the feedback loop of the slave device is converted into NAND gate

Figure : Synchronous v/s Asynchronous reset flop comparison.

Disadvantages :

– Wide enough pulse of the reset signal. We saw that being synchronous, reset has to meet the setup to the clock. We saw earlier in the figure that spurious glitches gets filtered in synchronous design, but this very behavior could be a problem. On the flip side when we do intend the reset to work, the reset pulse has to be wide enough such that it meets setup to the active edge of the clock for the all receivers sequentials on the reset distribution network.

– Another major issue with synchronous is clock gating. Designs are increasingly being clock gated to save power. Clock gating is the technique where clock is passed through an and gate with an enable signal, which can turn off the clock toggling when clock is not used thus saving power. This is in direct conflict with reset. When chip powers up, initially the clocks are not active and they could be gated by the clock enable, but right during the power up we need to force the chip into an known set and we need to use reset to achieve that. Synchronous reset will not take into effect unless there is active edge and if clock enable is off, there is no active edge of the clock.

Designer has to carefully account for this situation and design reset and clock enabling strategy which accounts for proper circuit operation.

– Use of tri-state structures. When tri-state devices are used, they need to be disabled at power-up. Because, when inadvertently enabled, tri-state device could crowbar and excessive current could flow through them and it could damage the chip. If tri-state enable is driven by a synchronous reset flop, the flop output could not be low, until the active edge of the clock arrives, and hence there is a potential to turn on tri-state device.

Figure : Tri-state Enable.

Asynchronous reset :

Advantages :

– Faster data path. Asynchronous reset scheme removes that AND gate at the input of the flop, thus saving one stage delay along the data path. When you are pushing the timing limits of the chip. This is very helpful.

– It has obvious advantage of being able to reset flops without the need of a clock. Basically assertion of the reset doesn’t have to setup to clock, it can come anytime and reset the flop. This could be double edged sword as we have seen earlier, but if your design permits the use of asynchronous reset, this could be an advantage.

Disadvantages :

– Biggest issue with asynchronous reset is reset de-assertion edge. Remember that when we refer to reset as ‘asynchronous’, we are referring to only the assertion of reset. You can see in figure about synchronous and asynchronous reset comparison, that one of the way asynchronous reset is implemented is through converting one the feedback loop inverters into NAND gate. You can see that when reset input of the NAND gate, goes low it forces the Q output to be low irrespective of the input of the feedback loop. But as soon as you deassert reset, that NAND gate immediately becomes an inverter and we are back to normal flop, which is susceptible to the setup and hold requirements. Hence de-assertion of the reset could cause flop output to go metastable depending upon the relative timing between de-assertion and the clock edge. This is also called reset recovery time check, which asynchronous reset have to meet even if they are asynchronous ! You don’t have this problem in synchronous reset, as you are explicitly forced to check both setup and hold on reset as well as data, as both are AND-ed and fed to the flop.

– Spurious glitches. With asynchronous reset, unintended glitches will cause circuit to go into reset state. Usually a glitch filter has to be introduced right at the reset input port. Or one may have to switch to synchronous reset.

– If reset is internally generated and is not coming directly from the chip input port, it has to be excluded for DFT purposes. The reason is that, in order for the ATPG test vectors to work correctly, test program has to be able to control all flop inputs, including data, clock and all resets. During the test vector application, we can not have any flop get reset. If reset is coming externally, test program hold it at its inactive value. If master asynchronous reset is coming externally, test program also holds it at inactive state, but if asynchronous reset is generated internally, test program has no control on the final reset output and hence the asynchronous reset net has to be removed for DFT purpose.

One issue that is common to both type of reset is that reset release has to happen within one cycle. If reset release happen in different clock cycles, then different flops will come out of reset in different clock cycles and this will corrupt the state of your circuit. This could very well happen with large reset distribution trees, where by some of receivers are closer to the master distribution point and others could be farther away.

Thus reset tree distribution is non-trivial and almost as important as clock distribution. Although you don’t have to meet skew requirements like clock, but the tree has to guarantee that all its branches are balanced such that the difference between time delay of any two branches is not more than a clock cycle, thus guaranteeing that reset removal will happen within one clock cycle and all flops in the design will come out of reset within one clock cycle, maintaining the coherent state of the design.

To address this problem with asynchronous reset, where it could be more severe, the master asynchronous reset coming off chip, is synchronized using a synchronizer, the synchronizer essentially converts asynchronous reset to be more like synchronous reset and it becomes the master distributor of the reset ( head of reset tree). By clocking this synchronizer with the clock similar to the clock for the flops( last stage clock in clock distribution), we can minimize the risk of reset tree distribution not happening within one clock.


Verilog execution order

Following three items are essential for getting to the bottom of Verilog execution order.

1) Verilog event queues.

2) Determinism in Verilog.

3) Non determinism in Verilog.

Verilog event queues : 

To get a very good idea of the execution order of different statements and assignments, especially the blocking and non-blocking assignments, one has to have a sound comprehension of inner workings of Verilog.

This is where Verilog event queues come into picture. Sometime it is called stratified event queues of Verilog. It is the standard IEEE spec about system Verilog, as to how different events are organized into logically segmented events queues during Verilogsimulation and in what order they get executed.

 Figure : Stratified Verilog Event Queues.

As per standard the event queue is logically segmented into four different regions. For sake of simplicity we’re showing the three main event queues. The “Inactive” event queue has been omitted as #0 delay events that it deals with is not a recommended guideline.

As you can see at the top there is ‘active’ event queue. According to the IEEE Verilog spec, events can be scheduled to any of the event queues, but events can be removed only from the “active” event queue. As shown in the image, the ‘active’ event queue holds blocking assignments, continuous assignments. primitive IO updates and $write commands. Within “active” queue all events have same priority, which is why they can get executed in any order and is the source of nondeterminism in Verilog.
There is a separate queue for the LHS update for the nonblocking assignments. As you can see that LHS updates queue is taken up after “active” events have been exhausted, but LHS updates for the nonblocking assignments could re-trigger active events.

Lastly once the looping through the “active” and non blocking LHS update queue has settled down and finished, the “postponed” queue is taken up where $strobe and $monitor commands are executed, again without any particular preference of order.

At the end simulation time is incremented and whole cycle repeats.

Determinism in Verilog. 

Based on the event queue diagram above we can make some obvious conclusions about the determinism.

– $strobe and $monitor commands are executed after all the assignment updates for the current simulation unit time have been done, hence $strobe and $monitor command would show the latest value of the variables at the end of the current simulation time.

– Statements within a begin…end block are evaluated sequentially. This means the statements within the begin…end block are executed in the order they appear within the block. The current block execution could get suspended for execution of other active process blocks, but the execution order of any being..end block does not change in any circumstances.

This is not to be confused with the fact that nonblocking assignment LHS update will always happen after the blocking assignments even if blocking assignment appears later in the begin..end order. Take following example.

initial begin
x = 0
y <= 3
z = 8

When we refer of execution order of these three assignments.

1) First blocking statement is executed along with other blocking statements which are active in other processes.

2) Secondly for the nonblocking statement only RHS is evaluated, it is crucial to understand that the update to variable ‘y’ by value of ‘3’ doesn’t happen yet. Remember that nonblocking statement execution happens in two stages, first stage is the evaluation of the RHS and second step is update of LHS. Evaluation of RHS of nonblocking statement has same priority as blocking statement execution in general. Hence in our example here, second step is the evaluation of RHS of nonblocking statement and

3) third step is execution of the last blocking statement ‘z = 8’. The last step here will be the update to ‘y’ for the nonblocking statement. As you can see here the begin .. end block maintains the execution order in so far as the within the same priority events.

4) last step would be the update of the LHS for the nonblocking assignment, where ‘y’ will be assigned value of 3.

– One obvious question that comes to mind, having gone through previous example is that what would be the execution order of the nonblocking LHS udpate !! In the previous example we only had one nonblocking statement. What if we had more than one nonblocking statement within the begin..end block. We will look at two variation of this problem. One where two nonblocking assignments are to two different variable and the two nonblocking assignments to same variable !!

First variation.

initial begin
x = 0
y <= 3
z = 8
p <= 6

For the above mentioned case, the execution order still follows the order in which statements appear.

1) blocking statement ‘x = 0’ is executed in a single go.

2) RHS of nonblocking assignment ‘y <= 3’ is evaluated and LHS update is scheduled.

3) blocking assignment ‘z = 8’ is executed.

4) RHS of nonblocking assignment ‘p <= 6’ is evaluated and LHS update is scheduled.

5) LHS update from the second nonblocking assignment is carried out.

6) LHS update from the last nonblocking assignment is carried out.

Second variation.

initial begin

x = 0
y <= 3
z = 8
y <= 6

For the above mentioned case, the execution order still follows the order in which statements appear.

1) blocking statement ‘x = 0’ is executed in a single go.

2) RHS of nonblocking assignment ‘y <= 3’ is evaluated and LHS update is scheduled.

3) blocking assignment ‘z = 8’ is executed.

4) RHS of nonblocking assignment ‘y <= 6’ is evaluated and LHS update is scheduled.

5) LHS update from the second nonblocking assignment is carried out, ‘y’ is 3 now.

6) LHS update from the last nonblocking assignment is carried out, ‘y’ is 6 now.

Non-determinism in Verilog.

One has to look at the active event queue in the Verilog event queues figure, to get an idea as to where the non-determinism in Verilog stems from. You can see that within the active event queue, items could be executed in any order. This means that blocking assignments, continuous assignments, primitive output updates, and $display command, all could be executed in any random order across all the active processes.

Non-determinism especially bits when race conditions occur. For example we know that blocking assignments across all the active processes will be carried out in random order. This is dandy as long as blocking assignments are happening to different variables. As soon as one make blocking assignments to same variable from different active processes one will run into issues and one can determine the order of execution. Similarly if two active blocking assignments happen to read from and write to the same variable, you’ve a read write race.

We’ll look at Verilog race conditions and overall good coding guidelines in a separate post.


Latch using a 2:1 MUX

After the previous post about XNOR gate using 2:1 MUX, one might have thought that finally we exhausted the number of gates that we could make using 2:1 MUX. But that is not entirely true !!

There are still more devices that we can make using a 2:1 MUX. These are some of the favorite static timing analysis and logic design interview questions and they are about making memory elements using the 2:1 MUX.

We know the equation of a MUX is :

Out = S * A + (S)bar * B

We also know that level sensitive latch equation is

If ( Clock )

Q = D [ This means if Clock is high Q follows D ]


Q = Q [ If clock is off, Q holds previous state ]

We can rewrite this as following :

Q = Clock * D + (Clock)bar * Q

This means we can easily make a latch using 2:1 MUX like following.

Latch using a 2:1 MUX

When CLK is high it passes through D to O and when CLK is off, O is fed back to D0 input of mux, hence O appears back at the output, in other words, we retain the value of O when CLK is off. This is what exactly latch does.

So what else can we make now ?


XNOR gate using 2:1 MUX

Following is a refresher on the XNOR gates.

Figure 1 XNOR gate

Alternate equation for an XNOR gate is :

O = (A)bar * (B)bar + A * B

This can be derived using Kmaps as following.

 Figure 2. Kmap for XNOR gate.

We know that the equation for a 2:1 MUX is of following form :

Out = S * A + (S)bar * B.

We need to turn this of the form

O = (A)bar * (B)bar + A * B

It seems easy enough to achieve this. In the 2:1 MUX equation, if we tie (A)bar in place of B, we get

Out = S * A + (S)bar * (A)bar

This is the equation of XNOR gate for inputs S and A.


Figure 3 XNOR using 2:1 MUX

Do you think we are done with 2:1 MUX series now ? There is still some more devices that we can make. Will continue in next post.


XOR gate using 2:1 MUX

XOR gate is kind of a special gate. We are familiar with the truth table of the XOR gate.

Figure1. XOR gate

We often use symbol OR symbol ‘+’ with circle around it to represent the XOR operation. There is an alternate way to describe XOR operation, which one can observe based on the truth table. One of the best way to find out a equation representation of any table is to use K-maps.

Figure 2. XOR K-map.

Out = A * (B)bar + (A)bar * B

Now having this equation at our hand it is easier to start with 2:1 MUX equation and convert it to XOR equation that we want.

2:1 MUX equation is :

Out = S * A + (S)bar * B

We can see that if we replace A with (B)bar, we get what we want.

Out = S * (B)bar + (S)bar * B

This is the XOR between S and B.

Following is the figure for the same.

Figure 3. XOR using 2:1 MUX

Next time we will work on XNOR gate.


A NOR gate using a 2:1 MUX

As previously discussed we start with the equation for 2:1 MUX like following. MUX has two inputs A and B and select pin is S and output pin Out

Out = S * A + (S)bar * B

As we want to make a NOR gate we are looking for equation of the form

Out = (A + B)bar

Just keep in mind taht we don’t have to get exact same equation with same pins, we could use any of teh three input pins A, B and S.

Where do we start ? This is something I did not mentioned while forming a NAND gate, but this applies to both NAND and NOR gate. We have already made AND gate, OR gate and inverter using 2:1 MUX. You can use AND gate and inverter and combine them to make NAND. Similarly you can use OR gate and inverter and combine them to make NOR gate. That would be one option to come up with NOR gate using 2:1 MUX.

Here we will try to come up with NOR gate using alternative way.

Going back we have

Out = S * A + (S)bar * B and we want to convert this to a NOR equation.

We want form

Out = (A + B)bar which is same as

Out = (A)bar * (B)bar

In our 2:1 MUX equation, you can easily see that we need to have (B)bar in place of B and we need to zero out the ‘S * A’ term.

We can achieve both by tying (B)bar in place of B and tying 0 to input A.

Out = S * 0 + (S)bar * (B)bar

Out = (S)bar + (B)bar

Out = (S + B)bar [ DeMorgan’s rule]

That is our NOR gate.

Figure NOR using 2:1 MUX

By now you should have become expert at this process. We will look at XOR gate in next  example. Please provide your feedback, ideas and critique, through various communications channels, like comments, contact form etc. I look forward to hearing from you.


Make a NAND gate using a MUX

Lets start with the equation of a 2:1 MUX, with input pins A and B, select pin S and output pin Out.

Out = S * A + (S)bar * B

We need to come up with a NAND gate and equation of a NAND gate is of the form :

Out = (A * B)bar or  Our = (A)bar + (B)bar [ using De Morgan’s rule]

Given the form of equations that we have to begin with it looks like we might have a better shot of converting. This is purely a guess.

Out = S * A + (S)bar * B  to the latter form of NAND equation [ Out = (A)bar + (B)bar]

If we actually tie (A)bar instead of A pin to the MUX and if we tie 1 to the B input

Out = S * (A)bar + (S)bar * 1

Out = S * (A)bar + (S)bar

Now lets prove that this is same as Out = (A)bar + (S)bar

Out = S * (A)bar + (S)bar

(Out)bar = (S * (A)bar + (S)bar) bar

(Out)bar = (S * (A)bar)bar * ((S)bar)bar

(Out)bar = (S * (A)bar)bar * S

(Out)bar = ((S)bar + ((A)bar)bar) * S

(Out)bar = ((S)bar + A) * S

(Out)bar = (S)bar * S + A * S

(Out)bar = 0 + A * S

(Out)bar = A * S

Out = (A * S)bar

Our guess worked and we have a  NAND gate by tying (A)bar instead of A and tying B to 1.

Figure NAND using 2:1 MUX

We will look at NOR gate in next post.



Make an OR gate using a MUX

We start with the equation of 2:1 MUX, where inputs to the mux are ‘A’ and ‘B’. Select is ‘S’ and output pin is ‘Out’.

Out = S * A + (S)bar * B

We need to come with an OR gate, hence we need to keep the ORing term ‘+’ in the equation, this means that we can not tie ‘S’ to 1 or 0. Because of tie ‘S’ to 1 or 0, one of the terms in the equation of MUX will have a product of 0 and that term will disappear.

What I mean is :

If we tie ‘S’ to 0.  Out = 0 * A + (0)bar * B = 0 + B = B

If we tie ‘S’ to 1.  Out = 1 * A + (1)bar * B = A + 0 * B = A

In both cases we look the ORing term ‘+’.

Same way, we can not tie either ‘A’ or ‘B’ to zero to maintain the ‘+’ term.

Lets start with ‘A’ and tie to 1.

Out = S * 1 + (S)bar * B

Out = S + (S)bar * B

This is what we end up with. We really want : Out = S + B.

If you are a master at boolean algebra or digital logic, you can look at equation

Out = S + (S)bar * B

and tell that it’s equivalent to

Out = S + B

But do not worry. We’ll prove that to be the case.

Out = S + (S)bar * B

(Out)bar = (S + (S)bar * B)bar    [ taking bar of both sides ]

(Out)bar = (S)bar * ((S)bar * B)bar  [ Applying De-morgan’s rule on right hand side ]

(Out)bar = (S)bar * (S + (B)bar)   [Applying De-morgan’s rule on right most term]

(Out)bar = (S)bar * S + (S)bar * (B)bar   [Expanding right hand side]

(Out)bar = 0 + (S)bar * (B)bar   [Simplifying first term on right hand side]

(Out)bar = (S)bar * (B)bar

(Out)bar = (S + B)bar          [Applying De-morgan’s rule on right hand side]

Out = S + B     [Taking bar of both sides]

Hence we prove that if we tie input ‘A’ to 1, for a 2:1 MUX, we get an OR gate.

Figure 1. OR gate using 2:1 MUX.

Next we will look at the NAND gate using 2:1 MUX.



Make an AND gate using MUX

As we discussed in previous post, we can start with the equation of the MUX and reduce it down to equation of an AND gate like following.

For a 2:1 MUX with input A and B and select S and output Out, following is the equation.

Out = S * A + (S)bar * B

We can see that first product term in the sum of products on right hand side is actually an AND operation. All we need to do is zero out the second product term. We can achieve this by making B as ‘0’. We can tie B input of MUX to zero.

Out = S * A + (S)bar * 0

Out = S * A + 0

Out = S * A

Thus we can get an AND gate by tying B input of MUX to zero, resulting circuit is AND gate which ANDs input A and mux select S.

Following figure shows this arrangement.

Figure 1. AND gate using 2:1 MUX

How about an OR gate using 2:1 MUX ?