Answers to your VLSI Interview questions.

Featured

Sign up to receive FREE answers to VLSI interview questions.

When you sign up you will receive 25 questions answers to help you prepare for digital VLSI design interviews. You will get updates about new posts discussing various technical topics and interview questions. You will occasionally receive product recommendations to help with interview preparations.

Dynamic Gates

Dynamic gates use clock for their normal operation as opposed to the static gates, which don’t use clocks.

Dynamic gates use NMOS or PMOS logic. It doesn’t use CMOS logic like regular static gates. Because it uses either NMOS or PMOS logic and not CMOS logic, it usually has fewer transistors compared to static gates. Although there are extra transistors given that it uses clocks.

Figure : NMOS pull down logic for NOR gate.

The figure shows the pull down NMOS logic for a NOR gate. This pull down structure is used in the dynamic gates.

How dynamic gates work :

In static gates, inputs switch and after a finite input to output delay, output possibly switches to the expected state.

 

Figure : Dynamic NOR gate.

As you can see in the figure above, dynamic gate is made using NMOS pull down logic along with clock transistors on both pull up and pull  down paths.

We know that clock has two phases, the low phase and the high phase. Dynamic gate has two operating phases based on the clock phases. During the low clock phase, because of the pmos gate on the pull up network, the output of dynamic gate is pre-charged to high phase. This is the pre-charge state of dynamic gate.

When the clock is at high phase, the output of dynamic gate may change based on the inputs, or it may stay pre-charged depending on the input. The phase of the dynamic gates, when the clock is high, is called the evaluate phase. As it is essentially evaluating what the output should be during this phase.

Figure : Dynamic NOR waveforms when input ‘A’ is high.

As seen in the waveforms above, as soon as CLK goes low, it pre-charges output node ‘Out’ high. While in the pre-charge state, NOR input ‘A’ goes high. When CLK goes high, and evaluation phase begins, ‘Out’ is discharged to low as input ‘A’ is high. Input ‘B’ is not shown in the waveform as it is not relevant to this case.

If both inputs ‘A’ and ‘B’ were to remain low, output node would be held high during the pre-charge.

This technique of always priming or pre-charging output to be with, is a way to minimize  switching of the output node, because if with a new set of inputs, output was supposed to be high, it wouldn’t have to switch, as it is already pre-charged. Output  only has to switch in the case where it has to be low.

But obviously such reduction in  output  switching doesn’t come free, as it means introducing the clocks and  the extra pre-charge face, where output is not ready to be sampled.

One of the biggest concerns with dynamic gates, is the crowbar current. It needs to be ensured that the clock input to the pull up and pull down is the same node, because of pull up and pull down clocks are coming from different sources, there is a higher likelihood of both pull up and pull down transistors to be on at the same time and hence the crowbar current.

Dynamic gates burn more power because of the associated clocks. Clock signal switches continuously, hence there is more dynamic power dissipated.

The biggest benefit of dynamic gates is that they can be  cascaded together and their pull down only property can be  leveraged to have a very fast delay through a chain of multiple stage dynamic gates.

NMOS and PMOS logic

CMOS is the short form for the Complementary Metal Oxide Semiconductor. Complementary stands for the fact that in CMOS technology based logic, we use both p-type devices and n-type devices.

Logic circuits that use only p-type devices is referred to as PMOS logic and similarly circuits only using n-type devices are called NMOS logic. Before CMOS technology became prevalent, NMOS logic was widely used. PMOS logic had also found its use in specific applications.

Lets understand more how NMOS logic works. As per the definition, we are only allowed to use the n – type device as building blocks. No p-type devices are allowed. Lets take an example to clarify this. Following is the truth table for a NOR gate.

Figure : NOR truth table.

We need to come up the a circuit for this NOR gate, using n-mos only transistors. From our understanding  of CMOS logic, we can think about the pull down tree, which is made up of only n-mos gates.

Figure : NOR pulldown logic.

Here we can see that when either of the inputs ‘A’ or ‘B’ is high, the output is pulled down to the ground. But this circuit only reflects the negative logic, or the partial functionality of NOR gate when at least one of the inputs is high. This doesn’t represent the case where both input area low, the first row of the truth table. For an equivalent CMOS NOR gate, there would be pull up tree made up of p-mos devices.

But here we are referring to NMOS logic and we are not allowed to have p-mos devices. How could we come up with the pull up logic for our NOR gate ? The answer is a resistor. Essentially when both n-mos transistor are turned off, we want ‘out’ node to be pulled up and held at VDD. A resistor tied between VDD and ‘out’ node would achieve this. There could be other possible elaborate schemes to achieve the same using n-mos transistors for pulling up purpose, but an n-mos as a resistor is used to pull up the output node.

Of course you see some immediate drawbacks. You can see that when at least one of the pull down n-mos is on, there is a static bias current flowing from VDD to the ground even in the steady state. Which is why such circuits dissipate almost an order of magnitude more power compared to CMOS equivalent. Not only that, this type of circuit is very susceptible to the input noise glitches.

Any n-mos device can be made into a resistor by making it permanently on. N-mos device has inherent resistance and we can achieve the desired resistance by modulating the width of n-mos transistor.

Figure : NMOS logic NOR gate.

The above figure shows the NOR gate made using NMOS logic. Similarly any gate can also be made using PMOS logic.

Verilog Races

In Verilog certain type of assignments or expression are scheduled for execution at the same time and order of their execution is not guaranteed. This means they could be executed in any order and the order could be change from time to time. This non-determinism is called the race condition in Verilog.

For the purpose of refreshing your memory here is the Verilog execution order again, which we had discussed in a prior post.

Figure : Verilog execution order.

If you look at the active event queue, it has multiple types of statements and commands with equal priority, which means they all are scheduled to be executed together in any random order, which leads to many of the races..

Lets look at some of the common race conditions that one may encounter.

1) Read-Write or Write-Read race condition.

Take the following example :

always @(posedge clk)
x = 2;

always @(posedge clk)
y = x;

Both assignments have same sensitivity ( posedge clk ), which means when clock rises, both will be scheduled to get executed at the same time. Either first ‘x’ could be assigned value ’2′ and then ‘y’ could be assigned ‘x’, in which case ‘y’ would end up with value ’2′. Or it could be other way around, ‘y’ could be assigned value of ‘x’ first, which could be something other than ’2′ and then ‘x’ is assigned value of ’2′. So depending on the order final value of ‘y’ could be different.

How can you avoid this race ? It depends on what your intention is. If you wanted to have a specific order, put both of the statements in that order within a ‘begin’…’end’ block inside a single ‘always’ block. Let’s say you wanted ‘x’ value to be updated first and then ‘y’ you can do following. Remember blocking assignments within a ‘begin’ .. ‘end’ block are executed in the order they appear.

always @(posedge clk)
begin
x = 2;
y = x;
end

2) Write-Write race condition.

always @(posedge clk)
x = 2;

always @(posedge clk)
x = 9;

Here again both blocking assignments have same sensitivity, which means they both get scheduled to be executed at the same time in ‘active event’ queue, in any order. Depending on the order you could get final value of ‘x’ to be either ’2′ or ’9′. If you wanted a specific order, you can follow the example in previous race condition.

3) Race condition arising from a ‘fork’…’join’ block.

always @(posedge clk)
fork
x = 2;
y = x;
join

Unlike ‘begin’…’end’ block where expressions are executed in the order they appear, expression within ‘fork’…’join’ block are executed in parallel. This parallelism can be the source of the race condition as shown in above example.

Both blocking assignments are scheduled to execute in parallel and depending upon the order of their execution eventual value of ‘y’ could be either ’2′ or the previous value of ‘x’, but it can not be determined beforehand.

4) Race condition because of variable initialization.

reg clk = 0

initial
clk = 1

In Verilog ‘reg’ type variable can be initialized within the declaration itself. This initialization is executed at time step zero, just like initial block and if you happen to have an initial block that does the assignment to the ‘reg’ variable, you have a race condition.

There are few other situations where race conditions could come up, for example if a function is invoked from more than one active blocks at the same time, the execution order could become non-deterministic.

-SS.

 

Max Fanout of a CMOS Gate

When it comes to doing digital circuit design, one has to know how to size gates. The idea is to pick gate sizes in such a way that it gives the best power v/s performance trade off. We refer to concept of ‘fanout’ when we talk about gate sizes. Fanout for CMOS gates, is the ratio of the load capacitance (the capacitance that it is driving) to the input gate capacitance. As capacitance is proportional to gate size, the fanout turns out to be the ratio of the size of the driven gate to the size of the driver gate.

Fanout of a CMOS gate depends upon the load capacitance and how fast the driving gate can charge and discharge the load capacitance. Digital circuits are mainly about speed and power tradeoff. Simply put, CMOS gate load should be within the range where driving gate can charge or discharge the load within reasonable time with reasonable power dissipation.

Our aim is to find out the nominal fanout value which gives the best speed with least possible power dissipation. To simplify our analysis we can focus on the leakage power, which is proportional to the width or size of the gate. Hence our problem simplifies to, how can we get the smallest delay through gates, while choosing smallest possible gate sizes.

Typical fanout value can be found out using the CMOS gate delay models. Some of the CMOS gate models are very complicated in nature. Luckily there are simplistic delay models, which are fairly accurate. For sake of comprehending this issue, we will go through an overly simplified delay model.

We know that I-V curves of CMOS transistor are not linear and hence, we can’t really assume transistor to be a resistor when transistor is ON, but as mentioned earlier we can assume transistor to be resistor in a simplified model, for our understanding. Following figure shows a NMOS and a PMOS device. Let’s assume that NMOS device is of unit gate width ‘W’ and for such a unit gate width device the resistance is ‘R’. If we were to assume that mobility of electrons is double that of holes, which gives us an approximate P/N ratio of 2/1 to achieve same delay(with very recent process technologies the P/N ratio to get same rise and fall delay is getting close to 1/1). In other words to achieve the same resistance ‘R’ in a PMOS device, we need PMOS device to have double the width compared to NMOS device. That is why to get resistance ‘R’ through PMOS device device it needs to be ‘2W’ wide.

Figure 1. R and C model of CMOS inverter

Our model inverter has NMOS with width ‘W’ and PMOS has width ‘2W’, with equal rise and fall delays. We know that gate capacitance is directly proportional to gate width. Lets also assume that for width ‘W’, the gate capacitance is ‘C’. This means our NMOS gate capacitance is ‘C’ and our PMOS gate capacitance is ‘2C’. Again for sake of simplicity lets assume the diffusion capacitance of transistors to be zero.

Lets assume that an inverter with ‘W’ gate width drives another inverter with gate width that is ‘a’ times the width of the driver transistor. This multiplier ‘a’ is our fanout. For the receiver inverter(load inverter), NMOS gate capacitance would be  a*C as gate capacitance is proportional to the width of the gate.

Figure 2. Unit size inverter driving ‘a’ size inverter

Now let’s represent this back to back inverter in terms of their R and C only models.

Figure 3. Inverter R & C model

For this RC circuit, we can calculate the delay at the driver output node using Elmore delay approximation. If you can recall in Elmore delay model one can find the total delay through multiple nodes in a circuit like this : Start with the first node of interest and keep going downstream along the path where you want to find the delay. Along the path stop at each node and find the total resistance from that node to VDD/VSS and multiply that resistance with total Capacitance on that node. Sum up such R and C product for all nodes.

In our circuit, there is only one node of interest. That is the driver inverter output, or the end of resistance R. In this case total resistance from the node to VDD/VSS is ‘R’ and total capacitance on the node is ‘aC+2aC=3aC’. Hence the delay can be approximated to be ‘R*3aC= 3aRC’

Now to find out the typical value of fanout ‘a’, we can build a circuit with chain of back to back inverters like following circuit.

Figure 4. Chain of inverters.

Objective is to drive load CL with optimum delay through the chain of inverters. Lets assume the input capacitance of first inverter is ‘C’ as shown in figure with unit width. Fanout being ‘a’ next inverter width would ‘a’ and so forth.

The number of inverters along the path can be represented as a function of CL and C like following.

Total number of inverters along chain D = Loga(CL/C) = ln(CL/C)/ln(a)

Total delay along the chain D = Total inverters along the chain * Delay of each inverter.

Earlier we learned that for a back to back inverters where driver inverter input gate capacitance is ‘C’ and the fanout ration of ‘a’, the delay through driver inverter is 3aRC

Total delay along the chain D = ln(CL/C)/ln(a) * 3aRC

If we want to find the minimum value of total delay function for a specific value of fanout ‘a’, we need to take the derivative of ‘total delay’ with respect to ‘a’ and make it zero. That gives us the minima of the ‘total delay’ with respect to ‘a’.

D = 3*RC*ln(CL/C)*a/ln(a)

dD/da = 3*RC* ln(CL/C) [ (ln(a) -1)/ln2(a)] = 0

For this to be true

(ln(a) -1) = 0

Which means : ln(a) = 1, the root of which is a = e.

This is how we derive the fanout of ‘e’ to be an optimal fanout for a chain of inverters.
If one were to plot the value of total delay ‘D’ against ‘a’ for such an inverter chain it looks like following.

Figure 5. Total delay v/s Fanout graph

As you can see in the graph, you get lowest delay through a chain of inverters around ratio of ‘e’. Of course we made simplifying assumptions including the zero diffusion capacitance. In reality graph still follows similar contour even when you improve inverter delay model to be very accurate. What actually happens is that from fanout of 2 to fanout of 6 the delay is within less than 5% range. That is the reason, in practice a fanout of 2 to 6 is used with ideal being close to ‘e’.

One more thing to remember here is that, we assumed a chain of inverter. In practice many times you would find a gate driving a long wire. The theory still applies, one just have to find out the effective wire capacitance that the driving gate sees and use that to come up with the fanout ratio.

-SS.

Inverted Temperature Dependence.

It is known that with increase in temperate, the resistivity of a metal wire(conductor) increases. The reason for this phenomenon is that with increase in temperature, thermal vibrations in lattice increase. This gives rise to increased electron scattering. One can visualize this as electrons colliding with each other more and hence contributing less to the streamline flow needed for the flow of electric current.

There is similar effect that happens in semiconductor and the mobility of primary carrier decreases with increase in temperature. This applies to holes  equally as well as electrons.

But in semiconductors, when the supply voltage of a MOS transistor is reduced, and interesting effect is observed. At lower voltages the delay through the MOS device decreases with increasing temperature, rather than increasing. After all common wisdom is that with increasing temperature the mobility decreases and hence one would have expected reduced current and  subsequently reduced delay. This effect is also referred to as low voltage Inverted Temperature Dependence.
Lets first see, what does the delay of a MOS transistor depend upon, in a simplified model.

Delay = ( Cout * Vdd )/ Id [ approx ]

Where
Cout = Drain Cap
Vdd = Supply voltage
Id = Drain current.

Now lets see what drain current depends upon.

Id = µ(T) * (Vdd – Vth(T))α

Where
µ = mobility
Vth = threshold voltage
α = positive constant ( small number )

One can see that Id is dependent upon both mobility µ and threshold voltage Vth. Let examine the dependence of mobility and threshold voltage upon temperature.

μ(T) = μ(300) ( 300/T )m
Vth(T) = Vth(300) − κ(T − 300)
here ‘300’ is room temperature in kelvin.

Mobility and threshold voltage both decreases with temperature. But decrease in mobility means less drain current and slower device, whereas decrease in threshold voltage means increase in drain current and faster device.

The final drain current is determined by which trend dominates the drain current at a given voltage and temperature pair. At high voltage mobility determines the drain current where as at lower voltages threshold voltage dominates the darin current.

This is the reason, at higher voltages device delay increase with temperature but at lower voltages, device delay increases with temperature.

-SS.

Synchronous or Asynchronous resets ?

Both synchronous reset and asynchronous reset have advantages and disadvantages and based on their characteristics and the designers needs, one has to choose particular implementation.

Synchronous reset :

Advantages :

- This is the obvious advantage. synchronous reset conforms to synchronous design guidelines hence it ensures your design is 100% synchronous. This may not be a requirement for everyone, but many times it is a requirement that design be 100% synchronous. In such cases, it will be better to go with synchronous reset implementation.

- Protection against spurious glitches. Synchronous reset has to set up to the active clock edge in order to be effective. This provides for protection against accidental glitches as long these glitches don’t happen near the active clock edges. In that sense it is not 100% protection as random glitch could happen near the active clock edge and meet both setup and hold requirements and can cause flops to reset, when they are not expected to be reset.

This type of random glitches are more likely to happen if reset is generated by some internal conditions, which most of the time means reset travels through some combinational logic before it finally gets distributed throughout the system.

Figure : Glitch with synchronous reset

As shown in the figure, x1 and x2 generate (reset)bar. Because of the way x1 and x2 transition during the first clock cycle we get a glitch on reset signal, but because reset is synchronous and because glitch did not happen near the active clock edge, it got filtered and we only saw reset take effect later during the beginning of 4th clock cycle, where it was expected.

- One advantage that is touted for synchronous resets is smaller flops or the area savings. This is really not that much of an advantage. In terms of area savings it is really a wash between synchronous and asynchronous resets.

Synchronous reset flops are smaller as reset is just and-ed outside the flop with data, but you need that extra and gate per flop to accommodate reset. While asynchronous reset flop has to factor reset inside the flop design, where typically one of the last inverters in the feedback loop of the slave device is converted into NAND gate

Figure : Synchronous v/s Asynchronous reset flop comparison.

Disadvantages :

- Wide enough pulse of the reset signal. We saw that being synchronous, reset has to meet the setup to the clock. We saw earlier in the figure that spurious glitches gets filtered in synchronous design, but this very behavior could be a problem. On the flip side when we do intend the reset to work, the reset pulse has to be wide enough such that it meets setup to the active edge of the clock for the all receivers sequentials on the reset distribution network.

- Another major issue with synchronous is clock gating. Designs are increasingly being clock gated to save power. Clock gating is the technique where clock is passed through an and gate with an enable signal, which can turn off the clock toggling when clock is not used thus saving power. This is in direct conflict with reset. When chip powers up, initially the clocks are not active and they could be gated by the clock enable, but right during the power up we need to force the chip into an known set and we need to use reset to achieve that. Synchronous reset will not take into effect unless there is active edge and if clock enable is off, there is no active edge of the clock.

Designer has to carefully account for this situation and design reset and clock enabling strategy which accounts for proper circuit operation.

- Use of tri-state structures. When tri-state devices are used, they need to be disabled at power-up. Because, when inadvertently enabled, tri-state device could crowbar and excessive current could flow through them and it could damage the chip. If tri-state enable is driven by a synchronous reset flop, the flop output could not be low, until the active edge of the clock arrives, and hence there is a potential to turn on tri-state device.

Figure : Tri-state Enable.

Asynchronous reset :

Advantages :

- Faster data path. Asynchronous reset scheme removes that AND gate at the input of the flop, thus saving one stage delay along the data path. When you are pushing the timing limits of the chip. This is very helpful.

- It has obvious advantage of being able to reset flops without the need of a clock. Basically assertion of the reset doesn’t have to setup to clock, it can come anytime and reset the flop. This could be double edged sword as we have seen earlier, but if your design permits the use of asynchronous reset, this could be an advantage.

Disadvantages :

- Biggest issue with asynchronous reset is reset de-assertion edge. Remember that when we refer to reset as ‘asynchronous’, we are referring to only the assertion of reset. You can see in figure about synchronous and asynchronous reset comparison, that one of the way asynchronous reset is implemented is through converting one the feedback loop inverters into NAND gate. You can see that when reset input of the NAND gate, goes low it forces the Q output to be low irrespective of the input of the feedback loop. But as soon as you deassert reset, that NAND gate immediately becomes an inverter and we are back to normal flop, which is susceptible to the setup and hold requirements. Hence de-assertion of the reset could cause flop output to go metastable depending upon the relative timing between de-assertion and the clock edge. This is also called reset recovery time check, which asynchronous reset have to meet even if they are asynchronous ! You don’t have this problem in synchronous reset, as you are explicitly forced to check both setup and hold on reset as well as data, as both are AND-ed and fed to the flop.

- Spurious glitches. With asynchronous reset, unintended glitches will cause circuit to go into reset state. Usually a glitch filter has to be introduced right at the reset input port. Or one may have to switch to synchronous reset.

- If reset is internally generated and is not coming directly from the chip input port, it has to be excluded for DFT purposes. The reason is that, in order for the ATPG test vectors to work correctly, test program has to be able to control all flop inputs, including data, clock and all resets. During the test vector application, we can not have any flop get reset. If reset is coming externally, test program hold it at its inactive value. If master asynchronous reset is coming externally, test program also holds it at inactive state, but if asynchronous reset is generated internally, test program has no control on the final reset output and hence the asynchronous reset net has to be removed for DFT purpose.

One issue that is common to both type of reset is that reset release has to happen within one cycle. If reset release happen in different clock cycles, then different flops will come out of reset in different clock cycles and this will corrupt the state of your circuit. This could very well happen with large reset distribution trees, where by some of receivers are closer to the master distribution point and others could be farther away.

Thus reset tree distribution is non-trivial and almost as important as clock distribution. Although you don’t have to meet skew requirements like clock, but the tree has to guarantee that all its branches are balanced such that the difference between time delay of any two branches is not more than a clock cycle, thus guaranteeing that reset removal will happen within one clock cycle and all flops in the design will come out of reset within one clock cycle, maintaining the coherent state of the design.

To address this problem with asynchronous reset, where it could be more severe, the master asynchronous reset coming off chip, is synchronized using a synchronizer, the synchronizer essentially converts asynchronous reset to be more like synchronous reset and it becomes the master distributor of the reset ( head of reset tree). By clocking this synchronizer with the clock similar to the clock for the flops( last stage clock in clock distribution), we can minimize the risk of reset tree distribution not happening within one clock.

-SS.

Verilog execution order

Following three items are essential for getting to the bottom of Verilog execution order.

1) Verilog event queues.

2) Determinism in Verilog.

3) Non determinism in Verilog.

Verilog event queues : 

To get a very good idea of the execution order of different statements and assignments, especially the blocking and non-blocking assignments, one has to have a sound comprehension of inner workings of Verilog.

This is where Verilog event queues come into picture. Sometime it is called stratified event queues of Verilog. It is the standard IEEE spec about system Verilog, as to how different events are organized into logically segmented events queues during Verilogsimulation and in what order they get executed.

 Figure : Stratified Verilog Event Queues.

As per standard the event queue is logically segmented into four different regions. For sake of simplicity we’re showing the three main event queues. The “Inactive” event queue has been omitted as #0 delay events that it deals with is not a recommended guideline.

As you can see at the top there is ‘active’ event queue. According to the IEEE Verilog spec, events can be scheduled to any of the event queues, but events can be removed only from the “active” event queue. As shown in the image, the ‘active’ event queue holds blocking assignments, continuous assignments. primitive IO updates and $write commands. Within “active” queue all events have same priority, which is why they can get executed in any order and is the source of nondeterminism in Verilog.
There is a separate queue for the LHS update for the nonblocking assignments. As you can see that LHS updates queue is taken up after “active” events have been exhausted, but LHS updates for the nonblocking assignments could re-trigger active events.

Lastly once the looping through the “active” and non blocking LHS update queue has settled down and finished, the “postponed” queue is taken up where $strobe and $monitor commands are executed, again without any particular preference of order.

At the end simulation time is incremented and whole cycle repeats.

Determinism in Verilog. 

Based on the event queue diagram above we can make some obvious conclusions about the determinism.

- $strobe and $monitor commands are executed after all the assignment updates for the current simulation unit time have been done, hence $strobe and $monitor command would show the latest value of the variables at the end of the current simulation time.

- Statements within a begin…end block are evaluated sequentially. This means the statements within the begin…end block are executed in the order they appear within the block. The current block execution could get suspended for execution of other active process blocks, but the execution order of any being..end block does not change in any circumstances.

This is not to be confused with the fact that nonblocking assignment LHS update will always happen after the blocking assignments even if blocking assignment appears later in the begin..end order. Take following example.

initial begin
x = 0
y <= 3
z = 8
end

When we refer of execution order of these three assignments.

1) First blocking statement is executed along with other blocking statements which are active in other processes.

2) Secondly for the nonblocking statement only RHS is evaluated, it is crucial to understand that the update to variable ‘y’ by value of ’3′ doesn’t happen yet. Remember that nonblocking statement execution happens in two stages, first stage is the evaluation of the RHS and second step is update of LHS. Evaluation of RHS of nonblocking statement has same priority as blocking statement execution in general. Hence in our example here, second step is the evaluation of RHS of nonblocking statement and

3) third step is execution of the last blocking statement ‘z = 8′. The last step here will be the update to ‘y’ for the nonblocking statement. As you can see here the begin .. end block maintains the execution order in so far as the within the same priority events.

4) last step would be the update of the LHS for the nonblocking assignment, where ‘y’ will be assigned value of 3.

- One obvious question that comes to mind, having gone through previous example is that what would be the execution order of the nonblocking LHS udpate !! In the previous example we only had one nonblocking statement. What if we had more than one nonblocking statement within the begin..end block. We will look at two variation of this problem. One where two nonblocking assignments are to two different variable and the two nonblocking assignments to same variable !!

First variation.

initial begin
x = 0
y <= 3
z = 8
p <= 6
end

For the above mentioned case, the execution order still follows the order in which statements appear.

1) blocking statement ‘x = 0′ is executed in a single go.

2) RHS of nonblocking assignment ‘y <= 3′ is evaluated and LHS update is scheduled.

3) blocking assignment ‘z = 8′ is executed.

4) RHS of nonblocking assignment ‘p <= 6′ is evaluated and LHS update is scheduled.

5) LHS update from the second nonblocking assignment is carried out.

6) LHS update from the last nonblocking assignment is carried out.

Second variation.

initial begin

x = 0
y <= 3
z = 8
y <= 6
end

For the above mentioned case, the execution order still follows the order in which statements appear.

1) blocking statement ‘x = 0′ is executed in a single go.

2) RHS of nonblocking assignment ‘y <= 3′ is evaluated and LHS update is scheduled.

3) blocking assignment ‘z = 8′ is executed.

4) RHS of nonblocking assignment ‘y <= 6′ is evaluated and LHS update is scheduled.

5) LHS update from the second nonblocking assignment is carried out, ‘y’ is 3 now.

6) LHS update from the last nonblocking assignment is carried out, ‘y’ is 6 now.

Non-determinism in Verilog.

One has to look at the active event queue in the Verilog event queues figure, to get an idea as to where the non-determinism in Verilog stems from. You can see that within the active event queue, items could be executed in any order. This means that blocking assignments, continuous assignments, primitive output updates, and $display command, all could be executed in any random order across all the active processes.

Non-determinism especially bits when race conditions occur. For example we know that blocking assignments across all the active processes will be carried out in random order. This is dandy as long as blocking assignments are happening to different variables. As soon as one make blocking assignments to same variable from different active processes one will run into issues and one can determine the order of execution. Similarly if two active blocking assignments happen to read from and write to the same variable, you’ve a read write race.

We’ll look at Verilog race conditions and overall good coding guidelines in a separate post.

-SS.

Interview preparation for a VLSI design position

Some people believe that explicitly preparing for job interview questions and answers is futile. Because when it comes to important matter of job interview, what counts is real knowledge of the field. It is not an academic exam, where text-book preparation might come handy. You just have to know the real deal to survive a job interview. Also it is not only about the technical expertise that gets tested during job interview, but it is also about your overall aptitude, your social skill, your analytical skill and bunch of other things which are at stake.

Agreed, that it is not as simple as preparing few specific technical questions will lend you the job. But author’s perspective is that, one should prepare specific interview questions as a supplement to the real deal. One has to have the fundamental technical knowledge, the technical ability, but it doesn’t hurt to do some targeted preparations for job interview. It is more of a brush up of things, revision of old knowledge, tackling of some well-known technical tricks and more importantly boosting your confidence in the process. There is no harm and it definitely helps a lot to do targeted preparation for interview. Not only one should prepare for technical questions, but there is a most often asked behavioral questions set also available. One would be surprised, how much the preparation really helps.

It really depends on which position you are applying. Chip design involves several different skill and ability area, including RTL design, synthesis, physical design, static timing analysis, verification, DFT and lot more. One has to focus on the narrow field relevant to the position one is interviewing for. Most of the job positions tend to be related to ASIC design or the digital design. There are a few position in the custom design, circuit design, memory design and analog or mixed signal design.

What helps is having CMOS fundamental understanding. More than you might realize. Secondly you need to know more about verilog, as you will be dealing with verilog as long as you are in semiconductor industry. Next would come the static timing analysis. You need to know about timing also as long as you are in semiconductor industry as every chip has to run at certain frequency. Knowing about DFT is very crucial as well, because every chip designed has one or the other form of testability features, because in submicron technology no chip is designed without DFT. Basically focus on verilog, timing and DFT and fundamentals about MOS is what you need to begin with.

After having done the de-facto preparation of VLSI interview questions, you can focus more on the specific niche or the focus area that you are interviewing for, which could be verification, analog design or something else.

Latch using a 2:1 MUX

After the previous post about XNOR gate using 2:1 MUX, one might have thought that finally we exhausted the number of gates that we could make using 2:1 MUX. But that is not entirely true !!

There are still more devices that we can make using a 2:1 MUX. These are some of the favorite static timing analysis and logic design interview questions and they are about making memory elements using the 2:1 MUX.

We know the equation of a MUX is :

Out = S * A + (S)bar * B

We also know that level sensitive latch equation is

If ( Clock )

Q = D [ This means if Clock is high Q follows D ]

else

Q = Q [ If clock is off, Q holds previous state ]

We can rewrite this as following :

Q = Clock * D + (Clock)bar * Q

This means we can easily make a latch using 2:1 MUX like following.

Latch using a 2:1 MUX

When CLK is high it passes through D to O and when CLK is off, O is fed back to D0 input of mux, hence O appears back at the output, in other words, we retain the value of O when CLK is off. This is what exactly latch does.

So what else can we make now ?

-SS