In the post, Integated Clock Gating Cell, we discussed that an ICG has a negative level-sensitive latch preceding an AND gate in order to relax hold timing for clock gating check. And we discussed that it gives benefits for area, power and timing. Let us discuss how area, power and timing are saved. We will discuss only for the case of AND gate, the same will follow for OR gate.
1. Architectural benefits - simplicity in clock handling: By introducing ICGs in place of discrete gates, you dont have to worry about the launch edge of the signal while writing RTL (for details, see here). One can always launch the signal from positive edge-triggered flip-flop for timing and architectural simplicity without worrying about possibility of glitch in clock path due to wrong polarity flip-flop launching enable signal.
2. Benefits in area and power: Having custom module allows for better utilization of resources inside the custom ICG module; hence, it is expected to have lesser power than a latch and an AND gate combined.
3. Benefits in timing: Having the path from latch -> AND inside ICG saves us from having to meet these paths individually, which could take a lot of effort with discrete latch and AND gate. Also, it allows for latch to have almost full time borrow, thereby making almost a full cycle path from a positive edge-triggered flip-flop to ICG.
Problem statement: Given an 8:1 multiplexer such that the input connected to 5th input is the most setup timing critical and other inputs are timing critical in the order D0 > D1 > D2 > D3 > D4 > D6 > D7. Restructure the logic accordingly. Solution: We know that the most setup timing critical signal should have least logic in the data path. So, we need to prioritize 5th input such that it has least logic out of all the inputs. In other words, this is a problem of converting an ordinary multiplexer to a priority multiplexer. Let us first discuss how we can convert a multiplexer to priority mux.
Figure 1 below shows a multiplexer with 8-inputs D0 - D7 and selects S2,S1,S0.
This multiplexer can be represented in the form of a priority multiplexer as required is as shown in figure 2 below.
We can start from the equation of the priority multiplexer and prove that it is actually equivalent to 8:1 mux.
The equation of the priority multiplexer is given as:
O = (S0.S1'.S2).D5 + (S0.S1'.S2)'.(S0'.S1'.S2').D0 + (S0.S1'.S2)'.(S0'.S1'.S2').(S0.S1'.S2').D1 + (S0.S1'.S2)'.(S0'.S1'.S2').(S0.S1'.S2')'.(S0'.S1.S2').D2 + (S0.S1'.S2)'.(S0'.S1'.S2').(S0.S1'.S2')'.(S0'.S1.S2')'.(S0.S1.S2').D3 + (S0.S1'.S2)'.(S0'.S1'.S2').(S0.S1'.S2')'.(S0'.S1.S2')'.(S0.S1.S2')'.(S0'.S1.S2).D4 + (S0.S1'.S2)'.(S0'.S1'.S2').(S0.S1'.S2')'.(S0'.S1.S2')'.(S0.S1.S2')'.(S0'.S1.S2)'.(S0'.S1.S2).D6 + (S0.S1'.S2)'.(S0'.S1'.S2').(S0.S1'.S2')'.(S0'.S1.S2')'.(S0.S1.S2')'.(S0'.S1.S2)'.(S0'.S1.S2)'.(S0.S1.S2).D7 Simplifying the above equation leads us to the equation of ordinary multiplexer.
Priority multiplexers are common in case one of the inputs is to be prioritized. The reason for this can be either functional or timing. In case of timing being the reason, the most setup timing critical input is connected to the highest priority input. The reason is that the highest priority signal gets the least logic in its path in a priority multiplexer as discussed below. The schematic for a 4-input priority mux is shown in figure 1 below:
A priority multiplexer selects the input if the select corresponding select is "1" and none of the selects of with higher priority is "1". For example, if S0 is "1", X0 will be selected always. But if S2 is "1", X2 will be selected only if both S1 and S0 are "0".
The logic equation of a priority mux can be written as:
Y = S0.X0 + S0'.S1.X1 + S0'.S1'.S2 X2 + S0'.S1'.S2'.S3.X3
The output of the priority mux is not valid if all the select signals are "0", as we dont know which input to select in that case. That is why, figure 1 shows the D0 of the left multixer as don't care.
Here, we are given a problem wherein only ( 0 -> 1 ) transition of the signal is delayed by a single clock cycle whereas the other transition changes the output combination-ally. In other words, we are given the task to implement a Mealy state machine as output is both a function of state variables and input. However, this is a pretty simple problem involving single state. The output is a function of:
Input value one cycle before
Output should go "0" as soon as input goes "0". But it should go "1" when input one cycle back is "1". But there is a twist. What if current input is "0" and one cycle back, it was "1"? There is no clarity in the problem statement. Let us assume the output remains unchanged in such condition. The state transition table looks as shown below:
We can use K-map to solve for O. The solution is given in the figure below:
The resulting circuit is as shown in figure below.
Can you figure out the circuit that design that delays the negative edge of a signal by one cycle?
Here, the problem involves detecting if each bit of a signal is equal to corresponding bit of the other signal and then generating a resultant. First of all, the circuit which provides equivalence of 1-bit is nothing but an XNOR gate as explained here. So, we require 8 XNOR gates to judge equivalence of individual bits. Even if one of the bits is "0", it means the numbers are not equal, which can be obtained by ANDing the eight bits together.
Alternatively, we can use an XOR gate as well. An XOR gate provides output as "1" if the two inputs are not equal as explained here. Even if one of the 8 individual XOR gates provides output as "1", it will mean that the numbers are not equal, which can be obtained by NORing the eight bits together.
Can you judge which of these can be implemented with less area and power?
Ever thought why it is expected for setup checks to be single cycle or zero cycle? Common sense prevails that the data launched at any instant is expected to be captured at the next available instant, forming a setup check. We will explain here with the help of an example of latch-to-reg and reg-to-latch path.
Why setup check for postive latch to positive edge-triggered register is full cycle: Figure 1 below shows the clock waveforms for a positive latch to positive edge-triggered flip-flop timing paths. Positive edge-triggered register samples data only on positive edges. In the below figure, those instances are either "Time = 0" or "Time = T". The output of latch can change anytime when it is transparent. The earliest it can change is at "Time = 0+". So, the next instant it can get captured at the register is "Time = T". This makes the default setup check for such paths. That is why setup check for positive latch to positive edge-triggered registers is full cycle.
Figure 1: Single cycle setup check from positive latch to positive edge-triggered flip-flop
Why setup check for positive edge-triggered register to positive level-sensitive latch: Similar to the above case, figure 2 shows the clock wave-forms for positive edge-triggered flip-flop to positive latch timing paths. The flip-flop can launch data at either "Time = 0" or "Time = T". So, data will be available at its output at either "Time = 0+" or "Time = T+". The latch can capture the data at the same instant as it launched as it is transparent at that time. So, setup check is considered to be zero cycle in this case.
Figure 2: Zero cycle setup check from positive edge-triggered flip-flop to positive latch
We need to note that the "from" and "to" edges in these checks denoted in text-books (or shown in timing reports by STA tools by default) are default setup and hold checks. The actual setup and hold check edges are what is represented by the state machine the timing path belongs to. It is possible to ovverride default check edges by command set_multicycle_path. I would recommend going through setup and hold - the state machine essenstials to have a deeper understanding of this concept. In other words, you could have made positive latch to positive edge-triggered flip-flop setup check as single cycle. But you would also have to modify the hold check in that case. Can you guess what it would be?
Problem statement: An 8:1 multiplexer selects one out of three inputs based upon different combinations of S2, S1 and S0 as shown in figure below. Minimize the logic with a view that B is the most timing critical input.
A 4-input MUX has one 4-input AND and one 8-input OR between each input and the output. However, since, there is one signal connected to many of the inputs, there seems to be a scope of logic minimization. Let us use K-map to minimize the logic for problem. The K-map for this problem is as shown below:
Writing the boolean expression, we get:
O = S2'S1'S0' C + S2'S1'S0 B + S1 C + S2S1'S0'A + S2S1'S0 C
O = C (S1 + S1' (S2 ⊕ S0)) + S2'S1'S0 B + S2 S1' S0' A
O = C (S1 + ( S2 ⊕ S0 )) + S2'S1'S0 B + S2 S1' S0' A [Using A + A'B = A + B]
If we analyze carefully, we see that O is obtained by OR-ing three terms; one for A, one for B and one for C. The resulting structure is shown below:
Since, B is the most timing-critical input; there should be minimum logic between B and output. In other words, it should be closest to output. In the above figure, we see that there is a 4-input AND gate and a 3-input OR gate between O and B. We can reduce the logic between B and O by breaking 3-input OR into 2-input OR gates such that B is closest to output. similarly, we can break the 4-input AND gate into 2-input and 3-input AND gates. Thus, we are left with one 2-input AND gate and one 2-input OR gate between B and O as shown in figure below.
We see that there is still a possibility of logic re-structuring between B and O. De-Morgans theorem states that
A + B = (A' B')'
Going by this, we can convert the OR gate at the output into NAND gate as shown in figure below.
The bubbles at the input of NAND gate can be moved to the outputs of respective drivers. Or, saying, more sophistically, there are two NAND gates between B and O.
Thus, we have achieved our purpose of
Minimizing the logic
Assuring that there is minimum logic between B and O, since B is the most timing critical input.
We need to keep in mind that there may be more than one solutions to each logic minimization problems. Can you think of a better realization of the circuit in question? What could have been the realization of the circuit in case C was the most timing critical input?
We all know that a 2-input multiplexer selects one out of the available two inputs. Similarly, an N-input multiplexer selects one out of the available N inputs. Now, coming to the answer of the question,
The number of 2-input multiplexers needed to implement an N-input multiplexer is (N-1).
We will arrive at this conclusion by giving an example of an 8-input multiplexer implemented with the help of 2-input multiplexers. Each of the 2-input muxes reduces the number of signals by 1, thereby requiring total of "7" 2-input muxes to implement an 8-input mux.
Let us consider a circuit with 8 inputs and variable outputs. Firstly, if it has 8 outputs, there is no mux in-between.
Figure 1: Circuit with 8 output lines to represent 8 input lines
Now, if we add a 2-input mux between any of the 2 lines, then, we are reducing the number of output lines by 1 as shown in figure 2.
Figure 2: Circuit with 7 output lines and 8 input lines
Simlarly, adding another 2-input mux will leave the number of output lines to be 6. Figure 3 shows two of the all possible configurations of such circuits.
Figure 3: Circuits with 6 output lines and 8 input lines
Similarly, continuing on similar lines, we will need "7" 2-input muxes to converge to a single output. Thus, we can say that "7" 2-input muxes make an 8-input mux.
Similarly, going by this, we will need 13 2-input muxes for a 14:1 mux, 31 for a 32:1 mux, and so on. In other words, (N-1) 2:1 muxes will make up an N-input mux.
In the post 2x1 mux using NAND gates, we discussed how we can use NAND gates to build a 2x1 multilexer. In this post, we will discuss how we can use NAND gates to build a 4x1 mux:
1. Using structural approach: As we know that a 4x1 mux can be structurally built from 2x1 muxes as shown in figure 1 below. Thus, in the same way, we can arrange the 2-input NAND gates to build 4x1 muxes as shown in figure 1.
Figure 1: 4x1 mux using NAND gates with structural approach
2. Building 4x1 mux directly from NAND gates: The logical equation of a 4x1 multiplexer is given as:
Y = (S1' S0' A + S1' S0 B + S1 S0' C + S1 S0 D)
where S1 and S0 are the selects of the multiplexer and A, B, C and D are the multiplexer inputs.