# Quantifying Error in Dynamic Power Estimation of CMOS Circuits\*

Puneet Gupta<sup>†</sup> and Andrew B. Kahng<sup>†‡</sup>

\* Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA, USA ‡ Department of Computer Science and Engineering, UC San Diego, La Jolla, CA, USA puneet@ucsd.edu, abk@ucsd.edu

## Abstract

Conventional power estimation techniques are prone to many sources of error. With increasing dominance of coupling capacitances, capacitive coupling potentially contributes significantly to UDSM power consumption. We analyze potential sources of inaccuracy in power estimation, focusing on those due to coupling. Our results suggest that traditional power estimates can be off by as much as 50%.

#### 1 Introduction

The advent of low-power portable devices along with continued increases in device density and operating frequency make power consumption a major concern in modern VLSI design. In this work, we seek to identify sources of error in power estimation, focusing on those which arise if effects of capacitive coupling are ignored. Our aim is not to propose new coupling-aware power estimation techniques but rather to quantify the error in non-coupling aware methods. Here, "ignoring" coupling includes ignoring crosstalk noise, as well as ignoring neighbor switching and its effect on effective value of coupling capacitance.

Capacitive switching, leakage, short-circuit current and standby current are the sources of power consumption. *Static power* refers to the sum of leakage and standby power while *dynamic power* is the sum of short-circuit and switching power<sup>1</sup>. In this work, we concern ourselves with dynamic power consumption and capacitive switching in particular. Power estimation approaches can be classified into two categories, as follows.

- Simulative Techniques. These techniques employ direct simulation or statistical sampling techniques. Issues such as hazard generation and propagation, or reconvergent fanout-induced correlations, are automatically taken into consideration. If performed after layout and parasitic extraction, accurate estimation of capacitances (including coupling) and their effects is possible. A circuit simulator such as HSpice [10] or Powermill [18] is used for estimation of average power. A gate-level HDL simulation using tools such as NC-Verilog can also be adapted to report power dissipation using power models of gates from the library.
- Probabilistic Techniques. To avoid the strong pattern dependence and huge running times of simulation-based approaches, probabilistic approaches are used. These calculate probabilities of switching activity for each circuit node and multiply by CV<sup>2</sup><sub>dd</sub> to obtain the node's energy consumption. This dynamic capacitive power is summed up over all nodes to obtain total energy consumption of the circuit. The transition probability of each gate is sometimes referred to as the

activity factor [3, 7]. [8] predicts constant activity factor of 0.15 through all technology nodes.

Except for transistor-level simulation, typical power estimation techniques are oblivious to coupling. Coupling capacitances are larger than ground capacitances for interconnect, and their contribution depends on the transitions on the wires. According to [8], at the 90nm technology node capacitance density from interconnect is  $0.8nF/mm^2$  while the logic capacitance density is  $0.13nF/mm^2$ . Coupling capacitances can contribute 0-80% (a conservative estimate) of the total interconnect capacitance. Hence, coupling can account for up to 60% of the dynamic power consumption.

[25, 22] present algorithms to compute the total energy consumption considering coupling but [22] assumes complete voltage transitions and ignores crosstalk noise. The model outlined in [25] takes into account incomplete voltage transitions but ignores crosstalk noise. Both of these approaches use an infinite driver and ignore short-circuit power. [23] follows a simple Spice simulation based approach to generate three wire look-up tables for various interconnect lengths; these are used to compute energy consumption using interpolation. It is not clear whether simple linear interpolation is a correct assumption. All the above three approaches assume a known input vector (and thus suffer from pattern dependence) and essentially replace the circuit simulation step in the simulative techniques. [9, 21] take a switch factor based approach to model capacitive coupling. [9] incorrectly assumes the worst-case switch factor to be one, and also assumes all lines to be transitioning simultaneously. [21] essentially modifies activity factors of lines to take into account neighbor switching.. Effect of slew times and switching windows is ignored. Moreover, it is not clear how the switching correlation between lines is being estimated. The above-mentioned techniques compute average power consumption in a design. However, the worst-case impact of capacitive coupling can be large, and is more visible in the case of peak power computation which is relevant for worst-case voltage drop calculations. Even recent literature [26, 14] ignores this. For instance, [26] assumes just gate fanout to be a measure of capaci-

The remainder of our paper is organized as follows. In Section 2, we list possible sources of error in power estimation, concentrating on errors previously unexplored in the literature. Section 3 describes our simulation setup while Section 4 explains the assumptions made and their experimental confirmation. Section 5 gives some simulation results which help in quantifying various errors. Finally, in Section 6 we conclude by extrapolating the error estimates to full-chip estimates and predicting the extent of guard-banding required in conventional power estimation.

## 2 Sources of Inaccuracies

We have identified the following as potential sources of error in conventional power estimation. In later sections, we experimentally quantify all except numbers 5, 7 and 8.



<sup>\*</sup>This research was supported in part by the MARCO Gigascale Silicon Research Center and Cadence Design Systems, Inc.

<sup>&</sup>lt;sup>1</sup>There may be some clash of terminology here as the term dynamic power is sometimes used to refer to total instantaneous peak power rather than the time averaged power we use it for. We use the terminology presented in [17].

- Crosstalk noise. This error source is distinct from glitches arising from mismatched arrival times. Capacitive coupling can cause power-drawing voltage glitches on a silent line. Also, a transition on a victim line may become nonmonotone due to coupling with neighbors. In this case, the supply has to provide more current than in the case of a monotone waveform.
- 2. Short-circuit power due to crosstalk. Due to capacitive coupling, the victim receiver may spend a large (small) time in the middle transition region essentially due to increased (decreased) slew time and the transition waveform may not be monotone. This can cause a larger (smaller) short-circuit power dissipation in the victim receiver. Similarly, for the victim driver the output transition time changes due to capacitive coupling and can lead to increased or decreased short-circuit power depending on the transitions of the aggressors.
- 3. *Incorrect switch factors*. Coupling capacitances need to be treated differently as their power contribution is dependent on *two* transitions (victim and aggressor) rather than one. The typical approach to account for coupling capacitances is to convert them to equivalent grounded capacitances via a *switch factor* or *Miller factor* that depends on the overlap between victim and aggressor switching windows as well as their relative slew times. For delay, switch factor is typically computed as  $1 \frac{\Delta V_a}{V_{th}}$  where  $\Delta V_a$  is the change in the aggressor voltage for a victim voltage swing of  $V_{th}$  (typically  $50\% V_{dd}$ ) [4]. For power computations the switch factor should be computed for a rail-to-rail victim swing. This means that delay-based switch factors can overestimate the power consumption:  $-1 \le SF_{delay} \le 3$  [4, 13] while  $0 \le SF_{power} \le 2$ .
- 4. *Incomplete voltage swings*. If slew times are large, then a transition may not be rail-to-rail. In this case the total charge drawn from supply, and hence the power consumption, is reduced. This consideration is even more relevant for glitches which may not have enough time for a complete swing. Glitches due to mismatched arrival times of input signals are modeled as logic glitches by both logic simulators and probabilistic techniques. This means the power drawn by these glitches is computed assuming complete voltage swings while in most cases they will be smaller than complete voltage swings. Only simulative techniques can correctly take incomplete voltage swings into account. Power calculation as outlined in the IEEE Delay and Power Calculation System [6] has function prototypes to handle partial swing events.
- 5. Incorrect estimation of activity factors. Besides the abovementioned errors due to approximations, probabilistic determination of activity factors suffers from errors due to ignoring of neighboring transitions. Transition probabilities are found for each gate in isolation while the power consumption also depends on the neighboring transitions. As a result, triplet-wise activity factors (e.g., P(000→011)) are required as opposed to gate activity factors (e.g., P(0 → 1)). The calculation of these activity factors requires consideraion of transitions on interconnect segments rather than just gates. Moreover, switching windows and slew rates of the transitions must be taken into account. Quantifying this error is beyond the scope of our work as it requires extensive knowledge of switching correlations in the design under analysis.
- 6. Incorrect estimation of load capacitances. If we assume that all voltage swings in circuit are rail-to-rail (i.e., 0 ↔ V<sub>dd</sub>), then there is no resistive shielding of capacitances for switching-power calculation purposes. Thus, all ground capacitances should be lumped together, rather than using effective capacitance models such as those given in [12, 19]. For uncoupled lines [12] gives C<sub>ramp</sub> ≤ C<sub>eff</sub> ≤ C<sub>total</sub> for de-

lay computation, but for power computation  $C_{total}$  must be used.

In today's design flows, model order reduction steps used for DSPF to RSPF conversion potentially lead to incorrect power calculations. However, any model order reduction method which preserves the first moment of the driving point admittance function will also preserve the total capacitance. The commonly used  $\Pi$  model of [16] preserves the total capacitance, and to our knowledge, commercial tools use  $C_{total}$ and DSPF [24]. Hence, we do not expect this error to be present in most power estimation methodologies and do not address it in our experiments. It is worth noting that for calculation of short-circuit power,  $C_{eff}$  which preserves the output slew time needs to be used. If  $C_{total}$  is used instead, shortcircuit power will be underestimated as it decreases with increasing output slew time. (This error presently is likely to be small, as short-circuit power is a small component of total power consumption.)

- 7. *Incorrect propagation of glitches*. Verilog based simulation reports only complete logic glitches (lower bound) while probabilistic estimates based on STA and gate delay models overestimate the number of glitches. This is a well-known error [17, 26] and we do not address it in this work.
- 8. Other Errors. Probabilistic power estimation suffers from various other errors such as inaccurate gate delay models and spatial and temporal independence assumptions. These sources of errors are well known [17, 15] and are again beyond the scope of this work.

The above sources of errors motivate the question of where accurate coupling-aware power estimation is possible in the design flow. From the above discussion, we see that switching of neighboring wires can potentially have sizable impact on average as well as peak power consumption. Exact adjacency information can be known only after detailed routing, hence a switching activity aware power estimation which correctly takes capacitive coupling into account is best done only after detailed routing. An approximation can be to compute guardbanded values of power consumption ignoring neighbor-switching (and therefore assuming worst and best case values of coupling capacitance) and using independent activity factors. This can be done at any point in the flow where coupling and ground capacitances can be estimated with reasonable accuracy (e.g., based on post-placement global routing). In our experiments we assume that accurately extracted parasitics are available.

#### 3 Experimental Testbed

Our experimental testbed consists of systems of either one or two global RC interconnect lines in 180nm technology. Specifically, a 5 mm coupled interconnect is simulated with a real inverter at the source end and a load capacitance at the sink end. All values of interconnect and device parameters are derived from [11]. Device models from [2] are used and simulations are performed using *Synopsys Star-HSpice* [10]. Unless otherwise stated, interconnect is modeled with lumped L segments. We model distributed interconnect by 250µm-long segments.

Interconnect resistance per unit length is  $r=0.04\Omega/\mu m$ , ground capacitance per unit length is  $c_g=0.06fF/\mu m$ , and coupling capacitance is  $c_c=0.12fF/\mu m$  per nearest-neighbor aggressor. The load capacitance  $C_L$  is kept equal to the total interconnect ground capacitance in all our experiments; this is typical of global buffered interconnect. W/L ratio is taken to be 83 for NMOS transistors for all drivers, and W/L for PMOS is twice that of NMOS. If parameter values differ from the ones given here, we call this out in the experiment description. We adopt the following notation.

• Line 1 in a two-line system is the designated victim line,



| Output<br>Logic-level | W/L<br>(NMOS) | I <sub>Leakage</sub> )<br>(nA) |
|-----------------------|---------------|--------------------------------|
| 0                     | 83            | 22                             |
| 1                     | 83            | 12                             |
| 0                     | 55            | 15                             |
| 1                     | 55            | 8                              |

Table 1: Leakage power for typical global buffers used in our simulations.

|            | $t_{slew}$ | W/L    | Avg. Curr. |
|------------|------------|--------|------------|
| Transition | (ps)       | (NMOS) | (mA)       |
| D          | 100        | 55     | 0.085      |
| U          | 100        | 55     | 0.026      |
| D          | 200        | 55     | 0.11       |
| U          | 200        | 55     | 0.055      |
| D          | 100        | 83     | 0.126      |
| U          | 100        | 83     | 0.039      |
| D          | 200        | 83     | 0.165      |
| U          | 200        | 83     | 0.082      |

Table 2: Variation of short-circuit power with  $t_{slew}$  and device W/L. This is the worst-case short-circuit power (zero load capacitance).

while Line 2 is the only aggressor.<sup>2</sup>

- U represents a  $0 \rightarrow V_{dd}$  transition on the line.
- D represents a  $V_{dd} \rightarrow 0$  transition on the line.
- 0 represents a static 0 on the line.
- 1 denotes a static 1 on the line.

For example, 0U means that Line 1 is quiet at 0V and Line 2 is making an upward transition on the inverter output. We measure the average current drawn over a period of 1800ps on the power supply of the designated victim line; this measurement interval is kept large to ensure that a rail-to-rail transition occurs. We take charge drawn from supply as the measure of power (energy drawn can be obtained by multiplying the charge drawn by  $V_{dd}$ ).

## 4 Assumptions and Confirmations

The following basic assumptions are implicit in our experiments. (1) We assume that leakage and short-circuit current account for only a small part of the total current drawn. Therefore, we approximate total current by capacitive switching current. (2) We perform experiments with only one slew time. Then, to show that our results are applicable for a wide range of slew times, we show that the dependence of power consumption on slew time is weak. (3) Finally, we model interconnect by a lumped model which is also verified to only insignificantly affect the current drawn. In this section, we provide justifications for these basic assumptions. (For simplicity of presentation, we do not present exhaustive simulation results, but rather show the essential (and representative) data.)

**Negligible Leakage Power.** The leakage component of power is small for 180nm technology but can rise rapidly in future technology nodes, particularly for high-performance devices as opposed to low operating-power or low standby-power devices [11]. Typical values of leakage power for the inverters used in our experiments are summarized in Table 1. The leakage current is 3-4 orders of magnitude smaller than the switching current. More important, it is independent of load capacitances. We therefore ignore the leakage component of power in all our experiments.

**Small and/or Constant Short-circuit Power.** To measure the short-circuit power, we measure the current passing through the ground terminal of the driver. Table 2 tabulates worst-case (zero load capacitance) values for short-circuit power; Table 3 shows that short-circuit power decreases as load capacitance increases. To compute the switching current ( $I_{sw}$ ) we consider the U transition and subtract  $I_{SC}$  from the total current drawn. With respect

| $C_L$ | Avg. $I_{SC}$ | Avg. $I_{Total}$ |
|-------|---------------|------------------|
| (fF)  | (mA)          | (mA)             |
| 0     | 0.082         | 0.175            |
| 50    | 0.076         | 0.219            |
| 500   | 0.052         | 0.64             |

Table 3: Variation of short-circuit power with load capacitance for NMOS W/L=83 and input slew time=200ps. Note that typical values of load capacitance for drivers of buffered global interconnect will exceed 500fF.

| Input $t_{slew}$ (ps) | Avg. Curr. (mA) |
|-----------------------|-----------------|
| 50                    | 0.73            |
| 100                   | 0.74            |
| 200                   | 0.76            |
| 300                   | 0.80            |

Table 4: Power consumption for an uncoupled line.  $C_g = C_L = 300 fF$ .

to Table 3, we observe that typical values of load capacitance for drivers and repeaters in global buffered interconnects are greater than 500fF. We further observe that (i) the dependence of short-circuit current on load capacitance is weak, while the switching current is strongly dependent on load capacitance; and (ii) short-circuit power, can be kept to less than 10% of total dynamic power with proper design (balanced input and output slew times) [27, 20]. Our experiments involve the input to the driver switching much faster than its output, resulting in very small short-circuit power. In light of these observations, we consider any variation in total power drawn from the supply to stem from switching current alone.

Independence of Slew Times and Power Estimation. For uncoupled lines, the power drawn should depend only on the total switched capacitance. Short-circuit power is affected by the slew time but, as discussed above, is a very small component of the total dynamic power. Table 4 shows the variation of total power with varying slew times for a single uncoupled line. For coupled lines, the effective coupling capacitance varies with relative slew times and switching windows of the victim and the aggressor. If the aggressor signal arrives after the victim signal, then the effective capacitance is almost independent of the slew time, so that power is only weakly dependent on slew times. This phenomenon is examined in greater depth in Section 5.2. Table 5 shows results with total coupling capacitance  $C_c = C_g + C_L = 600 fF$ , where  $C_g$ and  $C_L$  are total ground and load capacitance respectively. We see that the power depends on transition time but not on slew time; thus, below we use only one slew time (100ps) to quantify errors in power estimation.

Little Effect of Distributed Nature of Interconnect. Since the power consumption depends only on the total capacitance, it is independent of the interconnect resistance. For an uncoupled line there is no effect of its distributed nature on power. For a coupled line, there is a small impact of its distributed nature: slew times and arrival times will differ along the aggressor and victim lines, leading to different switch factors along the victim line. Table 6 compares the power consumption in a distributed versus lumped coupled interconnect. The UU case, which essentially corresponds to an uncoupled interconnect, shows almost no effect of distribution while other cases show close to 1% effect. We are concerned

| $t_{slew}^{Victim}$ (ps) | $t_{slew}^{Aggressor}$ (ps) | Transition | Victim Curr. (mA) |
|--------------------------|-----------------------------|------------|-------------------|
| 100                      | 100                         | U0         | 1.33              |
| 50                       | 100                         | U0         | 1.31              |
| 100                      | 100                         | UD         | 1.91              |
| 50                       | 100                         | UD         | 1.90              |
| 100                      | 100                         | UU         | 0.73              |
| 50                       | 100                         | UU         | 0.72              |

Table 5: Power consumption for a coupled line. Victim and aggressor arrival times are assumed to be equal.



 $<sup>^2\</sup>mbox{Implicitly},$  this means that if there are multiple aggressors, their effects on current drawn can be summed.

|            | Avg. Curr. (mA) |       |       |       |  |  |
|------------|-----------------|-------|-------|-------|--|--|
|            | No. of Segments |       |       |       |  |  |
| Transition | 1 2 10 20       |       |       |       |  |  |
| UU         | 0.731           | 0.730 | 0.728 | 0.728 |  |  |
| U0         | 1.307           | 1.322 | 1.324 | 1.324 |  |  |
| UD         | 1.884           | 1.914 | 1.921 | 1.921 |  |  |

Table 6: Power consumption in the victim line of a 2-line coupled interconnect system, shown versus the number of segments used in the lumpeddistributed model.

| Transition | $t_{clock}(ps)$ | Energy <sub>total</sub> (pJ) |
|------------|-----------------|------------------------------|
| U0         | 600             | 3.79                         |
| U0         | 1200            | 4.24                         |
| U0         | 1800            | 4.29                         |
| UU         | 600             | 2.33                         |
| UU         | 1200            | 2.36                         |
| UU         | 1800            | 2.36                         |
| UD         | 600             | 5.20                         |
| UD         | 1200            | 6.12                         |
| UD         | 1800            | 6.22                         |

Table 7: Total energy drawn from  $V_{dd}$  for various transitions with varying measurement intervals. The clock period corresponds to the measurement interval of the current. The input slew in all the cases is 100ps.

with the relative rather than the absolute difference between the UU,U0,UD cases, and our experiments show that this relative difference remains almost constant, with or without distribution. Therefore, for simplicity of analysis, we use a lumped interconnect model for power computation.<sup>3</sup>

## 5 Experimental Results

We now seek to quantify the sources of errors listed in Section 2. Since logic simulation and probabilistic estimation of switching probabilities is beyond the scope of this work, we do not present any results for errors 5, 7 and 8 from Section 2. Errors 7 and 8 are independent of other errors, hence the inaccuracy due to them would be beyond any error calculated in this and the next section. To model error 5, we consider expected and worst-case values of activity factors in Section 6.

5.1 Incomplete Voltage Swings

Partial voltage swings can occur if the slew time is large relative to the clock period. We use a distributed interconnect for this experiment. Table 7 shows the impact of clock period on power consumption. The UD transition has the largest slew time and therefore requires more time for the victim transition to settle. Given good design practices for high-performance designs, the clock period is about 6 times the input rise time [8]. Therefore, we can consider the values given in Table 7 to represent the energy drawn for the driver switching every one (600ps measurement interval), two (1200ps) or three (1800ps) clock cycles. These correspond to average activity factors of 0.5, 0.33 and 0.25 respectively. If we assume an activity factor of 0.15, as in the high-performance MPU model of the 2001 ITRS [11], this error becomes essentially negligible. Note that in case of glitches, the problem of partial swings would be more pronounced as they are usually short-lasting, transient switching events. Subject to our ignoring glitching in our analysis, we can conclude: The effect of incomplete voltage swings is negligible for designs with typical activity factors.

## 5.2 Incorrect Switch Factors

For delay analyses, the appropriate switch factor is typically computed as [4]

$$SF_{delay} = 1 - \frac{\Delta V_a}{\Delta V_v} \tag{1}$$



Figure 1: Variation of  $SF_{power}$  with the difference between victim and aggressor arrival times. Both victim and aggressor have slew times of 100ps at the coupling point; the lines transition in opposite directions.

where  $\Delta V_a$  is the change in the aggressor voltage for a victim voltage swing of  $\Delta V_{\nu}$  (typically 50% of  $V_{dd}$ )[4]. For power computations the switch factor should be computed for a threshold of 100% of  $V_{dd}$ . Moreover, if the aggressor transition starts after the victim transition and the clock period is large enough, the victim will always "see" almost the complete aggressor transition since the theoretical 100% threshold delay for an RC network is ∞ and the voltage waveforms have a "knee" beyond which slope of the curve is very small. Therefore, if the aggressor and victim knee points lie within the clock period and the aggressor signal arrives after the victim, we will have  $SF_{power} \in \{0, 1, 2\}$ . The switch factor for power (UD transition) can hence be approximated by

$$SF_{power} = 1 + \left(1 - \frac{t_v^A - t_a^A}{t_a^R}\right) \,\forall t_v^A \ge t_a^A$$

$$2 \, otherwise \tag{2}$$

where  $t_v^A$  and  $t_a^A$  are the arrival times for victim and aggressor, and  $t_a^R$  denotes the slew time of the aggressor. For the UU transition, the expression for switch factor is

$$SF_{power} = 1 - \left(1 - \frac{t_v^A - t_a^A}{t_a^R}\right) \,\forall t_v^A \ge t_a^A$$

$$0 \, otherwise \tag{3}$$

and when the aggressor is quiet,

$$SF_{power} = SF_{delay} = 1$$
 (4)

For our experiments, we define the following terms.

- Measured SF<sub>delay</sub>: This is the SF computed experimentally such that the delays in the coupled and decoupled cases are
- Theoretical  $SF_{power}$ : This is the SF computed according to Equations (2), (3) and (4). As we will see, this accurately models the power consumption.

Table 8 shows the results for a set of simulations for a two-line system. Variation of  $SF_{power}$  with the difference between victim and aggressor arrival times is shown in Figure 1. From Table 8, we can conclude: Using delay-based switch factors for grounding coupling capacitances in power calculations leads to estimation errors; furthermore, the proposed power-based switch factors accurately model coupling capacitances for power analysis.

## **Crosstalk Noise Power**

A transitioning aggressor can cause a power-consuming glitch on a quiet victim line. Moreover, as mentioned above, if the victim transition is non-monotone, the total charge drawn is greater than CV since the victim driver has to pull up the switched capacitance by more than  $V_{dd}$ . Capacitive coupling can cause such



<sup>&</sup>lt;sup>3</sup>The lumped interconnect model will overestimate the receiver short-circuit power and underestimate driver short-circuit power. We ignore this effect because short-circuit power is a small component of the total power.

<sup>&</sup>lt;sup>4</sup>Note that a more accurate probabilistic analysis may be performed, as in Section 6 below, but the impact of partial swing events is likely to be small. Hence, we do not give a detailed analysis here.

| Γ |            | Cou        | pled Lines     | Measured $SF_{delay}$ |            | Theoretical SF <sub>power</sub> |       | $SF_{power}$ |                |
|---|------------|------------|----------------|-----------------------|------------|---------------------------------|-------|--------------|----------------|
|   | Transition | Delay (ps) | Avg. Curr.(mA) | Value                 | Delay (ps) | Avg. Curr.(mA)                  | Value | Delay (ps)   | Avg. Curr.(mA) |
|   | UD         | 343        | 1.87           | 1.7                   | 342        | 1.72                            | 2     | 377          | 1.88           |
|   | UU         | 214        | 0.74           | 0.6                   | 215        | 1.09                            | 0     | 145          | 0.73           |
|   | UD         | 391        | 1.88           | 2.15                  | 393        | 1.95                            | 2     | 377          | 1.88           |
|   | UU         | 159        | 0.73           | 0.15                  | 162        | 0.82                            | 0     | 145          | 0.73           |

Table 8: Switch factor based estimation of power.

| Victim W/L | $C_c$ | Peak      | Avg. Curr. |
|------------|-------|-----------|------------|
| (NMOS)     | (fF)  | Noise (V) | (mA)       |
| 83         | 600   | 0.35      | 0.578      |
| 110        | 600   | 0.34      | 0.580      |
| 83         | 300   | 0.23      | 0.298      |
| 110        | 300   | 0.22      | 0.298      |

Table 9: Crosstalk noise power for the 1D transition.

non-monotone transitions. It is difficult to estimate the power consumed by such non-monotonicity. Here, we estimate crosstalk noise pulse power on a quiet victim only. We consider only the 1D transition to be power-consuming (drawing current from  $V_{dd}$ , and ignore all other glitches. Comparing values of current in Table 9 with the current drawn for normal logic transitions in Table 6, we easily conclude: Power consumption due to crosstalk noise glitches is significant and cannot be ignored in power calculation.

#### 6 Extrapolating to Full Chip Error Estimate

In this section we give an estimate of cumulative error in power estimation, extrapolated to the whole chip. We stress that this is a rough estimate in that it does not take into account any switching correlations, and assumes an average activity factor. On the other hand, our estimate points out the significance of the error sources that we have analyzed above.

From the previous sections we may derive an estimate of expected error for each of the power consuming transitions (i.e., U0,UU,UD and  $1D^6$ ). The baseline power is calculated assuming a U0 transition with zero crosstalk noise power. As in [8, 11], we assume a given average activity factor A. A corresponds to the averaged transition probability for the  $0 \rightarrow 1$  transition. Then, the probabilities of transition pairs are as given in Table 10. Note that these probability values are derived assuming spatial and temporal independence. Table 10 also also gives the corresponding expected value of  $SF_{power}$ , assuming a uniform distribution of relative arrival times of the aggressor and the victim. Note that the entry 1D in the table corresponds to crosstalk glitches, and we also take into account partial glitches when the aggressor arrives before the victim in the UD case (the aggressor must be transitioning down for the victim to draw power). The amplitude of the partial glitch is assumed to be proportional to the difference between aggressor and victim arrival times. The expected switch factors of 0.25 and 1.75 and partial glitch probability of  $A^2/4$  are calculated assuming the clock period is not much larger than the sum of slew times of victim and aggressor. If the clock period is large, then the corresponding values would be 0.5, 1.5 and  $A^2/2$ .

For each of the transitions, error in short-circuit power may also be taken into account. An average-case analysis based on Table 10 yields a very small error in power estimation (essentially due to crosstalk noise power). Similarly, a worst-case analysis can be performed. The highest power consumption involves all U transitions corresponding to UD transitions, while the lowest power consumption occurs with all transitions being UU. Thus, a

 $\begin{array}{|c|c|c|c|c|c|}\hline \text{Transition} & \text{Probability} & SF_{power}^{Expected} \\ \hline U0 & A(1-2A) & 1 \\ UU & A^2 & 0.25 \\ UD & A^2 & 1.75 \\ 1D & \frac{A(1-2A)}{2} + \frac{A^2}{4} & 0 \\ \hline \end{array}$ 

Table 10: Probabilities of transition and expected  $SF_{power}$  for a given average activity factor A.

guardbanding range can be constructed. In the following, we try to derive some crude values for power consumption with various levels of accuracy.<sup>7</sup>

 Conventional non-coupling aware estimation. In this case, coupling capacitance is treated as ground capacitance with switch factor of 1, i.e.,

$$P_{conventional} = A(C_g + C_c + C_L)V_{dd}^2$$
 (5)

Expected coupling "aware" estimation. This uses either simulation or correct switch factors but ignores crosstalk noise.
 If we assume neighbor switching to be completely independent, we end up with transition probabilities as in Table 10.
 In this case coupling-aware power is the same as the conventional power estimate.

$$P_{coupled} = A(C_g + C_c + C_L)V_{dd}^2 \tag{6}$$

 Correct noise-aware estimate. This takes crosstalk glitches into account. It can be approximated as

$$P_{correct} = P_{conventional} + \left(\frac{A(1-2A)}{2} + \frac{A^2}{4}\right)P_{noise} \quad (7)$$

Worst-case estimate. This assumes all U transitions to be UD and all D transitions to be DU. I.e.,

$$P_{WC} = A(C_g + 2C_c + C_L)V_{dd}^2$$
 (8)

Best-case estimate. This assumes all U transitions to be UU
and all D transitions to be DD. I.e.,

$$P_{BC} = A(C_g + C_L)V_{dd}^2 \tag{9}$$

Assuming A = 0.15 [8, 11] and other values from Sections 5 and 4, we get the following:<sup>8</sup>

- We calculate *expected error* as  $\frac{P_{correct} P_{comventional}}{P_{comventional}}$ . From Section 5 and Equations (5), (7) we estimate this error to be +18%. I.e., conventional power estimation techniques are likely to underestimate power consumption.
- We calculate *worst-case error* as  $\frac{P_{WC} P_{conventional}}{P_{conventional}}$ . From Section 5 and Equations (5), (8) we estimate this error to be +50%
- Best-case error is calculated as  $\frac{P_{BC}-P_{conventional}}{P_{conventional}}$ . From Section 5 and Equations (5), (9), we infer that the conventional power estimation techniques can overestimate by 50%.

<sup>&</sup>lt;sup>8</sup>A positive error means underestimation by the conventional methods, while a negative error means overestimation.



 $<sup>^5</sup> Four transitions 1D,0D,0U,1U$  can cause glitches. The 0U,1U glitches do not draw any power from the victim supply: an 0U transition causes current to flow to victim ground through the NMOS while a 1U transition causes current to flow to victim supply through the PMOS. In the 0D case, current is drawn from the victim ground through the NMOS while in case of 1D glitch, power is drawn from the victim supply through the PMOS. We therefore consider only the 1D transition as power consuming.

<sup>&</sup>lt;sup>6</sup>We may assume that U1 is subsumed by U0.

<sup>&</sup>lt;sup>7</sup>We assume just one aggressor. The worst (and best) case effect of two or more aggressors can be modeled by simply scaling the coupling capacitance and assuming validity of linear superposition.

The above analysis not only suggests that certain guardbanding may be necessary in power estimation, but also guides such guardbanding. For example, the above analysis specifically implies that guardbanding the conventional power estimate by +18% will lead to zero expected error, 27% worst-case error and -58% best case error.

#### 7 Conclusions and Future Work

We have shown that current power estimation methods suffer from various inaccuracies which can result in over 50% error in estimation. In the absence of switching correlations (which are hard to compute), the expected power estimate given in Section 6 can be used to give more accurate results without being as pessimistic as the worst-case estimate. Of course, power estimation errors can result from various sources other than the ones highlighted in this work (e.g., parasitic extraction, input pattern dependence, etc.). The values of the errors given above assume zero error from these "other" sources. We have also given expressions for power-based switch factors as distinct from delay-based switch factors. These switch factors should be used when coupling capacitances are grounded for power computation purposes.

There are several interesting implications of the work that we have reported, notably the large impact of coupling on power consumption.

- All coupling delay reduction techniques (segment permutation, repeater staggering, etc.) are also applicable for power optimization. Moreover, since there is no parallel for hold-time violations in power, any approach which reduces delay due to coupling strictly improves power consumption. Due to coupling, there is a direct correspondence between delay uncertainty and power uncertainty.
- 2. Reducing switching activity factor may not always reduce power consumption. To see this, consider the example of 2n parallel lines coupled along their entire length to their immediate neighbors. As in our experiments assume  $C_c$  =  $C_g + C_L = C$ . Now we compare two transitions U0U0.. and UUUU... The first transition draws power  $P_{U0} =$  $(3n-1)CV^2$  while the second transition draws power  $P_{UU} =$  $2nCV^2$ . Further assume that all odd-indexed lines (starting from 1) have an activity factor  $A_1$  while all even-indexed lines (starting from 2) have an activity factor  $A_2$ . Also assume that switching activities are correlated such that a Utransition occurs on any even-index line only when a U transition occurs on the neighboring odd-index lines.<sup>9</sup> Thus,  $A_1 > A_2$ . The power consumption of the *n*-line system is then given by  $P = A_2 P_{UU} + (A_1 - A_2) P_{U0} + A_1 (A_1 - A_2) P_{noise}$ . Clearly, increasing  $A_2$  decreases power consumption until we reach  $A_2 = A_1$ .
- 3. The fact that a major component of power comes from interconnect capacitance suggests renewed attention to wirelength- and activity-driven layout methodologies to optimize power. The goal of routing the most frequently switching nets with smallest wirelengths has been well-understood for the past decade, but the relative significance of such an objective increases with technology scaling.

Our ongoing and future work includes the following.

- Estimation of power for non-monotone transitions, which was ignored in the crosstalk glitch power.
- Estimation of error in short-circuit power estimation. We expect this error to be small as short-circuit current itself can be limited to small values by proper design.
- Assessment of crosstalk glitches, which can also cause an increase in receiver short-circuit power if the amplitude of the glitch exceeds transistor threshold voltages. This effect

- is likely to be small as the glitches are usually not of high amplitude.
- Accounting for all crosstalk glitch transitions. Currently, we just consider the 1D transition as the power consuming transition. This needs to be verified with respect to, e.g., 1U transitions driving power to the supply or power being drawn from ground in the 0D case.<sup>10</sup>

#### 8 Acknowledgments

We thank Professor Dennis Sylvester of the University of Michigan at Ann Arbor for his continuous help and guidance during the course of this work. We also thank Dr. Tak Young of Monterey Design Systems for providing valuable input and reviewing several previous drafts.

#### References

- D. Blaauw, A. Dharchoudhury, R. Panda, S. Sirichotiyakul, C. Oh and T. Edwards, "Emerging power management tools for processor design", *Proc. International Symposium on Low Power Electronics and Design*, 1998, pp. 143-148.
- [2] "Berkeley Predictive Technology Model", http://www-device.eecs.berkeley.edu/~ptm/
- [3] A.P. Chandrakasan and R.W. Brodersen, "Low Power Digital CMOS Design", Kluwer Academic Publishers, 1995.
- [4] P. Chen, D.A. Kirkpatrick and K. Keutzer, "Miller Factor for Gate-Level Coupling Delay Calculation", Proc. IEEE/ACM International Conference on Computer Aided Design, 2000, pp. 68-74.
- [5] A. Dharchoudhury, R. Panda, D. Blaauw, R. Vaidyanathan, B. Tutuianu and D. Bearden, "Design and analysis of power distribution networks in PowerPC microprocessors", *Proc. Design Automation Conference*, 1998, pp. 738-743.
- [6] "Delay and Power Calculation System", IEEE Standard 1481-1999, 2000.
- [7] B.M. Geuskens, "Modeling the Influence of Multilevel Interconnect on Chip Performance", Ph.D. dissertation, Rensselaer Polytechnic Institute, 1997.
- [8] "The GSRC Technology Extrapolation System" http://www.gigascale.org/gtx/.
- [9] J. Henkel and H. Lekatsas, "A<sup>2</sup>BC: Adaptive Address Bus Coding for Low Power Deep Sub-Micron Designs", *Proc. Design Automation Conference*, 2001, pp. 744-749.
- [10] Star-HSpice, http://www.synopsys.com/products/avmrg/star\_hspice\_ds.html
- [11] "International Technology Roadmap for Semiconductors", 2001 http://public.itrs.net.
- [12] A. B. Kahng and S. Muddu, "New Efficient Algorithms for Computing Effective Capacitance", Proc. ACM/IEEE Intl. Symp. on Physical Design, 1998, pp. 147-151.
- [13] A.B. Kahng, S. Muddu and E. Sarto, "On Switch Factor Based Analysis of Coupled RC Interconnects", Proc. ACM/IEEE Design Automation Conference, 2000, pp. 79-84.
- [14] T. Murayama, K. Ogawa and H. Yamaguchi, "Estimation of peak current through CMOS VLSI circuit supply lines", Proc. Asia and South Pacific Design Automation Conference, 1999, pp. 295-298.
- [15] F.N. Najm, "A Survey of Power Estimation Techniques in VLSI Circuits", IEEE Transactions on VLSI Systems, 1994, pp. 446-455.
- [16] P.R. O'Brien and T.L. Savarino, "Modeling the Driving Point Characterstic of Resistive Interconnect for Accurate Delay Estimation", Proc. IEEE International Conference on Computer Aided Design, 1989, pp. 512-515.
- [17] M. Pedram, "Power Minimization in IC Design: Principles and Applications", ACM Trans. on Design Automation of Electronic Systems 1(1) (1996), pp. 3-56.
- [18] PowerMill Datasheet, Synopsys Inc.
- [19] J. Qian, S. Pullela and L. Pillage, "Modeling the Effective Capacitance for the RC interconnect of CMOS Gates", *IEEE Trans. on CAD of ICs and Systems* 13(12) (1994), pp. 1526-1535.
- [20] J.M. Rabaey, Digital Integrated Circuits, Prentice Hall Publishers, 1996.
- [21] Y. Shin and T. Sakurai, "Coupling-Driven Bus Design for Low Power Application-Specific Systems", Proc. Design Automation Conference, 2001, pp. 750-753.
- [22] P. Sotiriadis, A. Wang and A. Chandrakasan, "Transition Pattern Coding: An approach to reduce Energy in Interconnect", 26th European Solid-State Circuit Conference, 2000, pp. 320-323.
- [23] C.N. Taylor, S. Dey and Y. Zhao, "Modeling and Minimization of Interconnect Dissipation in Nanometer Technologies", *Proc. Design Automation Con*ference, 2001, pp. 754-757.
- [24] C.C Teng, Cadence Design Systems, Personal Communication.
- [25] T. Uchino and J. Cong, "An Interconnect Energy Model Considering Coupling Effects", Proc. Design Automation Conference, 2001, pp. 555-558.
- [26] C.Y. Wang and K. Roy, "Maximization of power dissipation in large CMOS circuits considering spurious transitions", *IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications* 47(4) (2000), pp. 483-490.
- 27] H.J.M. Veendrick, "Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design of Buffer Circuits", *IEEE Journal of Solid State Circuits* 19(4) (1984), pp. 468-473.

 $<sup>^{10}</sup>$ For instance, consider the 1U transition. If the victim is being held high and a neighboring wire goes high as well, positive charge is injected across the coupling cap (displacement current) which temporarily causes the victim voltage to exceed  $V_{dd}$ . The PMOS driver of the victim then removes this charge by conducting current since it sees a non-zero  $V_{ds}$ .



<sup>&</sup>lt;sup>9</sup>Note that this implicitly assumes that all odd-indexed lines switch together, as do all even-indexed lines.