# Optimization of Overdrive Signoff in High-Performance and Low-Power ICs

Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li, Siddhartha Nath, and Bongil Park

10003

Abstract-In modern system-on-chip implementations, multimode design is commonly used to achieve better circuit performance and power across voltage-scaled, "turbo" and other operating modes. To the best of our knowledge, there is no available systematic analysis or methodology for the selection of associated signoff modes for multimode circuit implementations. In this brief, we observe significant impacts of signoff mode selection on circuit area, power, and performance. For example, incorrect choice of signoff voltages for required overdrive frequencies can incur 12% suboptimality in power or 20% in area. Using the concept of mode dominance as a guideline, we propose a scalable, model-based adaptive search methodology to explore the design space for signoff mode selection. Our proposed methodology is duty cycle-aware in its minimization of lifetime energy. Results show that our proposed methodology provides >8% improvement in performance, for given  $V_{dd}$ , area and power constraints, compared with the traditional "signoff and scale" method. Further, the signoff modes determined by our methods result in <6% overhead in power compared with the optimal signoff modes.

*Index Terms*—Design space exploration, frequency overdrive, multicorner multimode design, signoff optimization.

# I. INTRODUCTION

In the era of heterogeneous multicore systems-on-a-chip (SoCs), the performance of single-threaded operations limits the overall speedup of applications. Designers use frequency overdrive at elevated voltages to obtain better performance in consumer electronic devices [2]. An operating mode (for simplicity, mode) is defined by an (operating frequency, voltage) pair. Devices typically operate at two or three modes, e.g., supply voltage-scaled (SVS), nominal, and turbo (overdrive). The nominal and SVS modes correspond to a lower operating voltage and a lower frequency, whereas the overdrive mode corresponds to a higher operating voltage and a higher frequency. We define the average power ( $P_{avg}$ ) for a circuit with both nominal and overdrive modes as

$$P_{\text{avg}} = r \times P_{\text{OD}} + (1 - r) \times P_{\text{nom}}, \quad 0 < r < 1$$
 (1)

where the duty cycle r is the total overdrive time normalized to the total lifetime.  $P_{\text{OD}}$  and  $P_{\text{nom}}$  are the circuit power at overdrive and nominal modes, respectively.

We define the signoff mode design space (or design space) as the set of feasible signoff mode combinations. A point in this design space specifies m (frequency, voltage) pairs for m-mode signoff, where  $m \ge 1$ . Signing off at different points in a design space results in circuits with different performance, power, and area. Fig. 1 shows that the average power of a given design can vary up to 26% across 40 different definitions of the overdrive mode, with a fixed nominal mode. Even when the overdrive frequency is fixed, the average power can vary up to 12% for different overdrive voltages. Circuit power varies with signoff voltage because when signing off

Manuscript received September 9, 2013; revised April 13, 2014; accepted June 23, 2014. Date of publication August 15, 2014; date of current version July 22, 2015.

T.-B. Chan, A. B. Kahng, J. Li, and S. Nath are with the University of California at San Diego, La Jolla, CA 92093 USA (e-mail: tbchan@ucsd.edu; abk@ucsd.edu; jil150@ucsd.edu; sinath@ucsd.edu).

B. Park is with the System LSI Division, Samsung Electronics Company, Ltd., Hwaseong 445-330, Korea (e-mail: bongil.park@samsung.com).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2014.2339848

> 100 mW 100 -900 MHz 88 ~ 90 mW 95 ~ 100 mW 98 -950 MHz 950 96 86 ~ 88 mW 90 ~ 95 mW Mower (mW) 92 90 12% 900 84 86 mW 82 ~ 84 mW 850 88 80 ~ 82 mW 86 800 x 1.03 < 80 mW 84 x x x x x 1.07 1.09 1.11 1.13 Overdrive Voltages (V) 1.05 1.17 1.15 1.03 1.07 1.11 1.15 **Overdrive Voltages (V)** (a) (b)

Fig. 1.  $P_{\text{avg}}$  of circuits signed off at the same nominal mode (500 MHz, 0.9 V), but 40 different overdrive modes. Design: AES [12]. Technology: foundry 65 nm. Corner: FF/125 °C. r = 10%.

at a lower voltage, buffer insertion to meet timing constraints leads to higher power. On the other hand, although circuit area decreases with a higher signoff voltage, power increases with operating voltage. The optimal signoff voltage must comprehend this tension.

Fig. 1 suggests that we can reduce design cost by carefully optimizing the signoff modes. Accordingly, in this brief, we study the signoff mode optimization problem, which seeks the optimal nominal and overdrive modes with respect to optimization objectives and constraints. Similar multimode signoff optimization has been studied by [5]. However, our work achieves greater insight into the basic tradeoff between frequency and voltage at the circuit level. As an extension to the previous work [1], we propose a more efficient and effective methodology for multimode signoff optimization.

Our contributions are summarized as follows.

- Based on the property of equivalent dominance, we propose a global optimization flow (using model-based adaptive search) to analyze and identify the dominant modes *before* circuit implementation.
- Our proposed methodologies lead to >8% and 6% performance improvements compared with the traditional "signoff and scale" and previous work [1], respectively, while maintaining similar power and area.
- The proposed methodologies can successfully determine signoff modes that reduce lifetime energy for a given duty cycle.

#### II. DOMINANCE OF MODES

To analyze the dominance of modes, we define the concept of design cone as follows.

*Definition:* The design cone of a given mode M is the union of (maximum frequency, voltage) operating modes for all feasible circuit implementations that are signed off at mode M.

Fig. 2 shows the design cone R of mode A. Circuits signed off at mode A will have their own frequency versus voltage tradeoffs.<sup>1</sup> At a given voltage, the boundary of the design cone is determined by the upper and lower bounds of the maximum frequency that is achievable by circuits signed off at mode A.

1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

<sup>&</sup>lt;sup>1</sup>Boundaries of a design cone can be affected by threshold voltage, gate type, and/or wire resistance. In this brief, we determine the boundaries using frequency versus voltage curves of high-threshold voltage (HVT) and low-threshold voltage (LVT) cells since other parameters have little impact at 65-nm technology [1].



Fig. 2. Illustration of the design cone of mode A (the shaded region).

Given the design cone of mode A, a mode C ( $f_C$ ,  $V_C$ ) has a positive slack (respectively, a negative slack) with respect to mode A if  $f_C$  is below (respectively, above) the lower (respectively, upper) boundary of design cone at  $V_C$ . Since the positive slack can be exploited to reduce power [1], we say that the existence of positive slack indicates overdesign.

We further define the dominance of modes as follows.

*Definition:* Given two modes  $M_1$  and  $M_2$ , if mode  $M_2$  shows positive slacks with respect to mode  $M_1$ , we define mode  $M_1$  as the dominant mode and mode  $M_2$  as the dominated mode.

*Definition:* Given two modes  $M_1$  and  $M_2$ , if mode  $M_1$  is in the design cone of mode  $M_2$  and mode  $M_2$  is in the design cone of mode  $M_1$ , we say that modes  $M_1$  and  $M_2$  exhibit equivalent dominance.

In Fig. 2, mode A is dominant and mode C is dominated. The dominant mode has tighter constraints, thus determining the properties (e.g., area, gate count, and total capacitance) of a design in a multimode signoff. In addition, when equivalent dominance holds for multiple signoff modes, we expect resulting design properties that are similar to those when the design is signed off at each mode individually.

Based on the equivalent dominance concept, we state the following Lemmas.  $^{2} \ \,$ 

*Lemma 1:* If two modes do not exhibit equivalent dominance, then each mode is outside of the design cone of the other.

*Lemma 2:* Multimode signoff at modes which do not exhibit pairwise equivalent dominance leads to overdesign.

Lemma 3: Mutual pairwise equivalent dominance holds among  $m (m \ge 3)$  modes if and only if the modes are collinear in the (v, f) space for signoff.

## **III. PROBLEM FORMULATIONS**

To sign off a circuit that operates at both nominal and overdrive modes, we need to select four parameters: nominal frequency  $(f_{nom})$  and voltage  $(V_{nom})$ , and overdrive frequency  $(f_{OD})$  and voltage  $(V_{OD})$ . In this brief, we study the problems where two parameters are given and two parameters must be determined as follows.

# A. FIND\_OD Problem

Given  $f_{\text{nom}}$ ,  $V_{\text{nom}}$ , and r, and upper bounds on  $V_{\text{OD}}$ ,  $P_{\text{avg}}$ , and  $P_{\text{OD}}$ , determine  $f_{\text{OD}}$  and  $V_{\text{OD}}$  such that  $f_{\text{OD}}$  is maximized.

# B. FIND\_NOM Problem

Given  $f_{\text{OD}}$ ,  $V_{\text{OD}}$ , and r, and upper bounds on  $P_{\text{avg}}$  and  $P_{\text{OD}}$ , determine  $f_{\text{nom}}$  and  $V_{\text{nom}}$  such that  $f_{\text{nom}}$  is maximized.

## C. FIND\_VOLT Problem

Given  $f_{\text{nom}}$ ,  $f_{\text{OD}}$ , and r, and upper bounds on  $V_{\text{OD}}$  and  $P_{\text{OD}}$ , determine  $V_{\text{nom}}$  and  $V_{\text{OD}}$  such that  $P_{\text{avg}}$  is minimized.

<sup>2</sup>Proofs of Lemma 1 and Lemma 2 are given in [1]. Lemma 3 can be established for m = 3, followed by a simple induction for m > 3.



Fig. 3. Our adaptive search flow (top) and power model (dotted box).

# D. FIND\_FREQ Problem

Given  $V_{\text{nom}}$ ,  $V_{\text{OD}}$ , and r, and upper bounds on  $P_{\text{avg}}$  and  $P_{\text{OD}}$ , determine  $f_{\text{nom}}$  and  $f_{\text{OD}}$  such that  $(1 - r) \times f_{\text{nom}} + r \times f_{\text{OD}}$  is maximized.

#### IV. EFFICIENT EXPLORATION OF DESIGN SPACE

The key challenge in signoff mode optimization is to efficiently search for the desired modes using a small number of implementation trials. To this end, we propose a model-based adaptive search to explore the design space for signoff mode selection. In the model-based adaptive search, new solutions are determined using models, which are updated or derived from implementations with previous solutions [3]. Fig. 3 shows our adaptive search flow. We construct our power model based on initial samples. Using the power model, we *predict* the optimal signoff mode and sample [i.e., run synthesis, placement, and routing (SP&R)] at the predicted mode. We iteratively sample and update the power model until the flow converges.

#### A. Power Model

Following industry standard models (Liberty format) and tools (e.g., [16]), we model circuit power as being comprised of three components—switching ( $P_{sw}$ ), internal ( $P_{int}$ ), and leakage ( $P_{leak}$ ). Our power model uses the following circuit properties: load capacitance ( $C_{load}$ ), which includes wire capacitance and the capacitance of input pins driven by nets [6], total gate capacitance ( $C_{gate}$ ), and percentage of cell instances with different  $V_T$  flavors ( $Pct_{\{LVT,HVT,NVT\}}$ ). As we observed in Fig. 1(b), circuit power exhibits unimodal behavior with varying signoff voltage. This suggests that we model power as a second-order polynomial of the signoff voltage. We also observe below that power linearly depends on circuit properties. Therefore, we also model the circuit properties as second-order polynomials of the signoff voltage or frequency,<sup>3</sup> as

$$c_{\text{load}} = q_1 \times V^2 + q_2 \times V + q_3 \tag{2}$$

$$c_{\text{gate}} = q_4 \times V^2 + q_5 \times V + q_6 \tag{3}$$

$$Pct_{\rm LVT} = q_7 \times V^2 + q_8 \times V + q_9 \tag{4}$$

<sup>3</sup>Note that circuit properties may not always behave as second-order relations with the signoff voltage or frequency, which can lead to errors in power estimation. However, our experimental results show that the estimation error is <10%.

where  $q_1-q_9$  are fitting parameters. Equations (2)–(4) are used when V is the variable in adaptive search; when f is the variable, we use f in place of V. We then use the estimated circuit properties to model power components.

1) Net Switching Power: We model net switching power as

$$P_{\rm sw} = k_1 \times \alpha \times C_{\rm load} \times f \times V^2 \tag{5}$$

where  $\alpha$  is the switching activity factor, f and V are operating frequency and supply voltage, respectively, and  $k_1$  is a fitting parameter used during adaptive search.

2) Internal Power: Since the internal power mainly consists of the short circuit power, based on [8], [9], we model internal power as

$$P_{\rm int} = k_2 \times \alpha \times C_{\rm gate} \times f \times V^2 \tag{6}$$

where  $k_2$  is a fitting parameter used during adaptive search.

3) Leakage Power: We use gate capacitance as a parameter to fit leakage power [7]. Further, we use the functional form  $e^{\beta \times V}$  ( $\beta$  is a fitting parameter depending on technology and threshold voltages of transistors) to model the leakage current. Due to dominant impact of LVT cells on leakage power as compared with NVT and HVT cells, we also use percentage of LVT cell instances in our model. We model leakage power as

$$P_{\text{leak}} = V \times C_{\text{gate}} \times (k_3 \times Pct_{\text{LVT}} + k_4) \times e^{\beta \times V}$$
(7)

where  $k_3$  and  $k_4$  are fitting parameters for adaptive search.

We emphasize that (5)–(7) are not for accurate power calculation. Rather, they are based on chosen parameters for power estimation within our adaptive search. In multimode signoff, since the circuit is mainly determined by the dominant mode, which has the tightest timing constraints, we use the dominant mode to model  $C_{\text{load}}$ ,  $C_{\text{gate}}$ , and  $Pct_{\text{LVT}}$ . However, when two or more modes exhibit equivalent dominance, we choose the modes that are not yet fixed and among these modes we choose the mode with the largest duty cycle for power modeling as it has the greatest impact on  $P_{\text{avg}}$ .

## B. Adaptive Search

We now propose two generic adaptive search flows for signoff mode selection (shown in Algorithm 1). We then extend them to solve the problems described in Section III.

Given a signoff frequency (f), we use the MIN\_POWER flow to search for the signoff voltage (V) that minimizes circuit power (P). The inputs  $V_{\min}$  and  $V_{\max}$  are user-specified minimum and maximum signoff voltages, respectively.  $V_{\text{stop}}$  is a stopping criterion for adaptive search. We first construct our power model based on three initial samples (Lines 1–3). Based on the obtained power model, we predict the optimal signoff voltage to minimize power (Line 6). We then run SP&R with the predicted signoff voltage and update the power model (Lines 7–9). If the change in the value of the estimated optimal signoff voltage is less than  $V_{\text{stop}}$ , the adaptive search terminates. Otherwise, more accurate estimation of the optimal signoff voltage is predicted from the improved power model.

Given a signoff voltage (V), we use the MAX\_FREQ flow to search for the maximum signoff frequency (f) under particular power constraints ( $P_{\text{max}}$ ). The input  $f_{\text{min}}$  is the predefined lower bound on performance and  $f_{\text{max}}$  is the maximum achievable frequency with voltage V.  $f_{\text{min}}$  and  $f_{\text{max}}$  define the range of signoff frequency selection.  $f_{\text{stop}}$  is a stopping criterion.

#### V. METHODOLOGY

# A. Design Space Reduction

According to Lemma 2, we search only the design space in which the equivalent dominance property holds to reduce overdesign.

## Algorithm 1 Adaptive Search Flows

**Procedure** MIN\_POWER  $(f, V_{\min}, V_{\max}, V_{stop})$ 

- 1: Run SP&R with  $(f, V_{\min}), (f, V_{\max}), (f, \frac{V_{\min}+V_{\max}}{2});$
- 2: Extract circuit information (=  $C_{\text{load}}$ ,  $C_{\text{gate}}$ ,  $P_{ct_{\text{LVT}}}$ ,  $P_{\text{sw}}$ ,  $P_{\text{int}}$  and  $P_{\text{leak}}$ );
- 3: Build the power model based on extracted information;
- 4:  $i \leftarrow 1$ ;  $V_0 \leftarrow V_{\min}$ ;
- 5: while  $\Delta V \ge V_{\text{stop}}$  do
- 6:  $V_i \leftarrow$  select the optimal V based on the power model;
- 7: Run SP&R with  $(f, V_i)$ ;
- 8: Extract circuit information;
- 9: Update the power model using least squares regression (LSQR) based on extracted information;

10:  $\Delta V \leftarrow V_i - V_{i-1}; i \leftarrow i+1;$ 

11: end while

12: return  $V_{i-1}$ 

## **Procedure** MAX\_FREQ (V, $P_{max}$ , $f_{min}$ , $f_{max}$ , $f_{stop}$ )

- 1: Run SP&R with  $(f_{\min}, V)$ ,  $(f_{\max}, V)$ ,  $(\frac{f_{\min}+f_{\max}}{2}, V)$ ;
- 2: Extract circuit information;
- 3: Build the power model based on extracted information;
- 4:  $i \leftarrow 1$ ;  $f_0 \leftarrow f_{\min}$ ;
- 5: while  $\Delta f \ge f_{\text{stop}}$  do
- 6:  $f_i \leftarrow \text{select } f \text{ based on the power model such that } P = P_{\max};$
- 7: Run SP&R with  $(f_i, V)$ ;
- 8: Extract circuit information;
- 9: Update the power model using LSQR based on extracted information;
- 10:  $\Delta f \leftarrow f_i f_{i-1}; i \leftarrow i+1;$

11: end while

12: return  $f_{i-1}$ 

#### B. Duty-Cycle Awareness

Our power model estimates  $P_{avg}$  based on r and our optimizations aim at reducing  $P_{avg}$  or are constrained by an upper bound on  $P_{avg}$ .

## C. Design Cone Approximation

We estimate a design cone using LVT- and HVT-only inverter chains, as in [1].

#### D. FIND\_OD Problem

We extend the MAX\_FREQ flow to solve the FIND\_OD problem (Algorithm 2). One key observation which reduces the number of multicorner multimode (MCMM) implementations during the adaptive search is that a circuit implemented at a particular pair of nominal mode and overdrive mode can also run at other overdrive modes along its frequency versus voltage tradeoff curve as shown in Fig. 4(a). This implies that circuits implemented with a nominal mode and any overdrive mode along one frequency versus voltage tradeoff curve will have similar circuit properties. Thus, we can extract circuit properties for solutions in the design cone by generating a few trial circuits with different frequency versus voltage tradeoffs.

# E. FIND\_NOM Problem

The FIND\_NOM problem is similar to the FIND\_OD problem. We solve the FIND\_NOM problem using the same methodology as for the FIND\_OD problem.

## F. FIND\_VOLT Problem

Finding the optimal ( $V_{\text{nom}}$ ,  $V_{\text{OD}}$ ) pair using exhaustive search incurs large runtime because there are  $O(n^2)$  feasible solutions (*n* is the number of feasible signoff voltages). To reduce the runtime complexity, we propose an approximate optimization method: for each  $V_{\text{nom}}$ , we consider only one  $V_{\text{OD}}$ , in which we determine the  $V_{\text{OD}}$  based on a parameter  $\lambda(V_{\text{nom}})$ , as shown in Fig. 4(b).<sup>4</sup>

<sup>&</sup>lt;sup>4</sup>Experimental results in Section VI show that our approximate optimization can achieve results similar to those of the exhaustive search.

# Algorithm 2 Method for Solving the FIND\_OD Problem

- 1: Find the design cone of the nominal mode  $(f_{nom}, V_{nom})$ ;
- 2: Find the intersections of the maximum supply voltage  $V_{\text{max}}$  and boundaries of the design cone. Define the minimum and maximum frequencies of these intersections as  $f_a$  and  $f_b$ , respectively;
- 3: Run MCMM SP&R with the given nominal mode and overdrive modes defined by  $\{f_a, f_b, \frac{f_a+f_b}{2}\}$  and  $V_{\text{max}}$ ;
- 4: Extract circuit information. Build or update the power model;
- 5: Estimate  $P_{avg}$ , based on the given r, corresponding to feasible overdrive modes within the design cone. Find the maximum  $f_{OD}$  along with the corresponding VOD satisfying power constraints;
- 6: Run MCMM SP&R with the overdrive mode obtained in Step 5. Repeat Steps 4–6 until  $\Delta f_{OD}$  is less than a stopping criterion  $f_{stop}$ .



Fig. 4. (a) Projection of mode **B** to mode **B**' for circuit property modeling. (b)  $\lambda(V_{\text{nom}})$  calculation, where  $\lambda(V_{\text{nom}}) = \Delta V 1 / \Delta V 2$ .  $V_{\text{HVT}}$  and  $V_{\text{LVT}}$  are defined by the intersections of  $f_{OD}$  and the design cone.

 $\lambda(V_{nom})$  indicates the ratio of HVT cells to total cells in the critical paths. When the signoff voltage increases, paths become faster and more HVT cells are used to reduce power. As a result, for a fixed  $f_{\text{nom}}$ ,  $\lambda(V_{\text{nom}})$  increases with  $V_{\text{nom}}$ . We heuristically approximate  $\lambda(V_{\text{nom}})$  as a linear function of  $V_{\text{nom}}$  in our method

$$\lambda(V_{\text{nom}}) = \frac{\lambda(V_{\text{max}}) - \lambda(V_{\text{min}})}{V_{\text{max}} - V_{\text{min}}} \times V_{\text{nom}} + \lambda(V_{\text{min}})$$
(8)

in which  $V_{\text{max}}$  and  $V_{\text{min}}$  are, respectively, the maximum  $V_{\text{nom}}$ at the given technology node and the minimum supply voltage at  $f_{nom}$ , which we assume can be determined by designers. We calculate  $\lambda(V_{\text{max}})$  and  $\lambda(V_{\text{min}})$  based on the desired  $V_{\text{OD}}$  that minimizes  $P_{\text{avg}}$  when  $V_{\text{nom}}$  equals to  $V_{\text{max}}$  and  $V_{\text{min}}$ , respectively. Algorithm 3 shows the steps to solve the FIND\_VOLT problem.

# G. FIND\_FREQ Problem

For each  $f_{nom}$ , we consider only one  $f_{OD}$ . Further, we approximate  $\lambda(f_{nom})$  as a linear function of  $f_{nom}$ . Since the methodology for the FIND\_FREQ problem is similar to that for the FIND\_VOLT problem (in that the frequency and voltage axes are swapped), we skip the detailed descriptions.

#### VI. EXPERIMENTS AND RESULTS

Our experiments use two RTL designs (AES and JPEG) from OpenCores [12] and four blocks (FPU, MUL, EXU, and SPU) from OpenSPARC T1 [13]. Designs are implemented with foundry 65-nm triple- $V_T$  libraries. We synthesize designs at both nominal and overdrive modes using Synopsys Design Compiler [15], and pick the mode with less power after routing.<sup>5</sup> We run MCMM P&R using Cadence SoC Encounter [10]. To eliminate tool noise, we execute each P&R run three times, perturbing the timing constraints by a small amount (i.e., 0.5% of the clock period) [4]. We use SensOpt [14] for postrouting leakage optimization, and Synopsys PrimeTime [16] for timing and power analyzes. We run timing analysis at SS corner and power analysis at FF corner. Our basic experimental configuration assumes r = 50%. All implemented designs have worst negative slacks (WNS)  $\geq -10$  ps.<sup>6</sup>

Algorithm 3 Method for Solving the FIND\_VOLT Problem

- 1: Define two nominal modes  $(f_{nom}, V_{min})$  and  $(f_{nom}, V_{max})$ ; For each nominal mode, determine the  $V_{OD}$  with the minimum  $P_{avg}$  by using the MIN\_POWER flow;
- 2: Calculate  $\lambda(V_{\min})$  and  $\lambda(V_{\max})$  with the resultant  $V_{OD}$ ; 3: Run MCMM SP&R at { $V_{\min}$ ,  $V_{\max}$ ,  $\frac{V_{\min}+V_{\max}}{V_{\max}}$ } (with  $f_{nom}$ ) and the corresponding  $V_{\text{OD}}$  (with  $f_{\text{OD}}$ ) determined by  $\lambda$  values;
- 4: Extract circuit information. Build or update the power model;
- 5: Find  $V_{\text{nom}}$  and the corresponding  $V_{\text{OD}}$  that achieve minimum  $P_{\text{avg}}$  based on the power model;
- 6: Run MCMM SP&R with the V<sub>nom</sub> and V<sub>OD</sub> obtained in Step 6. Repeat Steps 4-6 until  $\Delta P_{avg}$  is less than a stopping criterion  $P_{stop}$ .

|   | TABLE I                                                |     |     |    |     |     |  |  |  |  |  |
|---|--------------------------------------------------------|-----|-----|----|-----|-----|--|--|--|--|--|
|   | EXPERIMENTAL SETUP FOR THE FIND_OD PROBLEM             |     |     |    |     |     |  |  |  |  |  |
| e | $\begin{array}{c c c c c c c c c c c c c c c c c c c $ |     |     |    |     |     |  |  |  |  |  |
|   | ATC                                                    | 500 | 0.0 | 40 | F F | 1.0 |  |  |  |  |  |

| Case | Design | (MHz) | (V) | (mW) | (mW) | $(V)^{max}$ |
|------|--------|-------|-----|------|------|-------------|
| 1    | AES    | 500   | 0.9 | 40   | 55   | 1.2         |
| 2    | JPEG   | 400   | 0.9 | 80   | 100  | 1.2         |
| 3    | OST1   | 600   | 0.9 | 210  | 300  | 1.2         |

TABLE II

METRICS OF CIRCUITS IMPLEMENTED FOR THE FIND\_OD PROBLEM

|          |                | Signoff | Proposed | Exhaustive | Method in |
|----------|----------------|---------|----------|------------|-----------|
|          |                | &Scale  | method   | search     | [1]       |
|          | $f_{OD}$ (MHz) | 760     | 822      | 840        | 810       |
| AES      | $V_{OD}$ (V)   | 1.20    | 1.18     | 1.16       | 1.18      |
| (Case 1) | $area (um^2)$  | 30002   | 30594    | 31405      | 30832     |
|          | $P_{avg}$ (mW) | 35.1    | 36.2     | 37.3       | 36.0      |
|          | #P&R runs      | 2       | 4        | 66         | 7         |
|          | $f_{OD}$ (MHz) | 580     | 638      | 660        | 600       |
| JPEG     | $V_{OD}$ (V)   | 1.16    | 1.18     | 1.12       | 1.18      |
| (Case 2) | $area (um^2)$  | 114679  | 122394   | 127361     | 117355    |
|          | $P_{avg}$ (mW) | 67.6    | 70.5     | 69.3       | 69.7      |
|          | #P&R runs      | 2       | 4        | 66         | 7         |
|          | $f_{OD}$ (MHz) | 860     | 916      | 940        | 870       |
| OST1     | $V_{OD}$ (V)   | 1.16    | 1.14     | 1.12       | 1.16      |
| (Case 3) | $area (um^2)$  | 151149  | 154253   | 156363     | 150491    |
|          | $P_{avg}$ (mW) | 163.2   | 162.0    | 162.0      | 162.4     |
|          | #P&R runs      | 2       | 5        | 66         | 7         |

During adaptive search, we derive and refine our power model using MATLAB [11].

#### A. FIND\_OD Problem

Table I shows the experimental setup, where  $P_{avg_max}$ ,  $P_{OD_max}$ , and  $V_{\text{max}}$ , respectively, constrain  $P_{\text{avg}}$ ,  $P_{\text{OD}}$ , and  $V_{\text{OD}}$ . We assume the same overdrive mode for all four blocks from OpenSPARC T1 and combine them into a single instance which we denote as OST1. For each instance, we implement four methods to optimize the overdrive mode. The signoff&scale method applies the traditional "signoff and scale," where we first sign off circuits with the given nominal mode and then perform timing and power analyzes with libraries characterized at higher voltages to search for the maximum  $f_{OD}$ under power constraints. Note that we perform an additional MCMM P&R run to optimize power at both modes after the overdrive mode is selected. The proposed method uses the proposed adaptive search. The exhaustive search explores the entire feasible design space for given design parameters. We also compare with the method in [1].

Results in Table II show that the proposed method achieves >8%and 6% overdrive performance improvements compared with the signoff and scale and the method in [1], respectively, while maintaining similar area and power. Further, the proposed method is within 4% of that obtained from the exhaustive search, while using <8% of the exhaustive search runtime. We also note that our proposed method is scalable due to its use of adaptive search, which is able to converge to a near-optimal solution after a small number of SP&R runs.

When we optimize each block in OST1 individually (fine-grained optimization) the proposed method achieves 4%-8% for improvement compared with the signoff and scale. For Case 3 in Table II

<sup>&</sup>lt;sup>5</sup>Although this may be unnecessary when modes are equivalently dominant, we use the same implementation for all experiments for fair comparisons.

<sup>&</sup>lt;sup>6</sup>The small WNS is due to the discrepancy between timing analysis in Cadence SoC Encounter [10] and in Synopsys PrimeTime [16].

TABLE III Experimental Setup for the FIND\_VOLT Problem

| Case | Design | $f_{nom}$ (MHz) | $f_{OD}$ (MHz) | $P_{OD\_max}$ (mW) | $V_{max}$ (V) |
|------|--------|-----------------|----------------|--------------------|---------------|
| 4    | AES    | 700             | 850            | 50                 | 1.2           |
| 5    | JPEG   | 600             | 720            | 100                | 1.2           |

#### TABLE IV

METRICS OF CIRCUITS IMPLEMENTED FOR THE FIND\_VOLT PROBLEM

|          |                | Proposed method | Exhaustive search | Method in [1] |
|----------|----------------|-----------------|-------------------|---------------|
|          | $V_{nom}$ (V)  | 0.92            | 0.92              | 0.90          |
| AES      | $V_{OD}$ (V)   | 1.02            | 1.02              | 1.04          |
| (Case 4) | $area (um^2)$  | 35349           | 35349             | 34599         |
|          | $P_{avg}$ (mW) | 41.8            | 41.8              | 44.1          |
|          | #P&R runs      | 7               | 44                | 10            |
|          | $V_{nom}$ (V)  | 0.94            | 0.90              | 0.86          |
| JPEG     | $V_{OD}$ (V)   | 1.04            | 0.94              | 0.96          |
| (Case 5) | $area (um^2)$  | 136747          | 148360            | 145906        |
|          | $P_{avg}$ (mW) | 85.4            | 80.9              | 91.9          |
|          | #P&R runs      | 6               | 46                | 9             |

TABLE V Experimental Setup for the FIND\_FREQ Problem

| Case | Design | $V_{nom}$ (V) | $V_{OD}$ (V) | $P_{avg\_max}$ (mW) | $P_{OD\_max}$ (mW) |
|------|--------|---------------|--------------|---------------------|--------------------|
| 6    | AES    | 0.9           | 1.1          | 40                  | 55                 |
| 7    | JPEG   | 0.9           | 1.2          | 80                  | 120                |

TABLE VI Metrics of Circuits Implemented for the FIND\_FREQ Problem

|          |                 | Proposed method | Exhaustive search |
|----------|-----------------|-----------------|-------------------|
|          | $f_{nom}$ (MHz) | 618             | 610               |
|          | $f_{OD}$ (MHz)  | 810             | 860               |
| AES      | $f_{avg}$       | 714             | 735               |
| (Case 6) | $area (um^2)$   | 31526           | 32740             |
|          | $P_{avg}$ (mW)  | 40.3            | 39.6              |
|          | #P&R runs       | 6               | 70                |
|          | $f_{nom}$ (MHz) | 431             | 440               |
|          | $f_{OD}$ (MHz)  | 623             | 630               |
| JPEG     | $f_{avg}$       | 527             | 535               |
| (Case 7) | $area (um^2)$   | 119777          | 120670            |
|          | $P_{avg}$ (mW)  | 81.2            | 82.6              |
|          | #P&R runs       | 6               | 52                |

(coarse-grained optimization), the corresponding  $f_{OD}$  improvement is 6.5%. These consistent  $f_{OD}$  improvements suggest that the proposed method is scalable.

# B. FIND\_VOLT Problem

Tables III and IV, respectively, show our experimental setup and results. The proposed method achieves <6% power overhead, with  $7\times$  runtime reduction, compared with exhaustive search. The proposed method also achieves up to 12% reduction of average power compared with the method in [1].

## C. FIND\_FREQ Problem

Tables V and VI, respectively, show our experimental setup and results. The proposed method achieves <3% performance overhead, with around  $10\times$  runtime reduction, compared with exhaustive search.

#### D. Duty Cycle-Awareness Validation

To show that our proposed methodology is duty cycle-aware, we optimize AES (in the context of the FIND\_OD problem) with different duty cycles ( $r_{opt}$ ). We assume the nominal mode as (500 MHz, 0.9 V) and constraints on  $P_{avg}$  and  $V_{max}$  as 30 mW and 1.2 V, respectively. We then evaluate the maximum  $f_{OD}$  of outcomes with different duty cycles ( $r_{eval}$ ) under the power constraints.

 TABLE VII

 METRICS OF CIRCUITS IMPLEMENTED WITH DIFFERENT ropt

| Design | ropt | fod   | $V_{OD}$ | $f_{max}$ (MHz) with $r_{eval} =$ |     |     |     |     |
|--------|------|-------|----------|-----------------------------------|-----|-----|-----|-----|
| Design |      | (MHz) | (V)      | 0.1                               | 0.3 | 0.5 | 0.7 | 0.9 |
|        | 0.1  | 844   | 1.20     | 845                               | 830 | 725 | 670 | 640 |
|        | 0.3  | 832   | 1.19     | 840                               | 830 | 725 | 670 | 640 |
| AES    | 0.5  | 726   | 1.10     | 815                               | 815 | 730 | 670 | 635 |
|        | 0.7  | 670   | 1.05     | 805                               | 805 | 720 | 670 | 635 |
|        | 0.9  | 638   | 1.02     | 805                               | 805 | 720 | 670 | 640 |

Results in Table VII show that  $f_{OD}$  and  $V_{OD}$  decrease with a larger  $r_{opt}$ . That is, given fixed power constraints, optimization with a smaller  $r_{opt}$  results in a faster design. Further, maximum  $f_{OD}$  is achieved when  $r_{eval} = r_{opt}$ . These observations confirm the duty cycle-awareness of our proposed method. The results also show the cost of inaccurate prediction for r. For example, if r = 0.1 ( $f_{OD} = 845$  MHz), but the optimization assumes r = 0.9 ( $f_{OD} = 805$  MHz), there is a performance penalty of 5%.

## VII. CONCLUSION

Based on the properties of equivalent dominance, we propose guidelines and efficient methodologies to search for the optimal modes for overdrive signoff. The proposed methodologies can successfully determine the signoff modes that reduce lifetime energy, and are shown to achieve >8% and 6% performance improvements compared with the traditional "signoff and scale" and the previous work [1], respectively. The methodologies also result in <6% power overhead as compared with the optimal solutions.

#### REFERENCES

- T.-B. Chan, A. B. Kahng, J. Li, and S. Nath, "Optimization of overdrive signoff," in *Proc. ASP-DAC*, Jan. 2013, pp. 344–349.
- [2] M. Elgebaly, K. Z. Malik, L. G. Chua-Eoan, and S.-O. Jung, "Adaptive voltage scaling for an electronics device," U.S. Patent 7417482, Aug. 26, 2008.
- [3] J. Hu, M. C. Fu, and S. I. Marcus, "A model reference adaptive search method for global optimization," *Oper. Res.*, vol. 55, no. 3, pp. 549–568, 2005.
- [4] K. Jeong and A. B. Kahng, "Methodology from chaos in IC implementation," in *Proc. ISQED*, Mar. 2010, pp. 885–892.
- [5] A. B. Kahng, S. Kang, R. Kumar, and J. Sartori, "Enhancing the efficiency of energy-constrained DVFS designs," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 21, no. 10, pp. 1769–1782, Oct. 2013.
- [6] A. B. Kahng, B. Lin, and S. Nath, "Explicit modeling of control and data for improved NoC router estimation," in *Proc. 49th DAC*, Jun. 2012, pp. 392–397.
- [7] R. Kumar and C. P. Ravikumar, "Leakage power estimation for deep submicron circuits in an ASIC design environment," in *Proc.* 7th ASP-DAC, 2002, pp. 45–50.
- [8] K. Nose and T. Sakurai, "Analysis and future trend of short-circuit power," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 19, no. 9, pp. 1023–1030, Sep. 2000.
- [9] H. J. M. Veendrick, "Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits," *IEEE J. Solid-State Circuits*, vol. 19, no. 4, pp. 468–473, Aug. 1984.
- [10] Cadence SoC Encounter User Guide. [Online]. Available: http://www.cadence.com/
- [11] MATLAB. [Online]. Available: http://www.mathworks.com/products/ matlab/, accessed Sept. 2013.
- [12] OpenCores. [Online]. Available: http://opencores.org/, accessed Jul. 2012.
- [13] OpenSPARC T1. [Online]. Available: http://www.oracle.com/ technetwork/systems/opensparc/, accessed Apr. 2014.
- [14] Sensitivity-Based Leakage Optimizer. [Online]. Available: http:// vlsicad.ucsd.edu/SIZING/, accessed Sept. 2013.
- [15] Synopsys Design Compiler User Guide. [Online]. Available: http://www. synopsys.com/, accessed Sept. 2013.
- [16] Synopsys PrimeTime User's Manual. [Online]. Available: http://www. synopsys.com/, accessed Sept. 2013.