## Smart Non-Default Routing for Clock Power Reduction

#### Andrew B. Kahng, Seokhyeong Kang, <u>Hyein Lee</u>

VLSI CAD LABORATORY, UC San Diego 50<sup>th</sup> Design Automation Conference June 5, 2013



UC San Diego / VLSI CAD Laboratory

## Outline

- Motivation
- Our Wire Sizing Algorithm
- Experimental Results
- Conclusions and Future Works

#### **Non-Default Routing Rule**

- Non-Default Routing Rules (NDRs) are used to increase wire widths and spacings
  - Reduce wire parasitic and delay variability
  - Reduce coupling capacitance
  - Avoid Electromigration (EM) violation

**Default Rule** 





#### Electromigration

## EM causes unwanted opens or shorts in wires



■ EM reliability ↔ reduced current density
– Widen wires (use NDR)

Can NDRs always cure the EM problems?





Smart NDR (SNDR)



## **Related Works**

#### Wire sizing in clock trees

- [Tsai04] buffer insertion and wire sizing to optimize delay and power using dynamic programming
- [Guthaus06] clock buffer/wire sizing to minimize skew using sequential linear programming
- Related to timing; EM reliability not considered

#### EM-constrained wire sizing

- [Pullela95] low-power clock tree with EM constraints
- Inserts buffers to reduce wire width

#### Our work considers EM reliability without buffer insertion

## Outline

- Motivation
- Our Wire Sizing Algorithm
- Experimental Results
- Conclusions and Future Works

## **SNDR Wire Sizing Algorithm**

- Objective: Minimize the total capacitance of clock network
- While maintaining
  - Clock latency
  - Maximum transition time
  - Clock skew
  - Without EM violations
- Solution: NDR for each wire segment of a given clock network

#### Wire Delay, Slew, RC and EM Limit Models

Wire delay and slew model

 Wire delay: Elmore delay model [Elmore48]
 Wire slew: PERI model [HuAHK07]

Wire RC and EM limits: f(w) w: wire width

 $\begin{array}{c} R \propto 1/w \\ C \propto w \\ \underline{EM} \propto w \end{array}$ 

R, C, EM<sub>limit</sub> = linear functions of log(w)



#### **SNDR for a Clock Subnet** Problem formulation Minimize total capacitance Subject to $Delay_i < MaxDelay$ Skew<sub>i,i</sub> < MaxClockSkew I<sub>c</sub> < MaxEMCurrentLimit<sub>c</sub> $W_e \ge W_{desc(e)}$ $(desc(e) \equiv edges downstream from e)$



<subnet>

#### NDR solutions are obtained for wire segments

#### **SNDR for Entire Clock Tree**

- Solve SNDR subnet problems from the bottom to top of clock tree
- Skew propagation : Maximum/minimum clock latencies of downstream subnets are propagated to upstream subnets



<Entire Clock Tree>

## **Iterative Linear Programming**

 Sizing problem is a <u>quadratic program</u> due to RC delay
Separate sizing problem into <u>two linear programs</u> and solve them iteratively

Elmore delay =  $R \cdot C$ 

Iteratively solve the problems until the solution  $(x_e)$  converges

$$f_R(x_e) \cdot f_C(x_e) \longrightarrow f_R(x_e) \cdot f_C(x_e)$$

 $f_{R}(\overline{x_{e}}) \cdot f_{C}(\overline{K})$  $f_{R}(\overline{K}) \cdot f_{C}(\overline{x_{e}})$ 

Quadratic program Two linear programs Iterative linear program

 $\Rightarrow$  5X-30X runtime reduction by avoiding quadratic formulation

 $\Rightarrow$  170 minutes vs. 30 minutes for *dma* testcase

## Outline

- Motivation
- Our Wire Sizing Algorithm
- Experimental Results
- Conclusions and Future Works

## **Implementation Flow**

#### Practical and <u>automated</u> flow



#### **Experimental Environments**

Cadence SOC Encounter, Matlab, Synopsys 32/28nm
PDK

#### Various NDRs are tried

| NDR  | Width | Space | Norm.<br>Cap. |
|------|-------|-------|---------------|
| 1W2S | 1     | 2     | 1.21          |
| 1W5S | 1     | 5     | 0.73          |
| 2W4S | 2     | 4     | 0.89          |
| 1W8S | 1     | 8     | 0.66          |
| 2W7S | 2     | 7     | 0.76          |
| 3W6S | 3     | 6     | 0.88          |
| 4W5S | 4     | 5     | 1.00          |

Iso area : W+S is fixed

- Experiments
  - Iso area NDRs
  - Non-iso (less) area NDRs
- Results essentially satisfy all timing and EM constraints
  - Runtime: 10 seconds ~100 minutes per subnet

Matlab R2012b and 2.5GHz Intel Xeon processor

## **Results: Proportion of NDRs**

# ■ Smaller-width NDRs replace ≥ 80% of the wiring



#### **Results: Less Capacitance**

Clock switching power and wire capacitance reduction



#### **Results: Less Area**

50% track cost reduction can be achieved by non-iso area NDRs



Track cost : total amount of track length occupied

## **Conclusion and Future Works**

- Smart NDRs for clock networks ⇒ reduce the clock tree wire capacitance under timing and EM constraints
- Less capacitance: 9% reduction of wire cap, 5% reduction of clock switching power
- Less area: 50% track cost reduction by using non-iso area NDRs

#### Future work

- Control local skews for improved chip timing and robustness
- Use better delay models for problem formulation
- Noise and variability consideration

**Thank You!** 

#### References

- [Elmore48] W. C. Elmore, "The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers", J. Applied Physics 19(1) (1948), pp. 55-63
- [HuAHK07] S. Hu, C. J. Alpert, J. Hu, S. Karandikar, Z. Li, W. Shi and C. N. Sze, "Fast Algorithms For Slew Constrained Minimum Cost Buffering", IEEE TCAD 26(11) (2007), pp. 2009-2022.
- [PullelaMP95] S. Pullela, N. Menezes and L. T. Pillage, "Low Power IC Clock Tree Design", Proc. CICC, 1995, pp. 263-266.