# Improved Signoff Methodology with Tightened BEOL Corners

Tuck-Boon Chan<sup>‡</sup>, Sorin Dobre<sup>§</sup> and Andrew B. Kahng<sup>†‡</sup>

<sup>†</sup>CSE and <sup>‡</sup>ECE Departments, UC San Diego, La Jolla, CA 92093

<sup>§</sup> Qualcomm Technologies, Inc., San Diego, CA 92121

{tbchan, abk}@ucsd.edu, sdobre@qti.qualcomm.com

Abstract-To ensure functional correctness, conventional chip implementation methodology signs off the SOC design at extreme process, voltage and temperature (PVT) conditions. At the 20nm node and beyond, voltage and end of line (BEOL) layers have become major sources of variation, which must be accounted for by signoff at various *BEOL* corners. Conventional signoff methodology uses extreme BEOL corners, in which all BEOL layers are skewed to the worst-case condition (e.g., all BEOL layers have the worst parasitic capacitance). However, such a BEOL condition is very pessimistic because the probability of having all BEOL layers skew towards the worst-case condition simultaneously is extremely small. Such pessimism results in longer chip implementation schedules and poorer design quality. In this paper, we propose a signoff methodology with *tightened BEOL corners* to recover the pessimism incurred by the conventional BEOL corners. This approach is based on the observation that most timing-critical paths use different BEOL layers. When the variations of BEOL layers are not fully correlated, the BEOL-induced timing variation is much smaller due to averaging of random variations. Our experimental results show that by using tightened BEOL corners, we can reduce timing-violation paths by up to 100% and improve the WNS and TNS by up to 101ps and 53ns, respectively.

# I. INTRODUCTION

In a conventional implementation methodology, designers sign off an SoC design at extreme PVT conditions to ensure functional correctness. As wire geometries continue to shrink with each new process node, wire resistance (R) and capacitance (C) have become major sources of variation [15], which must be accounted for by signoff at BEOL corners. In current industry-standard signoff methods, conventional BEOL corners (CBCs) are defined such that all BEOL layers vary in the same way [6]. For example, Table I (see Section II) shows common BEOL corners in which the wire width  $(\Delta W)$ , wire thickness  $(\Delta T)$  and dielectric thickness  $(\Delta H)$  variations are biased to the minimum or maximum values.<sup>1</sup> Although BEOL parameters have strong spatial correlations within a die [12], different BEOL parameters are not fully correlated [5] [6] [10] [13] [22]. When the parameters are not fully correlated, the likelihood of a worstcase (or best-case) condition on all layers is vanishingly small (if not a physical impossibility). Therefore the CBCs are unnecessarily pessimistic, which results in longer chip implementation schedules (time spent on design closure steps).

To reduce the pessimism in CBCs, various statistical RC extraction and timing analysis methods have been proposed [1] [2] [3]. The main drawback of statistics-based methods is the lack of availability of commercial EDA tools to characterize a RC variation model (e.g., sensitivities of RC to BEOL physical parameters). Although we can construct the RC variation model by extracting RC at nominal and perturbed corners for each variation source [3], this method requires a lot of computing resources. For example, to characterize an interconnect stack with nine metal layers and three variation sources per layer, we need 28 RC extractions for a nominal corner and 27 perturbed corners. Moreover, the extracted parasitics are design-specific and they must be updated when the design changes.

Alternatively, Lu and McCullen [11] propose a BEOL variationaware timing analysis method based on a layout-to-SPICE [25] netlist extraction tool. Since the extraction tool can annotate the nominal RC value as well as the bounds of RC in the SPICE netlist, the BEOLinduced timing variation can be simulated using SPICE. However, the SPICE-based timing analysis is slower than static timing analysis (STA), and commercial RC extraction tools do not have the option to extract and annotate BEOL parameters into a netlist.

For corner-based timing analysis, there are methods to find the worst-case BEOL variation scenarios [6] [14] [20], but these scenarios are far from the typical BEOL variations seen in IC manufacturing. Thus, signing off a design using these BEOL variation scenarios will incur large design overheads [7]. Yamada and Oda [22] propose a simple method to tighten BEOL corners based on the wirelengths of BEOL layers. This corner-based method has the advantage that statistical extraction is only required once per technology for validation. However, this approach is so oversimplified that it may be inaccurate when path delays of an IC have different and opposite sensitivities to BEOL variations.

In this paper, we propose a signoff methodology with tightened BEOL corners (TBCs) to reduce the impact of pessimism in CBCs. Our method is based on an observation similar to [22], i.e., the wires on timing-critical paths are typically routed through different BEOL layers. For example, Figure 1 shows that the wirelength ratio of (setup) critical paths extracted from a design are mostly routed on layers M2 to M6. Figure 2 shows that, for 92% of the paths, the maximum wirelength from a single layer is less than 60% of the total wirelength. When process variations of the BEOL layers are not fully correlated, the timing variation on a critical path is typically much smaller than that estimated using CBCs due to averaging of uncorrelated variations.<sup>2</sup> Our analysis (see Section III) shows that the delay variation at a CBC (with respect to the typical BEOL condition) can be much larger than the delay variation obtained from a statistical analysis. Further, we observe that the pessimism of a CBC depends on the sensitivities of critical-path delays to resistance and capacitance variations. Our results also show that CBCs have small or no pessimism for certain kinds of critical paths. Thus, we cannot apply TBCs to the entire design as suggested in [22]. To address this issue, we propose to choose the signoff corners (i.e., CBCs or TBCs) for each path based on its delay sensitivities to resistance and capacitance. By using this method, we can safely sign off a path using TBCs or CBCs without underestimating the delay variation of the paths.

Our main contributions are as follows.

- We show that the pessimism of a CBC depends on the sensitivities of critical-path delay to BEOL resistance and capacitance, and that the trend is similar across different designs.
- We propose a method to identify the critical paths which can use tightened BEOL corners for signoff. We show that this method can reduce the number of paths with timing violations by up to 100% and improve WNS and TNS by up to 101*ps* and 53*ns*, respectively.

The organization of the rest of this paper is as follows. In Section II, we review the BEOL delay variation model. In Section III, we

<sup>&</sup>lt;sup>1</sup>The  $\Delta W, \Delta T$  and  $\Delta H$  in Table I are extracted from foundry BEOL corners. The definitions of the BEOL corners match with those described in [8].

 $<sup>^{2}</sup>$ As explained in [6], given a timing path, it is possible to find a worstcase BEOL scenario for which the delay estimated at the worst-case BEOL scenario is worse compared to those at CBCs. However, the worst-case BEOL scenario is rare or else not significant enough to cause timing violations in actual chips.



Fig. 1: Wirelength distribution of critical paths on different BEOL layers.



Fig. 2: Cumulative probability of the maximum wirelength percentage of a single layer (relative to total wirelength on its corresponding path).

describe our approach to derive tightened BEOL corners based on the properties of critical paths. The experimental results of our study are presented in Section IV. We conclude this paper in Section V.

# II. BEOL VARIATION MODEL

We denote the index of a metal layer in an interconnect stack by *m* and the total number of metal layers by  $N_{layer}$ . We denote the conductor width and thickness of the layer *m* by  $W_m$  and  $T_m$ , respectively. Similarly, we denote the thickness of the layer's interlayer dielectric (i.e., the distance between layer *m* and layer *m*+1) by  $H_m$ . Figure 3 illustrates an example of the interconnect stack with three metal layers (M1, M2 and M3).

# A. Conventional BEOL Corners

The major variation sources in a BEOL corner are  $\Delta W_m$ ,  $\Delta T_m$  and  $\Delta H_m$ , which correspond to the variations in  $W_m$ ,  $T_m$ , and  $H_m$ , respectively.<sup>3</sup> A CBC is modeled by biasing the variation sources in

TABLE I: Typical BEOL corners with skewed parameters.

| Corner           | $\Delta W_m$ | $\Delta T_m$ | $\Delta H_m$ |
|------------------|--------------|--------------|--------------|
| Y <sub>typ</sub> | typical      | typical      | typical      |
| $Y_{cb}$         | minimum      | minimum      | maximum      |
| Y <sub>cw</sub>  | maximum      | maximum      | minimum      |
| Y <sub>rcb</sub> | maximum      | maximum      | maximum      |
| Y <sub>rcw</sub> | minimum      | minimum      | minimum      |



### Inter-metal dielectric

Fig. 3: Illustration of the cross-section of a typical metal stack.

a BEOL technology file (e.g., itf [28] or ict [23]). For example, Table I shows the  $\Delta W_m$ ,  $\Delta T_m$  and  $\Delta H_m$  for typical CBCs. Note that the  $\Delta W_m$ ,  $\Delta T_m$  and  $\Delta H_m$  are biased in the same way for all layers in a CBC. It should also be noted that the RC-best ( $Y_{rcb}$ ) and C-worst ( $Y_{cw}$ ) corners have similar  $\Delta W$  and  $\Delta T$ . Meanwhile, the RC-worst ( $Y_{rcw}$ ) and C-best ( $Y_{cb}$ ) corners have similar  $\Delta W$  and  $\Delta T$ . Thus, the wire resistance extracted at  $Y_{rcb}$  and  $Y_{cw}$  (resp.  $Y_{rcw}$  and  $Y_{cb}$ ) are similar but the capacitance is larger (resp. smaller) at  $Y_{cw}$  (resp.  $Y_{cb}$ ) because of a smaller (resp. larger) inter-layer dielectric thickness.

#### B. Tightened BEOL Corners

We denote a tightened BEOL corner by  $Y_{ref_{-}\alpha}$ , where  $\alpha$  is a scaling factor and  $Y_{ref}$  is a CBC, i.e.,  $Y_{ref} \in \{Y_{cb}, Y_{cw}, Y_{rcb}, Y_{rcw}\}$ . We define  $\Delta W_m$ ,  $\Delta T_m$  and  $\Delta H_m$  of a  $Y_{ref_{-}\alpha}$  as

$$\Delta W_m \text{ of } Y_{ref\_\alpha} = \alpha \cdot \Delta W_m \text{ of corner } Y_{ref}$$
  

$$\Delta T_m \text{ of } Y_{ref\_\alpha} = \alpha \cdot \Delta T_m \text{ of corner } Y_{ref}$$
  

$$\Delta H_m \text{ of } Y_{ref\_\alpha} = \alpha \cdot \Delta H_m \text{ of corner } Y_{ref}$$
(1)

#### C. Statistical BEOL Variation

For an interconnect stack with  $N_{layer}$  layers, there are  $3N_{layer}$  variation sources. We model each of these variation sources as a Gaussian random variable  $z_v$  ( $v = 1, 2, ..., 3N_{layer}$ ). The correlations among the random variables are defined by a correlation matrix ( $\Sigma$ ). Since BEOL parameters are correlated if they are fabricated using the same *process module* [13], we model the correlation between two variance sources as follows.

$$\Sigma_{u,v} = \begin{cases} 1 \text{ if } u = v \\ \gamma \text{ if both } z_u \text{ and } z_v \text{ are } \Delta W, \ \Delta H \text{ or } \Delta T \\ \text{ of different BEOL layers and the layers are in } \\ \text{ the same process module.} \\ 0 \text{ otherwise} \end{cases}$$
(2)

where  $\Sigma_{u,v}$  is the entry at the  $u^{th}$  row and  $v^{th}$  column in  $\Sigma$ .  $\gamma$  is the correlation between  $z_u$  and  $z_v$ . Due to the lack of actual manufacturing data, we assume that  $\gamma$  is the same for different pairs of variation sources. In our experiments, we study two scenarios with  $\gamma = 0.5$  [13] and  $\gamma = 0.0$  (i.e., all variation sources are independent). Unless otherwise specified, the following statistical analyses use  $\gamma = 0.0$ . For

<sup>&</sup>lt;sup>3</sup>Spacing variation is implicitly defined by  $\Delta W_m$ .

the nine-layer interconnect stack in our experiment, there are three process modules:

- Layers M1, M2 and M3  $\in$  process module 1
- Layers M4, M5, M6 and M7  $\in$  process module 2
- Layers M8 and M9  $\in$  process module 3

We define  $Y_v$  as the BEOL corner in which only the v variation source is perturbed by one standard deviation from the typical condition.<sup>4</sup> We extract the delay sensitivity of the  $j^{th}$  path  $(p_j)$  to the  $v^{th}$  variation source  $(\Delta d_{j,v})$  by using the finite-difference method [3].<sup>5</sup>

$$\Delta d_{j,\nu} = d_j(Y_\nu) - d_j(Y_{typ}) \tag{3}$$

where  $Y_{typ}$  is the typical BEOL corner.  $d_j(Y_v)$  and  $d_j(Y_{typ})$  are, respectively, the delay of  $p_j$  at  $Y_v$  and  $Y_{typ}$ . Note that the layout-induced RC variation is accounted for in the RC extraction. The BEOL-induced delay variation for  $p_j$  ( $\sigma_{path_j}$ ) is given by the following equation.

$$\sigma_{path_j} = \sqrt{\sum_{\nu=1}^{3N_{layer}} (\Delta d'_{j,\nu})^2}$$
where  $[\Delta d'_{j,3N_{layer}}, ..., \Delta d'_{j,3N_{layer}}] = [\Delta d_{j,1}, ..., \Delta d_{j,3N_{layer}}] \cdot \lambda$ 
 $(\lambda \cdot \lambda^T) = \Sigma$ 
(4)

We decompose  $\Sigma$  to obtain  $\lambda$  by using the *Cholesky decomposition* method.  $\lambda$  is a lower triangular matrix and  $\lambda^T$  is the transpose of  $\lambda$ .

Note that the delay variation is also affected by the drive strength of standard cells which has within-die random variation [18]. Therefore, the delay variation of different nets on the same metal layer may not be fully correlated. Since our variation model assumes that the delay variation on a single metal layer is fully correlated, we may underestimate the effect of averaging random variations.

# **III. PESSIMISM IN CONVENTIONAL BEOL CORNERS**

Unlike hold-time violations which can be fixed by buffer insertion, fixing a setup timing-critical path at CBC corners has become a very challenging task due to the increased wire resistance and BEOL variation. For example, increasing the drive strengths of standard cells along a setup timing-critical path is a typical approach to fix a setup-time violation. However, when the path is dominated by wire delay (e.g., a path with relatively long wires), increasing the drive strengths of cells can only reduce a fraction of the path delay, which may be insufficient to fix the setup timing violation. This problem is even more critical at high  $V_{dd}$  and/or high temperature operating conditions in which the impact of wire delay variation is more significant. In the following discussion, we only focus on reducing the pessimism of CBC on the data path of setup timing-critical paths.<sup>6</sup>

We define  $\Delta d_j(Y)$  as the difference between the delays of  $p_j$  at corners *Y* and  $Y_{typ}$ , i.e.,  $\Delta d_j(Y) = d_j(Y) - d_j(Y_{typ})$ . We consider  $p_j$  as "safe" if the path is signed off at a corner *Y*, for which  $\Delta d_j(Y)$  is larger than  $\Im \sigma_{path_j}$ .

$$\exists Y, \Delta d_j(Y) \ge 3\sigma_{path_j} \tag{5}$$

<sup>4</sup>We assume that the  $\Delta W_m$ ,  $\Delta H_m$  and  $\Delta T_m$  in the  $Y_{rcb}$  and  $Y_{rcw}$  corners correspond to +3 and -3 standard deviations, respectively.

Our goal is to find the tightened BEOL corners such that the design signed off using these corners will meet the safe condition in Equation (5). Meanwhile, the corners should not be overly pessimistic, i.e., the difference between  $\Delta d_j(Y)$  and  $3\sigma_{path_j}$  should be minimized.

# A. Analysis

When BEOL variations are small, path delay variations can be approximated as a linear function of BEOL variations [1]. Based on this assumption and the definition of the TBC in Equation (1),

$$\Delta d_i(Y_\alpha) = \alpha \cdot \Delta d_i(Y) \tag{6}$$

where  $\Delta d_j(Y_\alpha)$  is the delay variation at a given TBC. To satisfy the safe condition at  $Y_\alpha$ , the smallest scaling factor for  $p_j(\alpha_j(Y))$  is given by

$$\alpha_j(Y) = \frac{3\sigma_{path_j}}{\Delta d_j(Y)} \tag{7}$$



Fig. 4:  $\alpha_j$  versus  $\Delta d_j$  for critical paths obtained from the *NETCARD* benchmark circuit.

Figure 4 shows the scaling factors of a set of critical paths for  $Y_{cw}$ and  $Y_{rcw}$ . The figure shows that  $\alpha_j(Y)$  is small when  $\Delta d_j(Y)$  is large but increases rapidly when  $\Delta d_j(Y)$  approaches zero. Also, there are paths for which their  $\Delta d_j(Y_{cw})$  (resp.  $\Delta d_j(Y_{rcw})$ ) become negative. This happens because  $Y_{cw}$  (resp.  $\Delta d_j(Y_{rcw})$ ) become negative. This happens because  $Y_{cw}$  (resp.  $Y_{rcw}$ ) corner has smaller parasitic resistance (resp. capacitance) and the paths are more sensitive to the changes in resistance (resp. capacitance). The results also imply that we need to sign off at both  $Y_{cw}$  and  $Y_{rcw}$  corners to capture the impact of interconnect variation. When we analyze both  $Y_{rcw}$  and  $Y_{cw}$  corners, the paths which have a smaller  $\Delta d_j(Y_{cw})$  will have a larger  $\Delta d_j(Y_{rcw})$ , and vice-versa for the paths which have larger  $\Delta d_j(Y_{cw})$ . Thus we should only consider the  $\alpha_j$  at the *dominant corner* which has a larger  $\Delta d_j(Y)$ . The actual scaling factor  $(\alpha_j^{act})$  is defined as

$$\alpha_j^{act} = \frac{3\sigma_{path_j}}{max(\Delta_{delay_j,Y_{cw}}, \Delta_{delay_j,Y_{rcw}})}$$
(8)

To understand the trends in Figure 4, we analyze the relationships between  $\sigma_{path_j}$  and  $\Delta d_j(Y)$ . Figure 5 shows that there is a strong correlation between  $3\sigma_{path_j}$  and  $\Delta d_j(Y)$ . Moreover, most of the paths have a  $\alpha_j^{act}$  smaller than 0.5. The small  $\alpha_j^{act}$  is due to the averaging of uncorrelated variations when the wires along the paths are routed on many metal layers.

Figure 6 shows the relationships between  $\alpha_j^{act}$ ,  $\Delta d_j(Y_{cw})$  and  $\Delta d_j(Y_{rcw})$ . Each circle in the figure represents a path, the coordinates of a circle on the x- and y-axes indicate its (normalized)  $\Delta d_j(Y_{cw})$  and  $\Delta d_j(Y_{rcw})$ . Meanwhile, the color of the circles indicates the magnitude of  $\alpha_j^{act}$ . From the figure, we can see that the paths with a large  $\alpha_j^{act}$  have small  $\Delta d_j(Y_{cw})$  and  $\Delta d_j(Y_{rcw})$ , e.g., both  $\Delta d_j(Y_{cw})$  and  $\Delta d_j(Y_{rcw})$  are smaller than 0.03 when  $\alpha_j^{act}$  is larger than 0.5.

Our analysis shows that the paths with a large  $\alpha_j^{act}$  have similar delay sensitivities to R and C. Since a CBC is biased such that the R

<sup>&</sup>lt;sup>5</sup>We assume that the path delay varies linearly with variation sources [1]. <sup>6</sup>Our signoff methodology is not applicable to the hold critical paths because there is not much averaging effect in the short data paths. Also, pessimisms of the CBCs is not significant for the clock network which is typically implemented on a few BEOL layers.



Fig. 5:  $3\sigma_{path_j}$  versus  $\Delta d_j(Y)$ .



Fig. 6:  $\alpha_i^{act}$  versus  $\Delta d_j$  at  $Y_{cw}$  and  $Y_{rcw}$  corners.

and C change in opposite directions (with respect to  $Y_{typ}$ ), the total delay variation at a CBC is very small for the paths with similar delay sensitivities to R and C. In other words the delay variation due to R and C are cancelled out. Note that the cancellation effect is an artifact of CBCs, which does not exist in the statistical RC analysis. Thus,  $3\sigma_{path_j}$  is larger than the delay variation at a CBC (i.e.,  $\alpha_j^{act}$  is large) for this kind of path.

Since the  $\alpha_j^{act}$  is mainly affected by  $\Delta d_j(Y_{cw})$  or  $\Delta d_j(Y_{rcw})$ , we propose to classify the critical paths based on their  $\Delta d_j(Y)$ .

$$p_{j} \in \begin{cases} G_{TBC} \text{ if } \left[ (\Delta d_{j}(Y_{rcw}) > A_{rcw}) \text{ or } (\Delta d_{j}(Y_{cw}) > A_{cw}) \right] \\ G_{CBC} \text{ otherwise} \end{cases}$$
(9)

 $G_{CBC}$  and  $G_{TBC}$  are respectively the set of paths to be signed off using CBC and TBC.  $A_{rcw}$  and  $A_{cw}$  are, respectively, the thresholds for the  $\Delta d_j(Y_{rcw})$  and  $\Delta d_j(Y_{cw})$ , which determine whether a path is in  $G_{TBC}$  or  $G_{CBC}$ .

# B. Proposed Method

Figure 7 describes our signoff methodology. Given a routed design, we first analyze the data paths at  $Y_{cw}$ ,  $Y_{rcw}$  and  $Y_{typ}$  to classify the setup timing-critical paths into  $G_{TBC}$  or  $G_{CBC}$ . The paths in  $G_{TBC}$  (resp.  $G_{CBC}$ ) will be analyzed using TBC (resp. CBC). If there are timing violations, the paths are fixed through a path-based ECO at the corresponding BEOL corners. The design is closed when there are no paths with timing violations in both  $G_{TBC}$  and  $G_{CBC}$ .

Based on our experimental results (see Section IV), we observe that the critical paths of the designs implemented using the same technology and design flow have similar structures. Therefore, we propose to extract the values of  $A_{cw}$  and  $A_{rcw}$  from a set of representative critical paths and use them for other designs implemented using the same technology and design flows. By using



Fig. 7: Proposed signoff flow.

this approach, we only need to perform the costly statistical analysis to characterize  $A_{CW}$  and  $A_{rCW}$  when there is a major change in the technology or design flow.

Given a set of representative critical paths as well as their corresponding timing constraints and operating conditions, the problem is to select the  $A_{cw}$ ,  $A_{rcw}$  and TBCs to minimize the pessimism in CBCs while satisfying the safe condition in Equation (5). To solve this problem, we perform a statistical analysis and extract the *optimal scaling factors* ( $\alpha^{opt}(Y_{rcw})$ ) and  $\alpha^{opt}(Y_{cw})$ ) for different  $A_{cw}$  and  $A_{rcw}$ .<sup>7</sup>

$$\alpha^{opt}(Y_{rcw}) = \max_{j}(\alpha_{j}^{act}(Y_{rcw})), \Delta d_{j}(Y_{rcw}) > A_{rcw}$$
  
$$\alpha^{opt}(Y_{cw}) = \max_{j}(\alpha_{j}^{act}(Y_{cw})), \Delta d_{j}(Y_{cw}) > A_{cw}$$
(10)

Figure 8 shows that as  $\alpha^{opt}(Y_{rcw})$  (resp.  $\alpha^{opt}(Y_{cw})$ ) reduces, the  $A_{rcw}$  (resp.  $A_{cw}$ ) increases but the  $|G_{TBC}|$  reduces. In other words, as we tighten a BEOL corner, the number of paths which can be signed off using the TBC reduces.



Fig. 8: Tradeoff between  $A_{rcw,cw}$  and  $|G_{TBC}|$  with  $\gamma = 0.0$ .

# IV. EXPERIMENTAL RESULTS

We use three designs from ISPD contests [16] [21] and the *OpenCores* [24] as the testcases in our experiments. The designs are placed and routed with a triple- $V_{th}$  45*nm* foundry library using *Synopsys IC Compiler* [26]. To emulate the highly resistive BEOL in advanced technology, we scale the resistivity in the BEOL model file by 8×. For timing signoff, we use *Synopsys PrimeTime* [27]. The PVT condition for setup timing analysis is *SS*,0.90V and 125°C. We

<sup>&</sup>lt;sup>7</sup>The  $\alpha^{opt}(Y_{cw})$  (resp.  $\alpha^{opt}(Y_{rcw})$ ) is optimal for a given set of representative critical paths, along with a threshold value  $A_{cw}$  (resp.  $A_{rcw}$ ).

|                     | LEON3MP | NETCARD | SUPERBLUE12 |
|---------------------|---------|---------|-------------|
| Clock period (ns)   | 1.80    | 2.00    | 3.10        |
| Gate count          | 232K    | 575K    | 1031K       |
| Utilization (%)     | 84      | 79      | 82          |
| Core area $(mm^2)$  | 0.45    | 1.04    | 1.91        |
| Max Transition (ns) | 0.33    | 0.33    | 0.33        |

TABLE II: Physical implementation results of testcases.

TABLE III: Configurations for TBC-based signoff.

|               |                | γ=           | 0.0           | $\gamma = 0.5$ |               |  |  |
|---------------|----------------|--------------|---------------|----------------|---------------|--|--|
| Configuration | $\alpha^{opt}$ | $A_{cw}$ (%) | $A_{rcw}$ (%) | $A_{cw}$ (%)   | $A_{rcw}$ (%) |  |  |
| TBC-0.5       | 0.5            | 3.6          | 4.5           | 4.3            | 7.3           |  |  |
| TBC-0.6       | 0.6            | 3.2          | 3.0           | 3.3            | 5.0           |  |  |
| TBC-0.7       | 0.7            | 2.9          | 2.9           | 3.0            | 3.4           |  |  |

use the  $Y_{cw}$  and  $Y_{rcw}$  during the implementations. The key design parameters of the implemented testcases are listed in Table II.

# A. Experiment Setup

After placement and routing, we fix the timing violations in the designs by using the *fix\_eco* commands in *Synopsys PrimeTime* [27] until there are no improvements. Then we extract 1000 setup timingcritical paths at  $Y_{cw}$  and  $Y_{rcw}$ , separately. To emulate our signoff methodology, we filter the extracted paths based on the definition in Equation (10) to obtain  $G_{TBC}$ . For our signoff methodology, the paths in  $G_{TBC}$  are analyzed using  $Y_{cw_{\alpha}}$  and  $Y_{rcw_{\alpha}}$ . Meanwhile, the paths in  $G_{CBC}$  are analyzed using  $Y_{cw}$  and  $Y_{rcw}$ . In our experiments, we set  $\alpha^{opt}(Y_{rcw})$  equal to  $\alpha^{opt}(Y_{cw})$ .<sup>8</sup> The  $A_{rcw}$  and  $A_{cw}$  for different  $\alpha^{opt}$  and statistical BEOL models are listed in Table III. To collect the representative timing-critical paths, we implement another NETCARD benchmark circuit with clock period = 2.3ns and extract the top 10000 paths at  $Y_{rcw}$  and  $Y_{cw}$ . Note that the critical paths are different from those of the NETCARD testcase described in Table II. Since the representative timing-critical paths can be different from the actual testcases, we increase the values of  $A_{rcw}$  and  $A_{cw}$  by 1% to account for the sampling error in the construction of the representative paths.

#### B. Results

Figure 9 shows that  $\alpha^{act}$  values are large when  $\Delta_{delay}(Y_{rcw})$  or  $\Delta_{delay}(Y_{cw})$  values are small. This validates our assumption that the different testcases have similar trends (i.e.,  $\alpha^{act}$  versus  $\Delta_{delay}(Y_{rcw})$  and  $\Delta_{delay}(Y_{cw})$ ) even though the testcases have different clock periods, gate counts and core areas. Note that we only repeat the experiments for three different netlists. It is possible that there are other netlists which show different trends compared to that shown in Figure 9.

Table IV shows the timing analysis results with  $\gamma = 0.0$ . By using our methods (TBC-0.5, TBC-0.6 and TBC-0.7), we can improve the WNS by 46*ps* to 125*ps* and TNS by up to 68*ns*. Meanwhile, the total number of paths with timing violations is reduced by 42% to 100%.

Table V shows the results of a similar experiment with  $\gamma = 0.5$ . The results show that for all testcases, the  $|G_{TBC}|$  are relatively smaller compared to that in Table IV where  $\gamma = 0.0$ . This is because the  $A_{cw}$  and  $A_{rcw}$  are larger for the same  $\alpha$  when there are stronger correlations among variation sources. Table V shows that  $|G_{TBC}|$  for the TBC-0.5 configuration is zero for the *LEON3MP* testcase. Thus, the TBC-0.5 configuration has no improvements compared to the CBC approach. Meanwhile, results in Table V show that by using TBC-0.6 and TBC-0.7, we can still reduce WNS by up to 101ps and TNS by up to 53ns; the total number of paths with timing violations is also reduced by 10% to 100%.

The *delay estimation error* in Tables IV and V are defined as  $\Delta d_j(Y) - 3\sigma_{path_j}$ . Since the delay estimation errors in the tables are positive, it means that no TBC case underestimates the delay variation.

To fix the remaining timing violation paths, we have several options. First, we can upsize standard cells along critical paths to reduce path delay. Second, if the wire delay is large, we can insert buffers to break long wires into shorter ones so as to reduce wire delay. Note that both approaches will change the  $\Delta_{delay}(Y_{rcw})$  or  $\Delta_{delay}(Y_{rcw})$ . If the  $\Delta_{delay}(Y_{rcw})$  or  $\Delta_{delay}(Y_{cw})$  becomes larger than the corresponding  $A_{rcw}$  or  $A_{cw}$ , we can use TBC, which will reduce the delay variation and improve WNS. Alternatively, we can also intentionally route the wires over multiple layers during the physical implementation stages so as to create critical paths which has less BEOL variations as already discussed in [17] [19].

# V. CONCLUSIONS

Due to highly resistive BEOL layers in advance technology nodes, signoff using conventional BEOL corners (CBC) results in longer chip implementation schedules and poorer design quality. We propose a method to reduce the pessimism in the CBC by using TBC. Our method is based on the observation that most timing-critical paths use different BEOL layers. When the variations of BEOL layers are not fully correlated, the BEOL-induced timing variation is much smaller due to averaging of random variations.

Further, our analysis shows that by extracting the delay sensitivities of the critical paths to the RC-worst and C-worst BEOL corners, we can identify the paths which can use TBC for signoff without underestimating the delay variation (compared to a statistical analysis). The advantage of our method is that the TBC can be precharacterized and calibrated with statistical analysis when there is a major change in the technology node or design flow. Our experimental results show that our method which uses tightened BEOL corners on selected paths can reduce the number of paths with timing violations by up to 100% and improve the WNS and TNS by up to 101ps and 53ns, respectively.

We observe that when the value of  $\alpha$  is large the delay variations at  $Y_{cw}$  and  $Y_{rcw}$  are small. Thus, it may be possible to cover all critical paths by using a  $Y_{typ}$  with a small derating factor on wire delay. In other words, the design can be implemented and signed off by using  $Y_{rc\_\alpha}$ ,  $Y_{rcw\_\alpha}$  and  $Y_{typ}$  (with a derating factor). We expect that this approach will further reduce the pessimism in BEOL corners because the design is not implemented at CBC.

#### REFERENCES

- K. Agarwal, M. Agarwal, D. Sylvester and D. Blauuw, "Statistical Interconnect Metrics for Physical-Design Optimization", *IEEE TCAD* 25(7) (2006), pp. 1273-1288.
- [2] W. Dai and H. Ji, "Timing Analysis Taking Into Account Interconnect Process Variation", *IEEE International Workshop on Statistical Methodology*, 2001, pp. 51-53.
- [3] Z. Feng, P. Li and Z. Ren, "SICE: Design-Dependent Statistical Interconnect Corner Extraction Under Inter/Intra-Die Variations", *IET Circuits, Devices and Systems* 3(5) (2009), pp. 248-258.
- [4] E. A. Foreman, P. A. Habitz, M.-C. Cheng and C. Visweswariah, "A Novel Method for Reducing Metal Variation with Statistical Static Timing Analysis", *IEEE TCAD* 31(8) (2012), pp. 1293-1297.

<sup>&</sup>lt;sup>8</sup>It is possible that using different  $\alpha^{opt}(Y_{rcw})$  and  $\alpha^{opt}(Y_cw)$  can improve the benefits of our signoff methodology.



Fig. 9: Factor  $\alpha^{act}$  versus  $\Delta d_i(Y)$  of critical paths of different testcases.

| TABLE | IV: | Timing | analysis | results | with γ | i = 0.0. |
|-------|-----|--------|----------|---------|--------|----------|
|-------|-----|--------|----------|---------|--------|----------|

|                                        | LEON3MP |         |         |         | NETCARD |         |         |         | SUPERBLUE12 |         |         |         |
|----------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|-------------|---------|---------|---------|
|                                        | CBC     | TBC-0.5 | TBC-0.6 | TBC-0.7 | CBC     | TBC-0.5 | TBC-0.6 | TBC-0.7 | CBC         | TBC-0.5 | TBC-0.6 | TBC-0.7 |
| WNS (ns)                               | -0.046  | 0.000   | 0.000   | -0.010  | -0.134  | -0.009  | -0.033  | -0.059  | -0.154      | -0.085  | -0.091  | -0.106  |
| TNS (ns)                               | -2.519  | 0.000   | 0.000   | -0.043  | -7.290  | -0.030  | -0.409  | -0.894  | -80.351     | -18.899 | -24.373 | -34.993 |
| #Timing violations                     | 170     | 0       | 0       | 12      | 246     | 10      | 19      | 19      | 1422        | 869     | 972     | 1206    |
| Delay estimation error (ns)            | 0.001   | 0.008   | 0.010   | 0.007   | 0.006   | 0.005   | 0.011   | 0.016   | -0.001      | 0.006   | 0.003   | 0.007   |
| $ G_{TBC} $ /total number of paths (%) | 0.0     | 26.1    | 27.9    | 29.6    | 0.0     | 41.4    | 54.5    | 63.2    | 0.0         | 32.6    | 41.4    | 44.0    |

TABLE V: Timing analysis results with  $\gamma = 0.5$ .

|                                        | LEON3MP |         |         |         | NETCARD |         |         |         | SUPERBLUE12 |         |         |         |
|----------------------------------------|---------|---------|---------|---------|---------|---------|---------|---------|-------------|---------|---------|---------|
|                                        | CBC     | TBC-0.5 | TBC-0.6 | TBC-0.7 | CBC     | TBC-0.5 | TBC-0.6 | TBC-0.7 | CBC         | TBC-0.5 | TBC-0.6 | TBC-0.7 |
| WNS (ns)                               | -0.046  | -0.046  | 0.000   | -0.010  | -0.134  | -0.134  | -0.033  | -0.059  | -0.154      | -0.146  | -0.091  | -0.106  |
| TNS (ns)                               | -2.519  | -2.519  | 0.000   | -0.043  | -7.290  | -1.986  | -0.434  | -0.894  | -80.351     | -60.186 | -27.039 | -36.337 |
| #Timing violations                     | 170     | 170     | 0       | 12      | 246     | 35      | 20      | 19      | 1422        | 1229    | 1078    | 1276    |
| Delay estimation error (ns)            | 0.001   | 0.000   | 0.011   | 0.010   | 0.005   | 0.004   | 0.011   | 0.019   | 0.000       | 0.002   | 0.006   | 0.002   |
| $ G_{TBC} $ /total number of paths (%) | 0.0     | 0.0     | 25.4    | 28.6    | 0.0     | 25.4    | 47.2    | 56.7    | 0.0         | 9.7     | 32.3    | 37.8    |

- [5] T. Fukuoka, A. Tsuchiya and H. Onodera, "Worst-Case Delay Analysis Considering the Variability of Transistors and Interconnects", *Proc.* ACM ISPD, 2007, pp. 35-41.
- [6] F. Huebbers, A. Dasdan and Y. Ismail, "Multi-Layer Interconnect Performance Corners for Variation-Aware Timing Analysis", Proc. IEEE/ACM ICCAD, 2007, pp. 713-718.
- [7] K. Jeong, A. B. Kahng and K. Samadi, "Quantified Impacts of Guardband Reduction on Design Process Outcomes", *Proc. ISQED*, 2008, pp. 790-897.
- [8] A. Kurokawa, H. Masuda, J. Fujii, T. Inoshita, A. Kasebe, Z. Huang and Y. Inoue, "Determination of Interconnect Structural Parameters for Best- and Worst-Case Delays", *IEICE Transactions Fundamentals of Electronics* E89-A(4) (2006), pp. 856-864.
- [9] A. Kurokawa, T. Sato, T. Kanamoto and M. Hashimoto, "Interconnect Modeling: A Physical Design Perspective", *IEEE Transactions on Electron Devices* 56(9) (2009), pp. 1840-1851.
- [10] Y. Liu, S. R. Nassif, L. T. Pileggi and A. J. Strojwas, "Impact of Interconnect Variations on the Clock Skew of a Gigahertz Microprocessor", *Proc. ACM/IEEE DAC*, 2000, pp. 168-171.
- [11] N. Lu and J. McCullen, "Enablement of Variation Aware Timing: Treatment of Parasitic Resistance and Capacitance", *Proc. ISQED*, 2007, pp. 743-748.
- [12] J. Luo, S. Sinha, Q. Su, J. Kawa and C. Chiang, "An IC-Manufacturing Yield Model Considering Intra-Die Variation", *Proc. ACM/IEEE DAC*, 2006, pp. 749-754.
- [13] P. McGuinness, "Variations, Margins, and Statistics", Proc. ACM ISPD, 2008, pp. 60-67.
- [14] A. Mutlu, J. Le, R. Molina and M. Celik, "Parametric Analysis to Determine Accurate Interconnect Extraction Corners for Design Performance", *Proc. ISQED*, 2010, pp. 419-423.
- [15] S. R. Nassif, G.-J. Nam and S. Banerjee, "Wire Delay Variability in Nanoscale Technology and Its Impact on Physical Design", *Proc. ISQED*, pp. 591-596.

- [16] M. M. Ozdal, C. Amin, A. Ayupov, S. M. Burns, G. R. Wilke and C. Zhuo, "An Improved Benchmark Suite for the ISPD-2013 Discrete Cell Sizing Contest", *Proc. ISPD*, 2013, pp. 168-170. http://www.ispd.cc/contests/13/ispd2013\_contest.html
- [17] U. Padmanabhan, J. M. Wang and J. Hu, "Robust Clock Tree Routing in the Presence of Process Variations", *IEEE TCAD* 27(8) (2008), pp. 1385-1397.
- [18] L.-T. Pang and B. Nikolić, "Measurement and Analysis of Variability in 45nm Strained-Si CMOS Technology", Proc. IEEE Custom Integrated Circuits Conference, 2008, pp. 129-132.
- [19] A. Sharifi and M. Kandemir, "Process Variation-Aware Routing in NOC Base Multicores", *Proc. ACM/IEEE DAC*, 2011, pp. 924-929.
- [20] L. G. Silva, L. M. Silveira and J. R. Phillips, "Efficient Computation of the Worst-Delay Corner", Proc. DATE, 2007, pp. 1617-1622.
- [21] N. Viswanathan, C. J. Alpert, C. Sze, Z. Li, G.-J. Nam and J. A. Roy, "The ISPD-2011 Routability-Driven Placement Contest and Benchmark Suite", *Proc. ACM ISPD*, 2011, pp. 141-146. http://www.ispd.cc/contests/11/ispd2011\_contest.html
- [22] K. Yamada and N. Oda, "Statistical Corner Conditions of Interconnect Delay (Corner LPE Specifications)", Proc. ASPDAC, 2006, pp. 706-711.
- [23] Cadence ICT, http://www.cadence.com/Community/tags/ICT/default.aspx
- [24] OpenCores, http://opencores.org
- [25] SPICE, http://bwrcs.eecs.berkeley.edu/Classes/IcBook/SPICE/
- [26] Synopsys IC Compiler, http://www.synopsys.com/
- [27] Synopsys Primetime, http://www.synopsys.com/Tools/Implementation/ SignOff/PrimeTime/Pages/default.aspx
- [28] Synopsys Interconnect Technology Format, http://www.synopsys.com/community/interoperability/pages/tapinitf.aspx