# Timing Analysis and Optimization Implications of Bimodal CD Distribution in Double Patterning Lithography

Kwangok Jeong<sup>†</sup> and Andrew B. Kahng<sup>†‡</sup>

<sup>†</sup>ECE and <sup>‡</sup>CSE Departments, University of California at San Diego kjeong@vlsicad.ucsd.edu, abk@cs.ucsd.edu

Abstract— Double patterning lithography (DPL) is in current production for memory products, and is widely viewed as inevitable for logic products at the 32nm node. DPL decomposes and prints the shapes of a critical-layer layout in two exposures. In traditional single-exposure lithography, adjacent identical layout features will have identical mean critical dimension (CD), and spatially correlated CD variations. However, with DPL, adjacent features can have distinct mean CDs, and uncorrelated CD variations. This introduces a new set of 'bimodal' challenges for timing analysis and optimization. We assess the potential impact of DPL on timing analysis error and guardbanding, and find that the traditional 'unimodal' characterization and analysis framework may not be viable for DPL. For example, using 45nm models, we find that different DPL mask layout solutions can cause 50ps skew in clock distribution that is unseen by traditional analyses. Different mask layouts can also result in 20% or more change in timing path delays. Such results lead to insights into physical design optimizations for clock and data path placement and mask coloring that can help mitigate the error and guardband costs of DPL.

#### I. INTRODUCTION

Double Patterning Lithography (DPL) [2] [3] allows 32nm half-pitch logic patterning using 193nm ArF lithography tools. DPL partitions a critical-layer layout into two mask layouts and exposures, each with relaxed critical pitch and spacing. DPL incurs a throughput overhead, and necessitates tight overlay control between the two exposures. However, there is likely no other solution for logic patterning at 32nm [1], and indeed DPL is already in production for leading-edge memory products.

In DPL, lines, or spaces between lines, are printed in two sequential processes. Thus, CDs of adjacent lines or spaces can have different mean and sigma values, i.e., a 'bimodal' CD distribution. The existence of two different CD populations loses the spatial correlation that is enabling to pessimism reduction in on-chip variation-aware timing analysis as well as statistical timing analysis.

Currently, three main technology options exist for DPL: double exposure, double patterning, and spacer double patterning. The mechanism leading to a bimodal CD distribution, and its overall significance, will differ according to the technology option. Each option can print trenches (space is critical) or lines (line is critical) according to resist types. We now briefly review the three DPL technology options for a specific resist type.

**Double exposure DPL.** Double exposure DPL with negative tone resist creates trenches at twice the resolution of normal lithography, using two successive exposure steps. Double exposure DPL prints spaces rather than target line shapes, and is hence called a *negative dual trench process* [4]. One edge of a target line is formed with the first exposure, and the other edge is generated with the second exposure, as shown in Figure 1. Both edges of two adjacent lines facing each other are formed at the same time. While an exposure dose variation can result in an edge placement error, both lines will be affected by the same amount, and CDs of adjacent lines remain identical, as shown in Figure 1(b). However, in the presence of misalignment, CDs of adjacent lines can differ by the amount of the misalignment error, as shown in Figure 1(c).

We note that while double exposure DPL entails a relatively simple process, the fact that CDs are determined by misalignment error reduces the technique's viability. This is because the roadmap for overlay control capability is significantly looser than the general CD control requirement (e.g., the 2007 ITRS specifies overlay tolerance at the 45nm node to be as large as 9nm [10]).



Fig. 1. Double exposure DPL.

Double patterning DPL. Double patterning DPL with positive tone resist creates lines at twice the normally achievable resolution, using a LELE (litho-etch-litho-etch) process. At the first etch step, the patterns of the first resist layer are transferred to an underlying hard mask. Photoresist is then coated onto the surface remaining after the first process, and exposed in the second exposure step. The flow finishes with the hard mask that prints one line and the resist of the second exposure that prints the other line. In double patterning DPL, the two edges of a line that are printed by the first etch and the second exposure, and the two edges of the adjacent line that are printed by the second exposure and etch process, can be different, as shown in Figure 2(a). While the first patterns are made on a perfectly flat wafer, the second resist is coated onto a topography that is a result of overetch of the first patterning step. The topography implies greater depth of focus (DOF) variation, so that the CDs between the first and the second patterns can differ. Plasma exposure of the first line during the second etch could additionally cause CD change [4]. For these reasons, CDs of critical features will have a bimodal distribution.

Unlike the double exposure DPL, double patterning DPL will have two different CD populations due to the CD control error, as shown in Figure 2(b). Since misalignment just shifts the line, without changing the linewidth, misalignment itself does not matter; this is illustrated in Figure 2(c).



Fig. 2. LELE double patterning DPL.

**Spacer double patterning DPL.** The third DPL technology option [8] [9] uses sacrificial spacer technology, as illustrated in Figure 3. Similar to LELE double patterning, spacer double patterning prints target lines instead of edges. Therefore, given a well-controlled spacer generation and etch process, the CD difference between

adjacent lines can be maintained to be as small as the CD control capability, even in the presence of misalignment error.



Fig. 3. Sacrificial spacer double patterning DPL.

The 2007 ITRS [10] roadmap indicates that CD control capability is much better than overlay control capability (e.g., the CD control target for 45nm technology node is 1.9nm). Thus, we expect that printing trenches with negative tone resist will not be attractive in the future, and in this paper we deal with bimodal CD distribution arising from CD control error of DPL processes that print lines with positive tone resist.

The remainder of this paper is organized as follows. In Section II, we describe the 'bimodal challenge' from DPL. In Section III, we experimentally assess the impact of bimodal CD distribution on timing delay, slack, and guardband. Section IV then describes potential courses of action for the industry as standards and flows are evolved to address the bimodal challenge, and finally, Section V gives conclusions.

# II. THE 'BIMODAL CHALLENGE'

Figure 4(a) shows a bimodal CD distribution for 32nm technology measured from 24 wafers processed by DPL, as reported in [4]. Figure 4(b) shows a simplified illustration of the bimodal CD distribution, in which two CD groups have independent mean and sigma values. The bimodal CD distribution affects design timing as follows.



Fig. 4. Bimodal CD distribution.

**Loss of spatial correlation.** The existence of two independent CD populations in a design takes away the presumptions of spatial correlation that has always been used to reduce pessimism in corner-based timing analysis. For example, consider two closely placed, identical inverters made with different steps of double patterning DPL-i.e., one inverter is made by the first litho-etch step and the other is made by the second litho-etch step. These two inverters can have different gate CDs, so that their electrical characteristics, such as delay and power, can also be extremely different from each other despite being adjacent in the same die.

In general, within-die variations are taken into account by onchip variation (OCV) models or by statistical timing analysis flows. Bimodal CD distribution can also be treated as an additional variation source. However, the important problem that we address in this work is that the size of the variation from the bimodal CD distribution can be very large, e.g., over 8% of mean CD difference between the groups, as shown in Figure 4(a); therefore, designers must consider more extreme within-die variations during timing optimization as a direct consequence of DPL. **Increase of overall CD variation.** Unless the two CD populations have the same mean values, overall CD variation must be increased with DPL. Dusa et al. propose the use of a unimodal representation pooled from the bimodal CD distribution [4], specifically,

$$3\sigma_{CD,pooled}^{2} = \left(\frac{3\sigma_{CD,G1}}{2}\right)^{2}$$

$$+ \left(\frac{3\sigma_{CD,G2}}{2}\right)^{2}$$

$$+ \left(\frac{3}{2}\left(\mu_{CD,G1} - \mu_{CD,G2}\right)\right)^{2}$$

$$(1)$$

where G1 and G2 are the two different groups of CD populations. Dusa et al. observed about 20% of  $3\sigma$  CD variation to the mean CD from the pooled CD model for 32nm DPL process. Table I shows, for various CD mean differences between G1 and G2, the CD mean and sigma values for the bimodal distribution, and for the corresponding unimodal distributions as calculated using Equation (1) for 50nm target CD.

TABLE I

MEAN AND SIGMA OF BIMODAL AND POOLED UNIMODAL CD
DISTRIBUTIONS.

|            |             | G1    |           | G2    |           |
|------------|-------------|-------|-----------|-------|-----------|
|            |             | Mean  | $3\sigma$ | Mean  | $3\sigma$ |
|            |             | (nm)  | (nm)      | (nm)  | (nm)      |
| Mean Diff. | Unimodal    | 50.00 | 2.00      | -     | -         |
| 0 nm       | Pooled uni. | 50.00 | 2.00      | -     | -         |
|            | Bimodal     | 50.00 | 2.00      | 50.00 | 2.00      |
| 1 nm       | Pooled uni. | 50.00 | 2.50      | -     | -         |
|            | Bimodal     | 49.50 | 2.00      | 50.50 | 2.00      |
| 2 nm       | Pooled uni. | 50.00 | 3.61      | -     | -         |
|            | Bimodal     | 49.00 | 2.00      | 51.00 | 2.00      |
| 3 nm       | Pooled uni. | 50.00 | 4.92      | -     | -         |
|            | Bimodal     | 48.50 | 2.00      | 51.50 | 2.00      |
| 4 nm       | Pooled uni. | 50.00 | 6.32      | -     | -         |
|            | Bimodal     | 48.00 | 2.00      | 52.00 | 2.00      |
| 5 nm       | Pooled uni. | 50.00 | 7.76      | -     | -         |
|            | Bimodal     | 47.50 | 2.00      | 52.50 | 2.00      |
| 6 nm       | Pooled uni. | 50.00 | 9.22      | -     | -         |
|            | Bimodal     | 47.00 | 2.00      | 53.00 | 2.00      |

As seen in the table, overall CD variation of the unimodal representation in Column 4 increases with the increasing mean difference between CD groups. This increased variation will necessarily increase the guardband of the design process, and in turn will worsen optimization and design closure runtime, as well as standard design metrics such as area, wirelength, violations, etc., as recently reported by Jeong et al. [7] (cf. the discussion of Figure 11 below).

#### III. IMPACTS OF BIMODAL CD DISTRIBUTION

In this section, we analyze the timing problems that arise from a bimodal CD distribution. In this discussion, we refer to the different CD distributions as corresponding to the different colorings (i.e., mask exposures) of the gate polys in a cell layout. In DPL coloring, adjacent minimum-pitch poly lines must be colored differently. Thus, a cell can have (at least) two basic versions according to its coloring sequence, as shown in Figure 5. To distinguish between these different colorings, when the cell is instantiated in standard, "North" orientation. we use  $M_{12}$  (respectively,  $M_{21}$ ) to refer to a cell in which the first or leftmost poly is colored by CD group1 (respectively, CD group2), the second poly is colored by CD group2 (CD group1), and so on. It is important to note that regardless of whether a cell has an odd number of polys or an even number of polys, and cells' placement locations and orientations, there will exist two different colorings for the cell, based on which color is assigned to the first (leftmost) poly. We discuss the key impacts of the bimodal CD distribution: on path delay variation, on timing slack variation, and on the design process.



Fig. 5. Example of two different DPL colorings for a NOR3 cell.

### A. Path Delay Variation in DPL

Every cell instance in a design can be colored differently according to its location and the surrounding cell instances. Therefore, instances of the same master cell in a timing path can be differently colored, and can have different electrical behaviors. As mentioned in Section II above, due to the loss of the spatial correlation between differently colored cells, delays across cell types ( $M_{12}$  and  $M_{21}$ ) in a path can vary randomly or with less correlation, even while cells of the same type coloring have strong correlation. Finding the path delay variation of a timing path in the presence of bimodal CD distribution requires solution of the following problem formulation.

**Bimodal Path Delay Variation Analysis:** Given m cells  $g_i$  of  $M_{12}$  type and n cells  $q_j$  of  $M_{21}$  type in a timing path, determine the delay variation of the timing path, subject to the constraints:

(a) 
$$Min_{i,j}cov(g_i, g_j) > Max_{i,j}cov(g_i, q_j)$$

(b) 
$$Min_{i,j}cov(q_i, q_j) > Max_{i,j}cov(g_i, q_j)$$

According to the constraints, the covariance between cells in the same group is larger than the covariance between cells in different groups.

The delay variation of a delay path is:

$$\sigma^{2}(d(path)) = \sigma^{2}\left(\sum_{i}(d(g_{i})) + \sum_{i}(d(q_{i}))\right)$$

$$= \sum_{i}\sigma^{2}(d(g_{i})) + \sum_{i}\sigma^{2}(d(q_{i}))$$

$$+2\sum_{i,j}cov(g_{i},g_{j}) + 2\sum_{i,j}cov(q_{i},q_{j}) + 2\sum_{i,j}cov(g_{i},q_{j}) \qquad (2)$$

From Equation (2), since  $cov(g_i,q_j)$  is small (e.g., zero in the case of no correlation), the path delay variation for a path composed of uncorrelated different types of cells is smaller than that of a path composed of only correlated cells.

Recall that for the DPL process, patterns are first partitioned into two groups, and that the two groups are each assigned a distinct color. The constraint is that same-color patterns should not be placed within the minimum distance that is permitted by the litho and etch equipment. According to the placement locations, orientations and the neighboring cells, a cell can be colored in different ways. Figure 6 shows the delay variation of 4-stage inverter chains and buffer chains for all possible colorings of cells for two different CD mean differences, 0nm in (a) and 3nm in (b). We measure the delay of the timing paths across the four combinations of extreme CD corners (Min and Max CD values for each CD group). The x-axis shows the different path coloring sequences, and the legends in the figure show the combinations of the extreme corners of each CD group. Note that even for such a simple timing path, the number of required timing analyses in the DPL regime increases exponentially with the number of stages.

For this experiment, we use the 45nm bulk CMOS SPICE model from the University of Arizona's Predictive Technology Model website [11] and 45nm circuits from NANGATE's Open Cell Libraries [12]. We assume that the CD values of each CD group have perfect spatial correlation, so as to isolate the impact of bimodality as well as to reduce the number of experiments. The number of

configurations of each path, accounting for different colorings and process corners, is  $4 \cdot 2^4 = 64$ . Table II shows all possible CD corners (Column 1), and all possible coloring sequences (Column 3), in the 4-stage inverter and buffer chains. Right arrows ( $\rightarrow$ ) imply the logical signal propagations, and cells can be placed anywhere in a die.

TABLE II

PATH CONFIGURATIONS FOR 4-STAGE INVERTER AND BUFFER CHAINS.

| CD corner           | Path coloring configuration |                                                                   |  |
|---------------------|-----------------------------|-------------------------------------------------------------------|--|
| G1 group - G2 group |                             | 0 0                                                               |  |
|                     | 1                           | $M_{12} \to M_{12} \to M_{12} \to M_{12}$                         |  |
|                     | 2                           | $M_{12} \to M_{12} \to M_{12} \to M_{21}$                         |  |
|                     | 3                           | $M_{12} \to M_{12} \to M_{21} \to M_{12}$                         |  |
|                     | 4                           | $M_{12} \rightarrow M_{12} \rightarrow M_{21} \rightarrow M_{21}$ |  |
| MAX - MAX           | 5                           | $M_{12} \to M_{21} \to M_{12} \to M_{12}$                         |  |
|                     | 6                           | $M_{12} \rightarrow M_{21} \rightarrow M_{12} \rightarrow M_{21}$ |  |
| MAX - MIN           | 7                           | $M_{12} \rightarrow M_{21} \rightarrow M_{21} \rightarrow M_{12}$ |  |
|                     | 8                           | $M_{12} \to M_{21} \to M_{21} \to M_{21}$                         |  |
| MIN - MAX           | 9                           | $M_{21} \to M_{12} \to M_{12} \to M_{12}$                         |  |
|                     | 10                          | $M_{21} \to M_{12} \to M_{12} \to M_{21}$                         |  |
| MIN - MIN           | 11                          | $M_{21} \to M_{12} \to M_{21} \to M_{12}$                         |  |
|                     | 12                          | $M_{21} \to M_{12} \to M_{21} \to M_{21}$                         |  |
|                     | 13                          | $M_{21} \rightarrow M_{21} \rightarrow M_{12} \rightarrow M_{12}$ |  |
|                     | 14                          | $M_{21} \rightarrow M_{21} \rightarrow M_{12} \rightarrow M_{21}$ |  |
|                     | 15                          | $M_{21} \rightarrow M_{21} \rightarrow M_{21} \rightarrow M_{12}$ |  |
|                     | 16                          | $M_{21} \rightarrow M_{21} \rightarrow M_{21} \rightarrow M_{21}$ |  |

We assume the CD variation within each CD group to be 2nm, which is comparable to the ITRS predicted value for CD control in the 45nm node, i.e., 1.9nm [10]. Finally, we measure the delay of the 64 different path configurations while sweeping the difference of means between CD group1 and CD group2 from 0nm to 6nm. We also compare the delay estimated from the pooled unimodal CD model (ref. Table I) with that estimated from the more realistic bimodal CD model.

From this study, we observe that for most cases, the delay values are within the boundary of the delay at the MAX-MAX and MIN-MIN corners, and that most results from bimodal analysis are within the window established by the pooled unimodal model. However, not all cases are covered by the pooled model when the mean CD difference between the two groups is 0nm. In addition, delay variation increases when the mean difference between the two CD groups increases. Note that the delay variation of pooled unimodal cases becomes significantly larger than that for the bimodal cases, when the mean CD difference becomes nonzero, as shown in Figure 6(b). This immediately raises the question as to whether the pessimism of a pooled unimodal delay model (i.e., today's standard practice) will be too costly in the DPL regime. We also observe that for skewed processes (MAX-MIN or MIN-MAX), delay variation across all the path configurations is larger than for MAX-MAX or MIN-MIN.

Figure 7 shows delay variations of a 16-stage inverter chain, normalized to mean values. Here, only four (out of 2<sup>16</sup>) path colorings are studied: (i) M1-only, (ii) M1-M2-M1-... alternation, (iii) M2-M1-M2-... alternation, and (iv) M2-only. Correspondingly to the analytical solution in 2, alternative coloring of the timing path shows smaller delay variations.

## B. Timing Slack Variation in DPL

While path delay variation can be reduced by the bimodal CD distribution, we find a very different situation with variation of timing slack – which is the most important parameter for design timing. Timing slack ( $T_{slack}$ ) of the design is defined by

$$T_{slack} = T_{clock} + T_{cycle} - T_{data}$$
 (3)

The variation of the timing slack is calculated by

$$\sigma_{T_{slack}}^2 = \sigma_{T_{clock}}^2 + \sigma_{T_{data}}^2 - 2cov\left(T_{clock}, T_{data}\right) \tag{4}$$

<sup>1</sup>As noted in the earlier review of double exposure DPL technology, since overlay control in the 45nm node is 9nm, it is difficult to use the negative double exposure process in light of the CD variation requirement. Hence, we do not consider the negative correlation between CD groups that would result with double exposure DPL, and we assume that CD variation is determined only by CD control capability.



Fig. 6. Delay variations of 4-inverter and 4-buffer chains. Path configurations are as given in Table II.



Fig. 7. Relative delay variation  $\sigma/\mu$  (%) over all process corners.

For a traditional single-exposure process, if we assume that spatial correlation is high, the covariance term in Equation (4) will reduce the slack variation. However, in DPL, since cells in the clock path can be colored in a different way from cells in the data path, the covariance term will be reduced to zero, so that timing slack variation becomes a sum of clock path and data path variations. To meet signoff timing constraints with this increased slack variation in DPL, designs will require more stringent and difficult timing optimization.

We illustrate this concept with Figure 8, which portrays the slack calculation for the traditional single-exposure process in (a), and for the DPL process in (b). In this simple example, we assume that nominal delay of both clock and data path are 10ns, and, following the analysis of path delay variation in Equation (2), we assume that DPL has smaller delay variation than the single exposure, e.g.,  $\pm 5$ ns for single exposure and  $\pm 2$ ns for DPL.



Fig. 8. Worst timing slack calculation in the DPL and (traditional) single-exposure regimes.

In the single-exposure case, due to the strong spatial correlation between the clock path and the data path, process variation does not make timing slack worse. However, in the DPL case, although the delay variation is small, we can see large negative slack, due to the weak correlation between clock and data path – that is, each path delay can be varied independently.

To see more explicitly and realistically the impact of bimodal CD distribution on the timing slack, we extract a topmost critical path from the *aes* core, obtained as RTL from the open-source site *opencores.org* [13], which synthesizes to 40K instances, and is placed and routed with a reduced set of 45nm library cells. Both the launching and capturing clock paths are composed of 14 stages of inverters, respectively. Also, the launching and capturing clock paths share the initial 4 stages of inverters, but differ from each other in the latter 10 stages of each path. The data path is composed of 30 logic stages, e.g., 2-input NAND, NOR, OR and AND logic cells, and 1-input BUF and INV cells. An exhaustive design of experiments (DOE) would require 4 · 2<sup>54</sup> cases. We reduce the DOE complexity by restricting alternatives for the clock paths, the combinational data path, and registers.

First, we assume that the colorings of all cells in the data path are fixed. This allows us to evaluate the impact of bimodal CD distribution only on the clock design. Second, the number of clock path configurations still remains very large  $(4 \cdot 2^{24})$ , so we further limit our experiments to the 5 extreme cases shown in Table III.

TABLE III
COLORING CONFIGURATIONS OF THE CRITICAL PATH EXAMPLE.

|                  | Data path                   | Launching clock path                                                                           | Capturing clock path                                                                           |
|------------------|-----------------------------|------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| Case 1<br>Case 2 |                             | $M_{12} \rightarrow M_{12} \rightarrow \dots$<br>$M_{21} \rightarrow M_{21} \rightarrow \dots$ | $M_{12} \rightarrow M_{12} \rightarrow \dots$<br>$M_{21} \rightarrow M_{21} \rightarrow \dots$ |
| Case 3           | $M_{12} \rightarrow M_{12}$ | $M_{12} \rightarrow M_{12} \rightarrow \dots$                                                  | $M_{21} \rightarrow M_{21} \rightarrow \dots$                                                  |
| Case 4           |                             | $M_{21} \rightarrow M_{21} \rightarrow \dots$                                                  | $M_{12} \rightarrow M_{12} \rightarrow \dots$                                                  |
| Case 5           |                             | $M_{12} \rightarrow M_{21} \rightarrow \dots$                                                  | $M_{12} \rightarrow M_{21} \rightarrow$                                                        |

For a design to operate correctly, data signals must be carried from one (launching) register to the next (capturing) register once per each clock cycle. The timing slacks for setup and hold time are defined by<sup>2</sup>

· setup timing slack

$$T_{slack,setup} = T_{RAT,setup} - T_{AAT,setup}$$

$$= (T_{capture} - T_{launch}) + T_{cycle} - T_{setup} - T_{data} \ge 0$$
(5)

• hold timing slack

$$T_{slack,hold} = T_{RAT,hold} - T_{AAT,hold}$$

$$= (T_{launch} - T_{capture}) + T_{data} - T_{hold} \le 0$$
(6)

The difference of delays between launching and capturing clock paths, i.e., clock skew, plays an important role in both the setup and hold timing slacks. If  $T_{capture}$  is greater (resp. smaller) than  $T_{launch}$ , this increases (decreases) setup time slack but decreases (increases) hold time slack regardless of data path delay. Therefore, however well one optimizes the circuit to have zero slack, an unbalanced clock network can create clock skew and cause timing problems by either setup or hold time violations. Figure 9 shows

<sup>&</sup>lt;sup>2</sup>We use the standard acronyms of AAT for actual arrival time, and RAT for required arrival time.

the maximum skew that occurs as a result of the bimodal CD distribution, across the path coloring sequences shown in Table III. Note that the clock skew is originally designed to be zero. Intuitively, we can expect that there is no clock skew when the coloring sequences of both clock paths are the same, i.e., Cases 1, 2 and 5. However, even when the mean difference between two CD groups is zero, Cases 3 and 4 show substantial clock skew due to the different coloring sequences of launching and capturing clock paths, and the skew increases when the mean CD difference increases. The maximum clock skews of Cases 3 and 4 with 0nm mean CD difference are 22.7ps for each, and these skews increase up to 52.2ps and 53.4ps, respectively, with 6nm mean difference. Another implication of Figure 9 is that the pooled unimodal CD representation cannot discern the potential skewrelated timing problems in DPL designs, even though the pooled model accounts for the physical distribution of CDs, and is very pessimistic with respect to CD corners. This is because the pooled CD model cannot distinguish the colorings of paths.



Fig. 9. Clock skew versus CD mean difference between CD groups, across combinations of process corners. Case 1, 2 and 5 are superposed on x-axis.

Figure 10 shows the slack changes of each coloring sequence of clock paths versus the mean difference of the CD groups at the worst CD corner combination (MAX-MAX). The timing path originally has zero slack when the CD mean difference is zero (i.e., two color groups have same CD mean). For Case 4, since the delay of the capturing (resp. launching) clock path decreases (increases), the slack becomes negative<sup>3</sup>; this will worsen when the number of stages of the clock network increases. For Cases 1, 2, 3 and 5, delay of the capturing clock path is greater than that of the launching clock path, so that the slack is still positive or even improved. However, since the improved slack on this path is only from clock skew, there can easily be a resulting timing problem for the next timing path that starts with this path's capturing register, or increased hold time violations per Equation (6). We also notice again that the pooled unimodal CD representation shows unnecessarily pessimistic setup timing slack values.



Fig. 10. Timing slack versus CD mean difference between CD groups across combinations of process corners.

#### C. Guardband and Design Process in DPL

The simplest way to consider the bimodal CD distribution in the design process is to model bimodal as unimodal. The alreadycited pooled unimodal CD model from [4] can be useful, and today's conventional flow can still be used. However, the pooled unimodal model gives a too-pessimistic guardband, which can lead to significant overdesign.

Figure 11 shows the best-case and worst-case delays of the 45nm INV cell for each of pooled unimodal and bimodal, with mean difference between the two CD groups in x-axis. Delay difference between worst and best shows the size of the guardband. As seen in the figure, simple unimodal modeling will lead to more than 2X increase of guardband, even for the small mean difference cases; according to the recent study on guardband impact in [7], this will lead to over 15%, 39% and 14% of area, runtime and wirelength increase, respectively.



Fig. 11. Timing guardband for each characterization method.

To reduce such pessimism in unimodal representation, separate timing models for each CD group are required. However, this increases the difficulty of circuit optimization. Placement location and surrounding patterns will determine the timing model of a cell instance, since these factors affect the DPL coloring. Consequently, even slight cell movement or resizing can give large and non-obvious changes in delay values under skewed process combinations, i.e., MIN-MAX or MAX-MIN. This may lead to more physical design iterations, since at every ECO placement step, cells' timings can be changed by the applied DPL patterning and coloring solution.

From the above results and discussions, we can conclude that a pooled unimodal representation with pessimistic corner values does not suffice in the future of DPL, and furthermore, as we demonstrated above, the pooled unimodal model cannot capture the potential timing problems caused by uncorrelated data and clock delay variations. To deal with the challenges presented by the bimodal CD distribution, novel timing analysis and optimization methodologies are required.

# IV. POTENTIAL SOLUTIONS AND MITIGATIONS FOR BIMODAL CD DISTRIBUTION

DPL doubles the resolution of optical lithography, in comparison to the traditional single-exposure process. However, our results strongly suggest that to incorporate DPL into production, the bimodal CD distribution must be dealt with accurately in both analysis and optimization. Orthogonally, accurate control and measurement of mean difference between two CD populations are required to reduce pessimism. Even as the traditional timing flow undergoes substantial changes today, DPL-awareness appears to be a looming key issue. In this section, we give some considerations for future courses of action by which the industry can respond to uncorrelated distributions of device behaviors in the DPL context.

#### A. Timing Analysis

DPL requires (1) more guardbanding and/or (2) a new methodology to characterize electrical properties such as delay, power, etc. of DPL circuits. Existing methodology and infrastructure allows modeling via a pooled unimodal CD model, but our timing analyses in Section III show that the pooled unimodal description is likely

 $<sup>^3</sup>$ With 6nm mean CD difference, -18ps of slack violation occurs. This value is about 10% of the clock path delay of our test case.

too pessimistic. Therefore, device modeling, extraction and cell characterization strategies must change to accommodate DPL. The device parameters of the process must capture the reality of bimodal CD variation, and production methodology should permit each printed transistor (finger) to independently reference the appropriate model card. To this end, another mask layer is required to distinguish transistors in G1 and G2, and each type of device must correspond to its model within the bimodal-aware SPICE modeling. Each cell master has at least two distinct instantiations in silicon ( $M_{12}$  type and  $M_{21}$  type) that require modeling, so that two independent timing libraries for  $M_{12}$  and  $M_{21}$  type cells are required. Our simulation results show that delay of the MAX-MAX combination for CD groups G1 and G2 dominates other combinations for the worst-case delay, and that delay of the MIN-MIN combination dominates other combinations for best-case delay. Hence, for delay calculation, it may be reasonable to use MAX-MAX corner as the worst delay corner and MIN-MIN corner as the best delay corner, similar with the conventional corner-based timing analysis. However, for the timing slack calculation, new statistical and deterministic timing analysis methodologies, comprehending transistor-level spatial correlation, will be required to reduce the pessimism.

#### B. Timing Optimization

For bimodal-aware timing optimization, we can split a timing graph into three pieces: data path, clock path and sequential cells. As is implied by the  $cov(g_i, q_i)$  term in Equation (2) and our experiments in Section III (e.g., Figure 7), the alternative coloring reduces delay variation as well as the worst case delay of both clock and data paths. Thus, we suggest the use of 'self-compensation' in the sense of deliberate balancing of cell colorings in timing paths - to reduce delay variation. The term 'self-compensation' has been used by Gupta et al. [5] in previous work that reduces throughfocus timing variation by balancing 'isolated' and 'dense' (pitch) timing arcs. Here, we refer to the balancing of timing arcs between two uncorrelated CD distributions. For clock paths, we suggest the use of the same type coloring for all cells in the clock network (thus exploiting spatial correlation maximally). This will further reduce clock variation as well as slack variation. According to the experiment in Figure 9, several tens of ps skew reduction will result.

However, we note that coloring based on the timing, i.e., alternative coloring or same type coloring of paths, can increase coloring conflicts with neighboring cells that are already colored (Figure 13 (b)). Although detailed coloring methods in DPL are out of the scope of this paper, an intermediate methodology may be to apply a hierarchical DPL process based on master cells, whereby all master cells have pre-defined coloring. To avoid DPL coloring conflicts between adjacent cell instances, it may be necessary to develop larger-sized cells, in which all critical features can be colored independent of the colorings of other neighboring cells. For cells to be colored independently, twice the distance from poly center to the cell boundary  $(2d_{pb})$  must be greater than the minimum resolution  $(Res_{min})$  of the single exposure system. Figure 12 illustrates a placement of two cells with the original minimum-sized cells in (a) and with larger-sized cells that avoid coloring conflicts in (b).



Fig. 12. Conflict removal with larger-sized cell.

To avoid increase in cell area, we suggest to fix coloring conflicts by placement perturbation using remaining white space, i.e., increasing distance between conflicting cells, as shown in Figure

13: (a) find the timing critical path, (b) determine best coloring, and (c) remove coloring conflicts with white space management. To this end, we suggest adapting the 'Corr' approach of Gupta et

For sequential cells, e.g., latches and registers, it is unclear which process combination gives the worst or best behavior. 'D-Q' and CK-Q' delay can be maximized (minimized) at the MAX-MAX (MIN-MIN) corner, but hazard timing margins such as setup and hold may take on worst values at different combinations of process corners, since these result from racing between clock and data paths within a cell. Measurement and optimization of delay and hazard timing margin across process corners appears to be an open and challenging research topic.



(a) Original placement (b) Alternative coloring (c) Conflict Removal Fig. 13. Conflict removal with intelligent white space management.

#### V. CONCLUSIONS AND ONGOING WORK

Double Patterning Lithography (DPL) allows 32nm half-pitch logic patterning using 193nm ArF lithography tools, and is already in production for leading-edge memory products. However, the associated 'bimodal' CD distribution and loss of spatial correlation between differently colored (exposure) cells has far-reaching impacts on circuit properties that are neither well-defined nor wellstudied. In this paper, we have given both analytic and empirical assessments of the potential impact of DPL on timing analysis error and guardbanding. We observe that the traditional 'unimodal' characterization and analysis framework may not be viable for DPL. For example, our analyses demonstrate that different mask layouts can result in 20% or more change in timing path delays. Based on our observations, we have proposed potential solutions for each step of the design process. Our next goal is to provide more accurate, efficient and practical solutions for the 'bimodal-aware' challenges in timing analysis and circuit design optimization.

#### REFERENCES

- G. Capetti et al., "Sub k1 = 0.25 Lithography with Double Patterning Technique for 45nm Technology Node Flash Memory Devices at 193nm", *Proc. SPIE Optical Microlithography*, vol. 6520, pp. 65202K-1 65202K-12, 2007.
   J. Finders, M. Dusa and S. Hsu, "Double Patterning Lithography: The Bridge Between Low k1 ArF and EUV", *Microlithography World*, Feb. 2008.
   M. Drapeau, V. Wiaux, E. Hendrickx, S. Verhaegen and T. Machida, "Double Patterning Design Split Implementation and Validation for the 32nm Node", *Proc. SPIE Design for Manufacturability through Design-Process Integration*, Vol. 6521 2007.

- Proc. SPIE Design for Manufacturability through Design-Process Integration, Vol. 6521, 2007.
   M. Dusa et al., "Pitch Doubling Through Dual-Patterning Lithography: Challenges in Integration and Litho Budgets", Proc. SPIE Conference on Optical Microlithography, 2007, pp. 65200G-1 65200G-10.
   P. Gupta, A. B. Kahng, Y. Kim and D. Sylvester, "Self-Compensating Design for Focus Variation", Proc. Design Automation Conf., June 2005, pp. 365-368.
   P. Gupta, A. B. Kahng and C.-H. Park, "Detailed Placement for Improved Depth of Focus and CD Control", Proc. Asia and South Pacific Design Automation Conf., Jan. 2005, pp. 343-348.
   K. Jeong, A. B. Kahng and K. Samadi, "Quantified Impacts of Guardband Reduction on Design Process Outcomes", Proc. Intl. Symp. on Quality Electronic Design, March 2008, pp. 890-897.
   S.-M. Kim et al., "Issues and Challenges of Double Patterning Lithography in DRAM", Proc. SPIE Conference on Optical Microlithography, 2006, pp. 65200H-1 65200H-7.
   M. Maenhoudt, J. Versluijs, H. Struyf, J. Van Olmen and M. Van Hove, "Double

- M. Maenhoudt, J. Versluijs, H. Struyf, J. Van Olmen and M. Van Hove, "Double Patterning Scheme for Sub-0.25 k1 Single Damascene Structures at NA=0.75,  $\lambda$ =193nm", *Proc. SPIE Conference on Optical Microlithography*, 2005, pp. 1508-
- International Technology Roadmap for Semiconductors, 2007 Edition, http://public.itrs.net/.
- Predictive Technology Model, http://www.eas.asu.edu/~ptm. NANGATE, http://www.nangate.com/
- OPENCORES.ORG, http://www.opencores.org/.