# Voltage Drop Mitigation by Adaptive Voltage Scaling using Clock-Data Compensation

Andrés Malavasi-Mora<sup>1</sup> and Renato Rimolo-Donadio<sup>2</sup> <sup>1</sup>Intel Labs, Hillsboro, Oregon 97124, USA. andres.f.malavasi@intel.com <sup>2</sup>Instituto Tecnológico de Costa Rica, 30101 Cartago, Costa Rica. rimolo@tec.ac.cr

*Abstract*— High-frequency power supply noise compromises performance and energy efficiency of microprocessor-based products, restricting the maximum frequency of operation for electronic systems and decreasing device reliability. A guard band needs to be set in order to tolerate voltage drops without having any execution problem but leading to a performance reduction. This work proposes a technique to enhance voltage drop tolerance through adaptive scaling, taking advantage of the clock-data compensation effect. The proposed solution is validated with test cases in a FinFet CMOS technology at a post-layout simulation level, reaching from 6% up to 30% more voltage drop tolerance.

*Index Terms*—Adaptive voltage scaling, clock-data compensation, CMOS circuit, power noise, voltage drop.

## I. INTRODUCTION

High-frequency supply voltage ( $V_{CC}$ ) drop degrades the performance and energy efficiency of microprocessor products, limiting the maximum frequency ( $f_{max}$ ) of operation for electronic systems such as microprocessors [1]. Supply noise caused by on-chip current introduces delay variation in data paths, as well as jitter in clock paths. As a result, the launched data from one stage in a pipeline can no longer be guaranteed to be captured by the next clock edge within a given timing window, leading to a timing failure. This is why a guard band needs to be set in order to tolerate voltage drops without having any execution problem but leading to a performance penalty.

Several adaptive circuit techniques have been reported in the literature (e.g. [2]-[4]) aiming to reduce the effect of the voltage drops by explicitly sensing the variation with on-die monitors and adjusting the operating condition. Although this is effective at low frequencies, the chances to mitigate high frequency drop are very limited. Other techniques like resilient timing-error detection and recovery circuits [5]-[8] can be very useful since they detect the timing violation, isolates the error from corrupting the architecture state, and corrects the error through instruction replay. However, the architectural design complexity for implementing error recovery into a high-performance microprocessor while ensuring coverage for all failure scenarios is a significant challenge.

It is possible to enhance  $f_{max}$  immunity to power supply noise by designing the clock distribution such that the clock edge sampling the data at the receiver is pushed out every time the data signal is delayed due to power supply drop [9]. This effect is known as "Clock-Data Compensation" (CDC) and it is illustrated in Fig. 1; a simple pipeline circuit consisting of a phase-locked loop (PLL), a clock path and a data path is shown.



Fig. 1. Illustration of clock-data non-ideal effects: (a) simple clock and data path pipeline, (b) voltage drop event with ideal clock, (c) voltage drop event with a non-ideal clock.

In traditional analysis, the clock period is assumed constant while only the data path delay is assumed to change under the influence of supply noise. Fig. 1(b) illustrates example waveforms based on the traditional analysis showing several sampling failures during the event of a supply voltage undershoot. In reality, however, the clock path delay is also modulated by the supply noise and therefore stretches the clock period during supply downswings. As a result, the clock path delay and data path delay compensate for each other, which alleviates the timing margin. Fig. 1(c) depicts example waveforms for this scenario. This clock period modulation effect results in an extra timing margin that compensates for the slowdown in the data path.

Using this idea, adaptive clock distribution (ACD) along with clock gating techniques have become an interesting solution to mitigate the effect of voltage drop on microprocessor performance [1]. These techniques take advantage of the clock-data compensation effect, in which both clock and data are affected by the drop: changing the clock signal compensates changes on data paths. Nevertheless, clock gating can bring synchronization problems between blocks.

This work evaluates a different approach to address this problem by proposing an adaptive and scalable technique to enhance voltage drop tolerance in CMOS circuits through adaptive voltage scaling, taking advantage of the clock-data compensation effect without doing any clock gating o frequency reduction. The proposed solution is explained and validated by applying it to different scenarios, considering process corners and diverse operating conditions in a FinFet-CMOS technology. Finally, layout solutions and their evaluation were incorporated in order to improve confidence on the results with the proposed technique.

#### 978-1-7281-3427-7/20/\$31.00 ©2020 IEEE

# II. CLOCK TREE AND CDC IMPACT ON THE CRITICAL PATH

This section evaluates the effect of the clock-data compensation (CDC) when a voltage drop arises. The focus is on high-frequency voltage drops, since available literature on this topic indicates that the resonant supply noise typically lies in the 50-300MHz range with a maximum magnitude of approximately 10% of  $V_{CC}$  [10]. The configuration under study is depicted in Fig. 2.

The purpose of having a tunable data path is to mimic the different timing paths present in a design, which can be slow, medium or fast delay paths. Also, an error signal is generated when the data is not captured correctly at the output flip-flop.

Choosing the clock tree length to have the optimum delay for the critical path in a circuit is a key task to maximize the benefit of the clock-data compensation effect. For extremely long or short clock path delays, the slack considering the beneficial jitter effect (i.e. noisy clock supply) approaches the conventional analysis case (i.e. clean clock supply). A very short clock path makes the clock period modulation effect weaker, and conversely, in very long clock paths each clock edge will see a similar average supply voltage.

The impact of adding different stages (clock buffers) to the clock tree will give a better picture of the sensitivity of the clock path, meaning the ratio of the increase in the path delay normalized to the percentage decrease in the supply voltage under a supply noise condition.

Fig. 3 illustrates how the clock period changes during the voltage drop depending on its length. On Fig. 3(a), the voltage drop is present, then the maximum drop occurs at 52.5 ns, which corresponds to 100MHz frequency. The voltage level at this point is ~587mV, which corresponds to ~10% voltage drop. On Fig. 3(b), the clock period change on different clock trees is displayed, where the impact of adding 30, 100, and 190 stages to the clock path is shown.

It is notorious that the biggest change is given by the clock with 190 stages ( $clk_{190}$ ). This clock tree stretches up to a 561ps clock period (61ps extra clock period) but compresses about 35ps when the drop is recovering. The next stretching corresponds to  $clk_{100}$  that is about 40ps stretching and 25ps compressing. Finally, for  $clk_{30}$ , the clock tree has a very short clock path, making the clock period modulation effect weaker.

Regarding timing of the path, the slack changes depending on the different clock lengths. The clock of 190 stages presents the lesser slack change due to the CDC effect, dropping up to -50ps, which is greater than the most affected path driven by the clock tree (-75ps slack) without any compensation. As the clock length increase, the lesser the slack reduction the circuit is going to have.

The slack presents the biggest change during the first order drop, in the subsequence drops the slacks varies but less than the first drop. For example, for the second order voltage drop, the slack in the circuit when there is not any compensation falls up to -10 ps, causing a new failure in the circuit. For the clock of 190 stages due to the compensation, the slack remains positive avoiding any failure in the circuit.

Fig. 4 shows the behavior of the error signal; for the  $clk_{190}$  path, it fails until the voltage reaches its lowest value, and in the rest of the cases (with smaller clock tree), the circuit fails

earlier. When there is not enough compensation like in  $clk_{30}$ , the circuit tents to fail earlier with a higher voltage level (15 mV higher). The paths with no compensation or  $clk_{30}$ , fail even in the second order drop, which has less voltage drop compare to the first order drop.



Fig. 2. High level schematic for DUT built with a clock tree, launching and receiving flip-flops, a tunable data path and error signal generation.



Fig. 3. Impact of voltage drop on clock period: (a) voltage supply; (b) clock period measurement for different clock lengths.



Fig. 4. Error signal for 30, 100 and 190 clock buffers in the clock tree. Error signal occurs later when having more CDC effect.

## III. PROPOSED VOLTAGE DROP MITIGATION SCHEME

The proposed solution is depicted in Fig. 5. It is assumed that the system has a load, which can be certain part of a chip that can experience fluctuations in the voltage supply due to changes in the surrounding circuitry. The variable load will represent these changes through an increase or decrease of the current consumption. The voltage droop monitor (VDM) senses the supply voltage level and capture these fluctuations. Based on a voltage threshold and the supply voltage, will send information to the voltage controller, which can vary the resistance seen by the power supply and inject more charge to the system.

The VDM (Fig. 6) constantly checks if the nearby Vcc exp value crosses a programmable drop threshold, and if so, the associated drop controller turns on the local PG (power gate) block to inject a local charge from a clean power supply (Vcc hi) to Vcc exp and restores the voltage quickly (very similar to [2]). The block features a time-to-digital converter (TDC). This block is used to measure the voltage drop in the system. The tunable length delay block performs an "offset" for the TDC, in case we need to overcome any timing issues between the clock and the TDC at the time to test the circuit. However, the delay to measure cannot be greater than the offset plus the resolution of the TDC, in such case we need either to decrease the offset or add more stages to the VDM. The threshold selector allows to adjust the voltage threshold externally (assuming this technique gets prototyped) depending of the critical path timing slack by changing the input code for the controller.

The voltage controller manages the current injection based on a user defined threshold and the VDM code (thermometer code coming from the TDC) and its previous value. It can determine if the system is experiencing a voltage drop or not. Since both values present and past of the voltage are an input to the controller, the voltage controller can determine if the drop is going down or up.

When the voltage exceeds the threshold and its going down, the voltage controller will inject more current to the system through the current injection block. On the other hand, if the voltage is exceeding the threshold but is going up, this means the voltage is recovering then in order to save power will turn off the current injection block.

The VDM allows having a multi-threshold operation plus voltage drop direction detection, in which the controller can react depending on the voltage level. In order to have a better performance of the VDM, the clock source needs to be isolated from the system clock; otherwise, the VDM will be affected by the CDC effect, causing an inaccurate code when a voltage drop is occurring. The proposed VDM comes with an extra area overhead of 360  $\mu$ m<sup>2</sup> for the studied case.

Fig. 7 shows how PG size can be swept, in order to find the minimum size to avoid any overdesign in mitigation scheme. Having more PGs than necessary translates to extra area and leakage power consumption. The minimum PG number to inject enough charge to the system in order to star recovering the voltage was six in the evaluated scenario.



Fig. 5. Block diagram for the regulated system with a voltage comparator, voltage controller, and current regulation block in a variable load system with a power grid model.



Fig. 6. Voltage Drop Monitor (VDM) block diagram.



Fig. 7. PG size sweep for the current regulation block optimizing the minimum numbers of PGs needed to start injecting enough charge to the circuit to overcome voltage drop.

# IV. RESULTS

In this section, the results are based on post-layout simulations, where a non-regulated system is being compared to the proposed technique, analyzing the voltage drop tolerance (guard band reduction) of both circuits, which translates to a higher tolerance of an extra current variation event. Also, the performance improvement is shown, where both systems are under the same current variation event, finding their maximum operating frequency. This comparison was performed for different timing paths, where the voltage threshold can be adjusted according to the most critical path. The fast path corresponds to 72% of the clock period, the medium path is 78%, and finally the slow path is 90% of the clock period.

Table I summarizes iso-frequency test for all the different paths, and the amount of extra current variation ( $\Delta I$ ) the system can tolerate comparing both proposed technique and the baseline system, which is the non-regulated (NR system). In Table II, the summary corresponds to the  $f_{max}$  improvement under isocurrent variation conditions, finding the maximum operating frequency of the NR system when the current variation is the same as the maximum  $\Delta I$  tolerated by the proposed technique at 645MHz ( $f_{max}$  for proposed technique).

TABLE I ISO-FREQUENCY TEST SUMMARY

| Data path<br>Type | NR system<br>Max ΔI | Proposed<br>technique Max<br>ΔΙ | Extra Current<br>tolerance<br>improvement |
|-------------------|---------------------|---------------------------------|-------------------------------------------|
| Slow              | 40µA (10%)          | 80µA (20%)                      | 10%                                       |
| Medium            | 100µA (25%)         | 165µA (41%)                     | 16%                                       |
| Fast              | 170µA (42.5%)       | 300µA (75%)                     | 32.5%                                     |

TABLE II ISO-CURRENT VARIATION TEST SUMMARY

| Data path<br>Type | Max ΔI<br>tolerated by<br>proposed<br>technique<br>@ 645 MHz | Fmax in NR system<br>under the same<br>current variation | Frequency<br>improvement |
|-------------------|--------------------------------------------------------------|----------------------------------------------------------|--------------------------|
| Slow              | 80μΑ                                                         | 605 MHz                                                  | 6.2%                     |
| Medium            | 165µA                                                        | 605 MHz                                                  | 6.2%                     |
| Fast              | 300µA                                                        | 500 MHz                                                  | 22.4%                    |

As it was shown, the proposed scheme increases the voltage drop tolerance of the system. Incorporating a scheme like this will bring some complexity to the power management portion of the design plus an overhead in terms of area of the circuit. In the test case, the mitigation scheme area is larger than the critical data path, but in a more realistic scenario, the data path is more complex, and it should occupy much more area. Another consideration at the time to incorporate this solution, is the load maximum variation frequency, since as it was discussed previously, the monitor response speed will limit the applicability of the proposed solution. In the evaluated VDM's configuration, frequency cannot be faster than 770 MHz.

# V. CONCLUSIONS

This work proposed an alternative to enhance voltage drop tolerance in digital circuits by exploiting the CDC effect using a second order power grid model. An in-situ adjustable threshold voltage drop monitor (VDM) was developed to detect and react to the voltage drop in a fast way. The post-layout simulations show a 10% (slow path), 16% (medium delay path) and 32.5% (fast path) extra current tolerance improvement in the circuit using the proposed scheme compared to a circuit without it. Additionally, it shows 6.2% (slow path), 6.2% (medium delay path) and 22.4% (fast path) frequency improvement in the circuit using the voltage drop mitigation scheme in comparison to the reference case without it.

The response of the VDM is limited by its maximum operating frequency, which was 770 MHz in this study. This limits the voltage controller reaction if the load frequency is much higher than this, and its response to low voltages, since it can be a misread of the voltage code, causing potential incorrect behaviors of the voltage controller.

#### REFERENCES

- Bowman, K. A., Tokunaga, C., Karnik, T., De, V. K., & Tschanz, J. W. (2012). A 22nm dynamically adaptive clock distribution for voltage drop tolerance. VLSI Circuits (VLSIC), 2012 Symposium on, 48(4), 94–95.
- [2] H. Mair et al., "A 10nm FinFET 2.8GHz Tri-gear Decacore CPU complex with optimized power-delivery network for mobile SoC performance," in Proc. IEEE ISSCC, San Francisco, CA, USA, Feb. 2017, pp. 56–5.
- [3] J. Tschanz et al., "Adaptive frequency and biasing techniques for tolerance to dynamic temperature-voltage variations and aging", IEEE ISSCC Tech. Papers 2007.
- [4] N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar, "Next generation Intel core micro-architecture (Nehalem) clocking," IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 1121–1129, Apr. 2009.
- [5] S. Das et al., "A self-tuning DVS processor using delayerror detection and correction,"IEEE J. Solid-State Circuits, pp. 792–804, Apr. 2006.
- [6] S. Das et al., "Razor II: In situ error detection and correction for PVT and SER tolerance," JSSCC, pp. 32– 48, Jan. 2009.
- [7] K. A. Bowman et al., "Energy-efficient and metastabilityimmune resilient circuits for dynamic variation tolerance," IEEE J. Solid-State Circuits, pp. 49–63, Jan. 2009.
- [8] K. A. Bowman et al., "A 45 nm resilient microprocessor core for dynamic variation tolerance," JSSCC, Jan. 2011.
- [9] Wong, K. L., Rahal-arabi, T., Ma, M., & Taylor, G. (2006). "Enhancing Microprocessor Immunity to Power Supply Noise with Clock-Data Compensation". IEEE Journal of Solid-State Circuits, 4, 749–758.
- [10] D. Jiao, J. Gu, P. Jain, and C. Kim, "Enhancing beneficial jitter using phase-shifted clock distribution," inProc. IEEE Int. Symp. Low Power Electronics and Design (ISLPED), Aug. 2008, pp. 21–26.
- [11] Malavasi A., "Voltage Drop Mitigation by Adaptive Voltage Scaling using Clock-Data Compensation," M.Sc. dissertation, Instituto Tecnológico de Costa Rica, 2019.