Friday, May 4, 2018

V8 release v6.7

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s Git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 6.7, which is in beta until its release in coordination with Chrome 67 Stable in several weeks. V8 v6.7 is filled with all sorts of developer-facing goodies. This post provides a preview of some of the highlights in anticipation of the release.

JavaScript language features

V8 v6.7 ships with BigInt support enabled by default. BigInts are a new numeric primitive in JavaScript that can represent integers with arbitrary precision. Read the Web Fundamentals article on BigInt for more info on how BigInts can be used in JavaScript, and check out our write-up with more details about the V8 implementation.

Untrusted code mitigations

In V8 v6.7 we’ve landed more mitigations for side-channel vulnerabilities to prevent information leaks to untrusted JavaScript and WebAssembly code.


Please use git log branch-heads/6.6..branch-heads/6.7 include/v8.h to get a list of the API changes.

Developers with an active V8 checkout can use git checkout -b 6.7 -t branch-heads/6.7 to experiment with the new features in V8 v6.7. Alternatively you can subscribe to Chrome’s Beta channel and try the new features out yourself soon.

Posted by the V8 team

Wednesday, May 2, 2018

Adding BigInts to V8

Over the past couple of months, we have implemented support for BigInts in V8, as currently specified by this proposal, to be included in a future version of ECMAScript. The following post tells the story of our adventures.


As a JavaScript programmer, you now1 have integers with arbitrary2 precision in your toolbox:

const a = 2172141653n;
const b = 15346349309n;
a * b;
// → 33334444555566667777n     // Yay!
Number(a) * Number(b);
// → 33334444555566670000      // Boo!
const such_many = 2n ** 222n;
// → 6739986666787659948666753771754907668409286105635143120275902562304n

For details about the new functionality and how it could be used, refer to our in-depth Web Fundamentals article on BigInt. We are looking forward to seeing the awesome things you’ll build with them!

1 Now if you run Chrome Beta, Dev, or Canary, or a preview Node.js version, otherwise soon (Chrome 67, Node.js master probably around the same time).

2 Arbitrary up to an implementation-defined limit. Sorry, we haven’t yet figured out how to squeeze an infinite amount of data into your computer’s finite amount of memory.

Representing BigInts in memory

Typically, computers store integers in their CPU’s registers (which nowadays are usually 32 or 64 bits wide), or in register-sized chunks of memory. This leads to the minimum and maximum values you might be familiar with. For example, a 32-bit signed integer can hold values from -2,147,483,648 to 2,147,483,647. The idea of BigInts, however, is to not be restricted by such limits.

So how can one store a BigInt with a hundred, or a thousand, or a million bits? It can’t fit in a register, so we allocate an object in memory. We make it large enough to hold all the BigInt’s bits, in a series of chunks, which we call “digits” — because this is conceptually very similar to how one can write bigger numbers than “9” by using more digits, like in “10”; except where the decimal system uses digits from 0 to 9, our BigInts use digits from 0 to 4294967295 (i.e. 2**32-1). That’s the value range of a 32-bit CPU register3, without a sign bit; we store the sign bit separately. In pseudo-code, a BigInt object with 3*32 = 96 bits looks like this:

  type: 'BigInt',
  sign: 0,
  num_digits: 3,
  digits: [0x12…, 0x34…, 0x56…],

3 On 64-bit machines, we use 64-bit digits, i.e. from 0 to 18446744073709551615 (i.e. 2n**64n-1n).

Back to school, and back to Knuth

Working with integers kept in CPU registers is really easy: to e.g. multiply two of them, there’s a machine instruction which software can use to tell the CPU “multiply the contents of these two registers!”, and the CPU will do it. For BigInt arithmetic, we have to come up with our own solution. Thankfully this particular task is something that quite literally every child at some point learns how to solve: remember what you did back in school when you had to multiply 345 * 678 and weren’t allowed to use a calculator?

345 * 678
     30    //   5 * 6
+   24     //  4  * 6
+  18      // 3   * 6
+     35   //   5 *  7
+    28    //  4  *  7
+   21     // 3   *  7
+      40  //   5 *   8
+     32   //  4  *   8
+    24    // 3   *   8

That’s exactly how V8 multiplies BigInts: one digit at a time, adding up the intermediate results. The algorithm works just as well for 0 to 9 as it does for a BigInt’s much bigger digits.

Donald Knuth published a specific implementation of multiplication and division of large numbers made up of smaller chunks in Volume 2 of his classic The Art of Computer Programming, all the way back in 1969. V8’s implementation follows this book, which shows that this a pretty timeless piece of computer science.

“Less desugaring” == more sweets?

Perhaps surprisingly, we had to spend quite a bit of effort on getting seemingly simple unary operations, like -x, to work. So far, -x did exactly the same as x * (-1), so to simplify things, V8 applied precisely this replacement as early as possible when processing JavaScript, namely in the parser. This approach is called “desugaring”, because it treats an expression like -x as “syntactic sugar” for x * (-1). Other components (the interpreter, the compiler, the entire runtime system) didn’t even need to know what a unary operation is, because they only ever saw the multiplication, which of course they must support anyway.

With BigInts, however, this implementation suddenly becomes invalid, because multiplying a BigInt with a Number (like -1) must throw a TypeError4. The parser would have to desugar -x to x * (-1n) if x is a BigInt — but the parser has no way of knowing what x will evaluate to. So we had to stop relying on this early desugaring, and instead add proper support for unary operations on both Numbers and BigInts everywhere.

4 Mixing BigInt and Number operand types is generally not allowed. That’s somewhat unusual for JavaScript, but there is an explanation for this decision.

A bit of fun with bitwise ops

Most computer systems in use today store signed integers using a neat trick called “two’s complement”, which has the nice properties that the first bit indicates the sign, and adding 1 to the bit pattern always increments the number by 1, taking care of the sign bit automatically. For example, for 8-bit integers:

  • 10000000 is -128, the lowest representable number,
  • 10000001 is -127,
  • 11111111 is -1,
  • 00000000 is 0,
  • 00000001 is 1,
  • 01111111 is 127, the highest representable number.

This encoding is so common that many programmers expect it and rely on it, and the BigInt specification reflects this fact by prescribing that BigInts must act as if they used two’s complement representation. As described above, V8’s BigInts don’t!

To perform bitwise operations according to spec, our BigInts therefore must pretend to be using two’s complement under the hood. For positive values, it doesn’t make a difference, but negative numbers must do extra work to accomplish this. That has the somewhat surprising effect that a & b, if a and b are both negative BigInts, actually performs four steps (as opposed to just one if they were both positive): both inputs are converted to fake-two’s-complement format, then the actual operation is done, then the result is converted back to our real representation. Why the back-and-forth, you might ask? Because all the non-bitwise operations are much easier that way.

Two new types of TypedArrays

The BigInt proposal includes two new TypedArray flavors: BigInt64Array and BigUint64Array. We can have TypedArrays with 64-bit wide integer elements now that BigInts provide a natural way to read and write all the bits in those elements, whereas if one tried to use Numbers for that, some bits might get lost. That’s why the new arrays aren’t quite like the existing 8/16/32-bit integer TypedArrays: accessing their elements is always done with BigInts; trying to use Numbers throws an exception.

> const big_array = new BigInt64Array(1);
> big_array[0] = 123n;  // OK
> big_array[0]
> big_array[0] = 456;
TypeError: Cannot convert 456 to a BigInt
> big_array[0] = BigInt(456);  // OK

Just like JavaScript code working with these types of arrays looks and works a bit different from traditional TypedArray code, we had to generalize our TypedArray implementation to behave differently for the two newcomers.

Optimization considerations

For now, we are shipping a baseline implementation of BigInts. It is functionally complete and should provide solid performance (a little bit faster than existing userland libraries), but it is not particularly optimized. The reason is that, in line with our aim to prioritize real-world applications over artificial benchmarks, we first want to see how you will use BigInts, so that we can then optimize precisely the cases you care about!

For example, if we see that relatively small BigInts (up to 64 bits) are an important use case, we could make those more memory-efficient by using a special representation for them:

  type: 'BigInt-Int64',
  value: 0x12…,

One of the details that remain to be seen is whether we should do this for “int64” value ranges, “uint64” ranges, or both — keeping in mind having to support fewer fast paths means that we can ship them sooner, and also that every additional fast path ironically makes everything else a bit slower, because affected operations always have to check whether it is applicable.

Another story is support for BigInts in the optimizing compiler. For computationally heavy applications operating on 64-bit values and running on 64-bit hardware, keeping those values in registers would be much more efficient than allocating them as objects on the heap as we currently do. We have plans for how we would implement such support, but it is another case where we would first like to find out whether that is really what you, our users, care about the most; or whether we should spend our time on something else instead.

Please send us feedback on what you’re using BigInts for, and any issues you encounter! You can reach us at our bug tracker, via mail to, or @v8js on Twitter.

Posted by Jakob Kummerow, arbitrator of precision

Tuesday, April 24, 2018

Improved code caching

V8 uses code caching to cache the generated code for frequently-used scripts. Starting with Chrome 66, we are caching more code by generating the cache after top-level execution. This leads to a 20-40% reduction in parse and compilation time during the initial load.


V8 uses two kinds of code caching to cache generated code to be reused later. The first is the in-memory cache that is available within each instance of V8. The code generated after the initial compile is stored into this cache, keyed on the source string. This is available for reuse within the same instance of V8. The other kind of code caching serializes the generated code and stores it on disk for future use. This cache is not specific to a particular instance of V8 and can be used across different instances of V8. This blog post focuses on this second kind of code caching as used in Chrome. (Other embedders also use this kind of code caching; it’s not limited to Chrome. However, this blog post only focuses on the usage in Chrome.)

Chrome stores the serialized generated code onto the disk cache and keys it with the URL of the script resource. When loading a script, Chrome checks the disk cache. If the script is already cached, Chrome passes the serialized data to V8 as a part of compile request. V8 then deserializes this data instead of parsing and compiling the script. There are also additional checks involved to ensure that the code is still usable (for example: a version mismatch makes the cached data unusable).

Real-world data shows that the code cache hit rates (for scripts that could be cached) is high (~86%). Though the cache hit rates are high for these scripts, the amount of code we cache per script is not very high. Our analysis showed that increasing the amount of code that is cached would reduce the time spent in parsing and compiling JavaScript code by around 40%.

Increasing the amount of code that is cached

In the previous approach, code caching was coupled with the requests to compile the script.

Embedders could request that V8 serialize the code it generated during its top-level compilation of a new JavaScript source file. V8 returned the serialized code after compiling the script. When Chrome requests the same script again, V8 fetches the serialized code from the cache and deserializes it. V8 completely avoids recompiling functions that are already in the cache. These scenarios are shown in the following figure:

V8 only compiles the functions that are expected to be immediately executed (IIFEs) during the top-level compile and marks other functions for lazy compilation. This helps improve page load times by avoiding compiling functions that are not required, however it means that the serialized data only contains the code for the functions that are eagerly compiled.

Prior to Chrome 59, we had to generate the code cache before any execution has started. The earlier baseline compiler of V8 (Full-codegen) generates specialized code for the execution context. Full-codegen used code patching to fast-path operations for the specific execution context. Such code cannot be serialized easily by removing the context specific data to be used in other execution contexts.

With the launch of Ignition in Chrome 59, this restriction is no longer necessary. Ignition uses data-driven inline caches to fast-path operations in the current execution context. The context-dependent data is stored in feedback vectors and is separate from the generated code. This has opened the possibility of generating code caches even after the execution of the script. As we execute the script, more functions (that were marked for lazy compile) are compiled, allowing us to cache more code.

V8 exposes a new API, ScriptCompiler::CreateCodeCache, to request code caches independent of the compile requests. Requesting code caches along with compile requests is deprecated and would not work in V8 v6.6 onwards. Since version 66, Chrome uses this API to request the code cache after the top-level execute. The following figure shows the new scenario of requesting the code cache. The code cache is requested after the top level execute and hence contains the code for functions that were compiled later during the execution of the script. In the later runs (shown as hot runs in the following figure), it avoids compilation of functions during top level execute.


The performance of this feature is measured using our internal real-world benchmarks. The following graph shows the reduction in the parse and compile time over the earlier caching scheme. There is a reduction of around 20–40% in both parse and compilation time on most of the pages.

Data from the wild shows similar results with a 20–40% reduction in the time spent in compiling JavaScript code both on desktop and mobile. On Android, this optimization also translates to a 1–2% reduction in the top-level page-load metrics like the time a webpage takes to become interactive. We also monitored the memory and disk usage of Chrome and did not see any noticeable regressions.

Posted by Mythri Alle, Chief Code Cacher

Tuesday, March 27, 2018

V8 release v6.6

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s Git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 6.6, which is in beta until its release in coordination with Chrome 66 Stable in several weeks. V8 v6.6 is filled with all sorts of developer-facing goodies. This post provides a preview of some of the highlights in anticipation of the release.

JavaScript language features

Function.prototype.toString() now returns exact slices of source code text, including whitespace and comments. Here’s an example comparing the old and the new behavior:

// Note the comment between the `function` keyword
// and the function name, as well as the space following
// the function name.
function /* a comment */ foo () {}

// Previously:
// → 'function foo() {}'
//             ^ no comment
//                ^ no space

// Now:
// → 'function /* comment */ foo () {}'

Line separator (U+2028) and paragraph separator (U+2029) symbols are now allowed in string literals, matching JSON. Previously, these symbols were treated as line terminators within string literals, and so using them resulted in a SyntaxError exception.

The catch clause of try statements can now be used without a parameter. This is useful if you don’t have a need for the exception object in the code that handles the exception.

try {
} catch { // → Look mom, no binding!

In addition to String.prototype.trim(), V8 now implements String.prototype.trimStart() and String.prototype.trimEnd(). This functionality was previously available through the non-standard trimLeft() and trimRight() methods, which remain as aliases of the new methods for backward compatibility.

const string = '  hello world  ';
// → 'hello world  '
// → '  hello world'
// → 'hello world'

The Array.prototype.values() method gives arrays the same iteration interface as the ES2015 Map and Set collections: all can now be iterated over by keys, values, or entries by calling the same-named method. This change has the potential to be incompatible with existing JavaScript code. If you discover odd or broken behavior on a website, please try to disable this feature via chrome://flags/#enable-array-prototype-values and file an issue.

Code caching after execution

The terms cold and warm load might be well-known for people concerned about loading performance. In V8, there is also the concept of a hot load. Let’s explain the different levels with Chrome embedding V8 as an example:

  • Cold load: Chrome sees the visited web page for the first time and does not have any data cached at all.
  • Warm load: Chrome remembers that the web page was already visited and can retrieve certain assets (e.g. images and script source files) from the cache. V8 recognizes that the page shipped the same script file already, and therefore caches the compiled code along with the script file in the disk cache.
  • Hot load: The third time Chrome visits the web page, when serving script file from the disk cache, it also provides V8 with the code cached during the previous load. V8 can use this cached code to avoid having to parse and compile the script from scratch.

Before V8 v6.6 we cached the generated code immediately after the top-level compile. V8 only compiles the functions that are known to be immediately executed during the top-level compile and marks other functions for lazy compilation. This meant that cached code only included top-level code, while all other functions had to be lazily compiled from scratch on each page load. Beginning with version 6.6, V8 caches the code generated after the script’s top-level execution. As we execute the script, more functions are lazily compiled and can be included in the cache. As a result, these functions don’t need to be compiled on future page loads, reducing compile and parse time in hot load scenarios by between 20–60%. The visible user change is a less congested main thread, thus a smoother and faster loading experience.

Look out for a detailed blog post on this topic soon.

Background compilation

For some time V8 has been able to parse JavaScript code on a background thread. With V8’s new Ignition bytecode interpreter that shipped last year, we were able to extend this support to also enable compilation of the JavaScript source to bytecode on a background thread. This enables embedders to perform more work off the main thread, freeing it up to execute more JavaScript and reduce jank. We enabled this feature in Chrome 66, where we see between 5% to 20% reduction on main-thread compilation time on typical websites. For more details, please see the recent blog post on this feature.

Removal of AST numbering

We have continued to reap benefits from simplifying our compilation pipeline after the Ignition and TurboFan launch last year. Our previous pipeline required a post-parsing stage called "AST Numbering", where nodes in the generated abstract syntax tree were numbered so that the various compilers using it would have a common point of reference.

Over time this post-processing pass had ballooned to include other functionality: numbering suspend point for generators and async functions, collecting inner functions for eager compilation, initializing literals or detecting unoptimizable code patterns.

With the new pipeline, the Ignition bytecode became the common point of reference, and the numbering itself was no longer required — but, the remaining functionality was still needed, and the AST numbering pass remained.

In V8 v6.6, we finally managed to move out or deprecate this remaining functionality into other passes, allowing us to remove this tree walk. This resulted in a 3-5% improvement in real-world compile time.

Asynchronous performance improvements

We managed to squeeze out some nice performance improvements for promises and async functions, and especially managed to close the gap between async functions and desugared promise chains.

In addition, the performance of async generators and async iteration was improved significantly, making them a viable option for the upcoming Node 10 LTS, which is scheduled to include V8 v6.6. As an example, consider the following Fibonacci sequence implementation:

async function* fibonacciSequence() {
  for (let a = 0, b = 1;;) {
    yield a;
    const c = a + b;
    a = b;
    b = c;

async function fibonacci(id, n) {
  for await (const value of fibonacciSequence()) {
    if (n-- === 0) return value;

We’ve measured the following improvements for this pattern, before and after Babel transpilation:

Finally, bytecode improvements to “suspendable functions” such as generators, async functions, and modules, have improved the performance of these functions while running in the interpreter, and decreased their compiled size. We’re planning on improving the performance of async functions and async generators even further with upcoming releases, so stay tuned.

Array performance improvements

The throughput performance of Array#reduce was increased by more than 10× for holey double arrays (see our blog post for an explanation what holey and packed arrays are). This widens the fast-path for cases where Array#reduce is applied to holey and packed double arrays.

Untrusted code mitigations

In V8 v6.6 we’ve landed more mitigations for side-channel vulnerabilities to prevent information leaks to untrusted JavaScript and WebAssembly code.

GYP is gone

This is the first V8 version that officially ships without GYP files. If your product needs the deleted GYP files, you need to copy them into your own source repository.

Memory profiling

Chrome’s DevTools can now trace and snapshot C++ DOM objects and display all reachable DOM objects from JavaScript with their references. This feature is one of the benefits of the new C++ tracing mechanism of the V8 garbage collector. For more information please have a look at the dedicated blog post.


Please use git log branch-heads/6.5..branch-heads/6.6 include/v8.h to get a list of the API changes.

Developers with an active V8 checkout can use git checkout -b 6.6 -t branch-heads/6.6 to experiment with the new features in V8 v6.6. Alternatively you can subscribe to Chrome’s Beta channel and try the new features out yourself soon.

Posted by the V8 team

Monday, March 26, 2018

Background compilation

TL;DR: Starting with Chrome 66, V8 compiles JavaScript source code on a background thread, reducing the amount of time spent compiling on the main thread by between 5% to 20% on typical websites.


Since version 41, Chrome has supported parsing of JavaScript source files on a background thread via V8’s StreamedSource API. This enables V8 to start parsing JavaScript source code as soon as Chrome has downloaded the first chunk of the file from the network, and to continue parsing in parallel while Chrome streams the file over the network. This can provide considerable loading time improvements since V8 can be almost finished parsing the JavaScript by the time the file has finished downloading.

However, due to limitations in V8’s original baseline compiler, V8 still needed to go back to the main thread to finalize parsing and compile the script into JIT machine code that would execute the script’s code. With the switch to our new Ignition + TurboFan pipeline, we are now able to move bytecode compilation to the background thread as well, thereby freeing up Chrome’s main-thread to deliver a smoother, more responsive web browsing experience.

Building a background thread bytecode compiler

V8’s Ignition bytecode compiler takes the abstract syntax tree (AST) produced by the parser as input and produces a stream of bytecode (BytecodeArray) along with associated meta-data which enables the Ignition interpreter to execute the JavaScript source.

Ignition’s bytecode compiler was built with multi-threading in mind, however a number of changes were required throughout the compilation pipeline to enable background compilation. One of the main changes was to prevent the compilation pipeline from accessing objects in V8’s JavaScript heap while running on the background thread. Objects in V8’s heap are not thread-safe, since Javascript is single-threaded, and might be modified by the main-thread or V8’s garbage collector during background compilation.

There were two main stages of the compilation pipeline which accessed objects on V8’s heap: AST internalization, and bytecode finalization. AST internalization is a process by which literal objects (strings, numbers, object-literal boilerplate, etc.) identified in the AST are allocated on the V8 heap, such that they can be used directly by the generated bytecode when the script is executed. This process traditionally happened immediately after the parser built the AST. As such, there were a number of steps later in the compilation pipeline that relied on the literal objects having been allocated. To enable background compilation we moved AST internalization later in the compilation pipeline, after the bytecode had been compiled. This required modifications to the later stages of the pipeline to access the raw literal values embedded in the AST instead of internalized on-heap values.

Bytecode finalization involves building the final BytecodeArray object, used to execute the function, alongside associated metadata — for example, a ConstantPoolArray which stores constants referred to by the bytecode, and a SourcePositionTable which maps the JavaScript source line and column numbers to bytecode offset. Since JavaScript is a dynamic language, these objects all need to live in the JavaScript heap to enable them to be garbage-collected if the JavaScript function associated with the bytecode is collected. Previously some of these metadata objects would be allocated and modified during bytecode compilation, which involved accessing the JavaScript heap. In order to enable background compilation, Ignition’s bytecode generator was refactored to keep track of the details of this metadata and defer allocating them on the JavaScript heap until the very final stages of compilation.

With these changes, almost all of the script’s compilation can be moved to a background thread, with only the short AST internalization and bytecode finalization steps happening on the main thread just before script execution.

Currently, only top-level script code and immediately invoked function expressions (IIFEs) are compiled on a background thread — inner functions are still compiled lazily (when first executed) on the main thread. We are hoping to extend background compilation to more situations in the future. However, even with these restrictions, background compilation leaves the main thread free for longer, enabling it to do other work such as reacting to user-interaction, rendering animations or otherwise producing a smoother more responsive experience.


We evaluated the performance of background compilation using our real-world benchmarking framework across a set of popular webpages.

The proportion of compilation that can happen on a background thread varies depending on the proportion of bytecode compiled during top-level streaming-script compilation verses being lazy compiled as inner functions are invoked (which must still occur on the main thread). As such, the proportion of time saved on the main thread varies, with most pages seeing between 5% to 20% reduction in main-thread compilation time.

Next steps

What’s better than compiling a script on a background thread? Not having to compile the script at all! Alongside background compilation we have also been working on improving V8’s code-caching system to expand the amount of code cached by V8, thereby speeding up page loading for sites you visit often. We hope to bring you updates on this front soon. Stay tuned!

Posted by Ross McIlroy, main thread defender

Thursday, March 1, 2018

Tracing from JS to the DOM and back again


Debugging memory leaks in Chrome 66 just became much easier. Chrome’s DevTools can now trace and snapshot C++ DOM objects and display all reachable DOM objects from JavaScript with their references. This feature is one of the benefits of the new C++ tracing mechanism of the V8 garbage collector.


A memory leak in a garbage collection system occurs when an unused object is not freed due to unintentional references from other objects. Memory leaks in web pages often involve interaction between JavaScript objects and DOM elements.

The following toy example shows a memory leak that happens when a programmer forgets to unregister an event listener. None of the objects referenced by the event listener can be garbage collected. In particular, the iframe window leaks together with the event listener.

// Main window:
const iframe = document.createElement('iframe');
iframe.src = 'iframe.html';
iframe.addEventListener('load', function() {
  const local_variable = iframe.contentWindow;
  function leakingListener() {
    // Do something with `local_variable`.
    if (local_variable) {}
  document.body.addEventListener('my-debug-event', leakingListener);
  // BUG: forgot to unregister `leakingListener`.

The leaking iframe window also keeps all its JavaScript objects alive.

// iframe.html:
class Leak {};
window.global_variable = new Leak();

It is important to understand the notion of retaining paths to find the root cause of a memory leak. A retaining path is a chain of objects that prevents garbage collection of the leaking object. The chain starts at a root object such as the global object of the main window. The chain ends at the leaking object. Each intermediate object in the chain has a direct reference to the next object in the chain. For example, the retaining path of the Leak object in the iframe looks as follows:

Figure 1: Retaining path of an object leaked via iframe and event listener.

Note that the retaining path crosses the JavaScript / DOM boundary (highlighted in green/red, respectively) two times. The JavaScript objects live in the V8 heap, while DOM objects are C++ objects in Chrome.

DevTools heap snapshot

We can inspect the retaining path of any object by taking a heap snapshot in DevTools. The heap snapshot precisely captures all objects on the V8 heap. Up until recently it had only approximate information about the C++ DOM objects. For instance, Chrome 65 shows an incomplete retaining path for the Leak object from the toy example:

Figure 2: Retaining path in Chrome 65.

Only the first row is precise: the Leak object is indeed stored in the global_variable of the iframe’s window object. Subsequent rows approximate the real retaining path and make debugging of the memory leak hard.

As of Chrome 66, DevTools traces through C++ DOM objects and precisely captures the objects and references between them. This is based on the powerful C++ object tracing mechanism that was introduced for cross-component garbage collection earlier. As a result, the retaining path in DevTools is actually correct now:

Figure 3: Retaining path in Chrome 66.

Under the hood: cross-component tracing

DOM objects are managed by Blink — the rendering engine of Chrome, which is responsible for translating the DOM into actual text and images on the screen. Blink and its representation of the DOM are written in C++ which means that the DOM cannot be directly exposed to JavaScript. Instead, objects in the DOM come in two halves: a V8 wrapper object available to JavaScript and a C++ object representing the node in the DOM. These objects have direct references to each other. Determining liveness and ownership of objects across multiple components, such as Blink and V8, is difficult because all involved parties need to agree on which objects are still alive and which ones can be reclaimed.

In Chrome 56 and older versions (i.e. until Mar 2017), Chrome used a mechanism called object grouping to determine liveness. Objects were assigned groups based on containment in documents. A group with all of its containing objects was kept alive as long as a single object was kept alive through some other retaining path. This made sense in the context of DOM nodes that always refer to their containing document, forming so-called DOM trees. However, this abstraction removed all of the actual retaining paths which made it hard to use for debugging as shown in Figure 2. In the case of objects that did not fit this scenario, e.g. JavaScript closures used as event listeners, this approach also became cumbersome and led to various bugs where JavaScript wrapper objects would prematurely get collected, which resulted in them being replaced by empty JS wrappers that would lose all their properties.

Starting from Chrome 57, this approach was replaced by cross-component tracing, which is a mechanism that determines liveness by tracing from JavaScript to the C++ implementation of the DOM and back. We implemented incremental tracing on the C++ side with write barriers to avoid any stop-the-world tracing jank we’ve been talking about in previous blog posts. Cross-component tracing does not only provide better latency but also approximates liveness of objects across component boundaries better and fixes several scenarios that used to cause leaks. On top of that, it allows DevTools to provide a snapshot that actually represents the DOM, as shown in Figure 3.

Try it out! We are happy to hear your feedback.

Posted by Ulan Degenbaev, Alexei Filippov, Michael Lippautz, and Hannes Payer — the fellowship of the DOM

Monday, February 12, 2018

Lazy deserialization

TL;DR: Lazy deserialization was recently enabled by default in V8 version 6.4, reducing V8’s memory consumption by over 500 KB per browser tab on average. Read on to find out more!

Introducing V8 snapshots

But first, let’s take a step back and have a look at how V8 uses heap snapshots to speed up creation of new Isolates (which roughly correspond to a browser tab in Chrome). My colleague Yang Guo gave a good introduction on that front in his article on custom startup snapshots:

The JavaScript specification includes a lot of built-in functionality, from math functions to a full-featured regular expression engine. Every newly-created V8 context has these functions available from the start. For this to work, the global object (for example, the window object in a browser) and all the built-in functionality must be set up and initialized into V8’s heap at the time the context is created. It takes quite some time to do this from scratch.

Fortunately, V8 uses a shortcut to speed things up: just like thawing a frozen pizza for a quick dinner, we deserialize a previously-prepared snapshot directly into the heap to get an initialized context. On a regular desktop computer, this can bring the time to create a context from 40 ms down to less than 2 ms. On an average mobile phone, this could mean a difference between 270 ms and 10 ms.

To recap: snapshots are critical for startup performance, and they are deserialized to create the initial state of V8’s heap for each Isolate. The size of the snapshot thus determines the minimum size of the V8 heap, and larger snapshots translate directly into higher memory consumption for each Isolate.

A snapshot contains everything needed to fully initialize a new Isolate, including language constants (e.g., the undefined value), internal bytecode handlers used by the interpreter, built-in objects (e.g., String), and the functions installed on built-in objects (e.g., String.prototype.replace) together with their executable Code objects.

Startup snapshot size in bytes from 2016-01 to 2017-09. The x-axis shows V8 revision numbers.

Over the past two years, the snapshot has nearly tripled in size, going from roughly 600 KB in early 2016 to over 1500 KB today. The vast majority of this increase comes from serialized Code objects, which have both increased in count (e.g., through recent additions to the JavaScript language as the language specification evolves and grows); and in size (built-ins generated by the new CodeStubAssembler pipeline ship as native code vs. the more compact bytecode or minimized JS formats).

This is bad news, since we’d like to keep memory consumption as low as possible.

Lazy deserialization

One of the major pain points was that we used to copy the entire content of the snapshot into each Isolate. Doing so was especially wasteful for built-in functions, which were all loaded unconditionally but may never have ended up being used.

This is where lazy deserialization comes in. The concept is quite simple: what if we were to only deserialize built-in functions just before they were called?

A quick investigation of some of the most popular websites showed this approach to be quite attractive: on average, only 30% of all built-in functions were used, with some sites only using 16%. This looked remarkably promising, given that most of these sites are heavy JS users and these numbers can thus be seen as a (fuzzy) lower bound of potential memory savings for the web in general.

As we began working on this direction, it turned out that lazy deserialization integrated very well with V8’s architecture and there were only a few, mostly non-invasive design changes necessary to get up and running:

  1. Well-known positions within the snapshot. Prior to lazy deserialization, the order of objects within the serialized snapshot was irrelevant since we’d only ever deserialize the entire heap at once. Lazy deserialization must be able to deserialize any given built-in function on its own, and therefore has to know where it is located within the snapshot.
  2. Deserialization of single objects. V8’s snapshots were initially designed for full heap deserialization, and bolting on support for single-object deserialization required dealing with a few quirks such as non-contiguous snapshot layout (serialized data for one object could be interspersed with data for other objects) and so-called backreferences (which can directly reference objects previously deserialized within the current run).
  3. The lazy deserialization mechanism itself. At runtime, the lazy deserialization handler must be able to a) determine which code object to deserialize, b) perform the actual deserialization, and c) attach the serialized code object to all relevant functions.

Our solution to the first two points was to add a new dedicated built-ins area to the snapshot, which may only contain serialized code objects. Serialization occurs in a well-defined order and the starting offset of each Code object is kept in a dedicated section within the built-ins snapshot area. Both back-references and interspersed object data are disallowed.

Lazy built-in deserialization is handled by the aptly named DeserializeLazy built-in, which is installed on all lazy built-in functions at deserialization time. When called at runtime, it deserializes the relevant Code object and finally installs it on both the JSFunction (representing the function object) and the SharedFunctionInfo (shared between functions created from the same function literal). Each built-in function is deserialized at most once.

In addition to built-in functions, we have also implemented lazy deserialization for bytecode handlers. Bytecode handlers are code objects that contain the logic to execute each bytecode within V8’s Ignition interpreter. Unlike built-ins, they neither have an attached JSFunction nor a SharedFunctionInfo. Instead, their code objects are stored directly in the dispatch table into which the interpreter indexes when dispatching to the next bytecode handler. Lazy deserialization is similar as to built-ins: the DeserializeLazy handler determines which handler to deserialize by inspecting the bytecode array, deserializes the code object, and finally stores the deserialized handler in the dispatch table. Again, each handler is deserialized at most once.


We evaluated memory savings by loading the top 1000 most popular websites using Chrome 65 on an Android device, with and without lazy deserialization.

On average, V8’s heap size decreased by 540 KB, with 25% of the tested sites saving more than 620 KB, 50% saving more than 540 KB, and 75% saving more than 420 KB.

Runtime performance (measured on standard JS benchmarks such as Speedometer, as well as a wide selection of popular websites) has remained unaffected by lazy deserialization.

Next steps

Lazy deserialization ensures that each Isolate only loads the built-in code objects that are actually used. That is already a big win, but we believe it is possible to go one step further and reduce the (built-in-related) cost of each Isolate to effectively zero.

We hope to bring you updates on this front later this year. Stay tuned!

Posted by Jakob Gruber (@schuay)