Tuesday, January 10, 2017

Speeding up V8 Regular Expressions

This blog post covers V8's recent migration of RegExp's built-in functions from a self-hosted JavaScript implementation to one that hooks straight into our new code generation architecture based on TurboFan.

V8’s RegExp implementation is built on top of Irregexp, which is widely considered to be one of the fastest RegExp engines. While the engine itself encapsulates the low-level logic to perform pattern matching against strings, functions on the RegExp prototype such as RegExp.prototype.exec do the additional work required to expose its functionality to the user.

Historically, various components of V8 have been implemented in JavaScript. Until recently, regexp.js has been one of them, hosting the implementation of the RegExp constructor, all of its properties as well as its prototype’s properties.

Unfortunately this approach has disadvantages, including unpredictable performance and expensive transitions to the C++ runtime for low-level functionality. The recent addition of built-in subclassing in ES6 (allowing JavaScript developers to provide their own customized RegExp implementation) has resulted in a further RegExp performance penalty, even if the RegExp built-in is not subclassed. These regressions could not be be fully addressed in the self-hosted JavaScript implementation.

We therefore decided to migrate the RegExp implementation away from JavaScript.  However, preserving performance turned out to be more difficult than expected. An initial migration to a full C++ implementation was significantly slower, reaching only around 70% of the original implementation’s performance.  After some investigation, we found several causes:

  • RegExp.prototype.exec contains a couple of extremely performance-sensitive areas, most notably including the transition to the underlying RegExp engine, and construction of the RegExp result with its associated substring calls. For these, the JavaScript implementation relied on highly optimized pieces of code called 'stubs', written either in native assembly language or by hooking directly into the optimizing compiler pipeline. It is not possible to access these stubs from C++, and their runtime equivalents are significantly slower.
  • Accesses to properties such as RegExp's lastIndex can be expensive, possibly requiring lookups by name and traversal of the prototype chain. V8's optimizing compiler can often automatically replace such accesses with more efficient operations, while these cases would need to be handled explicitly in C++.
  • In C++, references to JavaScript objects must be wrapped in so-called Handles in order to cooperate with garbage collection. Handle management produces further overhead in comparison to the plain JavaScript implementation.

Our new design for the RegExp migration is based on the CodeStubAssembler, a mechanism that allows V8 developers to write platform-independent code which will later be translated into fast, platform-specific code by the same backend that is also used for the new optimizing compiler TurboFan. Using the CodeStubAssembler allows us to address all shortcomings of the initial C++ implementation. Stubs (such as the entry-point into the RegExp engine) can easily be called from the CodeStubAssembler. While fast property accesses still need to be explicitly implemented on so-called fast paths, such accesses are extremely efficient in the CodeStubAssembler. Handles simply do not exist outside of C++. And since the implementation now operates at a very low level, we can take further shortcuts such as skipping expensive result construction when it is not needed.

Results have been very positive. Our score on a substantial RegExp workload has improved by 15%, more than regaining our recent subclassing-related performance losses. Microbenchmarks (Figure 1) show improvements across the board, from 7% for RegExp.prototype.exec, up to 102% for RegExp.prototype[@@split].

Figure 1: RegExp speedup broken down by function.
So how can you, as a JavaScript developer, ensure that your RegExps are fast? If you are not interested in hooking into RegExp internals, make sure that neither the RegExp instance, nor its prototype is modified in order to get the best performance:
var re = /./g;
re.exec('');  // Fast path.
re.new_property = 'slow';
RegExp.prototype.new_property = 'also slow';
re.exec('');  // Slow path.
And while RegExp subclassing may be quite useful at times, be aware that subclassed RegExp instances require more generic handling and thus take the slow path:
class SlowRegExp extends RegExp {}
new SlowRegExp(".", "g").exec('');  // Slow path.
The full RegExp migration will be available in V8 5.7.

Posted by Jakob Gruber, Regular Software Engineer

Wednesday, December 21, 2016

How V8 measures real-world performance

Over the last year the V8 team has developed a new methodology to measure and understand real-world JavaScript performance. We’ve used the insights that we gleaned from it to change how the V8 team makes JavaScript faster. Our new real-world focus represents a significant shift from our traditional performance focus. We’re confident that as we continue to apply this methodology in 2017, it will significantly improve users’ and developers’ ability to rely on predictable performance from V8 for real-world JavaScript in both Chrome and Node.js.

The old adage “what gets measured gets improved” is particularly true in the world of JavaScript virtual machine (VM) development. Choosing the right metrics to guide performance optimization is one of the most important things a VM team can do over time. The following timeline roughly illustrates how JavaScript benchmarking has evolved since the initial release of V8:

Evolution of JavaScript benchmarks.

Historically, V8 and other JavaScript engines have measured performance using synthetic benchmarks. Initially, VM developers used microbenchmarks like SunSpider and Kraken. As the browser market matured a second benchmarking era began, during which they used larger but nevertheless synthetic test suites such as Octane and JetStream.

Microbenchmarks and static test suites have a few benefits: they’re easy to bootstrap, simple to understand, and able to run in any browser, making comparative analysis easy. But this convenience comes with a number of downsides. Because they include a limited number of test cases, it is difficult to design benchmarks which accurately reflect the characteristics of the web at large. Moreover, benchmarks are usually updated infrequently; thus, they tend to have a hard time keeping up with new trends and patterns of JavaScript development in the wild. Finally, over the years VM authors explored every nook and cranny of the traditional benchmarks, and in the process they discovered and took advantage of opportunities to improve benchmark scores by shuffling around or even skipping externally unobservable work during benchmark execution. This kind of benchmark-score-driven improvement and over-optimizing for benchmarks doesn’t always provide much user- or developer-facing benefit, and history has shown that over the long-term it’s very difficult to make an “ungameable” synthetic benchmark.

Measuring real websites: WebPageReplay & Runtime Call Stats

Given an intuition that we were only seeing one part of the performance story with traditional static benchmarks, the V8 team set out to measure real-world performance by benchmarking the loading of actual websites. We wanted to measure use cases that reflected how end users actually browsed the web, so we decided to derive performance metrics from websites like Twitter, Facebook, and Google Maps. Using a piece of Chrome infrastructure called WebPageReplay we were able to record and replay page loads deterministically.

In tandem, we developed a tool called Runtime Call Stats which allowed us to profile how different JavaScript code stressed different V8 components. For the first time, we had the ability not only to test V8 changes easily against real websites, but to fully understand how and why V8 performed differently under different workloads.

We now monitor changes against a test suite of approximately 25 websites in order to guide V8 optimization. In addition to the aforementioned websites and others from the Alexa Top 100, we selected sites which were implemented using common frameworks (React, Polymer, Angular, Ember, and more), sites from a variety of different geographic locales, and sites or libraries whose development teams have collaborated with us, such as Wikipedia, Reddit, Twitter, and Webpack. We believe these 25 sites are representative of the web at large and that performance improvements to these sites will be directly reflected in similar speedups for sites being written today by JavaScript developers.

For an in-depth presentation about the development of our test suite of websites and Runtime Call Stats, see the BlinkOn 6 presentation on real-world performance. You can even run the Runtime Call Stats tool yourself.

Making a real difference

Analyzing these new, real-world performance metrics and comparing them to traditional benchmarks with Runtime Call Stats has also given us more insight into how various workloads stress V8 in different ways.

From these measurements, we discovered that Octane performance was actually a poor proxy for performance on the majority of our 25 tested websites. You can see in the chart below: Octane’s color bar distribution is very different than any other workload, especially those for the real-world websites. When running Octane, V8’s bottleneck is often the execution of JavaScript code. However, most real-world websites instead stress V8’s parser and compiler. We realized that optimizations made for Octane often lacked impact on real-world web pages, and in some cases these optimizations made real-world websites slower.

Distribution of time running all of Octane, running the line-items of Speedometer and loading websites from our test suite on Chrome M57.

We also discovered that another benchmark was actually a better proxy for real websites. Speedometer, a WebKit benchmark that includes applications written in React, Angular, Ember, and other frameworks, demonstrated a very similar runtime profile to the 25 sites. Although no benchmark matches the fidelity of real web pages, we believe Speedometer does a better job of approximating the real-world workloads of modern JavaScript on the web than Octane.

Bottom line: A faster V8 for all

Over the course of the past year, the real-world website test suite and our Runtime Call Stats tool has allowed us to deliver V8 performance optimizations that speed up page loads across the board by an average of 10-20%. Given the historical focus on optimizing page load across Chrome, a double-digit improvement to the metric in 2016 is a significant achievement. The same optimizations also improved our score on Speedometer by 20-30%.

These performance improvements should be reflected in other sites written by web developers using modern frameworks and similar patterns of JavaScript. Our improvements to builtins such as Object.create and Function.prototype.bind, optimizations around the object factory pattern, work on V8’s inline caches, and ongoing parser improvements are intended to be generally applicable improvements to underlooked areas of JavaScript used by all developers, not just the representative sites we track.

We plan to expand our usage of real websites to guide V8 performance work. Stay tuned for more insights about benchmarks and script performance.

Posted by the V8 team

Thursday, December 15, 2016

V8 ❤️ Node.js

Node's popularity has been growing steadily over the last few years, and we have been working to make Node better. This blog post highlights some of the recent efforts in V8 and DevTools.

Debug Node.js in DevTools

You can now debug Node applications using the Chrome developer tools. The Chrome DevTools Team moved the source code that implements the debugging protocol from Chromium to V8, thereby making it easier for Node Core to stay up to date with the debugger sources and dependencies. Other browser vendors and IDEs use the Chrome debugging protocol as well, collectively improving the developer experience when working with Node.

ES6 Speed-ups

We are working hard on making V8 faster than ever. A lot of our recent performance work centers around ES6 features, including promises, generators, destructors, and rest/spread operators. Because the versions of V8 in Node 6.2 and onwards fully support ES6, Node developers can use new language features "natively", without polyfills. This means that Node developers are often the first to benefit from ES6 performance improvements. Similarly, they are often the first to recognize performance regressions. Thanks to an attentive Node community, we discovered and fixed a number of regressions, including performance issues with instanceof, buffer.length, long argument lists, and let/const.

Fixes for Node.js vm module and REPL coming

The vm module has had some long standing limitations. In order to address these issues properly, we have extended the V8 API to implement more intuitive behavior. We are excited to announce that the vm module improvements are one of the projects we’re supporting as mentors in Outreachy for the Node Foundation. We hope to see additional progress on this project and others in the near future.


With async functions, you can drastically simplify asynchronous code by rewriting program flow by awaiting promises sequentially. Async/await will land in Node with the next V8 update. Our recent work on improving the performance of promises and generators has helped make async functions fast. On a related note, we are also working on providing promise hooks, a set of introspection APIs needed for the Node AsyncHook API.

Want to try Bleeding Edge Node.js?

If you’re excited to test the newest V8 features in Node and don’t mind using bleeding edge, unstable software, you can try out our integration branch here. V8 is continuously integrated into Node before V8 hits Node master, so we can catch issues early. Be warned though, this is more experimental than Node master.

Posted by Franziska Hinkelmann, Node Monkey Patcher

Friday, December 2, 2016

V8 Release 5.6

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.6, which will be in beta until it is released in coordination with Chrome 56 Stable in several weeks. V8 5.6 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release.

Ignition and TurboFan pipeline for ES.next (and more) shipped

Starting with 5.6, V8 can optimize the entirety of the JavaScript language. Moreover, many language features are sent through a new optimization pipeline in V8. This pipeline uses V8’s Ignition interpreter as a baseline and optimizes frequently executed methods with V8’s more powerful TurboFan optimizing compiler. The new pipeline activates for new language features (e.g. many of the new features from the ES2015 and ES2016 specifications) or whenever Crankshaft (V8’s “classic” optimizing compiler) cannot optimize a method (e.g. try-catch, with).

Why are we only routing some JavaScript language features through the new pipeline? 

The new pipeline is better-suited to optimizing the whole spectrum of the JS language (past and present). It's a healthier, more modern codebase, and it has been designed specifically for real-world use cases including running V8 on low-memory devices.

We've started using the Ignition/TurboFan with the newest ES.next features we've added to V8 (ES.next = JavaScript features as specified in ES2015 and later) and will route more features through it as we continue improving its performance. In the middle term, the V8 team is aiming to switch all JavaScript execution in V8 to the new pipeline. However, as long as there are still real-world use cases where Crankshaft runs JavaScript faster than the new Ignition/TurboFan pipeline, for the short term we'll support both pipelines to ensure that JavaScript code running in V8 is as fast as possible in all situations.

So, why does the new pipeline use both the new Ignition interpreter and the new Turbofan optimizing compiler?

Running JavaScript fast and efficiently requires having multiple mechanisms, or tiers, under the hood in a JavaScript virtual machine to do the low-level busywork of execution. For example, it’s useful to have a first tier that starts executing code quickly, and then a second optimizing tier that spends longer compiling hot functions in order to maximize performance for longer-running code.

Ignition and TurboFan are V8’s two new execution tiers that are most effective when used together. Due to efficiency, simplicity and size considerations, TurboFan is designed to optimize JavaScript methods starting from the bytecode produced by V8's Ignition interpreter. By designing both components to work closely together, there are optimizations that can be made to both because of the presence of the other. As a result, starting with 5.6 all functions which will be optimized by TurboFan first run through the Ignition interpreter. Using this unified Ignition/TurboFan pipeline enables the optimization of features that were not optimizable in the past, since they now can take advantage of TurboFan's optimizations passes. For example, by routing Generators through both Ignition and TurboFan, Generators runtime performance has nearly tripled.

For more information on V8's journey to adopt Ignition and TurboFan please have a look at Benedikt's dedicated blog post.

Performance improvements

V8 5.6 delivers a number of key improvements in memory and performance footprint.

Memory-induced jank

Concurrent remembered set filtering was introduced: One step more towards Orinoco.

Greatly improved ES2015 performance

Developers typically start using new language features with the help of transpilers because of two challenges: backwards-compatibility and performance concerns.

V8's goal is to reduce the performance gap between transpilers and V8’s “native” ES.next performance in order to eliminate the latter challenge. We’ve made great progress in bringing the performance of new language features on-par with their transpiled ES5 equivalents. In this release you will find the the performance of ES2015 features is significantly faster than in previous V8 releases, and in some cases ES2015 feature performance is approaching that of transpiled ES5 equivalents.

Particularly the spread operator should now be ready to be used natively. Instead of writing ...
// Like Math.max, but returns 0 instead of -∞ for no arguments.
function specialMax(...args) {
    if (args.length === 0) return 0;
    return Math.max.apply(Math, args);
… you should now be able to write ...
function specialMax(...args) {
    if (args.length === 0) return 0;
    return Math.max(...args);
… and get similar performance results. In particular 5.6 includes speed-ups for the following micro-benchmarks:
See the chart below for a comparison between V8 5.4 and 5.6.

Comparing the ES2015 feature performance of V8 5.4 and 5.6
 Source:  https://fhinkel.github.io/six-speed/ (Cloned from http://kpdecker.github.io/six-speed/)

This is just the beginning, a lot more to follow in upcoming releases!

Language features

String.prototype.padStart / String.prototype.padEnd

String.prototype.padStart and String.prototype.padEnd are the latest stage 4 additions to ECMAScript. These library functions are officially shipped in 5.6.
Note: Unshipped again.

WebAssembly browser preview

Chromium 56 (which includes 5.6) is going to ship the WebAssembly browser preview. Please refer to the dedicated blog post for further information.


Please check out our summary of API changes. This document is regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.6 -t branch-heads/5.6' to experiment with the new features in V8 5.6. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Monday, October 31, 2016

WebAssembly Browser Preview

Today we’re happy to announce, in tandem with Firefox and Edge, a WebAssembly Browser Preview. WebAssembly or wasm is a new runtime and compilation target for the web, designed by collaborators from Google, Mozilla, Microsoft, Apple, and the W3C WebAssembly Community Group.

What does this milestone mark?

This milestone is significant because it marks:
  • a release candidate for our MVP (minimum viable product) design (including semantics, binary format, and JS API)
  • compatible and stable implementations of WebAssembly behind a flag on trunk in V8 and SpiderMonkey, in development builds of Chakra, and in progress in JavaScriptCore
  • a working toolchain for developers to compile WebAssembly modules from C/C++ source files
  • a roadmap to ship WebAssembly on-by-default barring changes based on community feedback 
You can read more about WebAssembly on the project site as well as follow our developers guide to test out WebAssembly compilation from C & C++ using Emscripten. The binary format and JS API documents outline the binary encoding of WebAssembly and the mechanism to instantiate WebAssembly modules in the browser, respectively. Here’s a quick sample to show what wasm looks like:

Raw Bytes
Text Format
C Source
02 40
03 40
20 00
0d 01
   get_local 0
   br_if 1
while (x != 0) {

20 00
21 02
   get_local 0
   set_local 2
 z = x;
20 01
20 00
21 00
   get_local 1
   get_local 0
   set_local 0

 x = y % x;
20 02
21 01
   get_local 2
   set_local 1

 y = z;
0c 00
   br 0

20 01
get_local 1
return y;

Greatest Common Divisor function

Since WebAssembly is still behind a flag in Chrome (chrome://flags/#enable-webassembly), it is not yet recommended for production use. However, the Browser Preview period marks a time during which we are actively collecting feedback on the design and implementation of the spec. Developers are encouraged to test out compiling and porting applications and running them in the browser.

V8 continues to optimize the implementation of WebAssembly in the TurboFan compiler. Since last March when we first announced experimental support, we’ve added support for parallel compilation. In addition, we’re nearing completion of an alternate asm.js pipeline, which converts asm.js to WebAssembly under the hood so that existing asm.js sites can reap some of the benefits of WebAssembly ahead-of-time compilation.

What's next?

Barring major design changes arising from community feedback, the WebAssembly Community Group plans to produce an official specification in Q1 2017, at which point browsers will be encouraged to ship WebAssembly on-by-default. From that point forward, the binary format will be reset to version 1 and WebAssembly will be versionless, feature-tested, and backwards-compatible. A more detailed roadmap can be found on the WebAssembly project site.

Monday, October 24, 2016

V8 Release 5.5

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.5, which will be in beta until it is released in coordination with Chrome 55 Stable in several weeks. V8 5.5 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release.

Language features

Async functions

In 5.5, V8 ships JavaScript ES2017 async functions, which makes it easier to write code that uses and creates Promises. Using async functions, waiting for a Promise to resolve is as simple as typing await before it and proceeding as if the value were synchronously available - no callbacks required. See this article for an introduction.

Here’s an example function which fetches a URL and returns the text of the response, written in a typical asynchronous, Promise-based style.
function logFetch(url) {
  return fetch(url)
    .then(response => response.text())
    .then(text => {
    }).catch(err => {
      console.error('fetch failed', err);
Here’s the same code rewritten to remove callbacks, using async functions.
async function logFetch(url) {
  try {
    const response = await fetch(url);
    console.log(await response.text());
  } catch (err) {
    console.log('fetch failed', err);

Performance improvements

V8 5.5 delivers a number of key improvements in memory footprint.


Memory consumption is an important dimension in the JavaScript virtual machine performance trade-off space. Over the last few releases, the V8 team analyzed and significantly reduced the memory footprint of several websites that were identified as representative of modern web development patterns. V8 5.5 reduces Chrome’s overall memory consumption by up to 35% on low-memory devices (compared to V8 5.3 in Chrome 53) due to reductions in the V8 heap size and zone memory usage. Other device segments also benefit from the zone memory reductions. Please have a look at the dedicated blog post to get a detailed view.


Please check out our summary of API changes. This document is regularly updated a few weeks after each major release. 

V8 inspector migrated

The V8 inspector was migrated from Chromium to V8. The inspector code now fully resides in the V8 repository.

Developers with an active V8 checkout can use 'git checkout -b 5.5 -t branch-heads/5.5' to experiment with the new features in V8 5.5. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Friday, October 7, 2016

Fall cleaning: Optimizing V8 memory consumption

Memory consumption is an important dimension in the JavaScript virtual machine performance trade-off space. Over the last few months the V8 team analyzed and significantly reduced the memory footprint of several websites that were identified as representative of modern web development patterns. In this blog post we present the workloads and tools we used in our analysis, outline memory optimizations in the garbage collector, and show how we reduced memory consumed by V8’s parser and its compilers.


In order to profile V8 and discover optimizations that have impact for the largest number of users, it is crucial to define workloads that are reproducible, meaningful, and simulate common real-world JavaScript usage scenarios. A great tool for this task is Telemetry, a performance testing framework that runs scripted website interactions in Chrome and records all server responses in order to enable predictable replay of these interactions in our test environment. We selected a set of popular news, social, and media websites and defined the following common user interactions for them:

A workload for browsing news and social websites:
  1. Open a popular news or social website, e.g. hackernews.
  2. Click on the first link.
  3. Wait until the new website is loaded.
  4. Scroll down a few pages.
  5. Click the back button.
  6. Click on the next link on the original website and repeat steps 3-6 a few times.
A workload for browsing media website:
  1. Open an item on a popular media website, e.g. a video on YouTube.
  2. Consume that item by waiting for a few seconds.
  3. Click on the next item and repeat steps 2-3 a few times.
Once a workflow is captured, it can be replayed as often as needed against a development version of Chrome, for example each time there is new version of V8. During playback, V8’s memory usage is sampled at fixed time intervals to obtain a meaningful average. The benchmarks can be found here.

Memory Visualization

One of the main challenges when optimizing for performance in general is to get a clear picture of internal VM state to track progress or weigh potential tradeoffs. For optimizing memory consumption, this means keeping accurate track of V8’s memory consumption during execution. There are two categories of memory that must be tracked: memory allocated to V8’s managed heap and memory allocated on the C++ heap. The V8 Heap Statistics feature is a mechanism used by developers working on V8 internals to get deep insight into both. When the --trace-gc-object-stats flag is specified when running Chrome (M54 or newer) or the d8 command line interface, V8 dumps memory-related statistics to the console. We built a custom tool, the v8 heap visualizer, to visualize this output. The tool shows a timeline-based view for both the managed and C++ heaps. The tool also provides a detailed breakdown of the memory usage of certain internal data types and size-based histograms for each of those types.

A common workflow during our optimization efforts involves selecting an instance type that takes up a large portion of the heap in the timeline view, as depicted in Figure 1. Once an instance type is selected, the tool then shows a distribution of uses of this type. In this example we selected V8’s internal FixedArray data structure, which is an untyped vector-like container used ubiquitously in all sorts of places in the VM. Figure 2 shows a typical FixedArray distribution, where we can see that the majority of memory can be attributed to a specific FixedArray usage scenario. In this case FixedArrays are used as the backing store for sparse JavaScript arrays (what we call DICTIONARY_ELEMENTS). With this information it is possible to refer back to the actual code and either verify whether this distribution is indeed the expected behavior or whether an optimization opportunity exists. We used the tool to identify inefficiencies with a number of internal types.

Figure 1: Timeline view of managed heap and off-heap memory

Figure 2: Distribution of instance type

Figure 3 shows C++ heap memory consumption, which consists primarily of zone memory (temporary memory regions used by V8 used for  a short period of time; discussed in more detail below).  Since zone memory is used most extensively by the V8 parser and compilers, the spikes correspond to parsing and compilation events. A well-behaved execution consists only of spikes, indicating that memory is freed as soon as it is no longer needed. In contrast, plateaus (i.e. longer periods of time with higher memory consumption) indicate that there is room for optimization.

Figure 3: Zone memory

Early adopters can also try out the integration into Chrome’s tracing infrastructure. Therefore you need to run the latest Chrome Canary with --track-gc-object-stats and capture a trace including the category v8.gc_stats. The data will then show up as V8.GC_Object_Stats event.

JavaScript Heap Size Reduction

There is an inherent trade-off between garbage collection throughput, latency, and memory consumption. For example, garbage collection latency (which causes user-visible jank) can be reduced by using more memory to avoid frequent garbage collection invocations. For low-memory mobile devices, i.e. devices with under 512M of RAM, prioritizing latency and throughput over memory consumption may result in out-of-memory crashes and suspended tabs on Android.

To better balance the right tradeoffs for these low-memory mobile devices, we introduced a special memory reduction mode which tunes several garbage collection heuristics to lower memory usage of the JavaScript garbage collected heap. 1) At the end of a full garbage collection, V8’s heap growing strategy determines when the next garbage collection will happen based on the amount of live objects with some additional slack. In memory reduction mode, V8 will use less slack resulting in less memory usage due to more frequent garbage collections. 2) Moreover this estimate is treated as a hard limit, forcing unfinished incremental marking work to finalize in the main garbage collection pause. Normally, when not in memory reduction mode, unfinished incremental marking work may result in going over this limit arbitrarily to trigger the main garbage collection pause only when marking is finished. 3) Memory fragmentation is further reduced by performing more aggressive memory compaction.

Figure 4 depicts some of the improvements on low memory devices since Chrome M53. Most noticeably, the average V8 heap memory consumption of the mobile New York Times benchmark reduced by about 66%. Overall, we observed a 50% reduction of average V8 heap size on this set of benchmarks.

Figure 4: V8 heap memory reduction since M53 on low memory devices

Another optimization introduced recently not only reduces memory on low-memory devices but beefier mobile and desktop machines. Reducing the V8 heap page size from 1M to 512KB results in a smaller memory footprint when not many live objects are present and lower overall memory fragmentation up to 2x. It also allows V8 to perform more compaction work since smaller work chunks allow more work to be done in parallel by the memory compaction threads.

Zone Memory Reduction

In addition to the JavaScript heap, V8 uses off-heap memory for internal VM operations. The largest chunk of memory is allocated through memory areas called zones. Zones are a type of  region-based memory allocator which enables fast allocation and bulk deallocation where all zone allocated memory is freed at once when the zone is destroyed. Zones are used throughout V8’s parser and compilers. 

One of the major improvements in M55 comes from reducing memory consumption during background parsing. Background parsing allows V8 to parse scripts while a page is being loaded. The memory visualization tool helped us discover that the background parser would keep an entire zone alive long after the code was already compiled. By immediately freeing the zone after compilation, we reduced the lifetime of zones significantly which resulted in reduced average and peak memory usage.

Another improvement results from better packing of fields in abstract syntax tree nodes generated by the parser. Previously we relied on the C++ compiler to pack fields together where possible. For example, two booleans just require two bits and should be located within one word or within the unused fraction of the previous word. The C++ compiler doesn’t not always find the most compressed packing, so we instead manually pack bits. This not only results in reduced peak memory usage, but also improved parser and compiler performance.

Figure 5 shows the peak zone memory improvements since M54 which reduced by about 40% on average over the measured websites. 

Figure 5: V8 peak zone memory reduction since M54 on desktop

Over the next months we will continue our work on reducing the memory footprint of V8. We have more zone memory optimizations planned for the parser and we plan to focus on devices ranging from 512M-1G of memory.

Update: All the improvements discussed above reduce the Chrome 55 overall memory consumption by up to 35% on low-memory devices compared to Chrome 53.  Other device segments will only benefit from the zone memory improvements.

Posted by the V8 Memory Sanitation Engineers Ulan Degenbaev, Michael Lippautz, Hannes Payer, and Toon Verwaest.