V8 JavaScript Engine: April 2016

Friday, April 29, 2016

ES6, ES7, and beyond

The V8 team places great importance on the evolution of JavaScript into an increasingly expressive and well-defined language that makes writing fast, safe, and correct web applications easy. In June 2015, the ES6 specification was ratified by the TC39 standards committee, making it the largest single update to the JavaScript language. New features include classes, arrow functions, promises, iterators / generators, proxies, well-known symbols, and additional syntactic sugar. TC39 has also increased the cadence of new specifications and released the candidate draft for ES7 in February 2016, to be ratified this summer. While not as expansive as the ES6 update due to the shorter release cycle, ES7 notably introduces the exponentiation operator and Array.prototype.includes().

Today we’ve reached an important milestone: V8 supports ES6 and ES7. You can use the new language features today in Chrome Canary, and they will ship by default in the M52 release of Chromium.

Given the nature of an evolving spec, the differences between various types of conformance tests, and the complexity of maintaining web compatibility, it can be difficult to determine when a certain version of ECMAScript is considered fully supported by a JavaScript engine. Read on for why spec support is more nuanced than version numbers, why proper tail calls are still under discussion, and what caveats remain at play.

An evolving spec

When TC39 decided to publish more frequent updates to the JavaScript specification, the most up-to-date version of the language became the master, draft version. Although versions of the ECMAScript spec are still produced yearly and ratified, V8 implements a combination of the most recently ratified version (e.g. ES6), certain features which are close enough to standardization that they are safe to implement (e.g. the exponentiation operator and Array.prototype.includes() from the ES7 candidate draft), and a collection of bug fixes and web compatibility amendments from more recent drafts. Part of the rationale for such an approach is that language implementations in browsers should match the specification, even if the it’s the specification that needs to be updated. In fact, the process of implementing a ratified version of the spec often uncovers many of the fixes and clarifications that comprise the next version of the spec.

Currently shipping parts of the evolving ECMAScript specification

For example, when implementing the ES6 RegExp sticky flag, the V8 team discovered that the semantics of the ES6 spec broke many existing sites (including all sites using versions 2.x.x of the the popular XRegExp library on npm). Since compatibility is a cornerstone of the web, engineers from the V8 and Safari JavaScriptCore teams proposed an amendment to the RegExp specification to fix the breakage, which was agreed upon by TC39. The amendment won't appear in a ratified version until ES8, but it's still a part of the ECMAScript language and we've implemented it in order to ship the RegExp sticky flag.

The continual refinement of the language specification and the fact that each version (including the yet-to-be-ratified draft) replaces, amends, and clarifies previous versions makes it tricky to understand the complexities behind ES6 and ES7 support. While it's impossible to state succinctly, it's perhaps most accurate to say that V8 supports compliance with the “continually maintained draft future ECMAScript standard”!

Measuring conformance

In an attempt to make sense of this specification complexity, there are a variety of ways to measure JavaScript engine compatibility with the ECMAScript standard. The V8 team, as well as other browser vendors, use the test262 test suite as the gold standard of conformance to the continually maintained draft future ECMAScript standard. This test suite is continually updated to match the spec and it provides 16,000 discrete functional tests for all the features and edge cases which make up a compatible, compliant implementation of JavaScript. Currently V8 passes approximately 98% of test262, and the remaining 2% are a handful of edge cases and future ES features not yet ready to be shipped.

Since it’s difficult to skim the enormous number of test262 tests, other conformance tests exist, such as the Kangax compatibility table. Kangax makes it easy to skim to see whether a particular feature (like arrow functions) has been implemented in a given engine, but doesn’t test all the conformance edge cases that test262 does. Currently, Chrome Canary scores a 98% on the Kangax table for ES6 and 100% on the sections of Kangax corresponding to ES7 (e.g. the sections labelled “2016 features” and “2016 misc” under the ESnext tab).

The remaining 2% of the Kangax ES6 table tests proper tail calls, a feature which has been implemented in V8, but deliberately turned off in Chrome Canary due to outstanding developer experience concerns detailed below. With the “Experimental JavaScript features” flag enabled, which forces this feature on, Canary scores 100% on the entirety of the Kangax table for ES6.

Proper Tail Calls

Proper tail calls have been implemented but not yet shipped given that a change to the feature is currently under discussion at TC39. ES6 specifies that strict mode function calls in tail position should never cause a stack overflow. While this is a useful guarantee for certain programming patterns, the current semantics have two problems. First, since the tail call elimination is implicit, it can be difficult for programmers to identify which functions are actually in tail call position. This means that developers may not discover misplaced attempted tail calls in their programs until they overflow the stack. Second, implementing proper tail calls requires eliding tail call stack frames from the stack, which loses information about execution flow. This in turn has two consequences:

It makes it more difficult to understand during debugging how execution arrived at a certain point since the stack contains discontinuities and
Error.prototype.stack contains less information about execution flow which may break telemetry software that collects and analyzes client-side errors.

Implementing a shadow stack can improve the readability of call stacks, but the V8 and DevTools teams believe that debugging is easiest, most reliable, and most accurate when the stack displayed during debugging is completely deterministic and always matches the true state of the actual virtual machine stack. Moreover, a shadow stack is too expensive performance-wise to turn on all the time.

For these reasons, the V8 team strongly support denoting proper tail calls by special syntax. There is a pending TC39 proposal called syntactic tail calls to specify this behavior, co-championed by committee members from Mozilla and Microsoft. We have implemented and staged proper tail calls as specified in ES6 and started implementing syntactic tail calls as specified in the new proposal. The V8 team plans to resolve the issue at the next TC39 meeting before shipping implicit proper tail calls or syntactic tail calls by default. You can test out each version in the meantime by using the V8 flags --harmony-tailcalls and --harmony-explicit-tailcalls.

Modules

One of the most exciting promises of ES6 is support for JavaScript modules to organize and separate different parts of an application into namespaces. ES6 specifies import and export declarations for modules, but not how modules are loaded into a JavaScript program. In the browser, loading behavior was recently specified by the new <script type="module"> tag. Although additional standardization work is needed to specify advanced dynamic module-loading APIs, Chromium support for module script tags is already in development. You can track implementation work on the launch bug and read more about experimental loader API ideas in the whatwg/loader repository.

ESnext and beyond

In the future, developers can expect ECMAScript updates to come in smaller, more frequent updates with shorter implementation cycles. The V8 team is already working to bring upcoming features such as async / await keywords, Object.values() / Object.entries(), String.prototype.padStart() / String.prototype.padEnd() and RegExp lookbehind to the runtime. Check back for more updates on our ESnext implementation progress and performance optimizations for existing ES6 and ES7 features.

We strive to continue evolving JavaScript and strike the right balance of implementing new features early, ensuring compatibility and stability of the existing web, and providing TC39 implementation feedback around design concerns. We look forward to seeing the incredible experiences developers will build with these new features.

-- Posted by the V8 team, ECMAScript Enthusiasts

Saturday, April 23, 2016

V8 Release 5.1

The first step in the V8 release process is a new branch from the git master immediately before Chromium branches for a Chrome Beta milestone (roughly every six weeks). Our newest release branch is V8 5.1, which will remain in beta until we release a stable build in conjunction with Chrome 51 Stable. Here’s a highlight of the new developer-facing features in this version of V8.

Improved ECMAScript support

V8 5.1 contains a number of changes towards compliance with the ES2017 draft spec.

Symbol.species

Array methods like Array.prototype.map construct instances of the subclass as its output, with the option to customize this by changing Symbol.species. Analogous changes are made to other built-in classes.

instanceof customization

Constructors can implement their own Symbol.hasInstance method, which overrides the default behavior.

Iterator closing

Iterators created as part of a for-of loop (or other built-in iteration, such as the spread operator) are now checked for a close method which is called if the loop terminates early. This can be used for clean-up duty after the iteration has finished.

RegExp subclassing exec method

RegExp subclasses can overwrite the exec method to change just the core matching algorithm, with the guarantee that this is called by higher level functions like String.prototype.replace.

Function name inference

Function names inferred for function expressions are now typically made available in the name property of functions, following the ES2015 formalization of these rules. This may change existing stack traces and provide different names from previous V8 versions. It also gives useful names to properties and methods with computed property names:

class Container {
   ...
   [Symbol.iterator]() { ... }
   ...
}
let c = new Container;
// Logs "[Symbol.iterator]".
console.log(c[Symbol.iterator].name);

Array.prototype.values

Analogous to other collection types, the values method on Array returns an iterator over the contents of the Array.

Performance improvements

Release 5.1 also brings a few notable performance improvements to the following JavaScript features:

Executing loops like for-in
Object.assign
Promise and RegExp instantiation
Calling Object.prototype.hasOwnProperty
Math.floor, Math.round and Math.ceil
Array.prototype.push
Object.keys
Array.prototype.join & Array.prototype.toString
Flattening repeat strings e.g. '.'.repeat(1000)

WASM

5.1 has a preliminary support for WASM. You can enable it via the flag --expose_wasm in d8. Alternatively you can try out the WASM demos with Chrome 51 (Beta Channel).

Memory

V8 implemented more slices of Orinoco:

Parallel young generation evacuation
Scalable remembered sets
Black allocation

The impact is reduced jank and memory consumption in times of need.

V8 API

Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.1 -t branch-heads/5.1' to experiment with the new features in V8 5.1. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Tuesday, April 12, 2016

Jank Busters Part Two: Orinoco

In a previous blog post, we introduced the problem of jank caused by garbage collection interrupting a smooth browsing experience. In this blog post we introduce three optimizations that lay the groundwork for a new garbage collector in V8, codenamed Orinoco. Orinoco is based on the idea that implementing a mostly parallel and concurrent garbage collector without strict generational boundaries will reduce garbage collection jank and memory consumption while providing high throughput. Instead of implementing Orinoco behind a flag as a separate garbage collector, we decided to ship features of Orinoco incrementally on V8 tip of tree to benefit users immediately. The three features discussed in this post are parallel compaction, parallel remembered set processing, and black allocation.

V8 implements a generational garbage collector where objects may move within the young generation, from the young to the old generation, and within the old generation. Moving objects is expensive since the underlying memory of objects needs to be copied to new locations and the pointers to those objects are also subject to updating. Figure 1 shows the phases and how they were executed before Orinoco. Essentially, objects were moved first and then pointers between those objects were updated afterwards, all in sequential order, resulting in observable jank.

Figure 1: Sequential moving of objects and updating pointers

V8 partitions its heap memory into fixed-size chunks, called pages, that are assigned to either young or old generation space. Objects are initially allocated in the young generation. Upon garbage collection, live objects are moved within the young generation once. Objects that survive another garbage collection are promoted to the old generation. For both phases, which we call collectively young generation evacuation, we parallelize the copying of memory based on pages. Within the young generation, moving objects always involves allocating memory on fresh pages (and releasing the old pages), leaving behind a compact memory layout. In the old generation this process happens in a slightly different manner, since dead memory leaves behind unusable holes (or fragmentation). Some of these holes can be reused via free lists, but others are left behind, requiring compaction to move live objects to a better packed (potentially new) page. Similar to the young generation this process is parallelized on page-level.

Since there are no dependencies between young generation evacuation and old generation compaction, Orinoco now performs these phases in parallel, as shown in Figure 2. The result of these improvements is a reduction of compaction time of 75% from ~7ms to under 2ms on average.

Figure 2: Parallel moving of objects and updating pointers

The second optimization introduced by Orinoco improves how garbage collection tracks pointers. When an object moves location on the heap, the garbage collector has to find all pointers that contain the old location of the moved object and update them with the new location. Since iterating through the heap to find the pointers would be very slow, V8 uses a data structure called a remembered set to keep track of all the interesting pointers on the heap. A pointer is interesting if it points to an object that may move during garbage collection. For example, all pointers from the old generation to the new generation are interesting because new generation objects move on every garbage collection. Pointers to objects in heavily fragmented pages are also interesting because these objects will move to other pages during compaction.

Previously, V8 implemented remembered sets as arrays of pointer addresses, or store buffers. There was one store buffer for the young generation and one for each of the fragmented old generation pages. The store buffer of a page contains addresses of all incoming pointers as shown in Figure 3. Entries are appended to a store buffer in a write barrier, which guards write operations in JavaScript code. This may result in duplicate entries since a store buffer may include a pointer multiple times and two different store buffers may include the same pointer. Duplicate entries make parallelization of the pointer update phase difficult because of the data race caused by two threads trying to update the same pointer.

Figure 3: Old remembered set

Orinoco removes this complexity by reorganizing the remembered set to simplify parallelization and make sure that threads get disjoint sets of pointers to update. Instead of storing incoming interesting pointers in an array, each page now stores the offsets of interesting pointers originating from that page in buckets of bitmaps as shown in Figure 4. Each bucket is either empty or points to a bitmap of a fixed length. A bit in the bitmap corresponds to a pointer offset in the page. If a bit is set then the pointer is interesting and is in the remembered set. Using this data-structure we can parallelize pointer updates based on pages. The absence of duplicate entries and the dense representation of pointers also allowed us to remove complex code for handling remembered set overflow. In our long running Gmail benchmark, this change reduced the maximum pause time of compacting garbage collection by 45% from 42ms to 23 ms.

Figure 4: New remembered set

The third optimization that Orinoco introduces is black allocation, an improvement to the marking phase of the garbage collector. Black allocation (shipped in V8 5.1) is a garbage collection technique in which all objects allocated in the old generation (e.g. pre-tenured allocations or promoted objects by the garbage collector) are marked black immediately in order to designate them as "live". The intuition behind black allocation is that objects allocated in the old generation are likely long living. Therefore, objects that were recently allocated in the old generation should at least survive the next old generation garbage collection, otherwise they were falsely promoted. After coloring newly allocated objects black the garbage collector will not visit them. We speed up coloring of black objects by allocating them on black pages where all all objects are black by default. Another benefit of black pages is that they do not have to be swept, since all objects allocated on them are (by definition) live. Black allocation speeds up incremental marking progress since marking work does not increase with new allocations. The impact of black allocation is clearly visible on the Octane Splay benchmark where the throughput and latency score improved by about 30% while using about 20% less memory due to faster marking progress and less garbage collection work overall.

We plan to roll out more Orinoco features soon. Stay tuned, we are still tinkering!

Posted by the jank busters: Ulan Degenbaev, Michael Lippautz, and Hannes Payer