The JavaScript specification includes a lot of built-in functionality, from math functions to a full-featured regular expression engine. Every newly-created V8 context has these functions available from the start. For this to work, the global object (for example, the window object in a browser) and all the built-in functionality must be set up and initialized into V8’s heap at the time the context is created. It takes quite some time to do this from scratch.
Fortunately, V8 uses a shortcut to speed things up: just like thawing a frozen pizza for a quick dinner, we deserialize a previously-prepared snapshot directly into the heap to get an initialized context. On a regular desktop computer, this can bring the time to create a context from 40 ms down to less than 2 ms. On an average mobile phone, this could mean a difference between 270 ms and 10 ms.
Applications other than Chrome that embed V8 may require more than vanilla Javascript. Many load additional library scripts at startup, before the “actual” application runs. For example, a simple TypeScript VM based on V8 would have to load the TypeScript compiler on startup in order to translate TypeScript source code into JavaScript on-the-fly.
As of the release of V8 4.3 two months ago, embedders can utilize snapshotting to skip over the startup time incurred by such an initialization.. The test case for this feature shows how this API works.
To create a snapshot, we can call v8::V8::CreateSnapshotDataBlob with the to-be-embedded script as a null-terminated C string. After creating a new context, this script is compiled and executed. In our example, we create two custom startup snapshots, each of which define functions on top of what JavaScript already has built in.
We can then use v8::Isolate::CreateParams to configure a newly-created isolate so that it initializes contexts from a custom startup snapshot. Contexts created in that isolate are exact copies of the one from which we took a snapshot. The functions defined in the snapshot are available without having to define them again.
There is an important limitation to this: the snapshot can only capture V8’s heap. Any interaction from V8 with the outside is off-limits when creating the snapshot. Such interactions include:
defining and calling API callbacks (i.e. functions created via v8::FunctionTemplate)
creating typed arrays, since the backing store may be allocated outside of V8
And of course, values derived from sources such as Math.random or Date.now are fixed once the snapshot has been captured. They are no longer really random nor reflect the current time.
Limitations aside, startup snapshots remain a great way to save time on initialization. We can shave off 100ms from the startup spent on loading the TypeScript compiler in our example above (on a regular desktop computer). We're looking forward to seeing how you might put custom snapshots to use!
Posted by Yang Guo, Software Engineer and engine pre-heater supplier
Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 4.6, which will be in beta until it is released in coordination with Chrome 46 Stable. V8 4.6 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.
The spread operator makes it much more convenient to work with arrays. For example it makes imperative code obsolete when you simply want to merge arrays.
// Merging arrays
// Code without spread operator
let inner = [3, 4];
let merged = [0, 1, 2].concat(inner, [5]);
// Code with spread operator
let inner = [3, 4];
let merged = [0, 1, 2, ...inner, 5];
Another good use of the spread operator to replace apply().
// Function parameters stored in an array
// Code without spread operator
function myFunction(a, b, c) {
console.log(a);
console.log(b);
console.log(c);
}
let argsInArray = ["Hi ", "Spread ", "operator!"];
myFunction.apply(null, argsInArray);
// Code with spread operator
function myFunction (a,b,c) {
console.log(a);
console.log(b);
console.log(c);
}
let argsInArray = ["Hi ", "Spread ", "operator!"];
myFunction(...argsInArray);
new.target
new.target is one of ES6's features designed to improve working with classes. Under the hood it’s actually an implicit parameter to every function. If a function is called with the keyword new, then the parameter holds a reference to the called function. If new is not used the parameter is undefined.
In practice, this means that you can use new.target to figure out whether a function was called normally or constructor-called via the new keyword.
function myFunction() {
if (new.target === undefined) {
throw "Try out calling it with new.";
}
console.log("Works;");
}
// Will break
myFunction();
// Will work
let a = new myFunction();
When ES6 classes and inheritance are used, new.target inside the constructor of a super-class is bound to the derived constructor that was invoked with new. In particular, this gives super-classes access to the prototype of the derived class during construction.
Reduce the jank
Jank can be a pain, especially when playing a game. Often, it's even worse when the game features multiple players. oortonline.gl is a WebGL benchmark that tests the limits of current browsers by rendering a complex 3D scene with particle effects and modern shader rendering. The V8 team set off in a quest to push the limits of Chrome’s performance in these environments. We’re not done yet, but the fruits of our efforts are already paying off. Chrome 46 shows incredible advances in oortonline.gl performance which you can see yourself below.
TypedArrays are heavily used in rendering engines such as Turbulenz (the engine behind oortonline.gl). For example, engines often create typed arrays (such as Float32Array) in JavaScript and pass them to WebGL after applying transformations.
The key point was optimizing the interaction between the embedder (Blink) and V8.
There’s no need to create additional handles (that are also tracked by V8) for typed arrays when they are passed to WebGL as part of a one-way communication.
On hitting external (Blink) allocated memory limits we now start an incremental garbage collection instead of a full one.
Freeing of unused memory chunks is performed on additional threads concurrent to the main thread which significantly reduces the main garbage collection pause time.
The good thing is that all changes related to oortonline.gl are general improvements that potentially affect all users of applications that make heavy use of WebGL.
V8 API
Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.
Developers with an active V8 checkout can use 'git checkout -b 4.6 -t branch-heads/4.6' to experiment with the new features in V8 4.6. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.
JavaScript performance continues to be one of the key aspects of Chrome's values, especially when it comes to enabling a smooth experience. Starting in Chrome 41, V8 takes advantage of a new technique to increase the responsiveness of web applications by hiding expensive memory management operations inside of small, otherwise unused chunks of idle time. As a result, web developers should expect smoother scrolling and buttery animations with much reduced jank due to garbage collection.
Many modern language engines such as Chrome's V8 JavaScript engine dynamically manage memory for running applications so that developers don't need to worry about it themselves. The engine periodically passes over the memory allocated to the application, determines which data is no longer needed, and clears it out to free up room. This process is known as garbage collection.
In Chrome, we strive to deliver a smooth, 60 frames per second (FPS) visual experience. Although V8 already attempts to perform garbage collection in small chunks, larger garbage collection operations can and do occur at unpredictable times -- sometimes in the middle of an animation -- pausing execution and preventing Chrome from hitting that 60 FPS goal.
Chrome 41 included a task scheduler for the Blink rendering engine which enables prioritization of latency-sensitive tasks to ensure Chrome remains responsive and snappy. As well as being able to prioritize work, this task scheduler has centralized knowledge of how busy the system is, what tasks need to be performed and how urgent each of these tasks are. As such, it can estimate when Chrome is likely to be idle and roughly how long it expects to remain idle.
An example of this occurs when Chrome is showing an animation on a web page. The animation will update the screen at 60 FPS, giving Chrome around 16.6 ms of time to perform the update. As such, Chrome will start work on the current frame as soon as the previous frame has been displayed, performing input, animation and frame rendering tasks for this new frame. If Chrome completes all this work in less than 16.6 ms, then it has nothing else to do for the remaining time until it needs to start rendering the next frame. Chrome’s scheduler enables V8 to take advantage of this idle time period by scheduling special idle tasks when Chrome would otherwise be idle.
Figure 1: Frame rendering with idle tasks.
Idle tasks are special low-priority tasks which are run when the scheduler determines it is in an idle period. Idle tasks are given a deadline which is the scheduler's estimate of how long it expects to remain idle. In the animation example in Figure 1, this would be the time at which the next frame should start being drawn. In other situations (e.g., when no on-screen activity is happening) this could be the time when the next pending task is scheduled to be run, with an upper bound of 50 ms to ensure that Chrome remains responsive to unexpected user input. The deadline is used by the idle task to estimate how much work it can do without causing jank or delays in input response.
Garbage collection done in the idle tasks are hidden from critical, latency-sensitive operations. This means that these garbage collection tasks are done for “free”. In order to understand how V8 does this, it is worth reviewing V8’s current garbage collection strategy.
Deep dive into V8’s garbage collection engine
V8 uses a generational garbage collector with the Javascript heap split into a small young generation for newly allocated objects and a large old generation for long living objects. Since most objects die young, this generational strategy enables the garbage collector to perform regular, short garbage collections in the smaller young generation (known as scavenges), without having to trace objects in the old generation.
The young generation uses a semi-space allocation strategy, where new objects are initially allocated in the young generation’s active semi-space. Once that semi-space becomes full, a scavenge operation will move live objects to the other semi-space. Objects which have been moved once already are promoted to the old generation and are considered to be long-living. Once the live objects have been moved, the new semi-space becomes active and any remaining dead objects in the old semi-space are discarded.
The duration of a young generation scavenge therefore depends on the size of live objects in the young generation. A scavenge will be fast (<1 ms) when most of the objects become unreachable in the young generation. However, if most objects survive a scavenge, the duration of the scavenge may be significantly longer.
A major collection of the whole heap is performed when the size of live objects in the old generation grows beyond a heuristically-derived limit. The old generation uses a mark-and-sweep collector with several optimizations to improve latency and memory consumption. Marking latency depends on the number of live objects that have to be marked, with marking of the whole heap potentially taking more than 100 ms for large web applications. In order to avoid pausing the main thread for such long periods, V8 has long had the ability to incrementally mark live objects in many small steps, with the aim to keep each marking steps below 5 ms in duration.
After marking, the free memory is made available again for the application by sweeping the whole old generation memory. This task is performed concurrently by dedicated sweeper threads. Finally, memory compaction is performed to reduce memory fragmentation in the old generation. This task may be very time-consuming and is only performed if memory fragmentation is an issue.
In summary, there are four main garbage collection tasks:
Young generation scavenges, which usually are fast
Marking steps performed by the incremental marker, which can be arbitrarily long depending on the step size
Full garbage collections, which may take a long time
Full garbage collections with aggressive memory compaction, which may take a long time, but clean up fragmented memory
In order to perform these operations in idle periods, V8 posts garbage collection idle tasks to the scheduler. When these idle tasks are run they are provided with a deadline by which they should complete. V8’s garbage collection idle time handler evaluates which garbage collection tasks should be performed in order to reduce memory consumption, while respecting the deadline to avoid future jank in frame rendering or input latency.
The garbage collector will perform a young generation scavenge during an idle task if the application’s measured allocation rate shows that the young generation may be full before the next expected idle period. Additionally, it calculates the average time taken by recent scavenge tasks in order to predict the duration of future scavenges and ensure that it doesn’t violate idle task deadlines.
When the size of live objects in the old generation is close to the heap limit, incremental marking is started. Incremental marking steps can be linearly scaled by the number of bytes that should be marked. Based on the average measured marking speed, the garbage collection idle time handler tries to fit as much marking work as possible into a given idle task.
A full garbage collection is scheduled during an idle tasks if the old generation is almost full and if the deadline provided to the task is estimated to be long enough to complete the collection. The collection pause time is predicted based on the marking speed multiplied by the number of allocated objects. Full garbage collections with additional compaction are only performed if the webpage has been idle for a significant amount of time.
Performance evaluation
In order to evaluate the impact of running garbage collection during idle time, we used Chrome’s Telemetry performance benchmarking framework to evaluate how smoothly popular websites scroll while they load. We benchmarked the top 25 sites on a Linux workstation as well as typical mobile sites on an Android Nexus 6 smartphone, both of which open popular webpages (including complex webapps such as Gmail, Google Docs and YouTube) and scroll their content for a few seconds. Chrome aims to keep scrolling at 60 FPS for a smooth user experience.
Figure 2 shows the percentage of garbage collection that was scheduled during idle time. The workstation’s faster hardware results in more overall idle time compared to the Nexus 6, thereby enabling a greater percentage of garbage collection to be scheduled during this idle time (43% compared to 31% on the Nexus 6) resulting in about 7% improvement on our jank metric.
Figure 2: The percentage of garbage collection that occurs during idle time.
As well as improving the smoothness of page rendering, these idle periods also provide an opportunity to perform more aggressive garbage collection when the page becomes fully idle. Recent improvements in Chrome 45 take advantage of this to drastically reduce the amount of memory consumed by idle foreground tabs. Figure 3 shows a sneak peek at how memory usage of Gmail’s JavaScript heap can be reduced by about 45% when it becomes idle, compared to the same page in Chrome 43.
Figure 3: Memory usage for latest version of Chrome 45 (left) vs Chrome 43 on Gmail.
These improvements demonstrate that it is possible to hide garbage collection pauses by being smarter about when expensive garbage collection operations are performed. Web developers no longer have to fear the garbage collection pause, even when targeting silky smooth 60 FPS animations. Stay tuned for more improvements as we push the bounds of garbage collection scheduling.
Posted by Hannes Payer and Ross McIlroy, Idle Garbage Collectors
V8 uses just-in-time compilation (JIT) to execute Javascript code. This means that immediately prior to running a script, it has to be parsed and compiled - which can cause considerable overhead. As we announced recently, code caching is a technique that lessens this overhead. When a script is compiled for the first time, cache data is produced and stored. The next time V8 needs to compile the same script, even in a different V8 instance, it can use the cache data to recreate the compilation result instead of compiling from scratch. As a result the script is executed much sooner.
Code caching has been available since V8 version 4.2 and not limited to Chrome alone. It is exposed through V8’s API, so that every V8 embedder can take advantage of it. The test case used to exercise this feature serves as an example of how to use this API.
When a script is compiled by V8, cache data can be produced to speed up later compilations by passing v8::ScriptCompiler::kProduceCodeCache as an option. If the compilation succeeds, the cache data is attached to the source object and can be retrieved via v8::ScriptCompiler::Source::GetCachedData. It can then be persisted for later, for example by writing it to disk.
During later compilations, the previously produced cache data can be attached to the source object and passed v8::ScriptCompiler::kConsumeCodeCache as an option. This time, code will be produced much faster, as V8 bypasses compiling the code and deserializes it from the provided cache data.
Producing cache data comes at a certain computational and memory cost. For this reason, Chrome only produces cache data if the same script is seen at least twice within a couple of days. This way Chrome is able to turn script files into executable code twice as fast on average, saving users valuable time on each subsequent page load.
Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 4.5, which will be in beta until it is released in coordination with Chrome 45 Stable. V8 4.5 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.
All of the new methods on Arrays and TypedArrays that are specified in ES2015 are now supported in 4.5. They make working with Arrays and TypedArrays more convenient. Among the methods added are Array.from and Array.of. Methods which mirror most Array methods on each kind of TypedArray were added as well.
Object.assign
Object.assign enables developers to quickly merge and clone objects.
1
2
3
4
5
let target = {a:"Hello, "};
let source = {b:"world!"};
// Objects are mergedObject.assign(target, source);
console.log(target.a + target.b);
This feature can also be used to mix in functionality.
More JavaScript language features are “optimizable”
For many years, V8’s traditional optimizing compiler, Crankshaft, has done a great job of optimizing many common JavaScript patterns. However, it never had the capability to support the entire JavaScript language, and using certain language features in a function—such as “try/catch” and “with”—would prevent it from being optimized. V8 would have to fall back to its slower, baseline compiler for that function.
One of the design goals of V8’s new optimizing compiler, TurboFan, is to be able to eventually optimize all of JavaScript, including ECMAScript 2015 features. In V8 4.5, we’ve started using TurboFan to optimize some of the language features that are not supported by Crankshaft: ‘for-of’, ‘class’, ‘with’ and computed property names.
Here is an example of code that uses 'for-of', which can now be compiled by TurboFan:
1
2
3
4
5
6
let sequence = ["First", "Second", "Third"];
for (let value of sequence) {
// This scope is now optimizablelet object = {a:"Hello, ", b:"world!", c: value};
console.log(object.a + object.b + object.c);
}
Although initially functions that use these language features won't reach the same peak performance as other code compiled by Crankshaft, TurboFan can now speed them up well beyond our current baseline compiler. Even better, performance will continue to improve quickly as we develop more optimizations for TurboFan.
V8 API
Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.
Developers with an active V8 checkout can use 'git checkout -b 4.5 -t branch-heads/4.5' to experiment with the new features in V8 4.5. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.
Last week we announced that we've turned on TurboFan for certain types of JavaScript. In this post we wanted to dig deeper into the design of TurboFan.
Performance has always been at the core of V8’s strategy. TurboFan combines a cutting-edge intermediate representation with a multi-layered translation and optimization pipeline to generate better quality machine code than what was previously possible with the CrankShaft JIT. Optimizations in TurboFan are more numerous, more sophisticated, and more thoroughly applied than in CrankShaft, enabling fluid code motion, control flow optimizations, and precise numerical range analysis, all of which were more previously unattainable.
A layered architecture
Compilers tend to become complex over time as new language features are supported, new optimizations are added, and new computer architectures are targeted. With TurboFan, we've taken lessons from many compilers and developed a layered architecture to allow the compiler to cope with these demands over time. A clearer separation between the source-level language (JavaScript), the VM's capabilities (V8), and the architecture's intricacies (from x86 to ARM to MIPS) allows for cleaner and more robust code. Layering allows those working on the compiler to reason locally when implementing optimizations and features, as well as write more effective unit tests. It also saves code. Each of the 7 target architectures supported by TurboFan requires fewer than 3,000 lines of platform-specific code, versus 13,000-16,000 in CrankShaft. This enabled engineers at ARM, Intel, MIPS, and IBM to contribute to TurboFan in a much more effective way. TurboFan is able to more easily support all of the coming features of ES6 because its flexible design separates the JavaScript frontend from the architecture-dependent backends.
More sophisticated optimizations
The TurboFan JIT implements more aggressive optimizations than CrankShaft through a number of advanced techniques. JavaScript enters the compiler pipeline in a mostly unoptimized form and is translated and optimized to progressively lower forms until machine code is generated. The centerpiece of the design is a more relaxed sea-of-nodes internal representation (IR) of the code which allows more effective reordering and optimization.
Example TurboFan graph
Numerical range analysis helps TurboFan understand number-crunching code much better. The graph-based IR allows most optimizations to be expressed as simple local reductions which are easier to write and test independently. An optimization engine applies these local rules in a systematic and thorough way. Transitioning out of the graphical representation involves an innovative scheduling algorithm that makes use of the reordering freedom to move code out of loops and into less frequently executed paths. Finally, architecture-specific optimizations like complex instruction selection exploit features of each target platform for the best quality code.
Delivering a new level of performance
We're already seeing some great speedups with TurboFan, but there's still a ton of work to do. Stay tuned as we enable more optimizations and turn TurboFan on for more types of code!
Posted by Ben L. Titzer, Software Engineer and TurboFan Mechanic
Historically, the V8 team has published blog posts on the Chromium blog that appeal to the wider web development audience. In order to deliver more in-depth posts especially targeted to the hardcore coder and virtual machine enthusiasts out there, we've spun off this V8-specific blog. Big-tickets items will continue to be posted to the Chromium blog, but we hope you'll stay tuned here for even more posts about advanced compiler techniques, low-level performance optimizations, and all the technical bits that make V8 tick.