Self-updating scripts

Updates: Philip Tellis deployed this as part of Log-Normal’s snippet and found two problems. One is fixed and the other is still being investigated:
The previous code had a race condition when beacon.js called document.body.appendChild before BODY existed. This was fixed.
Some users reported the update.php iframe was opened in a new tab in IE8. I can’t repro this bug but am investigating.
Analyzing your site using Page Speed or YSlow often produces lower scores than you might expect due to 3rd party resources with short cache times. 3rd party snippet owners use short cache times so that users receive updates in a timely fashion, even if this means slowing down the site owner’s page.

Stoyan and I were discussing this and wondered if there was a way to have longer cache times and update resources when necessary. We came up with a solution. It’s simple and reliable. Adopting this pattern will reduce unnecessary HTTP requests resulting in faster pages and happier users, as well as better Page Speed and YSlow scores.

Long cache & revving URLs
Caching is an important best practice for making websites load faster. (If you’re already familiar with caching and 304s you may want to skip to the self-updating section.) Caching is easily achieved by giving resources an expiration date far in the future using the Cache-Control response header. For example, this tells the browser that the response can be cached for 1 year:

Cache-Control: max-age=31536000
But what happens if you make changes to the resource before the year is over? Users who have the old version in their cache won’t get the new version until the resource expires, meaning it would take 1 year to update all users. The simple answer is for the developer to change the resource’s URL. Often this is done by adding a “fingerprint” to the path, such as the source control version number, file timestamp, or checksum. Here’s an example for a script from Facebook:

http://static.ak.fbcdn.net/rsrc.php/v1/yx/r/N-kcJF3mlg6.js
It’s likely that if you compare the resource URLs for major websites over time you’ll see these fingerprints changing with each release. Using the HTTP Archive we see how the URL changes for Facebook’s main script:

http://static.ak.fbcdn.net/rsrc.php/v1/y2/r/UVaDehc7DST.js (March 1)
http://static.ak.fbcdn.net/rsrc.php/v1/y-/r/Oet3o2R_9MQ.js (March 15)
http://static.ak.fbcdn.net/rsrc.php/v1/yS/r/B-e2tX_mUXZ.js (April 1)
http://static.ak.fbcdn.net/rsrc.php/v1/yx/r/N-kcJF3mlg6.js (April 15)
Facebook sets a 1 year cache time for this script, so when they make changes they rev the URL to make sure all users get the new version immediately. Setting long cache times and revving the URL is a common solution for websites focused on performance. Unfortunately, this isn’t possible when it comes to 3rd party snippets.

Snippets don’t rev
Revving a resource’s URL is an easy solution for getting updates to the user when it comes to the website’s own resources. The website owner knows when there’s an update and since they own the web page they can change the resource URL.

3rd party snippets are a different story. In most cases, 3rd party snippets contain the URL for a bootstrap script. For example, here’s the Tweet Button snippet:

<a href="https://twitter.com/share" class="twitter-share-button"
data-lang="en">Tweet</a>
<script>
!function(d,s,id){
    var js,fjs=d.getElementsByTagName(s)[0];
    if(!d.getElementById(id)){
        js=d.createElement(s); js.id=id;
        js.src="//platform.twitter.com/widgets.js";
        fjs.parentNode.insertBefore(js,fjs);
}}(document,"script","twitter-wjs");
</script>
Website owners paste this snippet code into their pages. In the event of an emergency update, the Twitter team can’t rev the widgets.js URL because they don’t have access to change all the web pages containing this snippet. Notifying all the website owners to update the snippet isn’t an option, either. Since there’s no way to rev the URL, bootstrap scripts typically have a short cache time to ensure users get updates quickly. Twitter’s widgets.js is cacheable for 30 minutes, Facebook’s all.js is cacheable for 15 minutes, and Google Analytics’ ga.js is cacheable for 2 hours. This is much shorter than the recommended practice of setting the expiration date a month or more in the future.

Conditional GETs hurt performance
Unfortunately, these short cache times for bootstrap scripts have a negative impact on web performance. When the snippet’s resource is requested after the cache time has expired, instead of reading the resource from cache the browser has to issue a Conditional GET request (containing the If-Modified-Since and If-None-Match request headers). Even if the response is a simple 304 Not Modified with no response body, the time it takes to complete that roundtrip impacts the user experience. That impact varies depending on whether the bootstrap script is loaded in the normal way vs. asynchronously.

Loading scripts the “normal way” means using HTML: <script src="..."></script>. Scripts loaded this way have several negative impacts: they block all subsequent DOM elements from rendering, and in older browsers they block subsequent resources from being downloaded. These negative impacts also happen when the browser makes a Conditional GET request for a bootstrap script with a short cache time.

If the snippet is an async script, as is the case for widgets.js, the negative impact is reduced. In this case the main drawback impacts the widget itself – it isn’t rendered until the response to the Conditional GET is received. This is disconcerting to users because they see these async widgets popping up in the page after the surrounding content has already rendered.

Increasing the bootstrap script’s cache time reduces the number of Conditional GET requests which in turn avoids these negative impacts on the user experience. But how can we increase the cache time and still get updates to the user in a timely fashion without the ability to rev the URL?

Self-updating bootstrap scripts
A bootstrap script is defined as a 3rd party script with a hardwired URL that can’t be changed. We want to give these scripts long cache times so they don’t slow down the page, but we also want the cached version to get updated when there’s a change. There are two main problems to solve: notifying the browser when there’s an update, and replacing the cached bootstrap script with the new version.

update notification: Here we make the assumption that the snippet is making some subsequent request to the 3rd party server for dynamic data, to send a beacon, etc. We piggyback on this. In the case of the Tweet Button, there are four requests to the server: one for an iframe HTML document, one for JSON containing the tweet count, and two 1×1 image beacons (presumably for logging). Any one of these could be used to trigger an update. The key is that the bootstrap script must contain a version number. That version number is then passed back to the snippet server in order for it to detect if an update is warranted.

replacing the cached bootstrap script: This is the trickier part. If we give the bootstrap script a longer cache time (which is the whole point of this exercise), we have to somehow overwrite that cached resource even though it’s not yet expired. We could dynamically re-request the bootstrap script URL, but it’ll just be read from cache. We could rev the URL by adding a querystring, but that won’t overwrite the cached version with the hardwired URL referenced in the snippet. We could do an XHR and modify the caching headers using setRequestHeader, but that doesn’t work across all browsers.

Stoyan struck on the idea of dynamically creating an iframe that contains the bootstrap script, and then reloading that iframe. When the iframe is reloaded it’ll generate a Conditional GET request for the bootstrap script (even though the bootstrap script is cached and still fresh), and the server will respond with the updated bootstrap script which overwrites the old one in the browser’s cache. We’ve achieved both goals: a longer cache time while preserving the ability to receive updates when needed. And we’ve replaced numerous Conditional GET requests (every 30 minutes in the case of widgets.js) with only one Conditional GET request when the bootstrap script is actually modified.

An example
Take a look at this Self-updating Scripts example modeled after Google Analytics. This example contains four pages.

page 1: The first page loads the example snippet containing bootstrap.js:

(function() {
    var s1 = document.createElement('script');
    s1.async = true;
    s1.src = 'http://souders.org/tests/selfupdating/bootstrap.js';
    var s0 = document.getElementsByTagName('script')[0];
    s0.parentNode.insertBefore(s1, s0);
})();
Note that the example is hosted on stevesouders.com but the snippet is served from souders.org. This shows that the technique works for 3rd party snippets served from a different domain. At this point your browser cache contains a copy of bootstrap.js which is cacheable for 1 week and contains a “version number” (really just a timestamp). The version (timestamp) of bootstrap.js is shown in the page, for example, 16:23:53. A side effect of bootstrap.js is that it sends a beacon (beacon.js) back to the snippet server. For now the beacon returns an empty response (204 No Content).

page 2: The second page just loads the snippet again to confirm we’re using the cache. This time bootstrap.js is read from cache, so the timestamp should be the same, e.g., 16:23:53. A beacon is sent but again the response is empty.

page 3: Here’s where the magic happens. Once again bootstrap.js is read from cache (so you should see the same version timestamp again). But this time when it sends the beacon the server returns a notification that there’s an update. This is done by returning some JavaScript inside beacon.js:

(function() {
  var doUpdate = function() {
    if ( "undefined" === typeof(document.body) || !document.body ) {
      setTimeout(doUpdate, 500);
    }
    else {
      var iframe1 = document.createElement("iframe");
      iframe1.style.display = "none";
      iframe1.src = "http://souders.org/tests/selfupdating/update.php?v=[ver #]";
      document.body.appendChild(iframe1);
    }
  };
  doUpdate();
})();
The iframe src points to update.php:

<html>
<head>
<script src="http://souders.org/tests/selfupdating/bootstrap.js"></script>
</head>
<body>
<script>
if (location.hash === '') {
    location.hash = "check";
    location.reload(true);
}
</script>
</body>
</html>
The two key pieces of update.php are a reference to bootstrap.js and the code to reload the iframe. The location hash property is assigned a string to avoid reloading infinitely. The best way to understand the sequence is to look at the waterfall chart.



This page (newver.php) reads bootstrap.js from cache (1). The beacon.js response contains JavaScript that loads update.php in an iframe and reads bootstrap.js from cache (2). But when update.php is reloaded it issues a request for bootstrap.js (3) which returns the updated version and overwrites the old version in the browser’s cache. Voilà!

page 4: The last page loads the snippet once again, but this time it reads the updated version from the cache, as indicated by the newer version timestamp , e.g., 16:24:17.

Observations & adoption
One observation about this approach is the updated version is used the next time the user visits a page that needs the resource (similar to the way app cache works). We saw this in page 3 where the old version of bootstrap.js was used in the snippet and the new version was downloaded afterward. With the current typical behavior of short cache times and many Conditional GET requests the new version is used immediately. However, it’s also true with the old approach that if an update occurs while a user is in the middle of a workflow, the user won’t get the new version for 30 minutes or 2 hours (or whatever the short cache time is). Whereas with the new approach the user would get the update as soon as it’s available.

It would be useful to do a study about whether this approach increases or decreases the number of beacons with an outdated bootstrap script. Another option is to always check for an update. This would be done by having the bootstrap script append and reload the update.php iframe when it’s done. The downside is this would greatly increase the number of Conditional GET requests. The plus side is the 3rd party snippet owner doesn’t have to deal with the version number logic.

An exciting opportunity with this new approach is to treat update.php as a manifest list. It can reference bootstrap.js as well as any other resources that have long cache times but need to be overwritten in the browser’s cache. It should be noted that update.php doesn’t need to be a dynamic page – it can be a static page with a far future expiration date. Also, the list of resources can be altered to reflect only the resources that need to be updated (based on the version number received).

A nice aspect of this approach is that existing snippets don’t need to change. All of the changes necessary to adopt this self-updating behavior are on the snippet owner’s side:

Add a version number to the bootstrap script.
Pass back the version number to the snippet server via some other request that can return JavaScript. (Beacons work – they don’t have to be 1×1 transparent images.)
Modify that request handler to return JavaScript that creates a dynamic iframe when the version is out-of-date.
Add an update.php page that includes the bootstrap script (and other resources you want to bust out of cache).
Increase the cache time for the bootstrap script! 10 years would be great, but going from 30 minutes to 1 week is also a huge improvement.
If you own a snippet I encourage you to consider this self-updating approach for your bootstrap scripts. It’ll produce faster snippets, a better experience for your users, and fewer requests to your server.



http://www.stevesouders.com/blog/2012/05/22/self-updating-scripts/

你可能感兴趣的:(JavaScript)