Steve Souders: Your Script Just Killed My Site - Transcript
Full audio & slides for this talk posted here.
Doug: Greetings programmers. Welcome to Tech Exploration. I’m Doug Crockford. In a moment I’m going to introduce my friend Steve Souders. Steve began looking at problems of performance of websites while he was at Yahoo. A lot of people had looked at that problem before and would usually do things like fiddle with a database or fiddle with the servers and try to figure out why they weren’t going faster. Steve looked at the whole web as a system. Everything from the server end to the browser and discovered that the browser doesn’t work anywhere near as well as we thought it would or should and found lots of work-arounds for that, which substantially speeded up webs. He wrote a number of articles about that and then published a couple of best-selling books on High Performance Websites and Even Faster Amazon or Even Faster Websites. They have earned the rank of 48,000 and 66,000 in Amazon, which is pretty great. So here he is, the fastest man in the world, Steve Souders.
Steve: Thank you, Doug. So I’ve got the stage now. You’re going to come back later for the Q&A? Okay. I’ll try sitting down but we’ll see how that works. So, thanks for that great introduction. Where’s Amanda? There she is. Amanda, thanks for contacting me and being considerate of schedules and tolerant of my missing things and putting this thing together. It’s great to be here at Tech Exploration to do this talk. I’ve done a version of this talk once before, and I loved it. I loved it even more because there was some really interesting serendipity with events that were outside of my control. Now, conspiracy theorists will say that those events were actually caused by me, but I want to guarantee you, they were not. And we’ll dig into that more when we get to that in a few slides.
I love this pic and I wanted to point out, it might be hard to see but, at the bottom is the URL for the Powerpoint deck that I’m going to go through tonight and also my website is there, stevesouders.com. In fact, if you go to stevesouders.com, in the upper right corner will be a link to the slides and all the talks that I’ve done and all the slides I’ve used are available through there. So, that might be a good URL to remember, stevesouders.com. I want to mention two more. Usually these come up during the course of my presentation, but tonight’s presentation is a little different because it’s really polished. So these other URLs won’t come up so I’m going to mention them now. One is webpagetest.org. It’s even more important to remember than stevesouders.com and it’s a great testing tool from a guy Pat Neenan who started at AOL and now he’s at Google.
The other URL, the third URL I want you to remember is perfplanet.com and that, again, is even more important than stevesouders.com. If you remember, it’s an RSS aggregator run by Stoyan Stefanov, a former member of the Yahoo Exceptional Performance team and YSlow developer who’s now at Facebook. It includes blogs like mine. So if you’re a member of perfplanet.com, and subscribe to that, you’ll get my blogs plus blog plus blog posts from about 40 or 50 other gurus in web performance.
it doesn’t work anymore or if it’s fried, then the whole architecture is down.
Now, I think it’s kind of comical that they only have one application server and they didn’t put the red circle around that as well. So I didn’t put the red circle there. I thought I did and I was going to copy and paste it over on the application server because I pulled this image about four months ago. I realized the red circle was burned into the image from Wikipedia. So both of those are a single point of failure, if either one of those goes down, the whole system is out, right, and that’s really bad. And 10 years ago, 15 years ago, 10 years ago even big websites would still have single points of failure and we all learned our lessons. Now, unless you run a mom and pop site like mine, you’ve eliminated all of these single point of failures.
So, people who do the web as a business are really good at this single point of failure stuff when it comes to hardware, but they’re not so good at it when it comes to software and we’ll see that. Here’s an example. Does anyone here work at Business Insider? They must be owned by some media company. I don’t know who it is. I should have done more research. So, I can look at this page and I can spot 1, 2, 3, 4, 5, five likely single points of failure. Anyone want to guess? I’m sorry?
Male audience member: External scripts are one of them.
Steve: Yeah, visually we can see those external scripts. The answer was correct, external scripts. We can see those visually on the page. We have an ad here at the top. Job Script is really awesome. It’s a way that one company, one server can talk to another server without ever having to share any kind of API or anything like that. So that’s a really nice way to get ads into the page. So we have a couple of ads in the page. We also have these third party snippets. A Facebook like button, a Twitter button, I don’t know why there isn’t a Google+ button there. I was paid to say that. In fact, every two weeks I’m paid to say that. And so, it’s not obvious to most people that these are single points of failure in the page, but anyone who’s looked behind that and read stuff like my books and my blog know that they are. So let’s take a peek at that and try to figure out why these are single points of failure.
The gentleman in the audience mentioned that they have external scripts behind them. So what’s the issue with external scripts? How about a little shout out for the background photos in these slides? Aren’t they good? Because scripts block, get it, they block? Okay, so it only gets worse. So synchronous scripts block all the elements that follow them in the page. And what do I mean by a synchronous script? I mean a script that you do the typical way. Script tag,
< script src="main.js" >< /script >
That’s a synchronous script. It means that no other job script on the page can be executed until that is downloaded, parsed and executed.
Well guess what? That happens at the onload event and in most cases, the user’s not going to wait for the onload event of this page. After 19 seconds, or 15 seconds, or if you’re like me, a second and a half, you’re going to hit reload, right? So anyone who’s relying on RUM to keep track of these front end stalls, you’re missing the slowest times that your users are experiencing if you’re waiting for onload. So a simple thing to do there is, if you’re doing any kind of RUM, instrumentation monitoring, metrics on your page, have a time out. Maybe keep the onload code there, but also, after about 10 seconds you might want to send a beacon back that said this page, it’s been 10 seconds and it still hasn’t load, but I’m just letting you know, it’s been over 10 seconds. You might know from looking at your stats what a good abandonment point is, 10 seconds, 15 seconds, 20 seconds, 5 seconds. So why did this happen on Business Insider? Is that readable?
< script src="blah.js" >< /script > and when that happens, the browser’s going to stop there and not render anything else below that script tag. And guess what? This script is in the head so the entire body is being blocked for rendering by this script. Now, how is it that it’s timing out, it’s taking 20 to 120 seconds? I first noticed this and wrote the blog post, so that’s a failure. This is something I call front end SPOF and that name has kind of caught on. You can read that blog post if you want. So the way I discovered this is kind of interesting. I was in Beijing last December for Velocity China and I loaded, I have because I’m so impatient with time.
I have this script that I fire up in the morning. I click go and that loads the 30 web pages that I read every morning. And while they’re loading, I go and get my breakfast and I bring it back to my desk and by the time I get back, those pages are loaded. I don’t have to wait for anything. So it’s like a dagger in my heart watching these websites load so slowly. So in the morning, I run the script and Business Insider, by the time I get to it, it’s blank. I go, why is it blank? I noticed that. So I fired up a packet sniffer and I noticed it was because of the Twitter widget. So when I got home, I decided to write this up. I needed a way to reproduce that and that’s where Web Page Test comes in. So I actually did mention it as part of the talk. So I’m using Web Page Test and I’m going to businessinsider.com and here’s the failure that we see.
The rendering is what, this is called filmstrip view, there is no painting on the page until about, sometime after 20 seconds. That’s because this Twitter script and I’m in IE, which times out after 20 seconds. So no rendering happens for 20 seconds. The way I was able to reproduce this using Web Page Test is they have a Web Page Test instance in China, which is behind the firewall which blocks twitter.com. So anyone who has users in China and is using any of these widgets that are blocked by the great firewall, Google by the way is blocked, you might want to look at your page load times in China, because I bet they’re over 20 seconds and I think #[17:45:3] is even longer than that. I also noticed Radar, O’Reilly Radar, had a similar issue.
So, I then went and looked at the Twitter code to see who was really to blame here. For example, Google Analytics is asynchronous, but we still publish, we still document the synchronous version in case people want to ruin their website. Here’s the documentation. I love Twitter, but here’s the documentation. A blocking script, they tell you to put this blocking script in the page and look, it’s in the head, so it’s going to block the entire body if it’s ever blocked by a firewall or if it’s just slow. Like it’s not implausible that the Twitter script could take three seconds to download, on some occasions four seconds. Does Business Insider want their entire page blocked for this Twitter widget which is this big on the page? No, it’s ridiculous. So yeah, and it’s in the above bar so the entire body is from rendering. I think I’m going to be sick. Okay, so that’s why we have this behavior. The entire page is blocked for over 20 seconds. Now, how unusual is this?
So, I don’t spend much time in China. We don’t have a firewall here, or at least, not a government one. You might have one at work. So I run this side project called the HP Archive where I crawl hundreds of thousands of URLs a couple times a month and gather information about how many scripts they have, how many bytes they have, how fast they load and I ran this query. And I thought it would be good to go through this query line by line. So we’ll do that. So my goal was, I want to find some other instances of this front end SPOF.
I want the URL, and I want the webpage test ID that will take me to the results, so I can look at them. From my two tables, I’ve a table of just the high level page information, and then all the resources will request that that page contains, and I want to bound those to be a specific run. I think this was in May, the run May 1st through May 15th. So, the page IDs were in that range. And I wanted to find some fairly popular sites, so I picked ones that were ranked in the top 20,000 worldwide where they had at least request, one response whose content type contained the word script, and it’s low time for that one resource was ten seconds, and the rendering of the page was also over ten seconds.
And I’ll group those by page ID. All right, that wasn’t really that enjoyable. There weren’t any pretty photographs in the back of the slide. But the point is all the HP archive code, and the data is open source. You can download it all and have an instance of your own running in probably a couple hours, and, then, you can run your own queries like this. And you can slice and dice this data any way you want, and there are people around the world who care about performance, who are doing stuff like that. So, it wasn’t super fun, but it’s pretty easy once you learn about the schema.
And so I found these examples. Here, this is data brand. For some reason, they have this script from Facebook that—I don’t know why the Internet is not deterministic, you know? This Facebook script of 14 seconds to download, and that blocked the entire page from rendering for 14 seconds. This is a big follow. I think it has something to with SCO or something like that. And here they had scripts from API’s #[0:22:00.7] google.com and Facebook that for some reason took over 25 seconds to return. And so, the rendering was blocked for 25 seconds. This is—I can’t even read this, white cool, yeah, man, and they have a lam.com, which is the industry leader in some type of media advertising. But they’re not the industry leader in high performance scripts, and here this script is taking 15 seconds or something like that. And it started a little late, so it blocks rendering until about the 25-second mark of the entire page. Look, this is what kills me is, and it might be hard to read, but this is a wonderful chart, and each of the roles of the waterfall chart is a request. The first one is the ACL document that you have a bunch of scripts, and images and style sheets and other stuff, so that’s like 15 requests that have already been downloaded. NG scripts, the whole HTML document, all of the texts in the HTML document is #[0:23:07.9] right now, and none of it is being drawn, waiting for this one, bland script, which is request #17. It just doesn’t make any sense. It’s madness, I tell you.
So, it was May 30th—guess what happened, and I talked about how these third party widgets you put on your page and bring your site down. You really got to, you know, be vigilant and diligent about making things asynchronous. Guess what happened the next day? Facebook had an outage. Right? It’s hard to see. I don’t know why those chose yellow for the plot, but all of a sudden, and this is a plot of just routine Facebook.com, I think, and it was taking 20-30 seconds to reach Facebook.com. But it wasn’t just that Facebook.com webpage. It was any of the resources on Facebook.com. So, what happened? And this, I thought was really awesome. I think the tech news community are getting really intelligent about how the web works, and they got the story right. And these are fairly big media sources, Forbes, and Facebook average slowed thousands of retail contents. So, it wasn’t just Facebook itself. And I’ll tell you, I’m not picking on Facebook here. This was the joke I was making before. People think I coordinated this. No one was listening to me on May 30th, so I brought Facebook down on May 31st, just to teach them a lesson. It wasn’t me. I don’t have that kind of power. PC World, and these averages have happened in the past. They happened to folks like Google and Yahoo, and AOL and Twitter, and they’re usually not too long, and if all they did was bring down the main publisher site, that would at least limit it. But it’s these snippets that spiral away through the entire Internet, these thousands, tens of thousands, hundreds of thousands of other sites when all of those sites get brought down because they’re loading scripts synchronously, that’s a massive outage.
So, as Facebook service goes, the Internet, yesterday’s Facebook outage also slowed down major retail sites and banks too, so and we can see these major U.S. retail sites #[0:26:29.9] times, above and below as the Facebook outage. Low times, you can see there’s a very, very, very strong correlation. So, I think that’s kind of the first half of the talk, and it’s been a tale of woe and foreboding. But all is not lost. The sun will come out tomorrow. So, we’re going to talk about a few things, that maybe, makes the future not as gloomy as the past.
Here’s the first one. I confess that this is the hyper source code I showed you from last May and being the minutia-focused freak that I am, I went back, and I checked it, and they changed their code. So, here’s the same Twitter snippet that brought them down before, that caused them to have that off-run and SPOF, and now they’ve got this thing scripted #[0:27:29.2] equals post-loaded, not sourced, data #[0:27:33.4].
Doug: Data is a way to modify each ML attributes without file-raping the HTML conformants.
So, this is unfortunate. I actually haven’t gone and looked if the Linked-In snippet has an advertised acing pattern, and Business Insider just hasn’t used it yet. I know in the case of Twitter, they had evangelized their AC version. They did that. They were in the Unknown Speaker, and they did that after I called them out on stage, and so that was good. So, Business Insider, they fixed their Twitter snippet because I called them out on stage. I’m calling you on stage for your Linked-In widget now and fix that, and I’ll find the next one, or you can find it by typing in the host name for all the scripts in your page. It’s pretty easy to get out of scripts in your page, open it in any packet sniffer. I use Chrome primarily now. You can get the network list of resources. You can click on scripts, so all it tells you is scripts and just go through it and type all the host names, have two windows open at a time. Type all those things for all the scripts in here and see if the page will have single point of failure in any of those third-party scripts don’t return. I wouldn’t bother to do your own scripts. If your server isn’t returning your own resources, then, they’re probably not getting the HTML page either. But certainly all the third-party stuff you should type in there.
And it’s kind of funny. Okay, so now I’m transitioning to another thing, which is, you know, a possible #[0:31:27.5] for this problem. The Twitter guys are really smart, but they have this bad stuff in their documentation, and they’re talking this anywhere that JS file, which is really small, and we want to place it as close to the top of the page as possible. You shouldn’t worry about it because it’s small. It’s less than 3k, and we gzip it. And all of the other scripts that loaded synchronously, so you only have to put one single point of failure in your page for us. So, don’t worry about it. It doesn’t matter that it’s only 3k. If it’s tiny now, it doesn’t matter whether it’s small or big. It’s going to bring that page down. Right? If I have to load synchronously. So, ya know, I want go off from this a little bit. We all know that failures happen. You’ve got to plan on that. Now, if I’m downloading a script, which is synchronous, I’m going to give you a 200 or a 304 response. Two hundred, I don’t have it in my cache, if I have it in my cache, but it’s expired. Like they said, you can cache it for a day, and it’s been a day and a half. And I have to issue – if modified sense conditional get request for that script. And, yet if it hasn’t changed something up there, I won’t get any response body, I’ll just get a very tiny couple hundred bytes, 304 response. But both of those responses locks the browser from moving forward and rendering and parcelling the HTML document. So, it doesn’t matter if it’s in the cache and expired and hasn’t changed—even a tiny, little short 304 request, if it times out, it’s going to cause this front end single point failure. And in this case, anywhere #[0:33:20.0] yes is only cacheable for 15 minutes. So, after 15 minutes, we have this script that’s loading as a single point of failure, at least back in May it was, and every 15 minutes, it’s going to issue another get request for it. And the browser’s blocked until that get request comes back.
So, just because scripts are small, and you gzip them, that it’s really #[0:33:45.7] to this issue of whether or not it’s a front end single point of failure. If you know the sequencing, it’s a front end single point of failure, and I don’t want to pick on anywhere.js. Where was that Twitter? I don’t want to pick on Twitter, their widget’s js is only 30 minutes, but if you look at Facebook, it’s only 15 minutes. If you look at Google Analytics, it’s only 2 hours. I think this was back in May. I think they might’ve changed that. I actually don’t know if they made it shorter or longer. So, this is kind of a pattern that we see for bootstrap scripts. These scripts—this is a popular pattern. Let me get a foothold with some script, and I’ll make it small, and then, depending on what I’m trying to do, asynchronously, dynamically load a bunch of other scripts. But these bootstrap scripts, since I’m giving you some other website, a snippet to put in your page. Like I can’t get you to change that to, you know, to bootstrapone.js, boostraptwo.js, every time I rev a new version of my bootstrap script, so what would I do? I have to give my bootstrap a really short expiration time so that if there is a change, the browser will check for the new version. So, but these more frequent—is that a message for me? No? Okay. Well, for the conditional get request means the front end’s SPOF is more likely. But be sure cache times are kind of required by these third-party snippets so that the user will get an update even though the file name doesn’t change.
Well, is there another way to solve this problem? I mentioned #[0:35:30.5] before, so I think that I said I had this idea. It’s not quite working, and he said, oh, well, here’s how you can fix that. So, we worked together, Google and Facebook, cool on something that we called self-updating bootstrap scripts. So, some basic work on some bootstrap script, and as part of the bootstrap, it sends a beacon back with a novel impression or page load time, or whether the user finished the conversion or whatever you might beacon back for something, or it could just be an adjacent request. But there’s some other request that back to the server. And if there isn’t one, do one just for the purposes of implementing the self-updating script. So, I have this bootstrap script, but it’s going to be cached for an hour.
So, how can I avoid generating a conditional get request to my third-party domain from some other person’s website without increasing the probability of front end SPOF. Well, here’s what I can do. On my server, I can always know the version is the current version, and when this beacon comes across, it can have the version number in the URL.
And since this is an I-frame, it’s not going to block the page at all. It’s loaded after the page—the bootstrap has already started. And, then, it’s going to have this other cute little snippet, which basically loads the page once and only once. Now, what happens when you re-load a page is instead of sending a modified sense conditional request, it sends a plain get request. There’s no modified sense match headers. Now, what’s going to happen when you do that, is that the server is going to return a real version, the new version of bootstrap.js, version 1.8. And there’s another thing that it does that’s more powerful that I’ll get to next. So, the page loads. It’s read the first time that update.php loads, or it loads bootstrap.js and loads it from cache because it’s in the cache. But, then, the #[0:39:55.3] happens, and the next time update.php gets loaded as part of a re-load, it requests bootstrap.js, and it’s not a conditional request. And even though it’s in the cache, and it doesn’t expire for another 364 days, since I did re-load, we request it, and it’s going to get the latest version. And it’s going to store it in cache. It’s going to overwrite the old version, 1.7, the new version 1.8. So, voila, we can have bootstrap scripts, and instead of having to have a short time, which increases the chance of front end SPOF, we get them long cache times. And we can use the self-updating pattern to get the new versions automatically in the background.
The one caveat is that new version is going to be used the next time the user hits the page. Is that super bad? Most of the time it’s not. And, in fact, that’s the way ab-cache works, so it is something to consider. If there was no way that you could ever tolerate someone doing even one page view with a script that had been updated, then, you might not want to do this. But there are a lot of cases where it’s a perfectly good pattern.
So, what he thought was, when they just did the normal way of having it expire after seven days, the version from, I think, last April, they still saw about 5,000 beacons from that old version during this week in June. And, then, he rolled out the self-updating script pattern, and it dropped to 600, a 90% drop. So, this is another benefit. Not only can you get a set of longer cache for your bootstrap script, but forcing this re-load by-passes this mis-configurative intermediate proxies, or other anomalies that are keeping expired content in the user’s cache, regardless of what you’ve set the headers to. So, that’s pretty cool. Okay, now, I think this is third and last one. And this was the one I was hoping Doug would like, would appreciate.
Doug: That was great.
Doug: I need to swear you in right now.
Steve: Oh, all right.
Doug: Do you swear to covet property, propriety, plurality, sureness, security, and voucher of the state’s say what—
Doug: It’s a fact.
Steve: Now, isn’t that just cool?
Doug: Okay, so I want to start by asking a question, and I’m sure it’s foremost on everybody’s mind. Website is two words?
Steve: That’s actually a funny story. I love telling stories. So, you know, O’Reilly looks at various sources, the—what is it the Chicago newspaper has a style guide?
Doug: The Chicago Annual Style.
Steve: And, when I wrote the first book in 2006/2007, either way, and I was used to using two words. In 2009 when this one came out, the style guide said it’s one word. And O’Reilly said, you know, here’s the name of the book Even Faster Web Sites with two words. And they said, well, yeah, we’re going to make it one word. That’s our style guide now. And I said, no, you can't do that because I can't live with a discrepancy like that. I've got two books, at least that spell websites differently. I said – and he said, oh, wait, your book is the last book that we'll let websites be two words.
Doug: So, I think there's a wall there, that, particularly in the web, but in – in the world, in life, things change, right? And sometimes change is good. Sometimes change is bad. Sometimes things don't change, and then we're stuck with stuff, like, you know, it's heartbreaking, the – the story you were telling, but it's not a new story. It – it comes from a mistake that document.right made in 1995 where he added document dot rights to the primitive style-making guide, which I hope will be the worst mistake he ever makes, but he made it. And we'd known that was a mistake for a long, long time, and we have made no progress in fixing it. And, you know, so we keep rediscovering it, and it's bad. And we can't get rid of it at this point because the advertising industry adopted it –
Doug: A decade or so ago, and so we picked document.right, the ad network fails, and then there's no money in the web, and that's the end of that, so solving this is going to be really hard.
Steve: Does it solely rest on the shoulders of document.right? Couldn't you, in that ad code or something else, say, you know, let me find the current script that I'm in, and let me insert a dominant #[0:51:48.4] or something like that right above or right below the script because I know the script is where I want this ad to occur. And so it seems like – and I think that that would cause the same sort of problems that, even if you did that asynchronously, the browser would have charged ahead, and then you – oh, no, because it's not doing document.right. Yeah.
Steve: You're absolutely right.
Doug: The thing I like about your approach is that it's really pragmatic. Whereas, I think I'd be more theoretical. I want to find what is fundamentally wrong and fundamentally fix it. Whereas, you look at what we actually have, and what is actually achievable, and in a sense, you're not expecting that you can fix the browser. You're going to try to fix it on the application side, and so you've come up with rules and conventions, which allow you to work around that stuff.
Steve: Yeah. Theoretically, you're right.
Doug: Well, of course, I'm right. But you also get this thing where you get, two steps forward and one step back, where good practices, next season, turn out to be bad practices. Like you recommended domain sharding because the browsers had this stupid limitation of only two connections per host. And so that throttled how much data you could load at once, and so you said, well, let's just bat it over several domains. And that was great until the browsers fixed the connection limit, and now sharding actually slows things down because we didn't form DNS in the cache as effectively.
Steve: In some cases.
Doug: So, you know, kind of, you know, we're doing a lot of lazy loading stuff now, and it makes me uncomfortable because it's way too much work to be lazy. You know, lazy patterns should be less work, not more work when you're working really hard. And now Google is reporting that there are performance problems with that. They want to be able to mouse over a link and, just by hovering, they want to start loading the HTML and the scripts, but they can't load all the scripts because they can't see all the scripts. The scripts won't be visible until the first script’s #[0:54:02.0]. So we put – sorry, there's no right way, there's not a good answer, and I find it really frustrating.
Steve: And that's hard is, if you're doing the right thing, which is, like, really staying up to date, using the latest best practices, then you're having to do more work than the people who, like, really have no idea what's going on, but the other side of that is, hopefully, one, you're developing your craft, you're, you know, even if you have to change things and put something in and take a step back. You're learning more about how things work and hopefully understanding why you do that. Well, then, also, by putting in that extra effort, in that time between when it was two steps forward, one step back, you know, it might have been a year or two-year time period, it's true that you had to undo that work you had to do two years ago, but in that two-year period, hopefully you had a better site, you know, better user experience or whatever it is. So, you know, there are benefits to it, but yeah, I know – you and I have talked a lot about that, about how a lot of these hats are things that people are going to have to unlearn, and that's – that's really unfortunate.
And to me, you know, maybe that's one of the reasons in the HP archive, you'll see that these – a lot of these performance best practices have a high adoption in the top 1,000 web sites, but as you get closer to the tail, these best – the adoption of these best practices drop off a lot. And so it might be that the people who are really paying attention to this stuff and putting in that extra effort, even if they might have to undo it later, are will – excuse me, the companies that have dozens, if not hundreds, of developers, and so they can afford some of that busywork of put in, take out, put in, take out, put in, revise, revise.
Doug: Mm-hmm, yeah, you really have to keep on top of it. And unlearning is really hard. You know, you still see reflections of Dreamweaver in – in the scripts we look at, you know, and that stuff was done a long time ago, before anybody had any idea of how programming worked. And it's still copied, and so someone will say, look, well, that's the right way to do it without any evidence without why that is, or – or the evidence has since expired, but it's still out there. And we spent a lot of time trying to educate our community, but, even so, it is a struggle.
Steve: You'll see – another example of that that you see every day is, like, Business Insider fixed their Twitter snippet. But, even though it was causing, you know, anyone in China to see a blank page for twenty seconds, they might not have fixed it. Like, I don't know if they fixed it because I – I had them as my counterexample in this presentation, but, a lot of times, you'll see these antiquated patterns. The same is true with Google Analytics. We've had the async pattern out for two years now, and people, you know, still, a large number of sites are using the synchronous pattern. So, unless something fails in a very noticeable way, people don't go back and revisit work that they've already done.
Steve: They've done it. They've finished it. It's behind them, and they don't want to go back and look at it unless there's some kind of catastrophe or emergency around it.
Doug: So, you've demonstrated tonight that, if you don't attend to your performance, it can get so bad that it turns into unreliability, and then into failure, but it's even worse than that because our dependence on third-party scripts. The security implications of that are completely horrendous. If you're Facebook, maybe it's not that much of a problem because all you do is waste people's time, and so, you know, if there are lots of ways to waste people's time, right? And one's as good as another, maybe, but, you know, where I work, we're – we're trying to move money through the network. And if any of that stuff goes wrong, the consequences are a whole lot worse than wasting time.
Doug: And right now, the browser is not a safe platform for doing that stuff, and I despair sometimes, you know, how do we get it fixed?
Steve: I – I've never written about this before, but years and years and years ago, I formed – I tried to figure out where performance fit into the grand scheme of things, and I only got – I stopped once I got to performance, but it turned out, in my mind, it's Number Four. I didn't think about five and lower. But, to me, the priorities are availability. If the site's not up, then it doesn't matter, you know, how fast, you know, or performant it is. The next is security and privacy. If you're not protecting your users and their information, then that can be disastrous, you know, even to the point of legal ramifications – certainly, financial ramifications and #[0:59:22.5] ramifications. Third is functionality. If it doesn't do what it's supposed to do, if you can't view a page or, you know, look at someone's profile, then, you know, people aren't going to use it. And really, the fourth one is performance. You know, if the features don't work, users are scared to use it, or the site's not even available. It doesn't matter how fast it is, but, you know, ten years ago, I know, you know, I was in companies that dealt with availability, like just trying to stay up. The number of users was growing so fast, and we didn't know a lot about security back then. And, you know, staying on top of bugs, that's always a constant battle, but usually, functionality is pretty good, and, you know, certainly, I think for the last five years or so, most companies are – have been on top of those – what I consider higher priorities, enough to start paying attention to performance.
Doug: I completely agree with you. I want websites to be really fast, but it's even more important to protect users and their money and security. So, let's hear from you. We have microphones set up around the room. Please come up and talk to us.
Steve: Yes, the whole – it's a lot, it's really hard to set up that problem, but the set-up is I'm a third party, and I – I can't change the code that's in your web site, so I have to give you this snippet, and it can never change, and that snippet includes a URL, so that means I can never change that URL, and since I can never change the URL, but I want you to get changes, I want users to get changes, I make it expire after fifteen minutes, and – and because the browser, the user has to keep requesting it so many times throughout the day, and there's a, I don't know what, 1-in-a-1,000 chance that it can take fifteen seconds to download? We're just increasing the probability of this front-end SPOF problem. And you're right, the only reason that I have to rely on expiration date to get updates to the user is because I can't modify that code in someone else's page. If it's my page, I can just make it my combo one-dot-JS, my combo two-dot-JS, my combo three-dot-JS, so you're absolutely right. It's only for third-party snippets, third-party scripts.
Unknown Speaker #1: Down in the kind of a #[1:02:08.0] is, if you are managing those scripts that are injected from the source, say, you can make other versions in the source and then put the latest version of the script that is going to be injected from the source, and it doesn’t work. I'm just saying, as I work it out, if you are not following this approach, the self-updating scripts –
Unknown Speaker #1: You can read other versions in, yeah, the source, then inject the actual version to frame the URLs for the third-party scripts, and then update it.
Steve: Well, how – what would be the version for – ?
Unknown Speaker #1: In a sense, like, on the scripts, the third-party scripts you would be inserting from your #[1:02:46.0], like HSPR, any other PSP pages.
Unknown Speaker #1: So, you'd be maintaining the script that needs to be injected into the HTML page. In that case, so you made in all the URLs that need to be injected, so we can actually directly go and change the configuration, like in swap injecting version or #[1:03:02.0] version or sharding.
Unknown Speaker #1: Right?
Steve: That would be a possible approach on your – on your back end. You could, like, be pinging, having a crunch off it. It's constantly pinging, ATIs at Google.com, Facebook, and Google Analytics, and always detecting as soon as there's a change, and doing something to twiddle your page. You could do that, yeah.
Unknown Speaker #1: Okay, thanks.
Steve: Brendan, hi!
Doug: We'll see. At this point, I – I'm not willing to predict what is definitely going to be in the next edition or not. We need to finalize stuff pretty soon because we need to present to the ECMA General Assembly a year from now, which, given the number of things we have to do between now and then, including writing the standard and testing it and sending it through the central committee of ECMA, I think we're probably going to be late, given the amount of work we have left to do, and that – that's too bad because I want the good parts of ES6, but they're going to have to be delayed because of the bad parts of ES6.
Unknown Speaker #2: And oh, got one more question, a quick question for Steve. It's about – I've noticed, like, when you're browsing a lot of tabs, and when you switch from your active tab to a different one, it starts to – the spinner goes off. What – what are web sites doing to – ? Is that a good pattern to use for performance reasons? What are they doing?
Steve: It depends on what's going on. One thing that I have noticed that I'm actually kind of pleased about is the page visibility. You – people using the page visibility API, which came out of the recently formed W3C Web Performance Working Group, so the idea is – a lot of people do this, they open URLs in a background tab. I do it all the time. They do, you know, command click, you know, and I'll open a bunch of URLs, and whichever one comes back first is the one I go to, and so, on those pages that you're opening in a background tab, they might do something like play a video, play audio, start a photograph carousel spinning, they might send beacons about, hey, the user just looked at our page when actually the user never looked at the page, and so, with this page visibility API, you can now detect when someone brings your page from a background tab to the topmost tab, and so sometimes, I've never seen when that happens, people will take action. They'll load a carousel or load an app, or do something, maybe send a beacon, and, if they're doing something, I – it depends on the browser, but typically, the spinner only goes off if they're doing HTTP requests, and so it's possible that triggering that visibility state is caught – is triggering some other HTTP request, which I think is a good thing.
Unknown Speaker #2: Over to you, thanks.
Unknown Speaker #3: Howdy. How do any of your performance recommendations change when you entire the mobile space? For instance, the average round trip time on a 3G network in the U.S. is something around 240 milliseconds a request, so something like the self-updating script seems to take several requests to complete. How are you – are you looking at the mobile space?
Steve: Yeah, I'll touch on your last point, and then I'll go back to the initial question. So the last point, even though it's generating more requests, they're in the background, and they're not blocking anything in the page, and so – and the benefit of that is this bootstrap script that might have had a fifteen-minute expiration time can now have a one-year expiration, and so, since sixteen minutes from now, when I load that page again, I would have had to do an if-modified-since request, and even though the response can be a very tiny 304 response, that latency, which is even bigger on mobile than it is on desktop, is even more of a problem, so the Number One rule in the very first book was reduce the number of HTTP requests, and that's still the Number One rule, and it's especially true for – excuse me – for mobile, and the number of requests also includes conditional get requests, which are, you know, percentage-wise, suffer the most from latency because they're typically very small, pithy, quick response, and instead they're on this high-latency, and so this thing that should be quick is actually taking a long time. In general, the best practice for my books apply just the same to mobile. The priorities of them would be a little different. For example, one of my rules is to reduce redirects. On the desktop, a redirect whirl might average us probably 200-250 milliseconds. On mobile, it's typically over a second, and the – it's exacerbated because, depending on how you've implemented your mobile site, what a lot of people do is, if the user has gone to dub-dub-dub dot my site dot com, it will redirect them to m dot my site com, and that redirect is happening – I mean, what we want to have happen is we want the HTML document, which is, you know the commands for the browser to get going rendering this page, we want to get that HTML document into the browser's hands as quick as possible, and now what we've done is we've put this one-second redirect delay in front of everything, in front of the HTML document.
Steve: So, you know, that one, like, reducing redirects, I think I, in the first book, I – I put them in priority order, and I think it was like number nine or something like that. I'd try to make it like number three or four for mobile. So really, none of the rules – none of the rules don't apply. There was one, it might have been domain sharding. Oh, yeah, it was domain sharding because, unfortunately, it seems like, anyway, mobile browsers often have – will open fewer numbers of TCP connections than they do on the desktop, like, you know, the Android browser will only open two whereas the Chrome browser on the desktop opens six for a given hostname. And so, that would actually argue more for domain sharding because they're only opening two connections per hostname, but they also set the maximum number of connections very low, like at four or eight or ten, so doing domain sharding really doesn't help you that much because you're going to, you know, overwhelm the TCP pool, exhaust the TCP connection pool fairly quickly, and you'll have the cost of doing these DNS lookups that really aren't buying you much benefit. That's the only one that I think I would kind of pull back on for mobile, but otherwise, they all apply, just some different priorities.
Unknown Speaker #3: What about in-lining versus externalizing? Like, I know a lot of times we externalize on desktop whereas actually in-lining that content on mobile can be faster.
Steve: Yeah, you're right, that is – I'm remembering how because I have a presentation that talks about that, and there's one or two that I cross out or – and I think – I think that's one of them – is definitely one of the rules in the first book was make scripts and style sheets external, and, you know, depending on the user patterns for your web site, mobile can make a lot more sense to inline those, although I will say that, in the book, I talked about dynamic in-lining, which I still don't see a lot of people doing, but would work really well for mobile, which is inline them if you don't see a cookie, but then, once the page is loaded, dynamically download the actual external scripts and style sheets, get them in the cache and set a cookie, and the next time you use your ghost to page the server, if it sees the cookie, it can say, oh, I don't need to inline, I'll just reference the external resources, and hopefully they'll be in the cache. I did an experiment, and there's – and it was just getting volunteers to do something, and they didn't really know the purpose. It was only a couple of hundred people, but I found a very high correlation between clearing the cache and clearing cookies, and so likely the cookie is a pretty good proxy for the state of the cache.
Unknown Speaker #3: Okay, thanks.
Steve: I have not done any performance analysis comparing those, and I would bet that it has less to do with the style and more to do with the, you know, things that I look at, like how the scripts are loaded and, you know, things like that.
Unknown Speaker #4: So what do you think of AMD, then, and RequireJS and LABjs, then? Have you looked at that?
Steve: I'm sorry?
Unknown Speaker #4: There's Require JS –
Unknown Speaker #4: AMD, LABjs – that's another point of confusion.
Unknown Speaker #4: One last question. Do you – what do you think of client-side templating in terms of performance and so forth?
Doug: So, one bit of warning about that: ES6 will probably have its own templating system, which will be radically different than everything else, and will also be significantly better because it will be built in the language, which means all the templating stuff that you're doing now, you're going to want to trash it. That's that. Seriously.
Steve: But that doesn't mean you shouldn't do it. Remember we talked about two steps forward and one step back? You learned the #[1:19:54.0].
Unknown Speaker #4: And when will that be available?
Doug: Well, that depends. Do you have to run on IE6? Seven? Seven? Eight? Nine?
Unknown Speaker #4: No IE.
Doug: Oh, then you're – then you're bold. Yeah, so, probably within two years, it should be fairly universal as long as you don't have to be IE.
Unknown Speaker #4: Thank you, sir. Thank you.
Unknown Speaker #5: Thank you.
Unknown Speaker #6: Excuse me. Okay, so come to the mobile apps #[1:24:05.0], the HTML file, hybrid mobile apps, the number is increasing, and it's harder to update, and then there are a lot of API codes in them, and there is maybe no true #[1:24:17.0] the Webpage test, so if there will be a #[1:24:23.5] SP or different SP for mobile apps. Yeah.
Steve: You mean for native apps?
Unknown Speaker #6: Not native, but the – what, HTML file hybrid apps? Like in the old Facebook app.
Steve: Yeah, well, I think there's, you know, I – I – yeah, #[1:24:45.0] still exists. I think maybe it's a little reduced because I think people just put fewer, you know, less third-party crap on their mobile pages, but, you know, Business Insiders, you know, if they build a mobile version of their web site, I guess if they build a – no, yeah, they'll still have the same problem if they've got the widgets, you know, Linked-In and Facebook-like and stuff like that built into their mobile app and it times out, it's still going to be a front-end SPOF.
Unknown Speaker #6: Yes, so there would be, I don't know, a tour support back there, web page tests, yes? Things like that. And if there's a robber, then maybe worse, I don't know.
Steve: Yeah. You could look at it. WebPage test does have iPhone and Android, so you can do some work there, and there's – it also has a scripting line, so even if you have a single-page mobile web app, you could actually walk through a couple of actions and see if SPOF exists, but certainly, on the initial loading, you can see if SPOF exists, front-end SPOF.
Unknown Speaker #6: Using Webpage test?
Steve: Yeah. Thank you.
Unknown Speaker #7: Yeah, how big of a boat anchor is IE this – these days? What is its market share, how is it changing, and is there any chance of expediting – lifting it off the floor?
Doug: That's a real good question, so IE is declining. It's not declining as fast as we'd like it to. IE10 is actually very good. It's actually very, very good.
Unknown Speaker #7: Mm-hmm.
Doug: But it's only previews at this point.
Unknown Speaker #7: Yeah.
Doug: So it's going to take a while for it to – to get into the market. There are still large communities where IE6 still dominates, and that is going to be a source of pain. The thing I've been trying to advise the industry on is, screw them. It's just so hard, so painful to have to support IE6. We should just cut them off.
Unknown Speaker #7: Well! I mean, you might say that about a – a toe that has – that the doctor might say we might not be able to say, but – well, I mean, it's a big enough toe.
Doug: Well, well, we don't actually know that. There was an experiment that I wanted to run at Yahoo while I was there, but I never managed to get it done, and that was to measure the revenue per user per browser.
Unknown Speaker #7: Yes.
Doug: And my prediction was that we were actually losing money supporting IE6.
Unknown Speaker #7: Aha.
Doug: You know, the higher support and – and failed ad delivery and all those things, but I – I never got the #[1:27:29.5] on it.
Unknown Speaker #7: So – what the basic – the basic model is, until they get a new PC, forget it?
Steve: So the definition of forget it, you know, kind of – it's not a black and white.
Unknown Speaker #7: I mean, why would they ever upgrade until they get a new PC if you're Grandma?
Doug: Right, so – the thing I proposed a few years ago was all the major web sites in the world pick one day and date where, if someone comes to us with IE6, we send them to a page that says, here are five other browsers you can load for free, please do that and come on back.
Unknown Speaker #7: Right, okay.
Doug: We never really did that, though.
Unknown Speaker #7: Oh, okay, too bad.
Steve: So we – Yahoo, I mean, Yahoo started it, was, you know, the grade-A browsers, grade-B browsers. I know that's what we do at Google, and we'll – it's not that we just make it so it doesn't work in IE6, we just say, yeah, you know, this new capability, this new feature, it's too – it's not worth it to make it work in IE6, and so stuff still works, you just don't get all the features, so it's not like, you know, they just can't do anything, they just don't get the latest and greatest.
Unknown Speaker #7: Mm-hmm, okay, thank you.
Steve: So those are three big questions, and you actually didn't ask a question, you said I want to ask questions about these three big things, and I'm getting a sign that I think we're supposed to wrap up, and you already asked two questions before, so I think we should let the gentleman behind you go, and you hang out and come up after, and we'll have a longer discussion about those. Anyone else can come up, too.
Doug: It was actually my job to do that, but – but he's right, so thank you.
Steve: You were too slow. It's all about speed.
Steve: You mean, like, you've pushed out a version of main-dot-JS, and then you made a change to it?
Unknown Speaker #9: Yeah, and the user needs to get that change.
Steve: The best thing is to change the name. Call it main-two-dot-JS.
Unknown Speaker #9: Okay. Second question, very brief. Okay, this is more for Doug. What's the fastest server-side scripting language?
Doug: Who cares? Because if it turns out – mostly, we're not constrained by the programming language. We're constrained by architecture, and so who cares is, I think, probably the right answer.
Steve: The other thing –
Unknown Speaker #9: Okay.
Steve: – from the performance perspective is, you know that you would drop the server-side time to zero and most people would never notice.
Unknown Speaker #9: Yeah.
Steve: So whether you use something that's twice as fast as something else doesn't really matter.
Doug: Yeah, and so the – the key insight of performance is always measure before you cut, right? And it's amazing how many people are trying to optimize stuff without looking to see where the hot spots are, and, you know, you could – could theoretically make the fastest possible web server by writing it all in assembly language. Probably isn't going to work. You're probably going to end up with something that's really slow. A scripting language will probably give you more leverage in surprising ways because you get better control over the architecture, over the application, so don't micro-optimize or, as Knuth says, premature optimization is the root of all evil.
Unknown Speaker #9: Right. Okay, thanks.
Doug: All right, so let's all thank Steve. Thank all of you for coming out tonight, and we'll see you next time. Good night.