Tuesday, October 8, 2013

The internet is getting slower, time to speed it up!

I figured that I would create this blog to get a movement, well, moving.

Computers are faster than ever before, even mobile phones have dual core multi Ghz processors.  Network speeds are faster than ever before, 100Mbit residential connections are within grasp of many consumers, 25Mbit is moving towards the norm in most of the western world.

Then why is it, that on my dual core computer with 8Gb of RAM, it often takes my computer as many as 8 seconds to display a simple web page?

The answer is, that the page is not that simple.  Not simple enough, in my books - if it takes 8 minutes to load.

Scripting engines like Javascript, Flash, AJAX and the like are stuffed into pages.  While the main reason for slowdown on web pages used to be images (due to the data) the fastest growing segment of web page data is scripts.

In 2000, the average page had about 25kb of scripts.  2010 that number was 150Kb  In 2013 the number exceeds 200Kb.  With even a 5Mbit internet connection, 200Kb in scripts is not a big deal to download.  The staggering page display times dont have to do with the amount of data being loaded.  It has to do with the amount of CPU downloading to the browser - the web browser has to render much more than ever before, and this is getting worse by the week.

This is caused in part by geeks wanting to make your computer browser the way to run applications.  "Cloud Computing" is the buzz word.  While web page scripts rendered on the client browser side are only a miniscule part of what "Cloud Computing" is all about, the drive by developers to make web pages run like native applications has caused an exploding number of tools to create drop down menus, pages that load without reloading the page, ways of hiding and displaying content making pages more dynamic and the like.

The problem with this is twofold:

The web protocol (HTTP) was never intended to run applications similar to what you would run on your computer, such as if you double clicked on a program installed to your computer on your desktop or from your start menu.  Google the web browser, "Lynx" and download it.  This is the original intent of the HTTP, HTML and WWW protocols.  See how web pages look when brought back to the original design of the HTTP, HTML and WWW protocols.  Looks like crap, right?

Well we have come a long way since the 1980s.  Thats for sure.  Not all of these improvements are bad.  Robots like Googlebot see web pages like this, script free and image free.  Imagine if the billions and billions of pages indexed by Google all took 4-10 seconds to load.  Google would have to have server farms large enough to make a noticeable effect on the global climate.  This is why SEO experts suggest looking at your own pages in Lynx when tweaking your pages.

The other part of the problem is the advent of the pop up blocker.  While this was a marvelous thing - popups were really quite annoying and they were overused - ever since advertisers have tried to find novel and innovative ways to force you to look at their ads.

Now websites have these annoying in-page popups.  If you have an older browser, computer, or mobile device, these bloody things can actually crash your browser.  The main driver of requiring you to upgrade your browser these days is to install features so that advertisers can annoy you.

Hold the phone.  Does that make any sense?  Television viewing came a long way with the advent and mainstream adoption of the PVR.  On one medium, we have found a way to finally skip the commercials (I recall just before the mainstream adoption of PVR technology watching a major national newscast - I had a PVR - after skipping all of the commercials the hour long show was finished before I could even finish my dinner.  And I dont eat big dinners) yet on the other medium (the internet) we are downloading and installing updates so we can get ever more annoying advertisements? 

The answer is not to wind back the clock, deep six all types of scripting and return to looking at pages in Lynx.  These individual tools are useful for their purposes.

However to have a web page with Facebook API, Twitter API, Disqus, a zillion flash, java, javascript, AJAX and other scripting engines all running simultaneously with each new script providing little to no benefit to the user who is there to review the content is a major flaw in the web design industry today.  Customers want bells and whistles, web developers do their best to wow customers with the bells and whistles.

Amazon.com noticed that for every 100 millisecond in delayed page rendering, they lost 1% in revenue.  Some smaller sites noticed that by speeding up their pages revenue was increased by a very noticable amount.  I myself, if a page is taking too long to render (personally I view a lot of news story pages) I just hit the back button to Google News and look for another similar story.

It is in the best interests of web site owners to speed up their pages.  It drives revenue, it drives eyeballs.  Your advertiser may pay a pretty penny for that annoying in page popup, but if people nix your entire page then you are paying a pretty penny for that pretty penny.  People just stop going to your site altogether.  There are billions of web pages.  Unless you really have a lock on the content your burying behind all that annoying dazzle, someone else will nab your user base.

Now I have major issues with "Cloud Computing" and "Software as a Service (SaaS)" in general.  These technologies are driving most of these hyperscripted tools that are finding their way on web pages all over the place. 

Of course if you are trying to build a web browser loaded word processor you're going to have no choice but to use innovative scripting techniques to make a web browser (originally designed to show static pages of content) act like a word processor like Microsoft Word (highly dynamic content).

The problem I find with cloud computing is that (a) your saved file data is never in your posession, (b) the service can go away or crash at the blink of an eye, (c) you can not use the software or service when you have no internet connection, and (d) it is way slower, more clunky, and less efficient than just running a native application coded for your computer.  I think it's a fad.  Cloud computing will always have its place in some applications - webmail is a great example - but the virtualization of everything will, I beleive, be realized one day to be a dumb model for a way to move everything.

Think I'm nuts?  Well on that iPhone or Android mobile device you're using; do you download your apps and install them, or do you use the web browser for everything?  The answer is obvious.  Even in the most cutting edge, native code is here to stay.  The reason why mobile devices are not cloud heavy? 

Here are the major types of code you will encounter in your computing lifetime:

1) Machine Language or Assembly

This is the absolute fastest turbo code you will ever run.  Usually the Operating System has large parts of it written directly in ML or Assembly for speed. 

It is quite difficult to code directly in Assembly, and it is usually very processor specific, making if difficult to port from one platform to another without entirely rewriting the code. 

Its blazing fast, and if we had more people willing to code in assembly, computers would probably never have to get much faster than they are today for tasks other than video, imagery, audio processing, and other CPU intensive tasks.  Although tools written in assembly for these tasks would probably boast tremendous speed advantages over current applications for the same purpose.

2) Compiled code

There are a lot of different programming languages, but the most popular are variants of C.  Programmers write their code and save it in plain text files, test it using their favorite code editor (which will selectively compile temporarily the sections of code which are being tested, in a manner similar to interpereted languages, which I will touch on later) and when the application is ready for hard testing or release, they will compile the code for the operating system and processor of each platform using a compiler for that plaform.

Compiled code is pretty fast, because the compiler will do it's best to make it into as efficient of assembly or machine language as possible.  However, unlike code written directly in assembly or ML, the compiler will have some inherent inefficiencies.  They make use of libraries, which may contain a bunch of code that the application does not use.  While developers can omit libraries that are never used, each library contains a variety of functions that are loaded in order to be available by the source.  Different compilers are more or less efficient than eachother, but by design compilers are intended to produce a wide variety of types of applications for different tasks in order to be useful to the widest number of developers.

Therefore, compilers cannot fine tune each instruction cycle of the CPU to do exactly what the devloper intends.  They speed up development however, so compiled code is a great thing.

You may have noticed in some video games, which were "ported" from one platform to another, that the great feature set of one platform is not used.  This is a somewhat lazy way to make compiled code available on another computer.  With virtualization of video cards away from being platform specific, this is not as bad of an issue as it once was.  A great example was computer games written for the IBM PC in the 1990s and early 2000s which were then ported to plaforms that had far better graphics hardware like the Amiga and Atari ST.  A more recent example is the failure of the Sony, IBM and Toshiba Cell processor in the Playstation 3.  The processing engine was tremendously powerful and gave the ability to have games with literal supercomputing power behind their engines.  Unfortunately, developers rarely developed exclusively for Cell powered PS3's; they wanted to develop for PC, XBOX, Nintendo and PS3 markets.  So they wrote code that was poorly optimized for the Cell's incredible horsepower. Yielding to devlopers for ease of development rather than amazing power, Sony yielded to their developer base and switched to Intel; a far inferior platform.

3) Interpereted Languages

Almost everything on the web is interpereted.  The PHP in the background that runs most website is interpereted by the server.  The AJAX and other scripts bloating web pages is run on the client's web browser.  Some server side optimizers will precompile a variety of PHP or ASP files to reduce server CPU load.  There is little opportunity to do this on the client side browser, because by it's very nature, web browsing is the process of visiting a wide variety of pages which are regularly updated.  The client side could compile and execute the page, but it would then take even longer to load; the speed boost would be on the second time the page was loaded if it had not been changed.

Caching is used by virtually every web browser and proxy server to speed up page loading times.  The banner logo of your favorite website, for example, likely rarely changes, so why load it each time if it has not been changed?  However that might net a speed boost by causing you to not load a whopping 2.5kb.  On a 25Mb connection that might save you twenty sixtieths of a second.  Certainly not a big deal when the page takes 5 seconds to render.  While some other segments of the page might be cached, such as several images and maybe even some externally loaded scripts, caching with current network speeds doesnt really do much anymore except for larger files.  With the sheer number of elements to download on web pages, caching can actually slow down loading the web page.  Some pages have more than 200 different file elements; your browser has to query the server for the last updated date of every single one of those files and compare it to the version you have cached.  Try turning caching off entirely - you may notice that some pages actually load faster when you just download everything every single time.

Finally - interpereted languages are the slowest of all code.  The reason is because each line of code must be translated from text into some form of source code that the computer can execute, line by line, in real time.  While the developer has saved you (or your client) the hassle of compiling by precompiling their code with a compiled language, with interpereted languages the developer has given you and your browser the favor of having to recompile the code EVERY SINGLE TIME YOU VIEW THE PAGE.

In comparison, if you had to compile Microsoft Word every single time you opened it, it might take a powerful computer as much as 4-6 hours to load the software.  In order to reduce load times, they could selectively compile portions of the software as you used them, but again this would result in absolutely staggering wait times, say 25-30 minutes when you hit the spell check button or 5-10 minutes when you clicked print preview.  You wouldnt ever use that software again.  Microsoft isnt that dumb, so they compile it as efficiently as they can.  However even compiled software can get bloated.  At one point, when Microsoft upgraded Microsoft Works from 4.0 to 5.0, they found over fifty megabytes of code that hadnt been used since version 3.

So the problem?  On the web, whether server side or browser side, something is compiling that code every single time someone views the page.  Servers are often very fast, efficient machines running on microkernel operating systems, and sometimes they preoptimize by compiling.  Your web browser doesnt have that luxury.

The result?  You have to get an ever faster browser, ever faster CPU, and ever more memory to load a web page where in most instances all you wanted to do was read a page of text or view a few pictures.  The text can be displayed blazingly fast; try uploading a TXT file to a web server and hitting it with your browser.  Practically instantly.  Try uploading a few images and an associated simple HTML page to view them and hit the page with your browser.  Depending on the size of the images, practically instant at today's interent speeds if they have been optimized for web viewing (that 10 megapixel digital camera image would never fit entirely on a web browser page at a 1:1 pixel ratio - it would take up 3x3 monitors at full resolution, so why bother sending data that will never be seen).

Whats sick, is that most of this CPU churning interpereted garbage slowing down your pages is stuff you have no interest in.  The advertiser floating something in front of the content you want to see.  The flash advertisement encouraging you to roll over to see a video you dont want to watch.  Some AJAX thing floating at the bottom of the page hoping that you will tweet, twitt, share, reddit, and foursquare spam their page to all of your social media contacts.  Do you really care to execute a quarter second's worth of code on *your own computer* to push analytic data to enrich them to their advertisers that skims your computer's cookies and tracking data?  Think of those advertisements that have no relevance to the page whatsoever, because a few days ago you were searching for a car; now every website you visit is loaded with car ads, even if the web page you are visiting is all about fighting climage change.

Can you just switch it off?

Unfortunately, a lot of this stuff is integral to how the page is viewed.  If your browser has a miss on an AJAX page, the whole page will be poorly formatted and may be unusable and unnavigable.  Try turning off all scripts and viewing a facebook page.  Garble a hundred miles long, if you can even see anything that you are looking for.

Facebook is actually a prime example of a web browser application trying to look like a native mode code application.  I prefer viewing facebook content on a mobile app, mostly because a regular browser runs it buggy, slow, it crashes or hangs.  Ironically facebook is struggling to monetize mobile use of facebook, because browser loads are way down.  This is more their doing than anyone elses, and I dont think that its just the general shift to mobile devices.  If facebook had a downloadable application for Windows that made it easier to navigate, it would probably rapidly become one of the most popular downloads of all time.

Am I some starchy old guy dinosaur luddite who can't get with the times?  Not really.  I'm 37, and I did web development from the tender age of 24 until I was 33.  I am also a network engineer.  Given my age I may long for the good old times when you could download a news page in 2 seconds on a 56K modem, but my growlies are not for going back to the 90s and modems, it is frustration that my computer is now 200 times faster, with 100 times the memory and my network connection is 100 times faster (if not more, I have a 50Mbit connection; I am to lazy to do the math on that one) but the pages now take sometimes 5-10 times longer to load than they did then.

This is not progress at all.  With the advancements in bandwidth and computing power, web pages should be loading instantaneously.

A curious discrepancy here is my Blackberry Playbook browser.  The PB runs on QNX which is a very aggressively virtualized operating system and Blackberry always had security first and foremost in mind.  The tablet may not have been the most successful, but one interesting quirk of it's browser is that it completely wipes most of the cookies and tidbits every time the application is closed because it is opened inside a new virtual space.  Pages load incredibly fast, it runs circles around my PCs, with either IE or Chrome browsers.  Script stuffed pages still take a while to load and the occasional page will crash the browser because it either doesn't have the software in-browser to run it or a runaway poorly written script will stuff all the memory and it closes the browser to maintain over all OS stability.

Here's another interesting thing to notice:  open task manager in your Windows and then open your web browser.  Hit page after page of popular websites and watch the memory usage of your web browser as you hit different pages.  It is not even surprising anymore for an IE session to reach 300+ megabytes of RAM usage.

Now that's insane.  You need 300 megabytes of RAM to display 1.5 megabytes of downloaded content, whereby less than 500-600 kilobytes of that is the stuff you actually went to the page to view in the first place?  Something has gone horribly wrong here.  I know memory only seems to get cheaper but needing 300x the memory of the downloaded content to display it is absolutely haywire.

I've talked to ISP technical support reps and this singular problem is what causes the most support calls.  "Why is my internet so slow?" they get asked over and over again.  They lead the user to a speed test page, which usually shows that they're getting speeds well into the megabits, hardly slow.  Then the user walks the rep over to their favorite web pages, bloated with excessive scripts, and the rep has to tell the user that it's their favorite sites at fault, not them.  Then they are asked a scathingly simple question, "Well if my internet is so fast, how is it that it takes my brand new laptop close to ten seconds to load a simple news story from my local paper?  It's only a page long?"

ISPs have even tried to correct this problem for web developers, to little success.  They have hypercaches for the most visited web pages, that actively seek out updated pages so as to keep their external bandwidth down.  They have throttled and traffic shaped bandwidth culprits, like torrent sites.  They have experimented with inline code optimizers (that fix the web developers shoddy code and wipe out unneccesary analytics before the user's computer ever sees a byte of data from the site).  Unfortunately, the scripts and code engines are too vast and change too often, so collectively the ISPs have wasted millions - trying to clean up shoddy developer's code just to reduce the support calls.

What is the answer?  It's hard to tell.  Even Google, which was once known for having teeny tiny pages that loaded blisteringly fast, is starting to bloat up their pages with doodads and features.  I dont go to google to get drop down menus, I go to get search results.  Google would be a great certification method for clean source code because they are the most popular search engine.  A simple google test periodically on listed pages to detect bad, unneccesary, scripts and exceedingly long page render times for the amount of content actually provided would incentivize developers to write clean, fast pages with a minimal amount of scripting.  A simple warning beside a page that failed the test - "This page may take excessively long to display due to excessive scripting" would warn users that they may be wasting their time, battery power on a bunch of crap they dont want nor need.  They would just scroll down to the next entry on the list without the warning.  If it actually prioritized pages in search results to approved pages at the top, developers would clean up their pages within a month or two.

Also, having a for-profit corporation in charge of certification would come with it's own deep seated questions.  What code would be extraneous, and what would be deemed worthy?  Who would decide?  What about anticompetitive practices, or open internet?  Google's refusal to support Blackberry or Windows Phone products for mapping, for example, shows that Google is now willing to use their clout vertically to squeeze competitors in their android space. (Worthy of note is that an android app could be sideloaded to BB10 by a company as sophisticated as Google in under a few hours)

A body like ICANN is unsuitable for the job.  They assign internet blocks of numbers, and deregistering static IPs due to bloated web pages could mean hundreds of sites removed per IP (shared servers display many websites off the same IP).

Registrars also would be poorly suited for the job.  They make money by selling domain names; it would be a race to the bottom, who would offer the most lax code inspection at the lowest price.

Browsers would also hurt themselves by refusing to display bloated pages.  Who wants to download a browser that wont display an obscure web page when you need it, regardless of how bad it is?

Really, at this point the only way to drive change is by us users ourselves.  If a page takes too long to load, dont view it, if you really must view it complain to the operator, and absolutely do not click on any advertisement that is blocking your view of the content you came to see.  If the operators lose money, receive complains, and lose traffic, and the advertisers do not see any business, many of these practices will find their way to the dustbin.

Another route is for social networking sites to terminate the availability of their in-page scripts for website URLs where they dont get used, and to include this functionality in their API.  If website X has twitter, facebook, reddit, foursquare, and a zillion other "like/share/etc" buttons, have the API remove the script and replace it with a single simple button if it does not get used within a reasonable amount of time, say a month.  Someone clicking on the button will reactivate the script on the website for a limited amount of time.  If it gets used daily, then it never gets deactivated.  The social media sites could be a great help; as a lot of the extra scripts on pages are for interaction with their website.  Plus this is a lot of extra bandwidth and resources for them.  How much effort does it take facebook for example, to deterine how many likes a particular URL on some random site has had?  Facebook has more than enough computing power, but if their webmaster API is set up on 10 million websites but only 1 million get much use, that's 9 million websites where every time the page is loaded they have to make a query.

There will be a time when computing does not get much faster.  It's already largely here.  Desktop PCs are no longer chasing the Mhz / Ghz bandwagon and are now installing more and more cores.  A dual core PC and an 8 core PC are not much different in speed for most ordinary tasks; a thread can only run on one core at a time.  Web browsing does not lend well to multithreaded coding and if it did then the problem I am highlighting here would be much, much worse (one core to interperet each language, like javascript and css and etc, would be insane).  Mobile devices are at the multicore stage.

So as developers jam and jam more useless crap into pages, they hurt not only themselves, but the web as a whole.  People are turning back to apps; especially on small mobile devices.  Ever try viewing Facebook on your mobile phone's browser?  Its practically unnavigable using touch.  THere's a reason Facebook tells you to get the app for your mobile when they detect you're on a phone.

As people turn away from the web, we hurt choice as a whole.  Take a look at some of the way that Apple has limited choice.  There was an app on iTunes that was a game about throwing your phone as high in the air as you could, rating your ability to toss your phone against that of your friends and all players as a whole.  A stupid game; for sure, caveated with warnings that you could destroy your phone.  But do we want Apple and Google to get to say which content and games we can use and which we are too stupid to judge for ourselves?  Do we really want to give the kind of control that the web robbed from the mainstream media and software publishers back to companies that have controlled ecosystems like Apple and Google?

While that is perhaps a bit of a doomsday prediction and a bit of a stretch from my initial gripe of overly bloated web pages; it's already happened with many forum sites.

I ran one of the most popular community forum sites for a sport that I liked.  When I started it, there was little competition, it grew quick and fast and it was unique, eventually reaching millions of hits per month.  Competition sprung up all around and flourished in their niches but with our head start they didnt effect us much.  With the advent and ubuquity of Facebook growing and growing, in polling my competitors sites to see how we were doing relative to the pack, I eventually noticed that 90% of our competitor's forum sites were dead, with the most recent posts sometimes months and occasionally years old - and some of these sites had been quite busy at times.  Our site's traffic was way down (from millions of hits per month to tens of thousands).  Where had these people gone?  To Facebook Groups.

The reason that story grips me is not because eventually my own community forum got so slow that it wasnt worth running annymore, although that was a sad day to give up on what was once the second busiest site of its type in the world.  What bothered me was the siloization that that caused.  My site attracted people from across the country interested in a sport.  People met eachother on my site, formed entire groups and teams that spanned multiple communities; users when travelling would meet up with eachother thousands of kilometres away like old buddies.  It was truly a community unto itself. 

Now that everything is on facebook, groups are smaller and more insular.  Most people who connect with eachother already know eachother through maybe one degree of separation.  While there are millions of different groups, you wouldn't know what to search for and the group itself might be closed and innocently reject an outsider they feared was just trying to spam their page.

While this was most definitely the choice of the individual users; it wasnt by choice that they wanted to see the mega communities get shut down.  Even if the mega community keeps it's page up, if there is no traffic then ultimately people will visit it less and less until they dont visit it at all.  Nobody wants to pay for and maintain a site that is merely a collection of links and self promotions asking people to visit their Facebook group's page.  Facebook is worth billions; why would a small time web site operator keep paying a few hundred bucks a year to bolster a billion dollar company.  Some community forum sites actually block facebook links in order to hope to survive.

But I digress.  While there are similarities to Facebook and Twitter decimating community forum sites and what I see as a future move to apps due to bloated web content the two are not the same.  In that aspect the ease of posting something on facebook (which they had as an app on their mobile device) slowly dwindeld the content they seeked on standard web pages from a desktop computer or mobile clunky web browser.  The bloating of web pages will have a similar effect in driving people to apps in general, and we are ever more likely to see the concentration of eyeballs in apps within closed ecosystems.

If nobody cares about the web much anymore then nobody will develop for it and they will all move to apps and the cycle of The Next Big Thing will continue.  That's a long ways away.  It would be sad to see such an empowering thing such as the world wide web implode due to the overzealous ingenuity of the geeks who embraced it for it's universal access and brought it to the forefront in the first place.