AngularJS & SEO - finally a piece of cake

To be honest, timing isn’t my strongest feature - and wasn’t at any time. But taking my time before getting down and dirty with AngularJS SEO was worth its while.

About a month ago, at Friday, May 23, 2014 at 9:00 AM, Google announced that they're finally crawling javascript. Not like before, unpredictable and on minimal scale, no. The whole thing.

If you're here for code, than TL:DR - skip to Part #2.

Part #1

I agree. It’s not 1998 anymore, and yes, it would be super awesome if the powerful apps we’re able to build nowadays, are crawlable and therefore findable by our end-users - in all they’re beautiful awesomeness. One can wonder about all the time passing between cause and effect here. Rumors say, we're owing these new capabilities to the Chrome browser - which opposedley could be the Googlebot itself. Meaning, Google had some big crowd-testing going on for a while. And most of us been on that party.

And now everything just works - no matter what? Well, at least it should be mentioned:

Sometimes the JavaScript may be too complex or arcane for us to execute, in which case we can’t render the page fully and accurately.

So in the end, be cool, don't get too funky. If you have to because you're a bad boy, it's debuggable. Staying in this range of compatability won't harm your project, but gently forces you into rational conventions, hopefully resulting in better and cleaner code. And a better product.

So - enough fuff-fuff. To the meatballs.

Before this, if you wanted to be seen with your Single-Page-Application on or by whichever result-spitting html-splitting machine, you had to do an awful lot of work to get it right.

A brief summary of the steps involved:

  1. building the SPA itself
  2. building (or using*) a service that’s able to pre-render the html for your landing-pages, with it’s own routing and magic stick collection
  3. making your SPA distinguish between normal users and crawlers - and re-route (somehow) to the special crawler-only-endpoints if a bot is requesting the page
  4. creating and serving a sitemap pointing to the crawler-only-endpoints

The right now (if it’s enough to be seen by mighty google only... which is like almost the whole pie anyways):

  1. building the SPA
  2. creating and serving a sitemap pointing to the actual end-user-urls

done!

A statistician would've shouted out: Hooray, a fifty percent decrease! But it's even better. Going with an SPA in the first place is a gain because, generally and subjectively speaking, one is able to have a leaner and better decoupled codebase, less complex APIs, smaller traffic footprints - i could go on. And now there is the SEO side, too. This was the killer argument, especially for small and mid-range projects. Too much effort to put in a feature that comes for free everywhere else. With the only thing left to do, when you have you're app up and running these days, is kicking out a nice little sitemap (which can be achieved in a lot of ways - look below for an example), this con argument just goes away.

Bringing a customers project from serving a sitemap with five simple static links to the main entry point of the app and some static landing-pages like ‘about-us’, to be fully crawlable began, what else could it be, with a search on Google. Not knowing about the Googlebots evolutionary jump, i started to search for working examples on the topic. A little bit overwhelmed but in good mood afterwards - let it'd been a rough 3-hour-search-read-and-coffee-break session - my status was this:

http://www.ng-newsletter.com/posts/serious-angular-seo.html

http://www.yearofmoo.com/2012/11/angularjs-and-seo.html

https://coderwall.com/p/vqpfka

https://prerender.io/

http://www.brombone.com/

http://getseojs.com/

and so on.

The TL:DR quintessence is, be a hero and build the html-snapshot-and-rerouting stuff yourself, for example with angular and PhantomJS, like it's lined out in the nice yearofmoo article. Or, go somewhere and pay for it. To be fair, there are some well lay out service providers for this and the whole technique still has its right to exist, getting even mentioned in Googles announcement:

It's always a good idea to have your site degrade gracefully. This will help users enjoy your content even if their browser doesn't have compatible JavaScript implementations. It will also help visitors with JavaScript disabled or off, as well as search engines that can't execute JavaScript yet.

And they care, even about their direct competitors. Breathtaking. Nevertheless, it's a niche and will definitely stay one. But it won't disappear completely - like it's the case with most stuff.

Coming out of my research session, grabbing a cup of black fuel from the offices gas station, i stumble upon the highest in hirachy, telling him straightaway about my newly enriched information pool. He just echoes: Not relevant anymore!

Somehow he had found out. The announcement.

Quickly I double-checked - and questioned my hardly trained searching skills afterwards. How could this small but important piece of information didn't show up to my eyeballs throughout the last hours? Must be their SEO guy.. that ones on him!

Rethinking the process, for an AngularJS app built by us roughly a year ago, made the result even more enjoyable for myself, seeing all the spared workload. For the sake of completeness, we're finally coming to the part, describing the implementation.

Part #2

Starting point.

Productmate is a business based in Hamburg, Germany. They are providing a showcase for local businesses - where they can present their products, curated and chosen by the Productmate team, ensuring a high quality standard and a certain red line throughout the offerings presented on the platform. When the project was started a year ago, it was a risk to get in bed with such a pre-mature gal like AngularJS, at least relating to SEO. But the timing was perfect - looking back now.

The Productmate app consists of the AngularJS front-end served by an rails-backend. Previously it was working with AngularJS's default, hashtagged-URLs https://productmate.de/#/shop/sh-dessous, telling the browser that the concerning route isn't one, that should be handled by the server, but instead by the client itself. Going down the old road would've meant changing these to hashbang-URLs as specified here: https://developers.google.com/webmasters/ajax-crawling/docs/specification?hl=de. The new one would be, using the pushState feature newer browsers have, which still falls back to the hashbang method if pushState isn't available.Therefore the old routing continues to work side-by-side with pretty-urls: https://productmate.de/shop/sh-dessous.

In Angular, if your app-configuration looked something like this before:

angular.module('awesomeApp', ['ngRoute', 'ngResource', ...]
  .config(function ($routeProvider) {
    $routeProvider
      .when('/', {
        templateUrl: 'views/foo.html',
        controller: 'FooCtrl'
      })
      .when('/b/:bar', {
        templateUrl: 'views/bar.html',
        controller: 'BarCtrl'
      })
      .otherwise({
        redirectTo: '/'
      });
  })

you'll have to inject the $locationProvider, activate its html5Mode:

angular.module('awesomeApp', ['ngRoute', 'ngResource', ...]
  .config(function ($routeProvider, $locationProvider) {
    $routeProvider
      .when('/', {
        templateUrl: 'views/foo.html',
        controller: 'FooCtrl'
      })
      .when('/b/:bar', {
        templateUrl: 'views/bar.html',
        controller: 'BarCtrl'
      })
      .otherwise({
        redirectTo: '/'
      });
    $locationProvider.html5Mode(true);
  })

and eventually you're good to go already. Depending on your current configuration you'll maybe have to change your asset paths or the base path (also mentioned here: http://scotch.io/quick-tips/js/angular/pretty-urls-in-angularjs-removing-the-hashtag). If not already configured correctly on the server, you'll have to re-route every request not targeting the main entry-point of your app, which will be "/" naturally, or your app is only working when going directly through the root path. If one tries to reload or navigate directly to a link not beeing the root path, he'll get a simple 404. That's because the browser has no indication anymore, wether it's dealing with real or client side url's. Hence, the request gets handed over from the browser to the server, who can't find anything.

In the simplest case you just have to get your server to serve back the app on all routes (or just the ones you want, i.e. by using regexes in route dissolvment), which then picks up with the routing once fired up, far away in the clients basement. This step relly is dependant on the which, what and how of your setup.

Comforting is, to mention it once again, when making this change to an app already up and running for some time, the old #-URLs continue to work in html5Mode - meaning exisiting references to it don't get broken anywhere.

Beeing finished with these preparations, one can head over to the Google Webmater Tools and finally try out the new landing-pages, by previewing them through the GoogleBot eyes directly with Fetch As Google.

On top of that you should consider these steps we have taken to ensure visibility:

  1. Creating a sitemap giving the crawler directions where to head (read more about sitemaps here http://www.sitemaps.org/protocol.html). Worth mentioning is, that you're able to create sitemaps pointing to sitemaps or to serve custom sitemaps depending on the route. For example, we're serving a sitemap containing some static landing-pages, a second containing all the shops that are registered on the platform and a third one containing all the shop-products.
  2. Enriching your app with meta informations. Besides getting found, you want to be pleasingly presented in the search results. For Productmate we've added a PageTitle service, as seen in this StackOverflow answer.
  .service('PageTitle', function() {
      var title = 'Productmate';
      return {
        title: function() { return title; },
        setTitle: function(newTitle) { title = newTitle; }
      };
    });

Using it in the template then looks like this (ng-bind prevents flickering, which will be an issue if you use curly braces like shown in the given StackOverflow link above):

<title ng-bind="PageTitle.title()"></title>

Additionally we set up a service to dynamically change the meta tags content in the head section of the page:

  .service('MetaInformation', function() {
      var metaDescription = '';
      var metaKeywords = '';
      return {
        metaDescription: function() { return metaDescription; },
        metaKeywords: function() { return metaKeywords; },
        reset: function() {
          metaDescription = '';
          metaKeywords = '';
        },
        setMetaDescription: function(newMetaDescription) {
          metaDescription = newMetaDescription;
        },
        appendMetaKeywords: function(newKeywords) {
          for (var key in newKeywords) {
            if (metaKeywords === '') {
              metaKeywords += newKeywords[key].name;
            } else {
              metaKeywords += ', ' + newKeywords[key].name;
            }
          }
        }
      };
    });

In the template:

    <meta name="description" content="{{ MetaInformation.metaDescription() }}">
    <meta name="keywords" content="{{ MetaInformation.metaKeywords() }}">

In your controller, service, directive or whatever you can now inject the service and put in the data you want to appear in the meta tags, resetting and refilling it on route changes. Put the service on the scope and access it from your template.

Finished. You should be Google-Ready by now. Congrats!