Routing Shenaniganry

Posted: 27 July, 2019 Category: code Tagged: gatsbySPA routing

I've spent the afternoon tidying up a lot of things to do with links, routes, missing pages and seo leakage.

(1/3) A bad link: Only in prod, only on first page load.

SOLUTION

  • don't set pathPrefix in gatsby-node.js if you want full control of site paths (tho' whatever you do, strive to end all paths with /, in conformity with gatsby).
  • force 'clean' of .cache/ and public/ during production builds

BACKSTORY

Homepage link: TOTALLY BORKED, BUT ONLY in prod. Oh and ONLY on first page load. WTAF? I can't stop marvelling at the weirdness of the effects, when gatsby tanks.

I lost a bit of time checking and re-checking all my gatsby <Link> constructions, as well as the dynamically generated ones in gatsby-node.js. No joy. A fair bit of googling later, I come across a gem: something to do with pathPrefix in gatsby-node.js gumming up the works. Toss this setting, seemed to be the general advice. I did. It worked.

By which I mean, now that I had broken it in two places, two wrongs were now making a right: The site in question was birthed from the loins of an old starter template, and one of the helpful things that the starter tried to do, was to help you configure the site url and path prefix in a config file. But the whole reason you're even using a starter, is that you're a starter, so the whole gambit is lost on you... in fact, when you get around to constructing your own paths for things, you've already forgotten that you set a path prefix... and you in fact gleefully supply such things anew, manually prefixing "/" everywhere.

Somewhere deep inside gatsby, something may have grumbled in disapproval, but you didn't know. The thing grudgingly built, and the site worked. Until you decided to add this new link on the home page. That was the last straw. "F@*k it!", goes Gatsby. "Learn to build your kottam paths properly!".

Back to the solution: having made sure I now had consistent path-prefixing, I turned my attention to the "only in prod" problem. Whenever gatsby started behaving weirdly, I had the tendency to nuke the .cache folder. So you can imagine how gobsmacked I was, a couple of goole searches later, to find that nuking .cache/ without also nuking public/ generally confuses gatsby (which, to be fair, would rather you leave both folders alone). Huh, I thought. I peeked inside the package.json file and shockingly, the production build step, did neither: it neither wiped the cache, nor the public folder. It just glibly went on to build prod with potentially dev assets still kicking around. Even though its author had seen fit to also add a clean command, it wasn't being invoked when you most needed it. Gah! I prefixed the build with the clean command and FINALLY. Consistent behaviour at bootup!

(2/3) The 404 page that wasn't

SOLUTION

  • strip out any runtime intelligence (eg gatsby info, runtime javascript that would live in gatsby-browser etc.
  • keep it as close to bare-bones html + css as possible; gatsby is going to spit it out into a TOTALLY static html file anyway... it won't be included in any of the SPA magic.

BACKSTORY

I hadn't built a 404 page yet. Now that a link was broken and I actually needed one, it reminded me to build one. Well it didn't work!

I can't pin this one on gatsby. This was all me. I had forgotten the "brilliant stupidity" inherent in all 404 pages: they have to be totally lacking in any intelligence whatseover, to truly shine at what they do.

404 is ALL runtime. Gatsby is ALL buildtime. I had the 404 react component rigged like all the others on the site, wrapping itself in the standard <Layout>, which in turn bootstrapped all kinds of javascript jiggery-pokery. One of these things was a loading... screen, left over from the site's php-driven, pre-gatsby incarnations. Now you had to know when to turn off the loader... this ancient site was just clunky/janky without it. And the only reasonable place to do that, ended up being in the onRouteUpdate() call inside gatsby-browser.js.

So you can guess what was happening: load site => bad url => 404 page => loading... loading... loading... loading... (it never stopped loading) In the end honestly just stripping out the loader component and manually in-filling portions of the overall layout got it working.

(3/3) SEO leaks due to netlify domain names and paths

SOLUTION

  • add a _redirects file at project root to map the netlify domain to your own
  • install and configure gatsby-plugin-canonical-urls (for both of these, google is your friend).

BACKSTORY

What I haven't mentioned so far is that, I noticed all of this in production (hosted at netlify) first. So the actual site behavior, end to end was:

  • load site =>
  • click new link =>
  • pathPrefix confusion =>
  • no such route =>
  • 404 page =>
  • no 404 page?! wtf! =>
  • bounce to Netlify's own default 404.

A spectacular chain of FAIL.

With the earlier 2 fixes though, everything worked perfectly now, but it DID remind me that the netlify substrate was there... including their fun domain names and deploy urls all potentially leaking seo traffic (not that we care about such things; its more the penalties-by-dilution).

The added _redirects file will cause netlify's build and deploy process to setup http redirects for you (away from their made-up name for your domain, and toward the proper name for your domain).

The canonical urls plugin will make sure that every page and every link makes reference to the "canonical url" of your site, so that regardless of the current domain or url or link href, a search engine bot knows that "all of the base are", in spite of lexical appearances, "belong to you".

Happy routing & linking!