A bespoke db migrator in rudimentary go.

Posted: 23 May, 2023 Category: backend Tagged: go

"I don't often roll my own DB migrator. But when I do, I do it in a language I barely know." — noone, ever. Until now.

Btw: I had posted this in April but then took it down. In the end I decided my efforts weren't entirely without merit, so here we are again... after some rewrites.


I was going to use golang-migrate. I really was. But I had overzealously skipped ahead and built a SQL template. As in, not a vanilla sql file, but one with all sorts of placeholders for env vars, because I knew that's what I'd want in the end.

I thought golang-migrate would handle sql templates. Upon further investigation though, I found out that:

  1. It didn't handle the use of environment variables
  2. It didn't handle sql templates as migration scripts - (PR open as I write)
  3. It was mightily biased toward up and down migrations.

That last thing is my personal pet peeve. Yes, I know how important it is to back out of things in prod. Yes I know that it becomes important with large teams. I can still irrationally like what I like though, so emoji-tonguepththh!

Anyway: for a go project that's mainly for getting going with go, I wasn't about to complicate my life with down migrations. Down migrations can go kick rocks.

Alas - an entirely different bout of rock-kicking ensued: fast forward to me lying fetal in the hard place, sobbing, clutching stubbed toes repeatedly bashed against the rock that is golang.


confusion


Things I learned

The Basics

  • Files, project structures, go modules, paths, imports, installing stuff... the whole capitalization makes public thing. A lot of this felt terribly alien coming from nodejs world. I'm actually disappointed in how easy it is for a language to so completely rewire your brain on its own behalf.
  • Coming to grips with basic data structures (hello, structs emoji-wave), lists and arrays and their quirks (I am still a bit discombobulated because I was trying to ingest this stuff before the kotlin variants of such things had fully embedded). I did end up needing to define, populate and then ferry around some structs for a bunch of metadata.
  • Also had to go over control structures like for loops, switch statements etc. real fast...

Ingesting env settings

Okay so it turns out there's nothing super-canned (to be fair, I haven't looked hard), other than:

  • the fact that you can prefix your go run commands with env exports, just like any other unix command
  • the fact that there's the godotenv package
    • (no idea if it's the "go" thing to use, but my nodejs wired brain lit up at the dotenv lexeme and before I knew it, I was go geting it into my project)
    • emoji-bulb belated discovery: so turns out viper is a thing. LOL! Totally what I should've used. Sigh...

I actually had to use an initial bootstrap env var on the command line, to tell godotenv where I preferred it to look, then the rest of env ingestion was easy.

Embedding, and files in general

I ran into the thing where you're able to reference source files where they are on disk, and then aren't necessarily able to, in the binary, for a compiled language. The js-marinated part of my brain was irrationally appalled/disappointed at this lack of magic. To the extent that it over-compensated: at one point I began embedding all the relevant SQL files thusly:

//go:embed migrations/*
var migrationsFS embed.FS

I then expected to be able to walk the directory and get files like so:

  • foo.sql
  • bar.sql

as entries, but fs.WalkDir() keeps spitting out migrations/foo.sql, migrations/bar.sql etc. Ten to one I hadn't used the root parameter correctly.

emoji-bulb ANYHOO. Thankfully, I woke up to the sheer stupidity of embedding for this particular use case, and with my new-found .env-snarfing skills, I just passed in an absolute path for the migrations folder. Further maximised farce by realising that I didn't even need this to be a built tool (I mean, who would I distribute this to and why would they accept it? Haha...), so yeah, so far I am just-in-time compiling my way through this for as long as I can get away with it.

Also also: During this portion of the build and other parts where I had to read or write files, I learned a bit more about the os, fs, io/fs and file/filepath packages.

The swiss army knife strings package

The vast majority of string manipulations were actually niftily handled by fmt.Sprintf. Also, strconv.Atoi came in handy a lot when converting between my cutesy nnnn-prefixed migration files to an integer id. Everything else came down to sprinklings of strings.Split, strings.Join, strings.Contains etc. So, nothing complex, but at least it forced me to acquaint myself with the docs for the strings package.

The coolest aspect of strings in go has to be the existence of / concept of runes !! emoji-heart_eyes

The other key strings-related task was hunting down tokens to replace in my sql template files. So there's this, which seemed a smidgen overkill-y for my tiny use case, so I just went with this for now:

replacer := strings.NewReplacer(
  "${DB_NAME}", pgdb,
  "${MAIN_SCHEMA}", mainSchema,
  "${APP_USER}", pguser,
  "${APP_USER_PASS}", pgpwd,
)
migrationStr := replacer.Replace(templateStr)

emoji-point_up This rather gives away my js-world provenance, because I guess the go-idiomatic delimiter would've been the{{ and }} like it shows in the package I bypassed. Sigh...

Talking to postgres

I am using pgx, which seems to be a popular way for go to talk to postgres. I ran into a couple of issues:

  • First: up until this point, my go code was a massively atemporal, not-ordered-in-time, do-everything-whenever-it's-invoked spaghetti bowl. And the universe (and go) had been rather forgiving. Until I created a vanilla db connection, read the last migration from the db, forgot I was doing this elsewhere AND THEN ALSO SET UP A TRANSACTION ON THAT VERY SAME CONNECTION. Whooo!! Faceplant! emoji-bomb!! It was at this point that I realised it was time to finally wade into async processing in go. emoji-sweat_smile
  • Second, I really needed a connection pool anyway. Not because I needed to do 2 different operations, but mainly because it's the only way to get multi-statement sql scripts to work. Connection pools necessitated a sister library: pgxpool.
  • Last but not least, it was necessary to determine the go equivalents for pg types. I wasn't using any complicated types, but even the humble integer pg type (for which int is an alias) needs to map to int32 in go (for which int is decidedly not an alias). I haven't gotten to the point where I use lightweight ORMs like sqlx or sqlc to marshall fields for me; I presume they'd take care of these under the hood... or at least, make you define such mappings precisely once. In short, if you're doing such raw, un-abstracted finagling with pg rows, you probably need tabs open for the table of pg types as well as table of go types, lol.

Contexts

Another learning while poking around with pgx is the idea of contexts: they're great for propagating things like request ids, trace ids etc between different layers of your application (think observability!) but also, timeboxing operations so you can handle timeouts and errors (i.e. for go routines). At this point my context isn't really doing anything yet, lol. On hindsight I could've shoved my config info/metadata in there. I still might.

Concurrency: Go routines, channels and defers.

None of this was at all clear initially (because I did not follow foundational tutorials emoji-see_no_evil). The first thing I met was defer; it shows up in all db connection examples, so it was the first thing I was forced to look up. defer x() basically invokes x() when we leaving the current scope / returning. Handy place to do cleanup tasks such as closing db connections or fetched rows, etc... hence the defers all over db code.

The next thing I had to contend with was go routines: I expected golang go f(x, y) to behave like javascript await f(x, y), but it's actually more like p = f(x,y) where p is a js-style promise... in other words a result that will just disappear into the ether unless you indicate you want a tap on the shoulder when things are done. My go routines were being invoked, but my program would still end abruply like "welp! I done CALLED that, like you wanted... so we're done here. EXIT!". Because Waiting isn't magically built in with await-style syntactic sugar! That's where channels come in...

So I had to learn a bit about channels. And I mean rudimenary, do-all-your-own-coordination channels. I know there's abstractions above that, with groups and what not, but I've only gotten as far as waiting on a channel to know a go routine has actually finished. This can't be the only or even best way... I look forward to staring at more pro go code to see what the conventions are.

emoji-exclamation I still can't get over the syntax ch := make(chan int) like.. I'm sorry but that LOOKS LIKE A TYPE PASSED IN AS A SECOND PARAM... and yet that's not what it is AT ALL. emoji-unamused If I ever write my own programming language and decide to pepper it with lexical unexpectedness, I will be taking a leaf out of make make(chan T).

The core migration handler

This is the resulting core migration handler, in all its newbie glory:

func runMigration(ctx context.Context, config *helpers.Config, pool *pgxpool.Pool) {
	fsys := os.DirFS(config.MigrationsDir)
	files, errFindAllFiles := allFiles(fsys)
	if errFindAllFiles != nil {
		log.Fatal("can't list migration files!")
	}

	ch := make(chan int32)
	chmig := make(chan bool)

	// prepare for migrations
	go getLatestCommittedMigrationId(ctx, pool, ch)
	lastCommittedId := <-ch
	migs := latestMigration(files, lastCommittedId)

	var ok bool
	for _, mig := range migs {
		sqlTemplate := getFileContents(fmt.Sprintf("%s/%s", config.MigrationsDir, mig.Filepath))
		sqlStr := helpers.HydrateSQLTemplate(sqlTemplate, *config)

		mig.Hash = makeHash(sqlStr)
		sqlHashStore := sqlForMigrationsRecord(mig)

		fingerprint := fmt.Sprintf("Migration for [%d][%s][%s]", mig.Id, mig.Name, mig.Hash)

		tx, _ := pool.Begin(ctx)
		go runInTransaction(ctx, tx, []string{sqlStr, sqlHashStore}, chmig)
		ok = <-chmig

		if ok {
			fmt.Printf("✅ Committing: %s\n", fingerprint)
			tx.Commit(ctx)
		} else {
			fmt.Printf("❌ Failed - Rolling Back: %s\n", fingerprint)
			tx.Rollback(ctx)
			break //stop processing
		}
	}
}

emoji-point_up So a lot of other preprocessing happens in other functions before we get here. A summary of those other pieces:

  • the Config and Migration structs that I'd defined for useful metadata
  • the ability to walk a directory and find all the relevant fles (yeah, shoulda been called getAllFiles, not allFiles!)
  • the use of go routines and the attendant go channels
  • determining the migrations (migs, of type Migration) that still need to run, after comparing with / checking the database (via the very badly named latestMigration)
  • "hydrating" sql templates with env vars to yield the proper sql scripts
  • db transactions with error handling to commit or rollback

Improvements

  • Should probably lay off log.Fatalf() a bit? Apparently it's pretty drastic, as far as terminations go. emoji-sweat_smile
  • Should learn go testing and add a test harness!

Overall: was a good thing to try and build, for the sake of self-learning / self-teaching basic go. I learned a lot! Hope it encourages others to pick up golang and give it a shot.