Quickly configure a keycloak server for Single Sign On

I’m writing a tutorial on how to make Single Sign on work with the Play Framework in Scala and how to integrate Silhouette authentication library with Keycloak. (this isn’t published yet though) One part of that tutorial is spinning up a Keycloak server you can run your app against. These are the minimal steps required to get something running.

  1. Spin up a keycloak server in a local docker instance
  2. Add a client app with a secret
  3. Add a user with a username/password combo

1. Start and configure a local Keycloak server for testing.

The following command requires docker to run. This will start keycloak locally, listening on port 8080 with the username and password of admin/admin. It uses an InMemory database so be aware that changes are lost when the machine is shutdown. For this tutorial however this is acceptable.

docker run -e KEYCLOAK_USER=admin -e KEYCLOAK_PASSWORD=admin -e DB_VENDOR=H2 -p 8080:8080 -p 9990:9990 jboss/keycloak

Once it’s running you should be able to navigate to http://localhost:8080 to access the main page (shown below)

Click on the “Administration Console” link above and use the username admin and the password admin. Below is a screenshot of the main screen “The Master Realm”. It’s too much work for me to explain how Keycloak and Realms work except to say that one Keycloak instance can manage dozens of upstream and downstream auth providers and applications and each realm is a segregation of users and permissions in some way. The master realm typically controls the auth for the Keycloak instance itself and each app would have it’s own realm. For this tutorial we are going to breeze over this and do the minimum.

Click on “Clients” in the left hand navigation menu. A client is an application that can use the Realm. Think of Keycloak is a database of users and clients are programs that can login to query it. (So clients are user accounts to log in to the “user database” where those users are just data) We’re going to write a Keycloak client app in the next tutorial below so we need to tell Keycloak who we are and how we’re going to connect. Click on Clients and the click Create in the top right corner of the table. You be asked to add your client below.

Here I’ve entered the ClientId. Make a note of this! I’ve also said what the Root URL for the project is, and for Play applications it’s port 9000 by default. Keycloak needs this because it checks “referers” [sic] and redirects users to and from our site. Therefore it needs to be http://localhost:9000 if that is where your app is running and you’re following my Scala guide. Once you save this new client you will be taken to the main screen for configuring it (shown below):

When this page opens you will not have a “Credentials” tab but you need one! You should change the Access type from public to confidential and hit save. Then the credentials tab will appear. The credential page is shown below:

On the credentials tab we can now see the secret. Make a note of this secret.

Now we have a working Keycloak server and a clientId and secret that enables another program to login to this server, what we need is an actual user account we can use for testing and logging into the app we write ourselves. Lets click on the “Users” tab (Under Manage) and click “Add User” in the top right corner of the users table.

You can fill this form in however you want, I don’t really care, but make a note of the username and email! Save the user. Then go to the Credentials tab shown below

First turn off the temporary option and then enter a password. I recommend “pwd” but whatever you choose make a note of it. Then click Reset Password and confirm the prompt when it opens.

Now you’ve done this, I highly recommend you put an email address against the admin user. It can lead some people who are both configuring the app and testing it at the same time to run into some confusion and doing this helps.

Your system is now minimally functional for the Scala tutorial. [I haven’t this tutorial yet.. so please wait.. but I have If you wander off and change other settings do remember all those settings are lost when you stop the VM! Before we finish we need to record our information. By now we should have:

  • A client id (keycloak-seed)
  • A client secret (different for everyone, mine is 45cb055e-d93c-4a14-a4ce-43c2bc0c1414)
  • A user account with a username name and password (mine are sinclair/pwd)

What we need are the special keycloak urls to connect to. Click on Realm under Configure to go back to the main page.

See this link next to Endpoints that says OpenId Endpoint Configuration. Click it and read the json (use a formatter to help you if your browser sucks)

We need and care about the following URLs that we’re going to use in our app:

authorization_endpoint: http://localhost:8080/auth/realms/master/protocol/openid-connect/auth
token_endpoint:         http://localhost:8080/auth/realms/master/protocol/openid-connect/token
userinfo_endpoint:      http://localhost:8080/auth/realms/master/protocol/openid-connect/userinfo
end_session_endpoint:   http://localhost:8080/auth/realms/master/protocol/openid-connect/logout

We now have a running server and the following information:

  • Authorization Endpoint
  • Token Endpoint
  • UserInfo Endpoint
  • End Session Endpoint
  • ClientId
  • ClientSecret

This is everything we’re going to need in our application, so now we’re ready to move to the Scala part of my tutorial.

IF == BAD

This post is essentially a write up of a talk my friend Johan Lindstrom did years and years ago, which in turn are ideas stolen from other people. This advice is aimed at really novice programmers who heavily rely only the initial pieces of knowledge they leverage when they start out. I don’t see this advice shared a lot online despite being common knowledge in some circles so please forgive me if you think it’s overly simplified beginner stuff.

IF statements in programming are bad. Johan and I worked on an warehouse backend system. One that involved taking orders, reserving stock, doing stock checks etc. At the time we had two warehouses, DC1 in England and DC2 in America, so code would often look like this (examples are transposed from Perl into Scala):

if (warehouse == DC1)
    startConveyorBelt()
else
   showManualInstructions()

Our code was absolutely full of these bad boys. Hundreds upon hundreds of separate statements throughout enormous legacy monstrosity. This code base will celebrate it’s 20 year anniversary next year.

def printInvoice(warehouse :String) = {
    val address = if (warehouse == "DC1") "England" else "America"
    val papersize = if (warehouse == "DC2") USLetter else A4
    val invoice = generateInvoice(address, papersize)
    ...
}

Of course, when we added a third warehouse nothing worked and it took an enormous effort to isolate all the behaviour and fix it. Some of the changes were in little blocks that went together. IF <something> assumes a key exists in a map etc or that a function had already been called. Adding the third DC didn’t result in a random blend of features. Just unpredictable crashes and a world of pain.

The way == or != were used would shape the way the default behaviour would play out. Stringification and easy regexs in Perl also made it harder to track down where comparisons or warehouse specific logic even resided.

warehouse.toLowercase == "dc1"    // lowercased alternatives

wh == "DC2"                       // alternative names

warehouse.matches("1")            // regexes are seamless in Perl
                                  they aren't so unnoticeably odd

if (letterSizeIsUSLegal)          // warehouse derived from something
                                  set earlier and not passed through

Perl doesn’t have the support of rich IDEs to help track references and all these different programming styles that have grown over 20 years means the process of finding these errors involves dozens of GREPs, lots of testing and a lot of code base inspection.

It didn’t take too long to realise that our IF statements should be based on small reusable features (ie. modular reusable components) and not switch on a global “whole warehouse” value. This code would have been much easier to manage:

if (warehouseHasConveyorBelts)
    sendRoutingMessage()
else
    showDestinationPageOnScreen()

if (shipmentRequiresInvoice) {
   val invoice = getInvoiceTemplateForCountry(
         getWarehouseCountry(warehouse)
   )
   Printer.print(invoice)
}

Ultimately however, the problem also extends passed this modularity and the realisation that IF statements themselves are bad. Necessary in a few places and possibly the simplest fundamental building blocks of all programs… but still bad… Lets look at a comparison to find out why.

The history of goto

Many languages like C, C++, Java, VB, Perl etc support the GOTO keyword, which is a language construct that allows you to jump around a function by providing a label next to a statement. GOTO will jump to the named label. Here is an example.

#include <stdio.h>

int main(void) {
	
	int someNumber = 0;
	int stop = 0;
	
	BEGIN:
	
	if (someNumber < 23)
	  goto ADD_THIRTEEN;

	  printf("hello. app finished with someNumber = %d", someNumber);
	  stop = 1;
	
	ADD_THIRTEEN:
	  someNumber += 13;
	  if (stop == 0)
		goto BEGIN;
	
	return 0;
}

The code is really difficult to read since execution jumps around all over the place. You may have difficulties even following the simple example above. Tracking the state of variables is really hard. Pretty much everyone is in agreement that GOTO statements are too low level and difficult to use and that IF, FOR/WHILE/DO loops and a good use of function calls actually make GOTOs redundant and bad practice.

Foreach loops are so much more elegant than GOTO statements because it’s obvious that you’re visiting each element once. It really speaks to the intent of the programmer or algorithm. Do-while-loops make it obvious the loop will always execute at least once. Scala supports .map, .filter, .headOption, dropWhile, foldLeft which all perform very simple well defined operations that convey intent to other people reading that GOTO simply cant.

So if a construct like GOTO is confusion, leads to spaghetti code, and can be replaced with more elegant solutions should we not prefer those alternatives? Of course! IF statements scatter your business logic around and leave it in disjointed locations across your code base that are hard to track, follow and change. They make refactoring hard. IF statements are bad for the same reasons that GOTO statements are bad, and that’s why we should aim to use them as little as possible.

Switching it up

Here’s a collection of constructs that can be used instead of IF statements to keep your application more readable, and more easy to follow and maintain.

Switch Statements

Not exactly much of an improvement, especially in most languages, but Scala’s specifically can be. If your choices extend a Sealed Trait, Scala can warn you which switch statements aren’t exhaustive. No DC3 slipping into DC2’s warehouse code paths!

sealed trait Warehouse
case object DC1 extends Warehouse
case object DC2 extends Warehouse
case object DC3 extends Warehouse

val myWarehouse :Warehouse = DC1

myWarehouse match {
   case DC1 => println("europe")
   case DC2 => println("america")
}

// scala reports: warning: match may not be exhaustive.
// It would fail on the following input: DC3

Option.map

A super common one, especially for Scala is to map over an optional value only doing something if it exists and doing nothing if it isn’t. This is the functional equivalent of an “if null” check.

invoices.map { invoice => invoice.print() }

Map is way more generic than this. It applies a function to a value inside a Monad and is commonly used to manipulate lists. Please don’t punish my brevity, it’s just an example for my own ends.

Inheritance

Inheritance allows you to override the behaviour of an existing object to do many specific things so it’s absolutely perfect at reducing the use of IF.

trait Warehouse {
  def hasAutomation() :Boolean
  def address() :String
  def isInEurope() :Boolean
}

class DC1 extends Warehouse {
  override def hasAutomation = true
  override def address = "England"
  override def isInEurope = true
}

class DC2 extends Warehouse {
  override def hasAutomation = false
  override def address = "America"
  override def isInEurope = false
}

class DC3 extends Warehouse {
  override def hasAutomation = false
  override def address = "Europe"
  override def isInEurope = true
}

// App is set up once.
val warehouse = if ("DC1") new DC1 else new DC2.

// use in code
if (warehouse.hasAutomation && warehouse.isInEurope)

 ...

When it comes to adding DC3, we have an interface to extend so we know exactly which methods we need to define in order to specify how a warehouse behaves. Our behaviour is vastly centralised. We only have to extend the initial warehouse setup once as well since we’ve bought everything together.

We can also go a step further and make the Warehouse class responsible for doing things. This removes IF statements even more!

object Printer { def print() = ??? }
object Browser { def handle() = ??? }
case class RoutingInstruction(destination :String)
val REDIRECT = 303
type Invoice = String

trait Warehouse {
  def packItem() :Either[String, Boolean]
  def generateInvoice() :List[Invoice]
  def maybeRouteItem() :Option[RoutingInstruction]
  def getNextWebpage() :Option[(Int, String)]
}

class DC1 extends Warehouse {
  override def packItem() = Right(true)
  override def generateInvoice()  = List.empty // no invoice since we are in england
  override def maybeRouteItem() = Some(RoutingInstruction("PackingArea11")) // we have automation
  override def getNextWebpage() = Some((REDIRECT, "/confirmation/place-on-conveyor"))
}

val warehouse :Warehouse = new DC1

// look, no if statements yet lots of diverse functionality
// being used.

warehouse.packItem()

warehouse.generateInvoice.map { Printer.print }

warehouse.getNextWebpage.map { Browser.handle }

There are some variations of Inheritance I won’t cover, such as Mixins and Traits or Interfaces. They all follow the same theme so I won’t list them individually. The code might be a little crap here because I’m trying to be slightly language independent in my samples.

Function Pointer Tables

You can effectively have cheap object orientation by having a Hash/Map of functions and passing around whole “collections of decisions” together.


def accessGranted() = println("granted!")
def accessDenied() = println("denied!")
val permission = "allowed"

// old, redundant.
if (permission == "allowed") accessGranted() else accessDenied()

// single place for logic.
val mapOfAnswers = Map(
    "allowed" -> accessGranted _,
    "denied" -> accessDenied _
)

val func = mapOfAnswers(permission) // no if here

func() // executes function which causes println to run

Partial Functions / Closures

Partial functions allow us to build functions using composition which can help mix up and select the appropriate logic without actually having to use IF statements.


def makeAddress(inEurope :Boolean)(country :String)(addressLines :String) =
    println(s"$addressLines\n$country\ninEurope: $inEurope")

val europeanFactory = makeAddress(true) _    // variables type
                                             // refers to a function
val britishFactory = europeanFactory("UK")

britishFactory("London")

Closures are functions that reference variables outside of their direct scope. It allows you to do something like this:

def setTimeout(timeMs :Int, onTimeout :Unit => Unit)

val myVariable = 66
def doingMyThing() = println("myVariable")

setTimeout(500, doingMyThing) // setTimeout doesnt have any logic
                                 but does the right thing

Lambdas are typically short hand syntax for functions so this general class of ideas can be used to encapsulate decision making without callers having to use IF statements everywhere.

Dependency Injection

Dependency injection is generally a technique to remove global variables from an application and is just an application of inheritance to a certain degree but it’s perfect for dynamically changing the behaviour of code without using repetitive IF statements.

// Old code with embedded IF statements

class FetchData {
   def fetchOrders() :List[Order] = {
      if (testMode == true)
        List(sampleOrder1, sampleOrder2)
      elseif (DC == 1)
        httpLibrary.httpGet("http://backend-1/")
      else
        httpLibrary.httpGet("http://backend-2").andThen(doConvert)
   }
}

// New version simply trusts whatever is passed in.

class FetchData(httpLibrary :DCSpecificHttpLibrary, convertor :Option[Order => Order] = None) = {

    def fetchOrders() :List[Order] = {
       val order = httpLibrary.httpGet() // was built knowing which DC
       convertor.map { c => c(order) }.getOrElse(order)
    }
}

// testing code would make a fake httpLibrary and pass it in before the test. Real code would use the real one.

Summary

I’m going to stop list alternatives now but hopefully you go away with some interesting thoughts on the subject and possibly an idea that sometimes IF statements can be detrimental if overused.

Some of my examples are really poor, especially my Inheritance one. I was going to model lots of subprocesses of a warehouse like ScreenFlowChartManager, StockCheckManager and make a warehouse point to them but the code was getting too big for a simple example.

I would accept some criticism that some IF statements can’t be avoided and I would accept that some of these alternatives only move the IF statement to another place in the code base. Certainly dependency injection only moves things to when the application starts. Still armed with this knowledge you can write applications which are easier to maintain and move your variables and mutable state around into places that make it easier to work with.

Devops often don’t understand logging

My job involves writing software. Working on bug fixes, adding new features and generally making the software better. That could mean easier to use so less training time for users. It could mean the software is faster so our users can do more of their other work. It could mean safer so we cause less frustration and upset to the general public. This all fits into this end goal we call “delivering value”. Value is an incredibly loose term not necessarily related to money but commonly it can be. It can also simply be called “improvements to the product”. It’s not a science but we identify pain points and try and smooth them out.

Businesses should try and utilise data in all their decision making and move away from gut based decision making because the later is significantly flawed. I can name dozens of examples from my previous experience of where assumptions essentially wasted money, introduced avoidable technical debt and other complexities. As one example, at the place I currently work someone was moving all the Mongo database backups to use a new Mongo replica instead of master because the backups were slowing down the production applications. That turned out to be a waste of two months since it never had an impact on application speed. In another example, the business ask for dozens of reports, each more meaningless than the last unless truly challenged. Maybe look at an actual report once and decide if it’s useful before I code it into the application and have to support it forever. It’s always best practice to try and use data to prove your beliefs and numerous companies exist to help companies understand their own data better. In short, we should use the data we have to identify and assign value to certain work when we prioritise it instead of just guessing at what will improve the product.

Data warehousing is a very old discipline used by many companies. You collect ad-hoc and unprocessed data from across your business and then practice combining it in different ways to try and understand your customers and objectives in new ways. For me, I personally see my application logs as a huge data warehousing effort. So my boss and I will discuss a problem like how long it takes to do some task in the system and we’ll start looking at our logs and our database. Maybe the “edits” a user makes to an page denotes how many mistakes other users are making. Perhaps comparing two urls allows to see how long a mistake goes unnoticed for. Perhaps if we quantify this mistake-rate we can prove our work yields improvements by measuring how many less edits are made after the change. We can measure it before and after in order to prove our work is of some demonstrable value. One thing we do in our department is count the number of emails to our support bucket and try and ask ourselves which changes will reduce that expensive and annoying correspondence the most? However I don’t know what metric or check is going to be useful until I am looking at the JIRA tickets on my backlog. It could be the distance between log lines, it could be urls, it could be times and it could be a mixture. It’s incredibly situational.

Perhaps you think the work of attaching costs and values or parsing logs is for the product owners or managers – I would argue it’s a shared responsibility across all levels and we should challenge work and enrich requests will real stats rather than blindly implement meaningless change for a pay cheque.

In order to enrich JIRA tickets with provable estimates and data, I specifically need access to an ad-hoc, dynamic tool where I can make meaning out of unstructured data with no upfront planning. I can do this from the logs with Splunk. Splunk allows me to perform a free-form regex-like search over my logs and then draw graphs from them and derive averages, maximums, trends and deviations. However if I need to either define a fixed parsing pipeline to turn adhoc logs into structured json data, or if I need to add triggers to my code for sysdig – this immediately means I cannot evaluate any historic data. It also means I have to do upfront and expensive development work to find out if another piece of work is worth doing. That is expensive in terms of time, effort, effeciency, especially since it’s not a science and could be meaningless. I need to be able to experiment very cheaply (i.e a regex or a SQL query) and writing data to sysdig manually is not cheap. It means waiting for two weeks to find out the answer to my question assuming two weeks data is even enough to make an informed decision. It’s better to have a tool that runs like dogshit but answers business questions on demand with no upfront planning than to have a tool that draws graphs from extracted data but requires forethought when configuring it.

People who think Kibana and logs are useful for finding errors but should only keep data short-term, and people who think Kibana should only be fed parsed, structured json, are ignoring the enormous amounts of useful information that would make them better developers. I hate to generalise but I find at every company I go to that I run into DevOps members tend to overlap with the former group. Kibana and Splunk having similar looking UIs but since one opens a world of business intelligence and the other one doesn’t, that’s where the similarities end. I also advise you keep logs forever as you may want to do “year-on-year” analysis of growth and things like that later.

The closed source Scala code problem

Java touts itself as the write-once, run anywhere programming language. Unfortunately Scala is not. It’s write once but when you publish a library, it must be compiled against a known specific major version of Scala such as 2.11 or 2.12. The version of Scala goes into the library’s name. If you upgrade your applications from Scala 2.11 to 2.12, you will need to recompile your libraries with the matching version as well.

This page of the sbt documentation explains how you can build and publish the library for multiple copies of Scala for instance, 2.10, 2.11, 2.12 in a single instruction. However you can’t compile the library against future versions, which obviously do not exist yet.

The underlying reason we need to recompile the library is to allow the compiler to make “breaking changes” to the bytecode between versions, so they can more aggressively improve the Scala compiler with fewer concerns for supporting backwards compatible Java bytecode. This makes a lot of sense for them for a minor inconvenience on the user side but it does have a larger implication for the community.

I recently upgraded a Play application from Scala 2.11 to 2.12 and I ran across a few projects that hadn’t been upgraded to 2.12 such as play2-auth and stackable-controller. Fortunately the code was open source and someone was able to create a working 2.12 fork. Yay for open source! The compiled version wasn’t published anywhere, so I had to fork it again and publish it to my organisation’s internal Artifactory repo. This was an inconvenient pain, configuring the Drone pipeline etc but what concerns me more is that, if this library were closed source, this fix would not have been possible.

Our application would be locked to Scala 2.11 until the whim of the library author, or until we managed to rewrite the dependency out. For this reason, I highly suggest you don’t choose to make your application depend on closed source libraries.

Job Security Y2K

I see a lot of folks advising young people that job security is important and they should pick a career path or skill set that gives them job security. I consider this bad advice and will outline why I believe so below.

Job security, the likelyhood of you losing your job, is incredibly important and especially so, when you get to the age that you are responsible for others as well as yourself, and an age when going home to your parents is no longer an option. However it is not the end goal. The true security you want is financial security. Money to live on, even if you’re unable to work. It’s an important distinction and the terms are not interchangeable.

Whoever you work for, everyone is expendable and companies just do not give a fuck about you. They never will and are probably, actively seeking to replace you behind your back. They have teams and projects designed to replace you. There is no such thing as job security. People do get made redundant from government jobs, regardless of what is claimed. That threat is always there.

Everyone in a company falls into two categories: back office “workers” who are cost centres that should be reduced via offshoring or automation and front office staff to be replaced by self service websites. I’ve seen jobs like accountancies morphed into “work pipelines” filled by unskilled, minimum wage people who escalate to a limited number of real accountants for actual issues. This “process driven” approach takes the demand off the need for expensive skilled employees and can be seen in every sector.

Nurses do the most work and escalate to doctors who in turn escalate to consultants and specialists.

I know many people who have been made redundant from jobs and it can cause some incredibly difficult problems for them, especially if their job or skill or being a provider is what gave them their self-worth. Who doesn’t define themselves by their work just a little?

That’s why I say never work for a single employer like the government (teacher, NHS, admin etc), despite the claims of unions, they can sack you and you’ve no where to take your skills when they do. Can you really dodge being a political escape goat for 40 years or somehow play out 40 years without taking on at least some responsibility? We can all be fired, and you shouldn’t consider yourself an exception.

People worry that computers and robots, self service checkout tills and vacuum cleaners are going to replace their jobs, and they’re probably right. They also believe us IT guys are completely safe, building these replacements and we’ve gotten the better end of the job security situation. Unfortunately they couldn’t be more wrong.

I work in IT and whilst “job security” for A job is high because it’s an in-demand skill.. in any given company that’s not true. For example I worked for website founded in 2001 that now makes over £1 billion per year selling clothes. It’s privately owned and can splash its cash anywhere. I worked on complex warehouse software that I believe helped our company edge out its luxury customer unique selling point. Only we got bought by a rival. They already have a competing warehouse, so you know how that down… send us your customer database to load into our system, ship your stock to our warehouse locations and go home. (ok it wasn’t quite like that at all.. but thats a real thing).

As a population we need to understand that job security is a meaningless word and that we should be aiming for “employability” and changing careers to suit demand. That’s just how we should view life now, because it’s the only way to survive in the real world of uncaring companies.

Even if you hate your job you still need the money so don’t confuse job security with your real concern: financial security.

If you have the opportunity to be a contractor, on twice the salary for only half the time.. I’d even go so far as to recommend that, personally. The purpose of the article is only to achieve the basic goal of making you think twice about job security as a metric.

When talking about groups being made redundant all this “go get another job stuff is meaningless because almost no one can afford to go out and re-educate or reskill themselves and take a zero experience entry role even if they did have the motivation to. The companies laying them off or replacing them with machines certainly aren’t footing the bill.

I don’t know what the solution is, it just seems to be the government that’s supposed to pick up the pieces. It just seems that regardless of what we say, companies will do what they want and we have to live with this situation regardless. Fighting the technology doesn’t even work. How we can help people stay in their jobs, which definitely helps those people, I honestly don’t know. You’re literally fighting the employers themselves. Something that only unions or the government can successfully do.

Defeating Racism in Artificial Intelligence

Like a dog or a child, AI systems are black boxes. You don’t really know what goes on inside their brains. You train them all the same. Repetition. You feed in your inputs or commands and tell them what the expected result should be and given a large enough sample set, hopefully the Neural Net or pet learns the pattern and can predict new correct outputs for never-before-seen inputs.

AI is here, it’s real, its very effectively at dealing with some problems that old style programming can’t deal with and it’s sticking around.

If victims of systematic or subtle racism currently have a hard time proving existing non IT systems are racist (such as job interviewing, flagging down a taxi) they may have an even greater problem with upcoming AI systems and ignorant (aka wilfully complicit) companies who will pretend AI takes racism off the table.

I foresee Mortgage websites where you fill in your details to get mortgage offers and car dealerships websites all trained with racist data collected from the 70s and onwards where the data could be in places racially biased. Obviously we won’t tell the AI what race the applicants are, we won’t even ask on the form, but like a human with good intuition, the computer will do what it needs to, to hit its targets, and the guy named Jamal gets the red cross while Richard gets the green tick. The computer just imitates the patterns it’s seen before.

Race can be inferred from name, location, occupation, college names and many other subtle clues. Even if you can’t see it, doesn’t mean the computer can’t either.

I wasn’t going to write on this subject until I saw a fascinating talk at Scala Exchange 2017. There’s this program called Matryoshka that can analyse decision trees. The woman who gave the talk showed us the complete list of passengers on the Titanic. Their names, gender, age, some other bits and finally, whether they survived or died. Matryoshka allowed her to see which factors on the input played the largest part in deciding the final outcome. Were the women and children prioritised? It’s quite simple really, you group the data by a field (eg Name) and look at the how close to 50% the outcomes were. I kind of lie, It actually gets really tricky with non binary data and dozens of discrete inputs but the point is, it can be understood and built to a certain point. Certainly by people smarter than me.

When it comes to race and ensuring equality, you achieve nothing if you sweep race related questions under the table. You must capture everyone’s race and then retrospectively look back to see if a system is racist or not. This is the practice used by the UK government when you apply for jobs there.

Matryoshka and similar tools could be a decisive tool in helping us to understand the AI we build, to keep it transparent and going forwards to ensure we build the kind of fair, racism-free systems we all want to see.

Disclaimer: I have not actually worked on or studied AI, nor have I actually used Matryoshka! I just sit in conferences and feel the sound waves bounce off my forehead.

Microservices are the right solution for true Monoliths

I’ve come across many people who don’t like Microservices. They complain that it fragements the code and adds potential network failures. These people are not stupid and they are not wrong. They just haven’t worked with a true monolith (even if they claim they have).

Microservices do add network latency and there are more potential failures. It does make transactional operations less reliable. It does add serialisation overhead. It does divide teams or offer the chance to split the solution over too many technologies. It does mean you’re app has to cope with multiple different versions in production. It does mean that integration testing scope is limited to smaller individual pieces like unit tests and not true end-to-end tests. It adds overhead in terms of the beaurcracy of adding or changing contracts. It adds documentation and the need to use Swagger endpoints everywhere! It just fundamentally adds more code and therefore a greater chance of bugs.

However, all that overhead is worth it, if your app is so unmanageable it takes 6 hours for the test suite to run. It is worth it, if a series of breaks had made the business enforce some sort of bi-weekly release schedule on you. The knock on effect of that bad decision is that each bi-weekly release is now bigger, more risky, and potentially causing even more failures. You have a monolith if you branch to make a large refactor and by the time you come to merge it back to master, master has moved on by 50 commits by 10+ people that make you feel like you’re back at square one. You have go around to people and ask how to sensibly merge the changes because you don’t know anything about what their code does because they’re in a completely different business domain to you, the acronyms are completely foreign and you’ve never met the person. The project is so large that each Agile team have adopted their own coding guidelines within the same codebase.

In those situations, Microservices are a real way out. Having your system as a collection of smaller microservices means you can drop a lot of these troubles.

“Monolith” does not mean a shitty product, an unreliable product, a clunky slow product or an old product with a lot of technical debt. It means one so frought with edge cases, fear and uncertaintly that even small, isolated and obvious bug fixes are delayed or forced through tons of beaucracy for fear of breaking the critical paths of the application.

Roughly how Counterstrike Matchmaking Ranks work

I’m answering this question because it comes up often and many find this answer useful.

The algorithm behind Valve’s matchmaking for Counterstrike is closed source. It’s a secret and they don’t want you to know. They don’t want you to know how it works so you don’t cheat the system. If defusing the bomb helps your rank they don’t want you shooting team mates for it. It’s an intentional black box.

When people ask in forums how it works, others often say “no-ones so anyone who answers is lying” but those people are being ignorant. We don’t know how gravity works but we still don’t fly out into space. Valve have confirmed that the algorithm is based on a publicly known algorithm called “Glicko 2 [PDF]“. With that, and with experience of using the system as a community we have some understanding of how it works and this is generally useful enough to give you a picture.

Continue reading “Roughly how Counterstrike Matchmaking Ranks work”

Comparing PDF Binary Data

If you generate a PDF using iText5 twice you get different results. The creation date, modification date and ID are different. This means that it’s difficult to write a test that is repeatable. It’s also not ideal to mock out the whole PDF generation from a library that’s sole purpose is to manipulate PDF as it gives no confidence the code works.

I decided to read the PDF Reference which documents the PDF format on disk to figure out how to write a binary comparison function that ignores the differences.

Continue reading “Comparing PDF Binary Data”

Vim as a Scala IDE

I’ve been using VIM as my Scala IDE for the last year and I can report I’ve had a very positive experience with it. Below I have a short table of what IDE features I use and what plugins offer those features. Before you read that stuff however you need to align your perceptions and expectations with mine. I’ve used vim to write Python and Perl every workday for 8 years. I have put forth considerable effort to make vim work for me as a Scala IDE. It didn’t happen overnight. Python and Perl are weakly-typed. Many IDEs simply can’t offer “Jump to definition” in Perl code accurately as there’s no strict rules on where `$obj->$method` would even go without evaluating the program at runtime. My expectations for a productive environment are probably far less than those coming from IntelliJ or Visual Studio would hope for. You have been warned.

Continue reading “Vim as a Scala IDE”