Devops often don’t understand logging

My job involves writing software. Working on bug fixes, adding new features and generally making the software better. That could mean easier to use so less training time for users. It could mean the software is faster so our users can do more of their other work. It could mean safer so we cause less frustration and upset to the general public. This all fits into this end goal we call “delivering value”. Value is an incredibly loose term not necessarily related to money but commonly it can be. It can also simply be called “improvements to the product”. It’s not a science but we identify pain points and try and smooth them out.

Businesses should try and utilise data in all their decision making and move away from gut based decision making because the later is significantly flawed. I can name dozens of examples from my previous experience of where assumptions essentially wasted money, introduced avoidable technical debt and other complexities. As one example, at the place I currently work someone was moving all the Mongo database backups to use a new Mongo replica instead of master because the backups were slowing down the production applications. That turned out to be a waste of two months since it never had an impact on application speed. In another example, the business ask for dozens of reports, each more meaningless than the last unless truly challenged. Maybe look at an actual report once and decide if it’s useful before I code it into the application and have to support it forever. It’s always best practice to try and use data to prove your beliefs and numerous companies exist to help companies understand their own data better. In short, we should use the data we have to identify and assign value to certain work when we prioritise it instead of just guessing at what will improve the product.

Data warehousing is a very old discipline used by many companies. You collect ad-hoc and unprocessed data from across your business and then practice combining it in different ways to try and understand your customers and objectives in new ways. For me, I personally see my application logs as a huge data warehousing effort. So my boss and I will discuss a problem like how long it takes to do some task in the system and we’ll start looking at our logs and our database. Maybe the “edits” a user makes to an page denotes how many mistakes other users are making. Perhaps comparing two urls allows to see how long a mistake goes unnoticed for. Perhaps if we quantify this mistake-rate we can prove our work yields improvements by measuring how many less edits are made after the change. We can measure it before and after in order to prove our work is of some demonstrable value. One thing we do in our department is count the number of emails to our support bucket and try and ask ourselves which changes will reduce that expensive and annoying correspondence the most? However I don’t know what metric or check is going to be useful until I am looking at the JIRA tickets on my backlog. It could be the distance between log lines, it could be urls, it could be times and it could be a mixture. It’s incredibly situational.

Perhaps you think the work of attaching costs and values or parsing logs is for the product owners or managers – I would argue it’s a shared responsibility across all levels and we should challenge work and enrich requests will real stats rather than blindly implement meaningless change for a pay cheque.

In order to enrich JIRA tickets with provable estimates and data, I specifically need access to an ad-hoc, dynamic tool where I can make meaning out of unstructured data with no upfront planning. I can do this from the logs with Splunk. Splunk allows me to perform a free-form regex-like search over my logs and then draw graphs from them and derive averages, maximums, trends and deviations. However if I need to either define a fixed parsing pipeline to turn adhoc logs into structured json data, or if I need to add triggers to my code for sysdig – this immediately means I cannot evaluate any historic data. It also means I have to do upfront and expensive development work to find out if another piece of work is worth doing. That is expensive in terms of time, effort, effeciency, especially since it’s not a science and could be meaningless. I need to be able to experiment very cheaply (i.e a regex or a SQL query) and writing data to sysdig manually is not cheap. It means waiting for two weeks to find out the answer to my question assuming two weeks data is even enough to make an informed decision. It’s better to have a tool that runs like dogshit but answers business questions on demand with no upfront planning than to have a tool that draws graphs from extracted data but requires forethought when configuring it.

People who think Kibana and logs are useful for finding errors but should only keep data short-term, and people who think Kibana should only be fed parsed, structured json, are ignoring the enormous amounts of useful information that would make them better developers. I hate to generalise but I find at every company I go to that I run into DevOps members tend to overlap with the former group. Kibana and Splunk having similar looking UIs but since one opens a world of business intelligence and the other one doesn’t, that’s where the similarities end. I also advise you keep logs forever as you may want to do “year-on-year” analysis of growth and things like that later.

The closed source Scala code problem

Java touts itself as the write-once, run anywhere programming language. Unfortunately Scala is not. It’s write once but when you publish a library, it must be compiled against a known specific major version of Scala such as 2.11 or 2.12. The version of Scala goes into the library’s name. If you upgrade your applications from Scala 2.11 to 2.12, you will need to recompile your libraries with the matching version as well.

This page of the sbt documentation explains how you can build and publish the library for multiple copies of Scala for instance, 2.10, 2.11, 2.12 in a single instruction. However you can’t compile the library against future versions, which obviously do not exist yet.

The underlying reason we need to recompile the library is to allow the compiler to make “breaking changes” to the bytecode between versions, so they can more aggressively improve the Scala compiler with fewer concerns for supporting backwards compatible Java bytecode. This makes a lot of sense for them for a minor inconvenience on the user side but it does have a larger implication for the community.

I recently upgraded a Play application from Scala 2.11 to 2.12 and I ran across a few projects that hadn’t been upgraded to 2.12 such as play2-auth and stackable-controller. Fortunately the code was open source and someone was able to create a working 2.12 fork. Yay for open source! The compiled version wasn’t published anywhere, so I had to fork it again and publish it to my organisation’s internal Artifactory repo. This was an inconvenient pain, configuring the Drone pipeline etc but what concerns me more is that, if this library were closed source, this fix would not have been possible.

Our application would be locked to Scala 2.11 until the whim of the library author, or until we managed to rewrite the dependency out. For this reason, I highly suggest you don’t choose to make your application depend on closed source libraries.

Job Security Y2K

I see a lot of folks advising young people that job security is important and they should pick a career path or skill set that gives them job security. I consider this bad advice and will outline why I believe so below.

Job security, the likelyhood of you losing your job, is incredibly important and especially so, when you get to the age that you are responsible for others as well as yourself, and an age when going home to your parents is no longer an option. However it is not the end goal. The true security you want is financial security. Money to live on, even if you’re unable to work. It’s an important distinction and the terms are not interchangeable.

Whoever you work for, everyone is expendable and companies just do not give a fuck about you. They never will and are probably, actively seeking to replace you behind your back. They have teams and projects designed to replace you. There is no such thing as job security. People do get made redundant from government jobs, regardless of what is claimed. That threat is always there.

Everyone in a company falls into two categories: back office “workers” who are cost centres that should be reduced via offshoring or automation and front office staff to be replaced by self service websites. I’ve seen jobs like accountancies morphed into “work pipelines” filled by unskilled, minimum wage people who escalate to a limited number of real accountants for actual issues. This “process driven” approach takes the demand off the need for expensive skilled employees and can be seen in every sector.

Nurses do the most work and escalate to doctors who in turn escalate to consultants and specialists.

I know many people who have been made redundant from jobs and it can cause some incredibly difficult problems for them, especially if their job or skill or being a provider is what gave them their self-worth. Who doesn’t define themselves by their work just a little?

That’s why I say never work for a single employer like the government (teacher, NHS, admin etc), despite the claims of unions, they can sack you and you’ve no where to take your skills when they do. Can you really dodge being a political escape goat for 40 years or somehow play out 40 years without taking on at least some responsibility? We can all be fired, and you shouldn’t consider yourself an exception.

People worry that computers and robots, self service checkout tills and vacuum cleaners are going to replace their jobs, and they’re probably right. They also believe us IT guys are completely safe, building these replacements and we’ve gotten the better end of the job security situation. Unfortunately they couldn’t be more wrong.

I work in IT and whilst “job security” for A job is high because it’s an in-demand skill.. in any given company that’s not true. For example I worked for website founded in 2001 that now makes over £1 billion per year selling clothes. It’s privately owned and can splash its cash anywhere. I worked on complex warehouse software that I believe helped our company edge out its luxury customer unique selling point. Only we got bought by a rival. They already have a competing warehouse, so you know how that down… send us your customer database to load into our system, ship your stock to our warehouse locations and go home. (ok it wasn’t quite like that at all.. but thats a real thing).

As a population we need to understand that job security is a meaningless word and that we should be aiming for “employability” and changing careers to suit demand. That’s just how we should view life now, because it’s the only way to survive in the real world of uncaring companies.

Even if you hate your job you still need the money so don’t confuse job security with your real concern: financial security.

If you have the opportunity to be a contractor, on twice the salary for only half the time.. I’d even go so far as to recommend that, personally. The purpose of the article is only to achieve the basic goal of making you think twice about job security as a metric.

When talking about groups being made redundant all this “go get another job stuff is meaningless because almost no one can afford to go out and re-educate or reskill themselves and take a zero experience entry role even if they did have the motivation to. The companies laying them off or replacing them with machines certainly aren’t footing the bill.

I don’t know what the solution is, it just seems to be the government that’s supposed to pick up the pieces. It just seems that regardless of what we say, companies will do what they want and we have to live with this situation regardless. Fighting the technology doesn’t even work. How we can help people stay in their jobs, which definitely helps those people, I honestly don’t know. You’re literally fighting the employers themselves. Something that only unions or the government can successfully do.

Defeating Racism in Artificial Intelligence

Like a dog or a child, AI systems are black boxes. You don’t really know what goes on inside their brains. You train them all the same. Repetition. You feed in your inputs or commands and tell them what the expected result should be and given a large enough sample set, hopefully the Neural Net or pet learns the pattern and can predict new correct outputs for never-before-seen inputs.

AI is here, it’s real, its very effectively at dealing with some problems that old style programming can’t deal with and it’s sticking around.

If victims of systematic or subtle racism currently have a hard time proving existing non IT systems are racist (such as job interviewing, flagging down a taxi) they may have an even greater problem with upcoming AI systems and ignorant (aka wilfully complicit) companies who will pretend AI takes racism off the table.

I foresee Mortgage websites where you fill in your details to get mortgage offers and car dealerships websites all trained with racist data collected from the 70s and onwards where the data could be in places racially biased. Obviously we won’t tell the AI what race the applicants are, we won’t even ask on the form, but like a human with good intuition, the computer will do what it needs to, to hit its targets, and the guy named Jamal gets the red cross while Richard gets the green tick. The computer just imitates the patterns it’s seen before.

Race can be inferred from name, location, occupation, college names and many other subtle clues. Even if you can’t see it, doesn’t mean the computer can’t either.

I wasn’t going to write on this subject until I saw a fascinating talk at Scala Exchange 2017. There’s this program called Matryoshka that can analyse decision trees. The woman who gave the talk showed us the complete list of passengers on the Titanic. Their names, gender, age, some other bits and finally, whether they survived or died. Matryoshka allowed her to see which factors on the input played the largest part in deciding the final outcome. Were the women and children prioritised? It’s quite simple really, you group the data by a field (eg Name) and look at the how close to 50% the outcomes were. I kind of lie, It actually gets really tricky with non binary data and dozens of discrete inputs but the point is, it can be understood and built to a certain point. Certainly by people smarter than me.

When it comes to race and ensuring equality, you achieve nothing if you sweep race related questions under the table. You must capture everyone’s race and then retrospectively look back to see if a system is racist or not. This is the practice used by the UK government when you apply for jobs there.

Matryoshka and similar tools could be a decisive tool in helping us to understand the AI we build, to keep it transparent and going forwards to ensure we build the kind of fair, racism-free systems we all want to see.

Disclaimer: I have not actually worked on or studied AI, nor have I actually used Matryoshka! I just sit in conferences and feel the sound waves bounce off my forehead.

Microservices are the right solution for true Monoliths

I’ve come across many people who don’t like Microservices. They complain that it fragements the code and adds potential network failures. These people are not stupid and they are not wrong. They just haven’t worked with a true monolith (even if they claim they have).

Microservices do add network latency and there are more potential failures. It does make transactional operations less reliable. It does add serialisation overhead. It does divide teams or offer the chance to split the solution over too many technologies. It does mean you’re app has to cope with multiple different versions in production. It does mean that integration testing scope is limited to smaller individual pieces like unit tests and not true end-to-end tests. It adds overhead in terms of the beaurcracy of adding or changing contracts. It adds documentation and the need to use Swagger endpoints everywhere! It just fundamentally adds more code and therefore a greater chance of bugs.

However, all that overhead is worth it, if your app is so unmanageable it takes 6 hours for the test suite to run. It is worth it, if a series of breaks had made the business enforce some sort of bi-weekly release schedule on you. The knock on effect of that bad decision is that each bi-weekly release is now bigger, more risky, and potentially causing even more failures. You have a monolith if you branch to make a large refactor and by the time you come to merge it back to master, master has moved on by 50 commits by 10+ people that make you feel like you’re back at square one. You have go around to people and ask how to sensibly merge the changes because you don’t know anything about what their code does because they’re in a completely different business domain to you, the acronyms are completely foreign and you’ve never met the person. The project is so large that each Agile team have adopted their own coding guidelines within the same codebase.

In those situations, Microservices are a real way out. Having your system as a collection of smaller microservices means you can drop a lot of these troubles.

“Monolith” does not mean a shitty product, an unreliable product, a clunky slow product or an old product with a lot of technical debt. It means one so frought with edge cases, fear and uncertaintly that even small, isolated and obvious bug fixes are delayed or forced through tons of beaucracy for fear of breaking the critical paths of the application.

Roughly how Counterstrike Matchmaking Ranks work

I’m answering this question because it comes up often and many find this answer useful.

The algorithm behind Valve’s matchmaking for Counterstrike is closed source. It’s a secret and they don’t want you to know. They don’t want you to know how it works so you don’t cheat the system. If defusing the bomb helps your rank they don’t want you shooting team mates for it. It’s an intentional black box.

When people ask in forums how it works, others often say “no-one knows so anyone who answers is lying” but those people are being ignorant. We don’t know how gravity works but we still don’t fly out into space. Valve have confirmed that the algorithm is based on a publicly known algorithm called “Glicko 2 [PDF]“. With that, and with experience of using the system as a community we have some understanding of how it works and this is generally useful enough to give you a picture.

Continue reading “Roughly how Counterstrike Matchmaking Ranks work”

Comparing PDF Binary Data

If you generate a PDF using iText5 twice you get different results. The creation date, modification date and ID are different. This means that it’s difficult to write a test that is repeatable. It’s also not ideal to mock out the whole PDF generation from a library that’s sole purpose is to manipulate PDF as it gives no confidence the code works.

I decided to read the PDF Reference which documents the PDF format on disk to figure out how to write a binary comparison function that ignores the differences.

Continue reading “Comparing PDF Binary Data”

Vim as a Scala IDE

[ This is an old post, written at a time when Ensime-vim was the only viable option. In 2021 I recommend “Scala metals”]

I’ve been using VIM as my Scala IDE for the last year and I can report I’ve had a very positive experience with it. Below I have a short table of what IDE features I use and what plugins offer those features. Before you read that stuff however you need to align your perceptions and expectations with mine. I’ve used vim to write Python and Perl every workday for 8 years. I have put forth considerable effort to make vim work for me as a Scala IDE. It didn’t happen overnight. Python and Perl are weakly-typed. Many IDEs simply can’t offer “Jump to definition” in Perl code accurately as there’s no strict rules on where `$obj->$method` would even go without evaluating the program at runtime. My expectations for a productive environment are probably far less than those coming from IntelliJ or Visual Studio would hope for. You have been warned.

Continue reading “Vim as a Scala IDE”

Adding a “scrollmarks” feature to KDE’s Konsole

Most of them are successful but rarely see the light of day outside of my work machine. They only have to help me to be worth it. I’ve hacked Konsole in a number of ways, Bash, DWM and generally anything that gets in my way. My desktop setup is incredibly unique due to that fact that quite a lot of software I’ve compiled solely for myself now.

Continue reading “Adding a “scrollmarks” feature to KDE’s Konsole”

The Terminal with Jira Integration (A KDE Konsole Hack)

I believe the following statements:

  • I am a professional software engineer
  • I should be as efficient as possible
  • Modifying other peoples’ software is easy.

People often deride the benefit of Free and Open Source Software (FOSS) as futile with the argument that no one really reads, modifies or understands the software. I appear to be something of an exception.

Continue reading “The Terminal with Jira Integration (A KDE Konsole Hack)”