“Premature Optimization” and the Pandora Box of Debates that Followed

All Evil Started with a Quote on All Evil

Donald E. Knuth, professor of computer science at Stanford University, popularized this phrase used in the programming community: “Premature optimization is the root of all evil.” Little did he know, however, this statement about “all evil” opened a Pandora’s Box – fierce / passionate / headbanging / crazy debates all the way from optimization to the meaning of engineering.

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.”

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.”

Donald Knuth, “Structured Programming with go to Statements” (1974)

Warning: you are about to peek inside the Pandora’s Box…which may lead to either an insightful soul-searching journey or a mental hurricane or somewhere in between.

Still with me? Then let’s dive in! 🙂

Premature Optimization vs. Technical Debt

A (somewhat) relevant concept to premature optimization is technical debt. Although most in the software engineering world would agree on the definitions of either term, folks are less aligned when it comes to how these two terms relate to each other – are they synonyms or opposites?

Technical debt refers to the “cost of additional rework caused by choosing an easy solution now instead of using a better approach that would take longer. (Wikipedia)” In layman terms, technical debt means if you are lazy now, you will have to make up for it later. Just like if you stock up dirty laundry, you will have to clean them sooner or later. And sooner is better than later better than never – that’s what people really mean when they remind you to “avoid technical debt“.

“Technical debt” as a phrase is looked upon favorably by programmers who believe chivalry isn’t dead. For them, “please avoid technical debt” is a civil alternative to “stop being lazy and get the $%@!#$$# up and do something.” So you could say “technical debt” existed in peace and had its supporters until it was put next to the “premature optimization,” and things get interesting.

This post asks the interesting question of whether premature optimization is “the opposite concept of technical debt”? What’s more interesting than the question itself are the comments that followed – highly recommend a read.

Some believe that “premature optimization” is generally a worse offense than “technical debt”, because at least technical debt saves you time now (although you need to pay back later), and the argument is that technical debt wastes less time than premature optimization on a net basis:

“There is no optimization included in this concept (of premature optimization). Optimization is doing something to improve value delivery. Eliminating waste is one form of optimization. This premature “optimization” introduces waste now (time is spent while not adding value). And if that isn’t bad enough, it introduces future waste as well.

“To me it (premature optimization) seems even worse than technical debt. Both (premature optimization and technical debt) result in future waste, but with technical debt you at least don’t waste a whole lot of time now.”

Comment by Henri van der Horst

However, is it really true that premature optimization only wastes time and creates no benefit at all? Randall Hyde argues that premature optimization is not as bad as it sounds – on the contrary, programmers could gain experience and the code as a whole does not suffer a lot:

“One thing nice about optimization is that if you optimize a section of code that doesn’t need it, you’ve not done much damage to the application. Other than possible maintenance issues, all you’ve really lost is some time optimizing code that doesn’t need it. Though it might seem that you’ve lost some valuable time unnecessarily optimizing code, don’t forget that you have gained valuable experience so you are less likely to make that same mistake in a future project.”

“The Fallacy of Premature Optimization”, Randall Hyde

To put it simply, Hyde considers premature optimization to be a “tuition” paid for how-to-code-better. If we go with Hyde’s argument, then the logical implication would be that technical debt is worse than premature optimization – the former teaches you nothing (other than that being lazy in the moment has its consequences down the road – which is something you have chosen to conveniently forget the moment you decide to go lazy and let the technical debt accumulate).

Some say premature optimization and technical debt, instead of being opposite concepts, overlap in meaning:

“You suggest premature optimization as an opposite, but I would say that premature optimization is technical debt. At least in a software context, optimization usually comes at the expense of readability and maintainability of the underlying code. If you didn’t need the optimization to support the use of the system under design, all you accomplished is making the code more difficult to maintain. This difficulty in maintenance is likely to cause new features to take longer to design, develop, test, and deploy, which is a key indicator of technical debt.”

Comment by Thomas Owens

To rephrase, Owens’ comment above argues that premature optimization creates problems that need to be remedied later, and I agree with him on that. What I disagree with, however, is that premature optimization creates “technical debt.” If we use the definition from Wikipedia above, technical debt refers specifically to problems caused by being lazy now (going for an easy solution or not doing anything), instead of being inappropriately / unwisely diligent (i.e., premature optimization). Owens has broadened the definition of “technical debt” in his comment to refer to code with any kind of problems – regardless of whether the cause was laziness (technical debt) or wrongly-guided diligence (premature optimization). And the preceding sentence is a nice way to summarize where I stand on this:

I believe both “premature optimization” and “technical debt” create problematic code that need to be fixed later – the key difference is in the root cause of the problem. Premature optimization is caused by misguided diligence, which creates very low ROI at its best or 0% ROI (100% wasted efforts) at its worst; technical debt is caused by mere laziness. While technical debt reinforces the old lesson that one should not be lazy, premature optimization shows that too much of diligence could be a bad thing.

Writing great code does not mean writing perfect code at every single step, and not every single line is worth investing the same amount of time & energy. Premature optimization is the result of incorrectly optimizing your time – which is of limited supply – and is the cause of failed maximization of the quality of your code output.

That was a mouthful yet just the start on the interesting debates surrounding premature optimization. We then slide further down the slippery slope to talk about the slippery slope itself.

The Premature Slippery Slope and the “Swiss Cheese” Model

Slippery slope means “a relatively small first step leads to a chain of related events culminating in some significant effect. (Wikipedia)” It has been more than four decades since Donald Knuth first popularized “premature optimization” in his 1974 paper – and four decades is a time long enough for his statement to fall down a premature slippery slope. 🙂

Some programmers say they want to avoid “premature optimization” as an excuse for being lazy or thoughtless. In this post, Joe Duffy expresses frustration when programmers use Knuth’s statement “to defend all sorts of choices, ranging from poor architectures, to gratuitous memory allocations, to inappropriate choices of data structures and algorithmsin other words, laziness.” It sounds like “premature optimization is the root of all evil” has been slipped down the slope to “optimization is the root of all evil” to “optimization is evil”.

Check out this humorous yet witty take by Randall Hyde on the various manifestations of the “slippery slope” gone too far: “The Fallacy of Premature Optimization”. My favorite part are the sarcastic observations he makes of programmers – some are a bit exaggerating and obviously don’t apply to every programmer, yet they are food for thought and I find myself guilty of slipping into some errors in non-programming fields:

“Observation #3: Software engineers use the Pareto Principle (also known as the “80/20 rule”) to delay concern about software performance, mistakenly believing that performance problems will be easy to solve at the end of the software development cycle. This belief ignores the fact that the 20 percent of the code that takes 80 percent of the execution time is probably spread throughout the source code and is not easy to surgically modify. Further, the Pareto Principle doesn’t apply that well if the code is not well-written to begin with (i.e., a few bad algorithms, or implementations of those algorithms, in a few locations can completely skew the performance of the system).”

“Observation #4: Many software engineers have come to believe that by the time their application ships CPU performance will have increased to cover any coding sloppiness on their part. While this was true during the 1990s, the phenomenal increases in CPU performance seen during that decade have not been matched during the current decade.”

“Observation #6: Software engineers have been led to believe that their time is more valuable than CPU time; therefore, wasting CPU cycles in order to reduce development time is always a win. They’ve forgotten, however, that the application users’ time is more valuable than their time.

“The Fallacy of Premature Optimization”, Randall Hyde

The central point that Hyde is trying to get across is that when some programmers claim to be “minimizing premature optimization”, what they are actually doing is “minimizing the time spent on thoughtful design” and, as a consequence, is a betrayal of the engineering ethos to maximize performance. There is no excuse for not investing the time to think through the systematic performance of the system as a whole – this is what expected of any good software developer, per Charles Cook (unfortunately, the link to Cook’s blog article is no longer valid):

“Its usually not worth spending a lot of time micro-optimizing code before it’s obvious where the performance bottlenecks are. But, conversely, when designing software at a system level, performance issues should always be considered from the beginning. A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems.”

Charles Cook

Rico Mariani is making a similar point when he says: “Never give up your performance accidentally.

Now, it’s time for the simple yet clever rule: Never give up your performance accidentally. That sums it up for me, really. I have used other axioms in the past — rules such as making sure you measure, making sure you understand your application and how it interacts with your system, and making sure you’re giving your customers a “good deal.” Those are all still good notions, but it all comes down to this: Most factors will tend to inexorably erode your performance, and only the greatest vigilance will keep those forces under control.

If you fail to be diligent, you can expect all manner of accidents to reduce your system’s performance to mediocre at best, and more likely to something downright unusable. If you fail to use discipline, you can expect to spend hours or days tuning aspects of your system that don’t really need tuning, and you will finally conclude that all such efforts are ‘premature optimizations’ and are indeed ‘the root of all evil.’ You must avoid both of these extremes, and instead walk the straight and narrow between them.

Rico Martiani (Microsoft, Performance Architect, 2004)

Rico’s principle of “never give up your performance” – whether accidentally or consciously – is applicable to all walks of life, not just programming. It is particularly important when we are dealing with complex systems:

What are good values for performance work? Well, to start with you need to know a basic truth. Software is in many ways like other complex systems: There’s a tendency toward increasing entropy. It isn’t anyone’s fault; it’s just the statistical reality. There are just so many more messed-up states that the system could be in than there are good states that you’re bound to head for one of the messed-up ones. Making sure that doesn’t happen is what great engineering is all about.

Rico Martiani (Microsoft, Performance Architect, 2004)

There you go: great engineering is about great performance indeed, but great engineering is not about guaranteeing a perfect performance – in fact, that is downright impossible. Great engineering is about preventing, or minimizing, the chance of resulting in performance that is so messed up that you bring about catastrophic consequences. Great engineering is not about delivering a perfect show 100% of the time – it is about making sure that a messed up sh*t-show happens 0% (or close to 0%) of the time. Therefore, a truly great engineer will steer away from wasteful “premature optimization”, while never forgetting or giving up on the goal of performance optimization. In fact, avoiding premature optimization itself is a tactic to optimize performance by investing time where it matters the most for the output.

On the point about avoiding a sh*t-show from happening, I came across the Swiss cheese model on accidental management, as explained by Matt Parker in his book Humble Pi: When Math Goes Wrong in the Real World:

[The] Swiss cheese model of disasters, which looks at the whole system, instead of focusing on individual people. The Swiss cheese model looks at how ‘defenses, barriers, and safeguards may be penetrated by an accident trajectory.’ This accident trajectory imagines accidents as similar to a barrage of stones being thrown at a system: only the ones that make it all the way through result in a disaster. Within the system are multiple layers, each with its own defenses and safeguards to slow mistakes. But each layer has holes. They are like slices of Swiss cheese.”

” I love this view of accident management, because if acknowledges that people will inevitably make mistakes a certain percentage of the time. The pragmatic approach is to acknowledge this and build a system robust enough to filter mistakes out before they become disasters. When a disaster occurs, it is a system-wide failure, and it may not be fair to find a single human to take the blame.

Humble Pi: When Math Goes Wrong in the Real World (Matt Parker)

The Swiss cheese model is very easy to visualize: imagine putting slices of Swiss cheese on top of each other, each slice with holes on them representing problems. Imagine catastrophic events only happen if the holes on each slice happens to line up, and an error could pass through them in a straight line. As Matt Parker points out, when a bunch of mistakes “conveniently” line up and result in a gigantic mistake, it is usually indicative of some systematic issues. This is not to say that individuals or specific actions are not at fault – but one should not focus on the tree and forget about the forest, i.e., the system as a whole. There is often lots to be done on a systematic level, e.g., improved processes or better tools.

Two final remarks:

(1) I am not a programmer and I don’t code myself, so yes, I am commenting on an area of trade that I have little experience of. That being said, just as you don’t have to be a professional mathematician to apply mathematical thinking in your daily life, I believe you don’t have to be a full-time software engineer to appreciate computational thinking. At the end of the day, although concepts like “premature optimization” and “technical debt” originated in the context of software, they could be applied to and maintain relevance in all walks of life;

(2) I highly recommend Matt Parker’s highly entertaining & educational book on mathematics: Humble Pi: When Math Goes Wrong in the Real World. If you love mathematics, there is no reason not to read it. If you hate mathematics, the biggest reason to read it is it will make you fall in love with math. Mathematics is a truly beautiful language and way of thinking.

See you later, world.

Naming Contest: A New “A”-Word For A.I.

Here’s a naming contest: pick a new word the letter “A” stands for in A.I. – conventionally known as artificial intelligence. What would be your pick?

Candidate #1: A = Amplified

Just like A.I., machine learning is an amplifier of all things human…reflecting [human actions] exponentially back into society, that’s where the hidden danger really is.

Dekai Wu, Professor at the Hong Kong University of Science & Technology, one of eight inaugural members of Google’s AI Ethics Council, in an interview on the “Exponential View with Azeem Azhar” podcast

Think of A.I. as a 10-X magical amplifier. Whatever we feed it – whether good or bad – will get reflected back at us with greater force.

One problem this brings is to amplify our biases – whether they are explicit or implicit. For instance, machines could internalize stereotypes, as a study found this: “European American names were more closely associated with pleasant words than they were with unpleasant ones, in comparison to African American names, and female names were more closely associated with words that have familial connotations than with career-oriented words, as compared to male names.”

Candidate #2: B = Biased?

Being original, you come up with this out-of-the-box answer: why don’t we scrap the letter “A” altogether, and replace it with the letter “B” instead? How about B.I. for “Biased Intelligence”?

While “artificial” is a relatively neutral term, “amplified” is neutral / slightly positive, “biased” takes a U-turn and takes on a negative connotation. But the picture is not all gloomy. On the bright side, biases from algorithms may be easier to spot:

In contrast to human thought processes, certain elements of algorithmic decision-making—such as the inputs used to make predictions and the outcomes algorithms are designed to estimate—are inherently explicit, though not always publicly visible.

Amy Merrick, “How making algorithms transparent can promote equity“, Chicago Booth Review

Biases from algorithmic decision-making are more explicit (than human biases), in that we could identify & measure them with objective data. We could analyze the record of all decisions made. In contrast, it is much harder to quantify how “biased” an actual person is in real life – it’s hard to imagine tracking every single decision, action or word said by a person and analyzing how much of that is attributable to “bias”.

Candidate #3: Get Rid of the 1st Letter Altogether

Now to be even more original, some of you may question why we need an adjective in front of the word “intelligence” in the first place. As mentioned in this Chinese interview on the podcast “迟早更新”, a guest speaker mentioned the ultimate purpose of artificial intelligence is to get rid of the “A” – i.e., the intelligence of machines no longer seems artificial.

You could argue if we want our name to be more aspiring and reflect our optimism about the future, we should adhere to “less is more” and scrap the letter “A” in the first place.

Candidate #4 (&5): Let’s Go Natural & Organic

At this point, some of you may ask: since the opposite of “artificial” is “natural”, why don’t we rename A.I. as Natural Intelligence? Or if we want to stick with the initials (since people are already so used to it), we could call it All-Natural Intelligence? Or perhaps Almost-Natural Intelligence for now, before A.I. hits perfection?

Or how about replacing “natural” with its cousin, “organic”? Say Almost-Organic Intelligence? Or Inorganic Intelligence for a fancy twin-initial of I.I.?

I will leave it to yourself to explore the rabbit hole…be sure not to dig too deep! 🙂

What is your pick or nomination? Share your views – write to me at fullybookedclub.blog@gmail.com or reach me on LinkedIn.

Interested in similar write-ups? Apart from publishing articles on this blog, I also send out a newsletter with original content and curated ideas. Subscribe here or view past issues here.

Talk Takeaway: “Fintech & Blockchain in China” by Prof. He Zhiguo (University of Chicago)

I attended a talk by Prof. He Zhiguo from the Booth Business School at the University of Chicago, titled “Fintech & Blockchain in China”. The talk was recorded and I hope it will be shared online for public view later. In the meantime, here are the most interesting parts I took away:

Pig Faces & Fraud Detection in Insurance

…have nothing in common? You read the section title and stare at me with a puzzled look.

You are not alone – that’s what I thought too. But contrary to our belief, the technology of identifying the face of a pig is very valuable for insurance companies to detect fraud. (Yes, “facial recognition” does not necessarily has to mean human facial recognition.)

Here’s the trick: imagine you are an insurance company, and a farm owner comes to you to buy insurance for his pigs – he wants to secure against say diseases or other factors that may cause his pigs to fall ill or die.

As a scheming insurer, you worry about a potential fraud case – say you enter an insurance contract with the farmer for 100 pigs. Ten months later, the farmer comes back to you and argues that one of the pigs has caught an illness covered by your insurance contract – let’s call that pig Piglet X.

Now, how can you be sure that the Piglet X is one of the original 100 pigs covered by your insurance contract? How can you be sure the pig you see today is the same pig you saw ten months ago? The answer is: you cannot – unless you have technology that could reliably recognize the face of a pig!

At current technology levels, an accuracy rate of ~80% is already top of the league – with much room for improvement. When the number of pigs is large enough, a 20% error rate could mean a considerable amount for an insurer!

Understanding Bitcoin Mining: Think About Kings & Followers (a mathematical game-of-thrones)

In explaining the rules-of-the-game for Bitcoin mining, Prof. He used the analogy of a “game-of-thrones”.

In Bitcoin “mining competition”, each round has multiple miners compete for their own block of transactions to be chosen as *the* canonical block that everyone else follows. Think of each round of Bitcoin mining competition as multiple miners competing to be elected the King-of-the-Round that gets to write history – and this history would be recognized as The Universal History that everyone treats as sacred. All other versions of “history” from other miners – those who lose the fight for the throne – are treated as heresy not to be trusted.

Importantly, to be elected the King and to keep the “crown”, you must write history truthfully. If you blatantly lie (e.g., make up a transaction), you risk losing followers, i.e., your people ‘rebel’ and go after a new king. In other words, there are cryptographic / mathematical checks & balances to make sure the “King” does not get to dictate history entirely to his will.

One common critique of Bitcoin mining is that it is an Arms Race – plus a relatively inefficient arms race. With more inputs in the race (i.e., more miners and / or more computing power & energy that enter the race), the output does not grow proportionately.

Money, In Simplest Terms, Is A State-of-the-World With 3 Factors: You, Me, How Much

The concept of money, put simply, is an accounting system – it simply answers this question: who paid whom by how much?

Say I paid you $100. The ‘money’ in this case is a “state-of-the-world” with 3 variables – you, me, how much ($100).

When we say “money”, we mostly care about one of two things:

  1. Who are the transacting parties – sending vs. receiving?
  2. What is the amount transacted?

The forms of money could vary – from paper money to numbers on a ledger to digital entries with no physical form…but all forms of money have one thing in common: they (aim to) represent a state-of-the-world with 3 key factors: You (Who), Me (Who Else), How Much?

Interested in similar write-ups of fun ideas across all walks of life? Subscribe to my newsletter (free!) and check out past issues.

Have comments or want to discuss more? I’d love to hear from you! Write to me at fullybookedclub.blog@gmail.com or reach me on LinkedIn. (P.S. At time of writing, I am working at a fintech / blockchain company.)

Interested in other fun events in Hong Kong? Check out this newsletter where I list out events I’m going to this month.

* * *

Special thanks to the University of Chicago for hosting this talk, for free, at their gorgeous Hong Kong campus. The event was well-organized – shuttle buses to and from metro stations, refreshments & food, name badges for registered attendees were thoughtfully provided for. Kudos to the team!