Benford’s Law

august 28th, 2022

I was curious as to what I actually could do with Benford’s Law. Could I detect anomalies that would warrant questioning? Where could I apply it?

So – first off – what is Benford’s Law – or the Newcomb-Benford law? It is a natural occurring distribution if the domain spans several orders of magnitude. It states that 1 is then – by far – the most prevalent most significant digit. The theoretical frequency is found by applying log10(1+1/digit).

Let’s say our data set is based upon the number of people living in cities, then the cities with 10-19 million, 1 million, 100 -199 thousand, 10-19 thousand, 1 thousand, 100-199, 10-19, and 1 citizen would cover a little more than 30% of the cities, whereas cities with 20-29 million, 2 million, 200-299 thousand, 20-29 thousand, 2 thoushand, 200-299, 20-29, and 2 citizens would cover 17.6% of the cities.

I find this counter intuitive, and thus very fascinating.

I had a bank statement with deposits and withdrawal – 212 in total – I though, why not try to apply Benford’s law to that lot and see how well it fits, and if not, if there are any explanation.

To get the most significant digit I chose to use the Log10 function basically writing the number in scientific notation. Then taking the exponent and divide the original amount to get a power of one, then finally take the absolute integer value:


Let’s assume the amount is 12345, then log10 would give us 4.091491, we then divide 12345 by 10^4 to get 1.2345, then take the integer value – not rounding – this gives us 1.

Using dplyr in R to get the frequency for this, I used:

data_hist <- %>% group_by(digit=abs(as.integer(amount/10^floor(log10(abs(amount)))))) %>% summarize(count=n()) )

Which gave me:


To add the Benford values simply add the values:

data_hist$benford = log10(1+1/data_hist$digit)

We would also like to have the frequency as a percentage, not as an observation count.

data_hist$freq = data_hist$count / sum(data_hist$count)

Now, we could use these two values to calculate which digits are the most off either simply using freq/benford or (freq-benford)/(freq+benford) – but let’s just plot the data as a bar chart along with the Benford curve.

ggplot(data_hist, aes(x=digit, y =freq)) + geom_bar(stat=”identity”) + geom_line(aes(x=digit, y = benford)) + labs(title = “Bank payment most significant digit: Benfords law”) + scale_x_discrete(“digit”, data_hist$digit, waiver(), factor(data_hist$digit))

It would seem tha 3, 5, and 9 are overrepresented, which begs the question: Why?

It turns out, that I have subscriptions in here and those are not random, and will thus skew my distribution.

While this was relatively easy to do, and I did spot a few outliers, it seems I didn’t quite get what I was looking for. And the 4 subscriptions didn’t expose themselves in this plot – maybe if the other subscriptions had been removed. Studies for another day.

Pipe Dream

august 29th, 2016

In which we speculate about software reuse, serverless architecture, and find out it is all a pipe dream in the end.

Let us begin in a different business: Plumbing. I can go to an online store and buy pipes and fittings, taps, heaters, and a lot of other stuff. Components can be bought for electricians and electronics in a similar fashion. That is, if I want, I can buy components and put them together as I see fit – luckily you have to be certified to do create installations in your home – at least in Denmark. Software on the other hand is almost always bought in complete installations.

While it did not make any sense to buy a host of single components in the old days, there is basically no obstacles to do so today. Amazon (and other *aaS vendors) sell by the second or even millisecond, which means you could make micropayments for micro – or even nano – services. It seems it should be possible to have a McIlroy (bash) pipe connection of components across the internet with differentiated payment schemes. Perhaps a Mac Automator for the web.

The step up from “basically no obstacles”

Communication – both the format and the pipes are obstacles. In plumbing, you cannot seamlessly fit a 1 inch to a 2 inch pipe, which is visibly different. If water runs through the system, then water runs through every part. In IT the common denominator is text. Sometimes this seems too simple, but it works quite well for bash programming.

The other element, the pipes or rather the network. Networks are slow compared to local disk access, yet n-distributed parallel execution is extremely fast, and with a forwarding address on the output, i.e. redirection of STDOUT in bash, the data would not have to return to the origin for every segment.

This then suggests that there should be a protocol along the lines:

setup: parameters, forward to, payment key
execute: stream of data

Having such a setup would allow you to actually define the schema of the data – if it conforms to a schema. This in turn would allow SQL like filtering, but that really depends upon the service.

Why would I want this?

Well, first off, real re-use. It seems silly that we’re re-inventing the wheel every time a new Rails/Django/Drupal/… is installed. It seems unnecessary that a non-programming customer should have to setup and maintain a webservice to do something which is already possible to do by using existing components possibly in a different way. Even if the customer does this, everyone else with the same ideas would have to do the same for themselves.

Naturally I want it because I’ve run into some missing functionality in one product, that I could construct myself or use from another vendor, but without the pipe and the linking it is not possible.

Whether it is a poor search functionality in a webshop, an e-mail filtering/handling service, or news feeds I’d like to filter – I have to do it myself, and I have to pull the information.

Why it is all a pipe dream after all

Spinning up the services can take a lot of time. Sending massive amounts of data across the web will congest an already congested infrastructure. Hoping for every service provider to provide a standardized service interface is in itself a pipe dream.

Pipe dream – a hope, wish, or dream that is impossible to achieve or not practical (Merriam-Webster)

Programming by the pound

august 21st, 2016

In which we try to figure out why some programs are worse off for the end users than they need to be.

We are likely to spot the root cause and less than pleasing hairdos if a hairdresser cut by the pound, that is, with no regards to style or aesthetics, but rahter focused on cutting a pound of hair off each customer.

We’d see: bowl cuts, bald, half-bald, … Speed and weight is of the essence.

The source code of some programs looks as if they have been created in a similar fashion. Just push code in without a sense of style or aesthetics. Copy and paste, ad hoc design from the beginning, Rube Goldberg like amendments.

Most people will not be able to see this, as they only have the user interface to interact with, if the software ever hits the market.

The programmer on the other hand will never learn any better. Basically he is doing a stellar job – the customers get served in a speedily manner, and there are no complaints. As mentioned above, the customers are less likely to spot the ugliness as they don’t see the code. The programmer rarely sees other peoples code, and likely cannot tell the difference between a good style and his own.

Sort of having a blind barber tell you: “Looks good to me.”

But if the end users are happy with the delivered product should we care at all? Well that depends, we’re all paying the price. The price for the software has to be earned somewhere. If the productivity of the end users doesn’t increase to cover the cost, then the choice to get the software in the first place was a bad idea. The cost of updates and additions goes up the more convoluted the source code is, because it will take longer to fit the new requirements into the existing code. Starting from scratch means losing the entire investment and betting on another system with a high probability of having the exact same issues. You may even lose the part of your data.

I’m not arguing that the hairdressers shop should be a cathedral, rather that it shouldn’t be a shanty building with wires and hoses dangling in dangerous positions.

Whenever you spot a user interacting with a user interface, and they become frustrated, odds are that the developers had more focus on programming by the pound getting things done than actually enabling their customers to get things done.

The architects grievance with MVC

juli 19th, 2016

MVC is a separation of concerns pattern in which you have a Model, a View, and a Controller for a given entity. The Model should contain the data and methods for the entity while the view – or views – are responsible for visual representation of the model, e.g. a pie chart or a bar chart of data. The controller is responsible for the providing the model with command requests.

Often, when developers are trying to follow the MVC pattern they follow the pattern as implemented by Rails; all the models go into the app/models directory, all the views reside in app/views, and all the controllers will be found inside the app/controllers directory.

This is comparable to designing a house and have a special room for all faucets, power outlets, and drains, and another room for all the handles and switches.

The faucet you would usually find in the kitchen will now be labelled “kitchen” but reside in the faucet room, and will likely sit next to the faucet labelled “bathroom”.

You could run a hose from the faucet to the kitchen, but that would only save some trouble. The handle for turning on and off the water resides in the controller room, you have the “kitchen faucet” controller. Next to these you may have the power on/off switches for the oven.

This construct is quite easy for the installers to set up, for the software equivalent this is easy on the framework.

But we are not building houses to please the work crew, but rather for the ease of living. We should focus upon the user experience as well, when we write code.

What we are achieving by this model is a high cohesion in unrelated entities performing the same role, which is contrary to what Larry Constantine suggested in 1968. Teasing apart the application for reuse is much more difficult – we cannot easily swap out one kitchen for a different one.

The better structure would be to have the strongly related entities in the same place, i.e. instead of:




it would make sense to have:



At least this would easily identify the views associated with a specific model, and if otherwise keeping with modular discipline should make it possible to pull out one entity.

Logic in the controller

Sometimes you run across a project where there is (model) logic in the controller, but that is a bad idea. It should be possible to keep the controller and change the model implementation, e.g. my keyboard (controller) does not have to change because my application changes or the keyboard layout changes. The controller should send events to the provided model to be interpreted there.

If you have logic in the controller, then you will need to change both the controller and the model, when you make a change, that means you have one more element in the cognitive load, which makes things just a bit more complicated. Complication that does not have to exist.

It seems that by tooling we are building software that is easy for the frameworks and the original constructors, but not good for those who have to maintain or live with the product. That is simply not the right way to be service minded.

Snake oil and “everybody can program”

juli 6th, 2016

Everyone can program, but not necessarily code – I fully agree with Quicy Larson: Coding isn’t easy.

I believe that all of us are capable of programming, that all of us can – in the Socrates Meno dialogue way – we have the ability to describe a set of procedures to apply in a given order.

We’re all capable of writing novels, but very few of those who do will make a successful novel.  That is to say: “It is not as easy as it looks or sounds.”

All of us can cook, though we’d be pressed to get a Michelin star.

Not all of us can write these procedures in a programming language, and – of those who can – not all should. And not all software should be written in a procedural/imperative style.

Some will throw together programming language correct grammar with no regards to the task being solved. I’m not sure this should constitute as programming. It is true that working software provides value to the user, but with little understanding of the needed solution in both behaviour and coding, there is so much more value to be had by doing it right (but there are so many more ways to do it wrong).

If they had worked the same level in the restaurant business, it is likely Gordon Ramsay would have called it a Kitchen Nightmare. As no one can see the internals of the code fewer customers turn away from the business, and the parallel fails in that we usually eat every day, we don’t get a new software product served every day.

In the business world scarcity with increased demand means prices will go up, leading to more resources being applied. In software development, this leads to people who really shouldn’t program are being hired to hack away at the next big thing.

As a society we are not better off having poorly constructed “cathedrals” forced upon us. If everytime we need to go through a door, we will have to use jump through hoops, we would be quick to remedy this odd contraption, but in the software world, there are usually no such way.

The pursuit for infinite savings allows for expenses now, but apparantly not investment in solid work, nor the hidden value by improving the software.

I am still wondering why there is so little regulation in a field so wide and with so far reaching consequences. Why do people accept snake oit?

Simple insights into source code bases

april 8th, 2016

Does the code base scream the domain?

I was wondering whether or not it would be possible to use parts of PageRank ( to gain insights into a code base. If PageRank works on web pages to ascertain what the contents of the page relates to, then likely a similar way could be construed for source code.

The simplest thing that could possibly work?
I chose the n-gram ( approach – unigram to be specific. While bi- and tri-grams are better for text, I’m not so sure for code bases, nevertheless, it could be tested.

The simple process

  • Find all files of a specific language inside the project structure. Likely it would be prudent to examine source and test code independently
  • Remove all for of new-lines
  • Tokenize on non alphanumeric entities
  • Build histogram of these tokens

Removing comments and possibly strings would likely be a good idea, but that would require parsing and not just bash.

find . -name “*.java” -type f | xargs cat | tr -d ‘\n’ | tr -d ‘\r’| tr -cs ‘[:alnum:]’ ‘\n’ | sort | uniq -c | sort -rn > wordfreq.txt

Looking at gerrit’s word frequency, we get something along these lines:

  • 27560 import
  • 25092 the
  • 21758 com
  • 21385 google
  • 16615 License
  • 14676 public
  • 14544 gerrit
  • 13553 String
  • 12309 final
  • 11823 return
  • 10431 private
  • 10191 new
  • 9809 if
  • 8940 this
  • 8196 0
  • 7665 in
  • 7225 void
  • 7163 under
  • 6809 a
  • 6590 null
  • 6389 server
  • 6234 client
  • 6185 static
  • 6125 for
  • 6024 2
  • 5965 org
  • 5953 to
  • 5384 or
  • 5212 class
  • 4972 may
  • 4963 Override
  • 4934 name
  • 4923 get
  • 4752 distributed
  • 4666 of
  • 4602 java
  • 4492 throws
  • 4392 n
  • 4164 is
  • 3705 e

Reading it “import the com google License public gerrit String final return private new if this 0 in void under a null server client static for 2 org to or class may Override name get distributed of java throws n is e” doesn’t quite make sense. Clearly the “License” and namespace “” influences heavily.

Removing the keywords we get:
“the com google License gerrit 0 in under a server client 2 org to or may name get distributed of n is e”

It is not as if the source code really screams what gerrit is about. From Chinese Whisper reconstruction I get something about a “client server with name distribution” – not quite the “Gerrit provides web based code review and repository management for the Git version control system” tagline.

The frequency count drops rapidly – let’s pull the data into R to see if there are some patterns.

gerrit <- read.table(“wordfreq.txt”, header=F)
f <-$V1))
f$Var1 <- as.numeric(as.character(f$Var1))
plot(log(f), type=”l”, xlab=”log(frequency)”, ylab=”log(count)”, main =”Gerrit source code tokens\nlog-log plot”)

gerrit loglog plot

This seems to be a power law distribution, but with a lot of outliers above 7 (corresponding to around 1100) – and with an anomaly just short of 8 (corresponding to 2374 to be exact). This is quite likely the template License.

gerrit[gerrit$V1 == 2374,]
V1 V2
101 2374 Unless
102 2374 Licensed
103 2374 LICENSE
104 2374 law
105 2374 governing
106 2374 express
108 2374 compliance
109 2374 BASIS
110 2374 agreed

Plotting the more conformant data

k <- f[f$Var1 <1100,]
plot(log(k), type=”l”, xlab=”log(frequency)”, ylab=”log(count)”, main =”Gerrit source code tokens\nfrequency < 1100\nlog-log plot”)
abline(glm(log(k$Freq ) ~ log(k$Var1)), col=”red”)

glm(log(k$Freq ) ~ log(k$Var1))

Call: glm(formula = log(k$Freq) ~ log(k$Var1))

(Intercept) log(k$Var1)
8.554 -1.372

Degrees of Freedom: 495 Total (i.e. Null); 494 Residual
Null Deviance: 1299
Residual Deviance: 152.9 AIC: 829.7

[1] 510.1444

gerrit loglog < 1100

So, we should likely look at values with the frequency in this area to get a better suggestion for what the code base is used for.

gerrit[gerrit$V1 < 600 & gerrit$V1 >= 500,]
V1 V2
240 598 code
241 597 url
242 592 rw
243 590 values
244 589 label
245 581 plugin
246 580 v
247 563 ctx
248 561 Result
249 558 Util
250 550 UUID
251 544 2013
252 541 bind
253 538 cb
254 533 IdentifiedUser
255 532 err
256 531 u
257 530 o
258 528 substring
259 526 master
260 525 Repository
261 522 CurrentUser
262 522 as
263 521 res
264 520 dom
265 517 assertEquals
266 516 token
267 508 start
269 508 interface
270 507 lang
271 506 servlet
272 500 Object

This has a better match with the core of the project, though we still see comment debris, e.g. “2013”

Gerrit can be found at – I was looking at the codebase from 02bafe0f4c51aa24b2b05d4d1309ecfc828762c0 (January 20th, 2016)

Independence check

With the previous information – and the notion of a vector representation – I thought about the possibility to check for independence.

If two vectors are independent, then they should be orthogonal. If two code bases are independent, then they should be orthogonal in their domain vectors. To test this, we can try to plot the words used in the code bases. Naturally, we would need to strip away the language keywords, but as we will see, this is not quite as necessary as expected. We can even gain other insights by looking at the keyword uses.

So, as above, I created word frequence files for two JavaScript projects.

p1 <- read.table(“p1-wordfreq.txt”, header=F)
p2 <- read.table(“p2-wordfreq.txt”, header=F)

We don’t really want the exact count, so we pick the relative frequencies

p1$V1 <- p1$V1/max(p1$V1)
p2$V1 <- p2$V1/max(p2$V1)

Now, we only want to look at the tokens they have in common to see whether or not they are orthogonal – the tokens not common are already orthogonal.

common <- merge(p1, p2, by = “V2”)

plot(common$V1.x, common$V1.y, xlab=”p1″, ylab=”p2″, main=”Comparing p1 and p2″)

comparing JavaScript projects p1 and p2

Next, we want to identify the JavaScript keywords.

js <- read.table(“JavaScriptKeywords.txt”, header=F)
names(js) <- “V2″ # js is a single column, we want to merge on the keywords in the same column names
js2 <- merge(js, common, by=”V2″)
points(js2$V1.x, js2$V1.y, pch=19, col=”red”)

# mark the 20% in both directions, thus we get a Pareto segmentation
abline(h=.2, col=”blue”)
abline(v=.2, col=”blue”)

high <- common[common$V1.x > .2 & common$V1.y > .2,]

The most frequently used non-keywords:

high[-(match(intersect(high$V2, js2$V2), high$V2)),]
V2 V1.x V1.y
34 data 0.4170306 0.2444444
49 err 0.5545852 0.4555556
50 error 0.3013100 0.8000000
115 censored 0.6812227 0.6888889
131 settings 0.2052402 0.2111111

The second to last in this list has been censored, it does provide an indication that the projects aren’t quite independent. The error, err, and data are so common and nondescript that it is somewhat okay to find them in this area, though I’d rather have less callback functions and better names in general.

The most frequently used keywords:

high[(match(intersect(high$V2, js2$V2), high$V2)),]
V2 V1.x V1.y
47 else 0.3449782 0.4444444
65 function 1.0000000 0.8000000
72 if 0.4716157 0.5000000
154 var 1.0000000 0.6444444

Again this can be explained by a lot of callbacks, which are often on the form:

function(err, data) {
} else {

Another explanation could be lots of anonymous functions, though usually callback.


Removing comments and imports should provide for a better picture of the code base. Even so, it seems to not exactly scream the domain or architecture.

Bi-grams could be another improvement.

Independence check of supposedly independent projects may reveal that they aren’t or that the code is skewed towards an unwanted design.

It is far from perfect, but as always it brings a different way of looking at the code base, and it is relatively quick to do.

Comparing large code bases somewhat defeats the purpose as regression to the mean tells nothing much of interest. Taking Gerrit as an example, then the most used token is “import”, which is used 27560 times and as we saw above, the interesting parts reveal themselves around 1100 uses, which is less than 4%.

comparing gerrit to dotCMScomparing gerrit to dotCMS (loglog)

Comparing Gerrit and an old repo I had of dotCMS, we find that the most used keywords including entities in java.lang are:


Which could indicate a lot of String constants and conditional logic (with return statements instead of else clauses), and with a possibility of Primitive Obsession – well, the web does call for a lot of String use.

How many bugs are left?

januar 7th, 2016

After reading How many bugs are left? I was intrigued by the use of the Lincoln Index to estimate the number of bugs residing in a solution. But after reading the blog post I was bit baffled that the conclusion didn’t pick up on what was really reflected in the data.

In the blog post there are 2 examples concerning 2 QAs, A and B, finding 20 and 30 bugs respectively in the each case. The real difference is the overlap.

In the first example there is only 1 bug in the overlap, and the Lincoln Index is then 20*30/1 = 600 – in total 49 bugs found

In the second example there are 18 bugs in the overlap, making the Lincoln Index 20*30/18 = 33.3 – in total 32 bugs found

The probability that a QA finds a bug is then:

QA A QA B Total
Example 1 20/600  = .03 30/600  = .05 49/600  = .08
Example 2 20/33.3 = .6 30/33.3 = .9 32/33.3 = .96

While this is an example of the method it tells me something not mentioned in the blog post: The bugs in Example 2 must have been extremely obvious making it questionable whether the trials are independent.

Another thing, while it may seem like overkill to have 2 QAs in the 2nd example, it seems too little to be worth the effort in the first example, but we really should have 3 QAs in both cases.

There is nothing indicating the size of the example solutions – which is part why the example is good, and part why I was a bit skeptical at first. There is no right answer for the examples, but if the Lincoln Index are to be considered sufficient estimates on the number of bugs in the systems, then what should we do?

Starting with Example 2 we have found almost all the bugs, and hopefully the fixes will not introduce new ones. There is a good probability that the remaining bugs will be fixed when the code base is fixed – after all 33.3 bugs in a code base is not a lot (depending on the size of the code base itself naturally).

Examining Example 1 we have a different problem. We have discovered approximately 1/12th of the bugs, and we have an estimated 600 bugs in the system. It would seem that we are in dire need for some sort of assistance. Possibly rework of the system as well.

Code base size estimates

Yes – I know – “Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs” (Bill Gates), but the bugs has to come from somewhere, and being somewhat consistent in our styles the number of lines do pose as a quantifiable metric.

According to Dan Mayer (bugs per line of code ratio) referencing Steve McConnell, then we have different ratios of bugs per 1000 lines of code (bugs/kloc): 3, 10-20, 15-50 bugs/kloc

Apart from the obvious 600/33.3 = 18 factor in number of bugs between the examples, which may be as simple as 18 times as much code, there are alternative explanations for the number.

Example 1
600 bugs at  3 bugs/kloc =  200,000 lines

600 bugs at 50 bugs/kloc =   12,000 lines
Example 2
33.3 bugs at  3 bugs/kloc =  11,111 lines

33.3 bugs at 50 bugs/kloc =     666 lines

That is, if Example 1 is 200 kloc with 3 bugs/kloc, and Example 2 is 666 lines with 50 bugs/kloc, then Example 1 is 300 times the lines, but only 18 times the bugs – in which case it is a rather small amount of bugs even at 600. Example 2 though should really clean up the mess.

If it is the opposite, that is Example 1 12,000 lines at 50 bug/kloc, and Example 2 11,111 lines at 3 bug/kloc, then the number of lines are almost the same, yet the number of bugs is 18 times higher. In this case Example 1 is truly in dire needs of some help.

Alternative Analysis

These speculations are really afterthoughts on the blog’s content. My real beef was with the Lincoln Index itself – it degenerates at a 0 overlap, basically saying that if two observers examine the same area, they must find some of the same elements. That is a natural assumption if observers are stringent and actually look at the same things. Seeing some of the Escape Room issues where contestants overlook the obvious it would seem that for a software solution there would be several opportunities for QAs to overlook something the developers already overlooked.

While there are some suggestions on improving the Lincoln Index in case the overlap is less than 10, e.g. Bailey (1952) suggesting N = A*(B+1)/(C+1), which would lead Example 1 to 310 bugs instead of the 600. My idea was to turn to the German Tank Problem and estimate the number of bugs from the Bayesian credibility score.

By applying our own serial number system to the bugs (tracking ID) we aren’t really playing into the correct scenario, but bear with me. The maximum serial number we see is thus the total number of unique bugs found. We only have 2 observations – one from each QA.

Only having 2 observations mean the mean, µ, is infinite. We should have at least 4 observations to come up with a mean and standard deviation.

We can still try to make a credible guess. Given at least 2 observations, the credibility that the number of bugs is equal to n, is:

0 if n < m
k-1/k * C(m-1,k-1)/C(n,k-1) if n >= m

m = number of distinct bugs found in the k observations

As k is 2 in our case, the formula simplifies into:

0 if n < m
(m-1)/(n*(n-1)) if n >= m

The credibility that we have more than n bugs is:

1 if n < m
C(m-1, k-1)/C(n, k-1) if n >= m

Again with k = 2 this simplifies into:

1 if n < m
(m-1)/n if n >= m

This latter formula means that if we want to be 95% confident in the number of bugs, n, then 5% risk that N > n: .05 = (m-1)/n <=> n = (m-1)/0.05 = 20*(m-1)

Running the examples under the German Tank Problem setting we get:

Example 1: A = 20, B = 30, C = 1, m = A+B-C = 49

Number of bugs at 95% confidence: 20*(49-1) = 960

pA    = 20/960 = 0.02

pB    = 30/960 = 0.03

total = 49/960 = 0.05

Example 2: A = 20, B = 30, C = 18, m = A+B-C = 32

Number of bugs at 95% confidence: 20*(32-1) = 620

pA    = 20/620 = 0.03

pB    = 30/620 = 0.05

total = 32/620 = 0.05

We see that we have a lot more bugs than our previous estimates, but the QAs probability of finding bugs are almost the same (below 5%) for both examples, and we have an estimated 5% of the total amount of bugs.


Looking at the accumulated credibility score, we can see that it grows rapidly, then slows down, perhaps an 80% confidence is sufficient. In this case .2 = (m-1)/n <=> n = 5*(m-1), this is a quarter of the 95% confidence numbers.

Example 1: A = 20, B = 30, C = 1, m = A+B-C = 49

Number of bugs at 80% confidence: 5*(49-1) = 240

pA    = 20/240 = 0.08

pB    = 30/240 = 0.13

total = 49/240 = 0.20

Example 2: A = 20, B = 30, C = 18, m = A+B-C = 32

Number of bugs at 80% confidence: 5*(32-1) = 155

pA    = 20/155 = 0.13

pB    = 30/155 = 0.19

total = 32/155 = 0.20

This is certainly better for Example 1 both with regards to the 95% confidence, but also with regards to the Lincoln Index – even the improved estimate.


I didn’t know about the Lincoln Index, so I learned something new today – that is always good. The original application to estimate the number of bugs in total seems good, at least better than disregarding data from the trenches.

John D. Cook suggests calibrating through experiments. This blog post has been a thought experiment on some of the deliveries presented by the data and an unrealistic application of the German Tank Problem – the odds of getting the “tanks” in sequence diminishes quickly, thus improvements can be applied to the m estimate.

Cutting the confidence level from 95% to 80% may seem drastic – and it is – as it cuts 75% off of the number of expected bugs, but for thought experiments it may be good enough.

QAs are valuable, and there is value in having several (at least 2, but 4 is better) to test a product.


Learning Java, Programming, TDD, Clean Code – which order?

november 26th, 2015

Recently, Marcus Biel asked me to review and comment on his “Free Java Clean Code Beginner Course”.

I’m quite flattered that anyone would ask my opinion, so naturally I gave him some feedback. I think the concept of Marcus’ project is valuable, especially considering the large community (9-10 million) of Java programmers, and the number of would be programmers, and the current state of the quality we – as a profession – provide. Just take a look at some of the questions asked on LinkedIn’s Java Developers group.

One of the key hurdles, I think, is that Marcus wants it all: Teach Java, Programming, OOP, TDD, and Clean Code. While these are all good things to know, I find it quite a lot all at once. That said, what should be left out in the beginning? How should you structure learning programming? The easiest way is to use imperative style – but that is hardly the “right” for Java. Starting out with too much OOP will also lead to highly coupled classes.

If you simply teach Java and programming, you’re bound to fail at only good OOP and Clean Code practices because Java sometimes enforces you to do things in a bad way.

TDD is having its own troubles – as debated by DHH, Fowler and Beck in “Is TDD Dead?”

Rich Hickey compares TDD to “driving a car around banging into the guard rails”, and Donald Knuth says something along the lines of tests are good for probing and figuring out an otherwise unknown domain. This blog has links to both.

Ward Cunningham created Fit, which Uncle Bob built Fitnesseon top of, so I believe that they are quite happy with repeatable testing. Uncle Bob at least writes about it in Clean Code

Edsger Dijkstra said: “Program testing can be used to show the presence of bugs, but never to show their absence!” – but then he was likely into proving correctness using Hoare triplets – the pre and post condition proofs.

In “Working Effectively with Legacy Code”, Michael Feathers says that legacy code is code without tests, and that tests makes it safe to refactor code.

I really like Hickey’s notion. The tests only shows the code in the exercises the tester had in mind. If the tester is the developer, then it is likely a proof of concept rather than an attempt to disprove working software,

I also really like Feathers’ concept – it’s really nice to have exercises for a section of code making sure that a change in the section will not misbehave, when swapped out with an equivalent. At least it is nice to have tests for the modules you depend upon, to be able to check than an upgrade does not cause any bad things. Basically, we use what Dijkstra said – making sure that we are not introducing previously known bugs again.

Knowing programmers, we’re likely to not be modest nor follow the scientific method: Observe, Think, Hypothesize, Create testable predictions, Test, Refine, General theory, nor Deming’s circle: Observe, Plan, Do, Check, Act. It is often more: Hack, Observe, Repeat – using a Waterfall approach it is sometimes more like: hack, hack, hack, observe, wtf!, repeat.

Dijkstra, Hickey, and Knuth seem to have their own disciplined framework in place, and TDD is a formal way trying to introduce discipline to the masses, though often being misunderstood, and due to our bias for confirming our beliefs (“Don’t believe everything you think” by Thomas Kida) we make poor tests more often than good tests. Sometimes we even make tests just to get a high test coverage, because someone, somewhere heard that this was a good metric.

Can you learn Clean Code without knowing programming? I don’t think so, and quite likely, then Clean Code should be left after Patterns – which isn’t currently part of Marcus’ course.

Should you learn Clean Code before creating your own ugly mess? Would you know the difference if taught from day one?

How to refactor a refactored switch/case statement

november 26th, 2015

When good intentions go slightly wrong

For some odd reason I picked up a link to DZONE on “How to refactor a switch/case statement” – the link is now defunct, I’m not sure why. Anyway, Gianluca Tomasino, the original author still has the article on his blog.

So I read through this – I know I dislike switch/case jump tables, though not as much as I hate if-else-if – or as I like to reminisce Sid Meier’s Pirates! and call it the “evil El Sif”

Gianluca is quite right, that one option would be to use the Strategy pattern, but then goes on to show how not to implement this pattern by adding a method for each of the enums, then tie a specific implementation inside the enum ending up with a less readable and less maintainable code.

The enum part is right – eliminate the magic strings, define the different types.

The strategy interface definition is wrong – the name “HasStrategies” does not convey any useful information. The 2 methods bind concrete enums to an interface, 1 abstract method, e.g. ‘execute’ should be sufficient. Then the specific strategy is pushed inside the enums themselves. Enums should not care for whichever strategies you have for them, thus that sort of coupling is not wanted.

In the Decider class, we now define the specific strategy to use, which sort of defies the purpose of extracting the code from a switch – the specific class will now have 2 reasons for change:

  1. Change to the strategy
  2. Change to the enum definitions

“A class should have one, and only one reason to change.” That is the intent of the Single Responsibility Principle

If we add another value to the enums, then we need to change the Decider implementation as well, that is contrary to the Open Close Principle. From the looks of it, we have to change the enums (well, that’s a given), the strategy, and the decider implementation.

What I’d recommend:

Define the strategy interface using only one method

interface Strategy {
    String execute();

Simply define the values

enum Values {

Implement the strategies for each of the values, and add them to an EnumMap

class ValueStrategies {
    final static EnumMap<Values, Strategy> MAP = 
             new EnumMap<Values, Strategy>(Values.class);
    static {
        MAP.put(Values.PIPPO, new Strategy() {
            public String execute() {
                return "methodA";
        MAP.put(Values.PLUTO, new Strategy() {
            public String execute() {
                return "methodB";
    static Strategy get(Values value) {
        return MAP.get(value);

Implement the decider using these elements:

public class AltDecider implements Decider {

    public String call(String which) {
        Values value = Values.valueOf(which.toUpperCase());
        return ValueStrategies.get(value).execute();


Well, the mapping from a primitive to the enum should not take place inside the method, the Decider interface should be modified to fix such hacks, if the String, which, is null or does not represent a Value, then a NullPointerException and IllegalArgumentException respectively will be thrown from the Value conversion.

The names are still not meaningful.

With this solution a new enum value will require a change to Values and the implementation for its strategy inside the ValueStrategies.

If re-use of the strategy implementations were of concern, then naturally they should be implemented in their own classes and not as anonymous values inside the map.

9 bad programming habits we secretly love

oktober 26th, 2015

Reading the article 9 bad programming habits we secretly love I’m appalled that apparently this is considered the norm.

While I have likely used all of these bad programming habits at one time or another, I’m pretty sure that these are mostly due to the fact that the programmer is a poor one. Thus writing the article and hoping to get cheered on, we’re apparently worshiping the mediocre or even poor workmanship.

Depending upon where I reside in the hierarchy you should then either follow my advice or that of the article.

The habits:

No.1: Using goto

Using line numbers, you were bound to run out of valid line identifiers, thus you had to insert one line in essence telling “the code is to be continued on line xxx”, which made reading the code in its entirety really difficult.

The article states that a goto in a case statement will produce something that’s simpler to understand than a more properly structured list of cascading if-then-else blocks. Well, swapping one bad idea for another really isn’t the way to argue. Chain of Responsibility would be the right abstraction in that case. The part that sometimes needs a goto is then a different method called by some of these handlers.

No. 2: Eschewing documentation

Documentation should state evergreen information on the entities. The intent of a class, function, field. In some rare cases it should provide sufficient insights as to why a specific approach is used over another.

The code may be changing – and you’d likely then be violating the Open Closed Principle, but that’s another story – rarely the intent of the code is changing at the same pace as the code itself.

The function names suggested: insertReservation and cancelReservationare really poor names. Quite likely there would be an argument to these functions in the sort of a reservation object, and you would end up with having code as:

insertReservation(Reservation reservation)
cancelReservation(Reservation reservation)

Which – when read out loud – really is stuttering and horrible. I prefer:

insert(Reservation reservation)
cancel(Reservation reservation)

No. 3: Jamming too much code on one line

Readability is at a premium, why would anyone write long lines of code? I know Java is basically a one dimensional source code language – hence the need for semicolons between statements. Why you cannot have a line wrapping string is then a bit odd, but that’s a different story.

Yes, minified JavaScript loads faster, but leave minifying to minifiers.

If you need to put things in a grouped environment, then either use functions or separate within functions with additional blank lines.

The code is not getting longer – well, perhaps line-wise, but not really byte-wise or at least code-wise. The readability on the other hand goes up.

No. 4: Not declaring types

Well, that really depends upon your programming language. In a type safe language it does give you insight when reading the code, what the writer likely had in mind. You know that  a + b is supposed to be string concatenation and not an arithmetic sum if one of the arguments is a string. You know that 1/2 is integer division and will
result in 0 and not .5

No. 5: Yo-yo code

With the web a more frequent part of any development, there is a need for string to something else conversions, and sometimes back to strings again. With JSON we have basic numbers, but no dates or timestamps, thus a string <-> x is needed.

No. 6: Writing your own data structures

Usually you really shouldn’t, not even for an anticipated performance improvement. But then – who knows – maybe you’re writing the next great data structure to be used for decades.

No. 7: Breaking out of loops in the middle

The reason for not breaking or returning at several different places is code readability – and thus maintainability. Odd thing is, then loop breaks are often used for “find first”. Java – being a bit slow on this – does not cater for this functionality, whereas Scala has find(predicate) doing exactly what is needed.

No. 8: Using short variable names (but i, x, and and make sense)

Definitely! Working with coordinates, math, physics you’d be less confusing using the nomenclature of those domains. Using a for an array and l for a list seems to be counter intuitive given habit No.4 “Not declaring types”. To be hones, I don’t care if it’s an array or a list – I care what it is: List of some sort of entities: Books, users, …

No. 9: Redefining operators and functions

This is only funny until you have to debug the 2nd and 3rd redefinition. Use a mapping between whatever needs to be hacked using inverse logic and the sane world.


If you deliberately use any of the bad programming habits – with the exception of No.5, which has a few valid excuses – then my take is that you are a bad programmer. Luckily there are ways to improve – start by not doing these bad things. Follow up by not taking bad advice (mine included).