Archive for the ‘Software development’ Category

The architects grievance with MVC

tirsdag, juli 19th, 2016

MVC is a separation of concerns pattern in which you have a Model, a View, and a Controller for a given entity. The Model should contain the data and methods for the entity while the view – or views – are responsible for visual representation of the model, e.g. a pie chart or a bar chart of data. The controller is responsible for the providing the model with command requests.

Often, when developers are trying to follow the MVC pattern they follow the pattern as implemented by Rails; all the models go into the app/models directory, all the views reside in app/views, and all the controllers will be found inside the app/controllers directory.

This is comparable to designing a house and have a special room for all faucets, power outlets, and drains, and another room for all the handles and switches.

The faucet you would usually find in the kitchen will now be labelled “kitchen” but reside in the faucet room, and will likely sit next to the faucet labelled “bathroom”.

You could run a hose from the faucet to the kitchen, but that would only save some trouble. The handle for turning on and off the water resides in the controller room, you have the “kitchen faucet” controller. Next to these you may have the power on/off switches for the oven.

This construct is quite easy for the installers to set up, for the software equivalent this is easy on the framework.

But we are not building houses to please the work crew, but rather for the ease of living. We should focus upon the user experience as well, when we write code.

What we are achieving by this model is a high cohesion in unrelated entities performing the same role, which is contrary to what Larry Constantine suggested in 1968. Teasing apart the application for reuse is much more difficult – we cannot easily swap out one kitchen for a different one.

The better structure would be to have the strongly related entities in the same place, i.e. instead of:

models/
  kitchen
  bathroom

views/
  kitchen
  bathroom

controllers/
  kitchen
  bathroom

it would make sense to have:

kitchen/
  model
  view
  controller

bathroom/
  model
  view
  controller

At least this would easily identify the views associated with a specific model, and if otherwise keeping with modular discipline should make it possible to pull out one entity.

Logic in the controller

Sometimes you run across a project where there is (model) logic in the controller, but that is a bad idea. It should be possible to keep the controller and change the model implementation, e.g. my keyboard (controller) does not have to change because my application changes or the keyboard layout changes. The controller should send events to the provided model to be interpreted there.

If you have logic in the controller, then you will need to change both the controller and the model, when you make a change, that means you have one more element in the cognitive load, which makes things just a bit more complicated. Complication that does not have to exist.

It seems that by tooling we are building software that is easy for the frameworks and the original constructors, but not good for those who have to maintain or live with the product. That is simply not the right way to be service minded.

Snake oil and “everybody can program”

onsdag, juli 6th, 2016

Everyone can program, but not necessarily code – I fully agree with Quicy Larson: Coding isn’t easy.

I believe that all of us are capable of programming, that all of us can – in the Socrates Meno dialogue way – we have the ability to describe a set of procedures to apply in a given order.

We’re all capable of writing novels, but very few of those who do will make a successful novel.  That is to say: “It is not as easy as it looks or sounds.”

All of us can cook, though we’d be pressed to get a Michelin star.

Not all of us can write these procedures in a programming language, and – of those who can – not all should. And not all software should be written in a procedural/imperative style.

Some will throw together programming language correct grammar with no regards to the task being solved. I’m not sure this should constitute as programming. It is true that working software provides value to the user, but with little understanding of the needed solution in both behaviour and coding, there is so much more value to be had by doing it right (but there are so many more ways to do it wrong).

If they had worked the same level in the restaurant business, it is likely Gordon Ramsay would have called it a Kitchen Nightmare. As no one can see the internals of the code fewer customers turn away from the business, and the parallel fails in that we usually eat every day, we don’t get a new software product served every day.

In the business world scarcity with increased demand means prices will go up, leading to more resources being applied. In software development, this leads to people who really shouldn’t program are being hired to hack away at the next big thing.

As a society we are not better off having poorly constructed “cathedrals” forced upon us. If everytime we need to go through a door, we will have to use jump through hoops, we would be quick to remedy this odd contraption, but in the software world, there are usually no such way.

The pursuit for infinite savings allows for expenses now, but apparantly not investment in solid work, nor the hidden value by improving the software.

I am still wondering why there is so little regulation in a field so wide and with so far reaching consequences. Why do people accept snake oit?

Simple insights into source code bases

fredag, april 8th, 2016

Does the code base scream the domain?

I was wondering whether or not it would be possible to use parts of PageRank (https://en.wikipedia.org/wiki/PageRank) to gain insights into a code base. If PageRank works on web pages to ascertain what the contents of the page relates to, then likely a similar way could be construed for source code.

The simplest thing that could possibly work?
I chose the n-gram (https://en.wikipedia.org/wiki/N-gram) approach – unigram to be specific. While bi- and tri-grams are better for text, I’m not so sure for code bases, nevertheless, it could be tested.

The simple process

  • Find all files of a specific language inside the project structure. Likely it would be prudent to examine source and test code independently
  • Remove all for of new-lines
  • Tokenize on non alphanumeric entities
  • Build histogram of these tokens

Removing comments and possibly strings would likely be a good idea, but that would require parsing and not just bash.

find . -name “*.java” -type f | xargs cat | tr -d ‘\n’ | tr -d ‘\r’| tr -cs ‘[:alnum:]’ ‘\n’ | sort | uniq -c | sort -rn > wordfreq.txt

Looking at gerrit’s word frequency, we get something along these lines:

  • 27560 import
  • 25092 the
  • 21758 com
  • 21385 google
  • 16615 License
  • 14676 public
  • 14544 gerrit
  • 13553 String
  • 12309 final
  • 11823 return
  • 10431 private
  • 10191 new
  • 9809 if
  • 8940 this
  • 8196 0
  • 7665 in
  • 7225 void
  • 7163 under
  • 6809 a
  • 6590 null
  • 6389 server
  • 6234 client
  • 6185 static
  • 6125 for
  • 6024 2
  • 5965 org
  • 5953 to
  • 5384 or
  • 5212 class
  • 4972 may
  • 4963 Override
  • 4934 name
  • 4923 get
  • 4752 distributed
  • 4666 of
  • 4602 java
  • 4492 throws
  • 4392 n
  • 4164 is
  • 3705 e

Reading it “import the com google License public gerrit String final return private new if this 0 in void under a null server client static for 2 org to or class may Override name get distributed of java throws n is e” doesn’t quite make sense. Clearly the “License” and namespace “com.google” influences heavily.

Removing the keywords we get:
“the com google License gerrit 0 in under a server client 2 org to or may name get distributed of n is e”

It is not as if the source code really screams what gerrit is about. From Chinese Whisper reconstruction I get something about a “client server with name distribution” – not quite the “Gerrit provides web based code review and repository management for the Git version control system” tagline.

The frequency count drops rapidly – let’s pull the data into R to see if there are some patterns.

gerrit <- read.table(“wordfreq.txt”, header=F)
f <- as.data.frame(table(gerrit$V1))
f$Var1 <- as.numeric(as.character(f$Var1))
plot(log(f), type=”l”, xlab=”log(frequency)”, ylab=”log(count)”, main =”Gerrit source code tokens\nlog-log plot”)

gerrit loglog plot

This seems to be a power law distribution, but with a lot of outliers above 7 (corresponding to around 1100) – and with an anomaly just short of 8 (corresponding to 2374 to be exact). This is quite likely the template License.

gerrit[gerrit$V1 == 2374,]
V1 V2
101 2374 Unless
102 2374 Licensed
103 2374 LICENSE
104 2374 law
105 2374 governing
106 2374 express
107 2374 CONDITIONS
108 2374 compliance
109 2374 BASIS
110 2374 agreed

Plotting the more conformant data

k <- f[f$Var1 <1100,]
plot(log(k), type=”l”, xlab=”log(frequency)”, ylab=”log(count)”, main =”Gerrit source code tokens\nfrequency < 1100\nlog-log plot”)
abline(glm(log(k$Freq ) ~ log(k$Var1)), col=”red”)

glm(log(k$Freq ) ~ log(k$Var1))

Call: glm(formula = log(k$Freq) ~ log(k$Var1))

Coefficients:
(Intercept) log(k$Var1)
8.554 -1.372

Degrees of Freedom: 495 Total (i.e. Null); 494 Residual
Null Deviance: 1299
Residual Deviance: 152.9 AIC: 829.7

exp(8.554/1.372)
[1] 510.1444

gerrit loglog < 1100

So, we should likely look at values with the frequency in this area to get a better suggestion for what the code base is used for.

gerrit[gerrit$V1 < 600 & gerrit$V1 >= 500,]
V1 V2
240 598 code
241 597 url
242 592 rw
243 590 values
244 589 label
245 581 plugin
246 580 v
247 563 ctx
248 561 Result
249 558 Util
250 550 UUID
251 544 2013
252 541 bind
253 538 cb
254 533 IdentifiedUser
255 532 err
256 531 u
257 530 o
258 528 substring
259 526 master
260 525 Repository
261 522 CurrentUser
262 522 as
263 521 res
264 520 dom
265 517 assertEquals
266 516 token
267 508 start
268 508 RESOURCES
269 508 interface
270 507 lang
271 506 servlet
272 500 Object

This has a better match with the core of the project, though we still see comment debris, e.g. “2013”

Gerrit can be found at https://gerrit.googlesource.com/gerrit/ – I was looking at the codebase from 02bafe0f4c51aa24b2b05d4d1309ecfc828762c0 (January 20th, 2016)

Independence check

With the previous information – and the notion of a vector representation – I thought about the possibility to check for independence.

If two vectors are independent, then they should be orthogonal. If two code bases are independent, then they should be orthogonal in their domain vectors. To test this, we can try to plot the words used in the code bases. Naturally, we would need to strip away the language keywords, but as we will see, this is not quite as necessary as expected. We can even gain other insights by looking at the keyword uses.

So, as above, I created word frequence files for two JavaScript projects.

p1 <- read.table(“p1-wordfreq.txt”, header=F)
p2 <- read.table(“p2-wordfreq.txt”, header=F)

We don’t really want the exact count, so we pick the relative frequencies

p1$V1 <- p1$V1/max(p1$V1)
p2$V1 <- p2$V1/max(p2$V1)

Now, we only want to look at the tokens they have in common to see whether or not they are orthogonal – the tokens not common are already orthogonal.

common <- merge(p1, p2, by = “V2”)

plot(common$V1.x, common$V1.y, xlab=”p1″, ylab=”p2″, main=”Comparing p1 and p2″)

comparing JavaScript projects p1 and p2

Next, we want to identify the JavaScript keywords.

js <- read.table(“JavaScriptKeywords.txt”, header=F)
names(js) <- “V2″ # js is a single column, we want to merge on the keywords in the same column names
js2 <- merge(js, common, by=”V2″)
points(js2$V1.x, js2$V1.y, pch=19, col=”red”)

# mark the 20% in both directions, thus we get a Pareto segmentation
abline(h=.2, col=”blue”)
abline(v=.2, col=”blue”)

high <- common[common$V1.x > .2 & common$V1.y > .2,]

The most frequently used non-keywords:

high[-(match(intersect(high$V2, js2$V2), high$V2)),]
V2 V1.x V1.y
34 data 0.4170306 0.2444444
49 err 0.5545852 0.4555556
50 error 0.3013100 0.8000000
115 censored 0.6812227 0.6888889
131 settings 0.2052402 0.2111111

The second to last in this list has been censored, it does provide an indication that the projects aren’t quite independent. The error, err, and data are so common and nondescript that it is somewhat okay to find them in this area, though I’d rather have less callback functions and better names in general.

The most frequently used keywords:

high[(match(intersect(high$V2, js2$V2), high$V2)),]
V2 V1.x V1.y
47 else 0.3449782 0.4444444
65 function 1.0000000 0.8000000
72 if 0.4716157 0.5000000
154 var 1.0000000 0.6444444

Again this can be explained by a lot of callbacks, which are often on the form:

function(err, data) {
if(err){
} else {
}
}

Another explanation could be lots of anonymous functions, though usually callback.

Conclusion

Removing comments and imports should provide for a better picture of the code base. Even so, it seems to not exactly scream the domain or architecture.

Bi-grams could be another improvement.

Independence check of supposedly independent projects may reveal that they aren’t or that the code is skewed towards an unwanted design.

It is far from perfect, but as always it brings a different way of looking at the code base, and it is relatively quick to do.

Comparing large code bases somewhat defeats the purpose as regression to the mean tells nothing much of interest. Taking Gerrit as an example, then the most used token is “import”, which is used 27560 times and as we saw above, the interesting parts reveal themselves around 1100 uses, which is less than 4%.

comparing gerrit to dotCMScomparing gerrit to dotCMS (loglog)

Comparing Gerrit and an old repo I had of dotCMS, we find that the most used keywords including entities in java.lang are:

import
String
public
return
if
new
this
null
private
void
static

Which could indicate a lot of String constants and conditional logic (with return statements instead of else clauses), and with a possibility of Primitive Obsession – well, the web does call for a lot of String use.

How many bugs are left?

torsdag, januar 7th, 2016

After reading How many bugs are left? I was intrigued by the use of the Lincoln Index to estimate the number of bugs residing in a solution. But after reading the blog post I was bit baffled that the conclusion didn’t pick up on what was really reflected in the data.

In the blog post there are 2 examples concerning 2 QAs, A and B, finding 20 and 30 bugs respectively in the each case. The real difference is the overlap.

In the first example there is only 1 bug in the overlap, and the Lincoln Index is then 20*30/1 = 600 – in total 49 bugs found

In the second example there are 18 bugs in the overlap, making the Lincoln Index 20*30/18 = 33.3 – in total 32 bugs found

The probability that a QA finds a bug is then:

QA A QA B Total
Example 1 20/600  = .03 30/600  = .05 49/600  = .08
Example 2 20/33.3 = .6 30/33.3 = .9 32/33.3 = .96

While this is an example of the method it tells me something not mentioned in the blog post: The bugs in Example 2 must have been extremely obvious making it questionable whether the trials are independent.

Another thing, while it may seem like overkill to have 2 QAs in the 2nd example, it seems too little to be worth the effort in the first example, but we really should have 3 QAs in both cases.

There is nothing indicating the size of the example solutions – which is part why the example is good, and part why I was a bit skeptical at first. There is no right answer for the examples, but if the Lincoln Index are to be considered sufficient estimates on the number of bugs in the systems, then what should we do?

Starting with Example 2 we have found almost all the bugs, and hopefully the fixes will not introduce new ones. There is a good probability that the remaining bugs will be fixed when the code base is fixed – after all 33.3 bugs in a code base is not a lot (depending on the size of the code base itself naturally).

Examining Example 1 we have a different problem. We have discovered approximately 1/12th of the bugs, and we have an estimated 600 bugs in the system. It would seem that we are in dire need for some sort of assistance. Possibly rework of the system as well.

Code base size estimates

Yes – I know – “Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs” (Bill Gates), but the bugs has to come from somewhere, and being somewhat consistent in our styles the number of lines do pose as a quantifiable metric.

According to Dan Mayer (bugs per line of code ratio) referencing Steve McConnell, then we have different ratios of bugs per 1000 lines of code (bugs/kloc): 3, 10-20, 15-50 bugs/kloc

Apart from the obvious 600/33.3 = 18 factor in number of bugs between the examples, which may be as simple as 18 times as much code, there are alternative explanations for the number.

Example 1
600 bugs at  3 bugs/kloc =  200,000 lines

600 bugs at 50 bugs/kloc =   12,000 lines
Example 2
33.3 bugs at  3 bugs/kloc =  11,111 lines

33.3 bugs at 50 bugs/kloc =     666 lines

That is, if Example 1 is 200 kloc with 3 bugs/kloc, and Example 2 is 666 lines with 50 bugs/kloc, then Example 1 is 300 times the lines, but only 18 times the bugs – in which case it is a rather small amount of bugs even at 600. Example 2 though should really clean up the mess.

If it is the opposite, that is Example 1 12,000 lines at 50 bug/kloc, and Example 2 11,111 lines at 3 bug/kloc, then the number of lines are almost the same, yet the number of bugs is 18 times higher. In this case Example 1 is truly in dire needs of some help.

Alternative Analysis

These speculations are really afterthoughts on the blog’s content. My real beef was with the Lincoln Index itself – it degenerates at a 0 overlap, basically saying that if two observers examine the same area, they must find some of the same elements. That is a natural assumption if observers are stringent and actually look at the same things. Seeing some of the Escape Room issues where contestants overlook the obvious it would seem that for a software solution there would be several opportunities for QAs to overlook something the developers already overlooked.

While there are some suggestions on improving the Lincoln Index in case the overlap is less than 10, e.g. Bailey (1952) suggesting N = A*(B+1)/(C+1), which would lead Example 1 to 310 bugs instead of the 600. My idea was to turn to the German Tank Problem and estimate the number of bugs from the Bayesian credibility score.

By applying our own serial number system to the bugs (tracking ID) we aren’t really playing into the correct scenario, but bear with me. The maximum serial number we see is thus the total number of unique bugs found. We only have 2 observations – one from each QA.

Only having 2 observations mean the mean, µ, is infinite. We should have at least 4 observations to come up with a mean and standard deviation.

We can still try to make a credible guess. Given at least 2 observations, the credibility that the number of bugs is equal to n, is:

0 if n < m
k-1/k * C(m-1,k-1)/C(n,k-1) if n >= m

m = number of distinct bugs found in the k observations

As k is 2 in our case, the formula simplifies into:

0 if n < m
(m-1)/(n*(n-1)) if n >= m

The credibility that we have more than n bugs is:

1 if n < m
C(m-1, k-1)/C(n, k-1) if n >= m

Again with k = 2 this simplifies into:

1 if n < m
(m-1)/n if n >= m

This latter formula means that if we want to be 95% confident in the number of bugs, n, then 5% risk that N > n: .05 = (m-1)/n <=> n = (m-1)/0.05 = 20*(m-1)

Running the examples under the German Tank Problem setting we get:

Example 1: A = 20, B = 30, C = 1, m = A+B-C = 49

Number of bugs at 95% confidence: 20*(49-1) = 960

pA    = 20/960 = 0.02

pB    = 30/960 = 0.03

total = 49/960 = 0.05

Example 2: A = 20, B = 30, C = 18, m = A+B-C = 32

Number of bugs at 95% confidence: 20*(32-1) = 620

pA    = 20/620 = 0.03

pB    = 30/620 = 0.05

total = 32/620 = 0.05

We see that we have a lot more bugs than our previous estimates, but the QAs probability of finding bugs are almost the same (below 5%) for both examples, and we have an estimated 5% of the total amount of bugs.

credibility-of-total-number-of-bugs

Looking at the accumulated credibility score, we can see that it grows rapidly, then slows down, perhaps an 80% confidence is sufficient. In this case .2 = (m-1)/n <=> n = 5*(m-1), this is a quarter of the 95% confidence numbers.

Example 1: A = 20, B = 30, C = 1, m = A+B-C = 49

Number of bugs at 80% confidence: 5*(49-1) = 240

pA    = 20/240 = 0.08

pB    = 30/240 = 0.13

total = 49/240 = 0.20

Example 2: A = 20, B = 30, C = 18, m = A+B-C = 32

Number of bugs at 80% confidence: 5*(32-1) = 155

pA    = 20/155 = 0.13

pB    = 30/155 = 0.19

total = 32/155 = 0.20

This is certainly better for Example 1 both with regards to the 95% confidence, but also with regards to the Lincoln Index – even the improved estimate.

Conclusion

I didn’t know about the Lincoln Index, so I learned something new today – that is always good. The original application to estimate the number of bugs in total seems good, at least better than disregarding data from the trenches.

John D. Cook suggests calibrating through experiments. This blog post has been a thought experiment on some of the deliveries presented by the data and an unrealistic application of the German Tank Problem – the odds of getting the “tanks” in sequence diminishes quickly, thus improvements can be applied to the m estimate.

Cutting the confidence level from 95% to 80% may seem drastic – and it is – as it cuts 75% off of the number of expected bugs, but for thought experiments it may be good enough.

QAs are valuable, and there is value in having several (at least 2, but 4 is better) to test a product.

Resources:

http://leankit.com/blog/2015/12/how-many-bugs-are-left-the-software-qa-puzzle/
https://en.wikipedia.org/wiki/German_tank_problem
https://en.wikipedia.org/wiki/Lincoln_index
http://c2.com/cgi/wiki?LinesOfCode
http://www.mayerdan.com/ruby/2012/11/11/bugs-per-line-of-code-ratio/

Learning Java, Programming, TDD, Clean Code – which order?

torsdag, november 26th, 2015

Recently, Marcus Biel asked me to review and comment on his “Free Java Clean Code Beginner Course”.

I’m quite flattered that anyone would ask my opinion, so naturally I gave him some feedback. I think the concept of Marcus’ project is valuable, especially considering the large community (9-10 million) of Java programmers, and the number of would be programmers, and the current state of the quality we – as a profession – provide. Just take a look at some of the questions asked on LinkedIn’s Java Developers group.

One of the key hurdles, I think, is that Marcus wants it all: Teach Java, Programming, OOP, TDD, and Clean Code. While these are all good things to know, I find it quite a lot all at once. That said, what should be left out in the beginning? How should you structure learning programming? The easiest way is to use imperative style – but that is hardly the “right” for Java. Starting out with too much OOP will also lead to highly coupled classes.

If you simply teach Java and programming, you’re bound to fail at only good OOP and Clean Code practices because Java sometimes enforces you to do things in a bad way.

TDD is having its own troubles – as debated by DHH, Fowler and Beck in “Is TDD Dead?”

Rich Hickey compares TDD to “driving a car around banging into the guard rails”, and Donald Knuth says something along the lines of tests are good for probing and figuring out an otherwise unknown domain. This blog has links to both.

Ward Cunningham created Fit, which Uncle Bob built Fitnesseon top of, so I believe that they are quite happy with repeatable testing. Uncle Bob at least writes about it in Clean Code

Edsger Dijkstra said: “Program testing can be used to show the presence of bugs, but never to show their absence!” – but then he was likely into proving correctness using Hoare triplets – the pre and post condition proofs.

In “Working Effectively with Legacy Code”, Michael Feathers says that legacy code is code without tests, and that tests makes it safe to refactor code.

I really like Hickey’s notion. The tests only shows the code in the exercises the tester had in mind. If the tester is the developer, then it is likely a proof of concept rather than an attempt to disprove working software,

I also really like Feathers’ concept – it’s really nice to have exercises for a section of code making sure that a change in the section will not misbehave, when swapped out with an equivalent. At least it is nice to have tests for the modules you depend upon, to be able to check than an upgrade does not cause any bad things. Basically, we use what Dijkstra said – making sure that we are not introducing previously known bugs again.

Knowing programmers, we’re likely to not be modest nor follow the scientific method: Observe, Think, Hypothesize, Create testable predictions, Test, Refine, General theory, nor Deming’s circle: Observe, Plan, Do, Check, Act. It is often more: Hack, Observe, Repeat – using a Waterfall approach it is sometimes more like: hack, hack, hack, observe, wtf!, repeat.

Dijkstra, Hickey, and Knuth seem to have their own disciplined framework in place, and TDD is a formal way trying to introduce discipline to the masses, though often being misunderstood, and due to our bias for confirming our beliefs (“Don’t believe everything you think” by Thomas Kida) we make poor tests more often than good tests. Sometimes we even make tests just to get a high test coverage, because someone, somewhere heard that this was a good metric.

Can you learn Clean Code without knowing programming? I don’t think so, and quite likely, then Clean Code should be left after Patterns – which isn’t currently part of Marcus’ course.

Should you learn Clean Code before creating your own ugly mess? Would you know the difference if taught from day one?

How to refactor a refactored switch/case statement

torsdag, november 26th, 2015

When good intentions go slightly wrong

For some odd reason I picked up a link to DZONE on “How to refactor a switch/case statement” – the link https://dzone.com/articles/how-to-refactor-a-switchcase-statement is now defunct, I’m not sure why. Anyway, Gianluca Tomasino, the original author still has the article on his blog.

So I read through this – I know I dislike switch/case jump tables, though not as much as I hate if-else-if – or as I like to reminisce Sid Meier’s Pirates! and call it the “evil El Sif”

Gianluca is quite right, that one option would be to use the Strategy pattern, but then goes on to show how not to implement this pattern by adding a method for each of the enums, then tie a specific implementation inside the enum ending up with a less readable and less maintainable code.

The enum part is right – eliminate the magic strings, define the different types.

The strategy interface definition is wrong – the name “HasStrategies” does not convey any useful information. The 2 methods bind concrete enums to an interface, 1 abstract method, e.g. ‘execute’ should be sufficient. Then the specific strategy is pushed inside the enums themselves. Enums should not care for whichever strategies you have for them, thus that sort of coupling is not wanted.

In the Decider class, we now define the specific strategy to use, which sort of defies the purpose of extracting the code from a switch – the specific class will now have 2 reasons for change:

  1. Change to the strategy
  2. Change to the enum definitions

“A class should have one, and only one reason to change.” That is the intent of the Single Responsibility Principle

If we add another value to the enums, then we need to change the Decider implementation as well, that is contrary to the Open Close Principle. From the looks of it, we have to change the enums (well, that’s a given), the strategy, and the decider implementation.

What I’d recommend:

Define the strategy interface using only one method

interface Strategy {
    String execute();
}

Simply define the values

enum Values {
    PIPPO, PLUTO;
}

Implement the strategies for each of the values, and add them to an EnumMap

class ValueStrategies {
    final static EnumMap<Values, Strategy> MAP = 
             new EnumMap<Values, Strategy>(Values.class);
    static {
        MAP.put(Values.PIPPO, new Strategy() {
            @Override
            public String execute() {
                return "methodA";
            }
        });
        MAP.put(Values.PLUTO, new Strategy() {
            @Override
            public String execute() {
                return "methodB";
            }
        });
    }
    static Strategy get(Values value) {
        return MAP.get(value);
    }
}

Implement the decider using these elements:

public class AltDecider implements Decider {

    @Override
    public String call(String which) {
        Values value = Values.valueOf(which.toUpperCase());
        return ValueStrategies.get(value).execute();
    }

}

Well, the mapping from a primitive to the enum should not take place inside the method, the Decider interface should be modified to fix such hacks, if the String, which, is null or does not represent a Value, then a NullPointerException and IllegalArgumentException respectively will be thrown from the Value conversion.

The names are still not meaningful.

With this solution a new enum value will require a change to Values and the implementation for its strategy inside the ValueStrategies.

If re-use of the strategy implementations were of concern, then naturally they should be implemented in their own classes and not as anonymous values inside the map.

The Nature of Software Development

søndag, september 21st, 2014

Great book on a difficult topic: What software development should be like for the best possible outcome for both developers, company, and customers.

When you have tasted freedom, and revered the respect that you probably know what you’re doing. When you have tried the true aspects of autonomous teams it is the more detrimental to have to work with a micromanaging and secretive project management hierarchy.

But what are the odds that such companies will actually read, understand, and implement the insights in the book?

Hopefully this book will instill trust in the rigid companies insisting that waterfall process with a single release is the only way. In that sense software remains invisible as the clothes in The Emperor’s New Clothes or The Emperor’s New Software.


ISBN: 978-1-94122-237-9

This is in regards to the Beta 1.0 release of the e-book  release on 2014-09-08 (in ISO 8601 format)

The odds of getting it right

lørdag, august 30th, 2014

While it is easy to point out when people are getting things wrong – IMHO or in your opinion – it may serve a greater purpose to examine why things may go so utterly wrong as they often do, especially when we’re speaking about software development.

Software development is mostly about communication. Whether it is communicating with a programmer to make what you want, or it is telling a project manager to get them to tell a programmer what you want – it is in any case a matter of communicating vision to understanding.

So let us try to map out the different possibilities when facing a decision – or what may seem clear to you, but isn’t for at least one of the links in the development chain.

binary tree

binary tree

I have chosen a binary tree to depict the decision “right” or “wrong”. While normal interpretation of such a tree is that it is a 50/50 split, let us not make such a hasty assumption – at least we – as developers – should be better than a 50% guess at understanding customer requirements.

In the binary tree above there are only 4 decisions  which has to be right. If we simplify the model to have a fixed probability, p, that we make the right decision, we can use Bernoulli’s binomial distribution to determine the odds of making s successes in as many trials. In this case the binomial distribution deteriorates into a simple power function, ps.

Given either p or s we can calculate the other if we want at least a 50% chance of ending with a right solution.

Let us try that with a 6-sigma probability – p = 0.9999966.

s log p = log (.50) <=> s = log(.50) / log(p)

s = log(.50) / log(.9999966) ~  203867

That is, if we have an almost unheard of quality for understanding customer communication, then at a bit more than 200,000 decisions, the solution has a 50/50 chance of hitting the anticipated solution.

If we want to be 90% sure, then we cannot make more than 30988 decisions with 6-sigma understanding.

So, let us try the other way around – we would like to know with a sufficiently high confidence that our project meets our expectations, let us say 90% sure. We have identified 10,000 key decisions. How good must the communication then be?

s log(p) = log(x) <=> p = exp(log(x)/s)

p = exp(log(.90)/10000)  = 0.999989

Which means we need 6-sigma communication to achieve this goal.

On top of all this, then the calculations are assuming that the customer knows and communicates exactly what he or she wants, and that all decision points are uncovered and communicated at the same high level.

The only immediate sane solution to improving the odds is to reduce the scope drastically. It may sound silly, but more having to fulfill 10 things right becomes daunting for most of us. In the binary tree above, we would need to have 2 (10+1) -1= 2047 nodes – the sheer size of such a tree should be sufficient to deter anyone wanting more than 10 decisions.

Reduce scope. Improve communication by shortening the feedback loop.

Naturally, we could reduce scope right down until a single decision – but that would quickly throw us off balance, as a single point makes it impossible to determine direction.

Bad Teaching

søndag, juni 29th, 2014

Someone is wrong on the internet!

Someone is wrong on the internet
(Visit http://xkcd.com/ for a lot more of similar stuff)

Making too many mistakes while trying to teach a concept is worse than not teaching at all.

Don’t get me wrong – I admire the people making an effort to teach and
especially when it is about how to program. But just as enthusiastic I am about those who can, I am frustrated and angry with those who really can’t. Unfortunately there are plenty of the those who can’t who try – that is most likely due to the Dunning-Kruger effect

For some odd reason I stumbled upon one of these bad teaching resources, http://javapostsforlearning.blogspot.in/2014/06/method-overriding-in-java.html.

I became so furious with the content. A teacher should know better. A teacher should do better.

In an effort to teach inheritance of Object Oriented Programming, specifically for Java, the author takes a simple example and by doing so violates principles of design and good practices.

Concrete vs. Abstract

One of the benefits of inheritance is the ability to use the interface or in case of a missing interface then the super class to abstract the
hierarchy.

In the code, MethodOverridingMain lines 9-12 the declared objects (left hand side) should be Employee.


package org.arpit.javapostsforlearning;

public class MethodOverridingMain {

    public static void main(String[] args) {
        Employee d1 = new Developer(1, "Arpit", 20000);
        Employee d2 = new Developer(2, "John", 15000);
        Employee m1 = new Manager(1, "Amit", 30000);
        Employee m2 = new Manager(2, "Ashwin", 50000);

        System.out.println("Name of Employee:" + d1.getEmployeeName() + "---"
                + "Salary:" + d1.getSalary());
        System.out.println("Name of Employee:" + d2.getEmployeeName() + "---"
                + "Salary:" + d2.getSalary());
        System.out.println("Name of Employee:" + m1.getEmployeeName() + "---"
                + "Salary:" + m1.getSalary());
        System.out.println("Name of Employee:" + m2.getEmployeeName() + "---"
                + "Salary:" + m2.getSalary());
    }
}

This is particular useful as the subclasses don’t provide additional methods. Now every employee can be thought of as an Employee.

Violation of Liskov Substitution Principle

When working with hierarchies – which is a natural part of inheritance – then it is important to adhere to best practices such as Liskov Substitution Principle (LSP), which states that if a program module is using a Base class, then the reference to the Base class can be replaced with a Derived class without affecting the functionality of the program module.

Why is this important? It allows any developers using your source code as a library to reduce the cognitive load to only be concerned with the base class, which is another reason why you should program to an interface and not a concrete implementation (Interface Segregation Principle).

The violation is in the getSalary method of Manager and Developer. For the base class employee what you set is what you get, not so for the others.

Let us say that we have a policy of dividing the surplus every month with equal shares to every employee. The code to set the new salary for the employees would look something like this:


    public static void divideSurplus(double surplus, List<Employee> employees) {
        if (employees != null && employees.size() > 0) {
            double share = surplus / employees.size();
            for (Employee employee : employees) {
                employee.setSalary(employee.getSalary() + share);
            }
        }
    }

Yes, this is ugly mutating code but let us not be concerned with this yet.

If every employee were created as Employee this would work, that is, if version 1 of the library only had Employee, then this would have been the implementation to do the work.

When employees are created as Developer and Manager as well as Employee the code doesn’t break, but the business logic does. You end up paying more than you have made. This is an extremely ugly side effect of not adhering to LSP.


import java.util.ArrayList;
import java.util.List;

import org.arpit.javapostsforlearning.Developer;
import org.arpit.javapostsforlearning.Employee;
import org.arpit.javapostsforlearning.Manager;

public class SurplusDivision {

    public static void divideSurplus(double surplus, List<Employee> employees) {
        if (employees != null && employees.size() > 0) {
            double share = surplus / employees.size();
            for (Employee employee : employees) {
                employee.setSalary(employee.getSalary() + share);
            }
        }
    }

    // cannot be used if salaries are 0
    public static void divideSurplus2(double surplus, List<Employee> employees) {
        if (employees != null && employees.size() > 0) {
            double share = surplus / totalSalaries(employees);
            for (Employee employee : employees) {
                employee.setSalary(employee.getSalary()*(1 + share));
            }
        }
    }

    public static double totalSalaries(List<Employee> employees) {
        double total = 0;
        for (Employee employee : employees) {
            total += employee.getSalary();
        }
        return total;
    }

    public static double claculateSalary(Employee employee) {
        return employee.getSalary();
    }

    public static void main(String[] args) {
        double revenue = 90000.0;
        List<Employee> employees = new ArrayList<>();
        employees.add(new Employee(1, "name1", 10000.0));
        employees.add(new Employee(2, "name2", 20000.0));
        employees.add(new Employee(3, "name3", 30000.0));
        double surplus = revenue - totalSalaries(employees); 

        divideSurplus(surplus, employees);
        System.out.println(totalSalaries(employees)); // prints 90000.0

        employees = new ArrayList<>();
        employees.add(new Employee (1, "name1", 10000.0));
        employees.add(new Developer(2, "name2", 20000.0));
        employees.add(new Manager  (3, "name3", 30000.0));
        surplus = revenue - totalSalaries(employees); 

        divideSurplus(surplus, employees);
        System.out.println(totalSalaries(employees)); // prints 101600.0

        divideSurplus(0, employees);
        System.out.println(totalSalaries(employees)); // prints 115226.66666666666

        // surplus 2
        employees = new ArrayList<>();
        employees.add(new Employee(1, "name1", 10000.0));
        employees.add(new Employee(2, "name2", 20000.0));
        employees.add(new Employee(3, "name3", 30000.0));
        surplus = revenue - totalSalaries(employees); 

        divideSurplus2(surplus, employees);
        System.out.println(totalSalaries(employees)); // prints 90000.0

        employees = new ArrayList<>();
        employees.add(new Employee (1, "name1", 10000.0));
        employees.add(new Developer(2, "name2", 20000.0));
        employees.add(new Manager  (3, "name3", 30000.0));
        surplus = revenue - totalSalaries(employees);

        divideSurplus2(surplus, employees);
        System.out.println(totalSalaries(employees)); // prints 102441.17647058822

        divideSurplus2(0, employees);
        System.out.println(totalSalaries(employees)); // prints 117079.41176470587

    }

}

The main method should print 90000.0 in every case, but it doesn’t.

Not only has the hierarchy broken the business case – it has also made it quite impossible to create correct.

Double the money

This is a widespread mistake – having a decimal point in the string representation of the number does not make it a viable currency indicator. Please go and read What Every Computer Scientist Should Know About Floating-Point Arithmetic

It is incredible that people over and over again seem to think that infinitely many elements can be stored in a finite machine.

Public Fields are Bad

Well, there are no public fields in the code shown, they are class protected fields. While that is technically true, the fact is that getters and setters galore is basically no better. Having this bean structure or promiscuous objects makes it easy to implement in imperative style, and extremely hard to preserve the codes maintainability because you have violated the core tenet of Object Oriented Programming: Encapsulation. Read Why getter and setter methods are evil for more on this.

How often is it required to set the Employee name, Id, or salary?

Hardcoded Values

The BONUSPERCENT constant for both Manager and Developer are hardcoded, that is if one manager is allowed a different bonus percentage then it is not possible. If every developer needs a different bonus, then the code needs to be recompiled.

Unclear or Misleading names

BONUSPERCENT is in the decimal representation, that is 0.2 = 20%.

Bad Design

While I do understand the notion that we seemingly have a hierarchy because a Developer is an Employee and a Manager is an Employee, there really is no reason for this. They are only different – in the provided example – through title, which apparently isn’t a part of the object, and how their salaries are calculated. If an existing employee becomes a manager or developer, we cannot shift their roles, but must create a new instance of the matching object – something there isn’t support for in the code provided.

So if the Employee had a Role associated, then something else could calculate the salary to be paid based upon the role. Naturally this wouldn’t help with explaining code inheritance, and it probably wouldn’t help with the surplus division.

The Emperor’s New Software

lørdag, juni 22nd, 2013

Many years ago there lived an Emperor. He was so fond of new clothes that he spent all his time and all his money in order to be well dressed.

Apparently, today we have a lot of ’emperors’ so fond of new software that they spend all their time and all their money in pursuit of software solutions.

Visitor arrived every day at court and one day there came two men who called themselves weavers, but they were in fact clever robbers.

They pretended that they knew how to weave cloth of the most beautiful colors and magnificent patterns. Moreover, they said, the clothes woven from this magic cloth could not be seen by anyone who was unfit for the office he held or who was very stupid.

The Emperor thought: “If I had a suit made of this magic cloth, I could find out at once what men in my kingdom are not good enough for the positions they hold, and I should be able to tell who are wise and who are foolish. This stuff must be woven for me immediately.”

And he ordered large sums of money to be given to both the weavers in order that they might begin their work at once.

The hype curve of the potential usages of the product is as clear today in software as they are in this story.

The Emperor sends his old minister to check up on the weavers’ progress. The minister can’t see any product, but will not attest to the possibility that he is unfit for his job or very stupid, thus he expresses the wonders of the cloth.

The story repeats itself with other officials all claiming to see the wonderful product. Finally the Emperor is presented with the ‘cloth’ – and he too is too proud to admit that there is nothing there.

Getting dressed up in the makebelieve clothes, the Emperor starts off on a procession throughout the fair city. Everyone passed speak wonders of the cloth until a little child says: “But he hasn’t anything on.” Resounding throughout the crowd.

I’m not saying that software developers are swindlers – far from it, though there are less than adequate developers for some tasks. What I am trying to say is that people would rather try to keep up a facade of understanding than ask questions.

As Groucho Marx said: ‘It is better to remain silent and be thought a fool, than to open your mouth and remove all doubt.’

Silence is golden – unfortunately it is the price of silence, not the reward.

Software is not incomprehensible magic. If the solution you get is nothing like the solution you wanted, then most likely there have been communication issues.

If there is no executive support, no user involvement in the process, then you – the customer – will suffer. Whether this is due to failed projects, cumbersome work processes, or brittle solutions, you are partly responsible.

While H.C. Andersen might have had other reasons for writing The Emperor’s New Clothes, my parallel is the IT-illiterate decision makers out there. I’m not saying that everyone must speak IT, I’m saying that you should know your limits, and if you don’t know stuff you have to do, you should ally yourself with someone who can bridge the gap. But you should not be any less engaged in the production.

If you order a steak, medium-rare, at a restaurant, you would complain if you get a boiled steak, a well-done or a bleu steak. And rightly so. In software it seems you would not complain, just assume that you misunderstood the term, and the production facility – the kitchen and the waiter – performed their magic par excellence.

But this is just when it doesn’t go too bad. Quite often the parallel would be you ordering lemon sole (the fish), and getting the sole of a boot with a lemon on top. Paying the restaurant for their services, leaving the establishment still hungry, and returning the next day for another order of misconceptions.

Example: USAF wasting $1 billion on failed ERP project

“The Emperor’s New Clothes’ is a short tale by Hans Christian Andersen