Archive for the ‘programming’ Category

The architects grievance with MVC

tirsdag, juli 19th, 2016

MVC is a separation of concerns pattern in which you have a Model, a View, and a Controller for a given entity. The Model should contain the data and methods for the entity while the view – or views – are responsible for visual representation of the model, e.g. a pie chart or a bar chart of data. The controller is responsible for the providing the model with command requests.

Often, when developers are trying to follow the MVC pattern they follow the pattern as implemented by Rails; all the models go into the app/models directory, all the views reside in app/views, and all the controllers will be found inside the app/controllers directory.

This is comparable to designing a house and have a special room for all faucets, power outlets, and drains, and another room for all the handles and switches.

The faucet you would usually find in the kitchen will now be labelled “kitchen” but reside in the faucet room, and will likely sit next to the faucet labelled “bathroom”.

You could run a hose from the faucet to the kitchen, but that would only save some trouble. The handle for turning on and off the water resides in the controller room, you have the “kitchen faucet” controller. Next to these you may have the power on/off switches for the oven.

This construct is quite easy for the installers to set up, for the software equivalent this is easy on the framework.

But we are not building houses to please the work crew, but rather for the ease of living. We should focus upon the user experience as well, when we write code.

What we are achieving by this model is a high cohesion in unrelated entities performing the same role, which is contrary to what Larry Constantine suggested in 1968. Teasing apart the application for reuse is much more difficult – we cannot easily swap out one kitchen for a different one.

The better structure would be to have the strongly related entities in the same place, i.e. instead of:

models/
  kitchen
  bathroom

views/
  kitchen
  bathroom

controllers/
  kitchen
  bathroom

it would make sense to have:

kitchen/
  model
  view
  controller

bathroom/
  model
  view
  controller

At least this would easily identify the views associated with a specific model, and if otherwise keeping with modular discipline should make it possible to pull out one entity.

Logic in the controller

Sometimes you run across a project where there is (model) logic in the controller, but that is a bad idea. It should be possible to keep the controller and change the model implementation, e.g. my keyboard (controller) does not have to change because my application changes or the keyboard layout changes. The controller should send events to the provided model to be interpreted there.

If you have logic in the controller, then you will need to change both the controller and the model, when you make a change, that means you have one more element in the cognitive load, which makes things just a bit more complicated. Complication that does not have to exist.

It seems that by tooling we are building software that is easy for the frameworks and the original constructors, but not good for those who have to maintain or live with the product. That is simply not the right way to be service minded.

Snake oil and “everybody can program”

onsdag, juli 6th, 2016

Everyone can program, but not necessarily code – I fully agree with Quicy Larson: Coding isn’t easy.

I believe that all of us are capable of programming, that all of us can – in the Socrates Meno dialogue way – we have the ability to describe a set of procedures to apply in a given order.

We’re all capable of writing novels, but very few of those who do will make a successful novel.  That is to say: “It is not as easy as it looks or sounds.”

All of us can cook, though we’d be pressed to get a Michelin star.

Not all of us can write these procedures in a programming language, and – of those who can – not all should. And not all software should be written in a procedural/imperative style.

Some will throw together programming language correct grammar with no regards to the task being solved. I’m not sure this should constitute as programming. It is true that working software provides value to the user, but with little understanding of the needed solution in both behaviour and coding, there is so much more value to be had by doing it right (but there are so many more ways to do it wrong).

If they had worked the same level in the restaurant business, it is likely Gordon Ramsay would have called it a Kitchen Nightmare. As no one can see the internals of the code fewer customers turn away from the business, and the parallel fails in that we usually eat every day, we don’t get a new software product served every day.

In the business world scarcity with increased demand means prices will go up, leading to more resources being applied. In software development, this leads to people who really shouldn’t program are being hired to hack away at the next big thing.

As a society we are not better off having poorly constructed “cathedrals” forced upon us. If everytime we need to go through a door, we will have to use jump through hoops, we would be quick to remedy this odd contraption, but in the software world, there are usually no such way.

The pursuit for infinite savings allows for expenses now, but apparantly not investment in solid work, nor the hidden value by improving the software.

I am still wondering why there is so little regulation in a field so wide and with so far reaching consequences. Why do people accept snake oit?

Simple insights into source code bases

fredag, april 8th, 2016

Does the code base scream the domain?

I was wondering whether or not it would be possible to use parts of PageRank (https://en.wikipedia.org/wiki/PageRank) to gain insights into a code base. If PageRank works on web pages to ascertain what the contents of the page relates to, then likely a similar way could be construed for source code.

The simplest thing that could possibly work?
I chose the n-gram (https://en.wikipedia.org/wiki/N-gram) approach – unigram to be specific. While bi- and tri-grams are better for text, I’m not so sure for code bases, nevertheless, it could be tested.

The simple process

  • Find all files of a specific language inside the project structure. Likely it would be prudent to examine source and test code independently
  • Remove all for of new-lines
  • Tokenize on non alphanumeric entities
  • Build histogram of these tokens

Removing comments and possibly strings would likely be a good idea, but that would require parsing and not just bash.

find . -name “*.java” -type f | xargs cat | tr -d ‘\n’ | tr -d ‘\r’| tr -cs ‘[:alnum:]’ ‘\n’ | sort | uniq -c | sort -rn > wordfreq.txt

Looking at gerrit’s word frequency, we get something along these lines:

  • 27560 import
  • 25092 the
  • 21758 com
  • 21385 google
  • 16615 License
  • 14676 public
  • 14544 gerrit
  • 13553 String
  • 12309 final
  • 11823 return
  • 10431 private
  • 10191 new
  • 9809 if
  • 8940 this
  • 8196 0
  • 7665 in
  • 7225 void
  • 7163 under
  • 6809 a
  • 6590 null
  • 6389 server
  • 6234 client
  • 6185 static
  • 6125 for
  • 6024 2
  • 5965 org
  • 5953 to
  • 5384 or
  • 5212 class
  • 4972 may
  • 4963 Override
  • 4934 name
  • 4923 get
  • 4752 distributed
  • 4666 of
  • 4602 java
  • 4492 throws
  • 4392 n
  • 4164 is
  • 3705 e

Reading it “import the com google License public gerrit String final return private new if this 0 in void under a null server client static for 2 org to or class may Override name get distributed of java throws n is e” doesn’t quite make sense. Clearly the “License” and namespace “com.google” influences heavily.

Removing the keywords we get:
“the com google License gerrit 0 in under a server client 2 org to or may name get distributed of n is e”

It is not as if the source code really screams what gerrit is about. From Chinese Whisper reconstruction I get something about a “client server with name distribution” – not quite the “Gerrit provides web based code review and repository management for the Git version control system” tagline.

The frequency count drops rapidly – let’s pull the data into R to see if there are some patterns.

gerrit <- read.table(“wordfreq.txt”, header=F)
f <- as.data.frame(table(gerrit$V1))
f$Var1 <- as.numeric(as.character(f$Var1))
plot(log(f), type=”l”, xlab=”log(frequency)”, ylab=”log(count)”, main =”Gerrit source code tokens\nlog-log plot”)

gerrit loglog plot

This seems to be a power law distribution, but with a lot of outliers above 7 (corresponding to around 1100) – and with an anomaly just short of 8 (corresponding to 2374 to be exact). This is quite likely the template License.

gerrit[gerrit$V1 == 2374,]
V1 V2
101 2374 Unless
102 2374 Licensed
103 2374 LICENSE
104 2374 law
105 2374 governing
106 2374 express
107 2374 CONDITIONS
108 2374 compliance
109 2374 BASIS
110 2374 agreed

Plotting the more conformant data

k <- f[f$Var1 <1100,]
plot(log(k), type=”l”, xlab=”log(frequency)”, ylab=”log(count)”, main =”Gerrit source code tokens\nfrequency < 1100\nlog-log plot”)
abline(glm(log(k$Freq ) ~ log(k$Var1)), col=”red”)

glm(log(k$Freq ) ~ log(k$Var1))

Call: glm(formula = log(k$Freq) ~ log(k$Var1))

Coefficients:
(Intercept) log(k$Var1)
8.554 -1.372

Degrees of Freedom: 495 Total (i.e. Null); 494 Residual
Null Deviance: 1299
Residual Deviance: 152.9 AIC: 829.7

exp(8.554/1.372)
[1] 510.1444

gerrit loglog < 1100

So, we should likely look at values with the frequency in this area to get a better suggestion for what the code base is used for.

gerrit[gerrit$V1 < 600 & gerrit$V1 >= 500,]
V1 V2
240 598 code
241 597 url
242 592 rw
243 590 values
244 589 label
245 581 plugin
246 580 v
247 563 ctx
248 561 Result
249 558 Util
250 550 UUID
251 544 2013
252 541 bind
253 538 cb
254 533 IdentifiedUser
255 532 err
256 531 u
257 530 o
258 528 substring
259 526 master
260 525 Repository
261 522 CurrentUser
262 522 as
263 521 res
264 520 dom
265 517 assertEquals
266 516 token
267 508 start
268 508 RESOURCES
269 508 interface
270 507 lang
271 506 servlet
272 500 Object

This has a better match with the core of the project, though we still see comment debris, e.g. “2013”

Gerrit can be found at https://gerrit.googlesource.com/gerrit/ – I was looking at the codebase from 02bafe0f4c51aa24b2b05d4d1309ecfc828762c0 (January 20th, 2016)

Independence check

With the previous information – and the notion of a vector representation – I thought about the possibility to check for independence.

If two vectors are independent, then they should be orthogonal. If two code bases are independent, then they should be orthogonal in their domain vectors. To test this, we can try to plot the words used in the code bases. Naturally, we would need to strip away the language keywords, but as we will see, this is not quite as necessary as expected. We can even gain other insights by looking at the keyword uses.

So, as above, I created word frequence files for two JavaScript projects.

p1 <- read.table(“p1-wordfreq.txt”, header=F)
p2 <- read.table(“p2-wordfreq.txt”, header=F)

We don’t really want the exact count, so we pick the relative frequencies

p1$V1 <- p1$V1/max(p1$V1)
p2$V1 <- p2$V1/max(p2$V1)

Now, we only want to look at the tokens they have in common to see whether or not they are orthogonal – the tokens not common are already orthogonal.

common <- merge(p1, p2, by = “V2”)

plot(common$V1.x, common$V1.y, xlab=”p1″, ylab=”p2″, main=”Comparing p1 and p2″)

comparing JavaScript projects p1 and p2

Next, we want to identify the JavaScript keywords.

js <- read.table(“JavaScriptKeywords.txt”, header=F)
names(js) <- “V2″ # js is a single column, we want to merge on the keywords in the same column names
js2 <- merge(js, common, by=”V2″)
points(js2$V1.x, js2$V1.y, pch=19, col=”red”)

# mark the 20% in both directions, thus we get a Pareto segmentation
abline(h=.2, col=”blue”)
abline(v=.2, col=”blue”)

high <- common[common$V1.x > .2 & common$V1.y > .2,]

The most frequently used non-keywords:

high[-(match(intersect(high$V2, js2$V2), high$V2)),]
V2 V1.x V1.y
34 data 0.4170306 0.2444444
49 err 0.5545852 0.4555556
50 error 0.3013100 0.8000000
115 censored 0.6812227 0.6888889
131 settings 0.2052402 0.2111111

The second to last in this list has been censored, it does provide an indication that the projects aren’t quite independent. The error, err, and data are so common and nondescript that it is somewhat okay to find them in this area, though I’d rather have less callback functions and better names in general.

The most frequently used keywords:

high[(match(intersect(high$V2, js2$V2), high$V2)),]
V2 V1.x V1.y
47 else 0.3449782 0.4444444
65 function 1.0000000 0.8000000
72 if 0.4716157 0.5000000
154 var 1.0000000 0.6444444

Again this can be explained by a lot of callbacks, which are often on the form:

function(err, data) {
if(err){
} else {
}
}

Another explanation could be lots of anonymous functions, though usually callback.

Conclusion

Removing comments and imports should provide for a better picture of the code base. Even so, it seems to not exactly scream the domain or architecture.

Bi-grams could be another improvement.

Independence check of supposedly independent projects may reveal that they aren’t or that the code is skewed towards an unwanted design.

It is far from perfect, but as always it brings a different way of looking at the code base, and it is relatively quick to do.

Comparing large code bases somewhat defeats the purpose as regression to the mean tells nothing much of interest. Taking Gerrit as an example, then the most used token is “import”, which is used 27560 times and as we saw above, the interesting parts reveal themselves around 1100 uses, which is less than 4%.

comparing gerrit to dotCMScomparing gerrit to dotCMS (loglog)

Comparing Gerrit and an old repo I had of dotCMS, we find that the most used keywords including entities in java.lang are:

import
String
public
return
if
new
this
null
private
void
static

Which could indicate a lot of String constants and conditional logic (with return statements instead of else clauses), and with a possibility of Primitive Obsession – well, the web does call for a lot of String use.

Learning Java, Programming, TDD, Clean Code – which order?

torsdag, november 26th, 2015

Recently, Marcus Biel asked me to review and comment on his “Free Java Clean Code Beginner Course”.

I’m quite flattered that anyone would ask my opinion, so naturally I gave him some feedback. I think the concept of Marcus’ project is valuable, especially considering the large community (9-10 million) of Java programmers, and the number of would be programmers, and the current state of the quality we – as a profession – provide. Just take a look at some of the questions asked on LinkedIn’s Java Developers group.

One of the key hurdles, I think, is that Marcus wants it all: Teach Java, Programming, OOP, TDD, and Clean Code. While these are all good things to know, I find it quite a lot all at once. That said, what should be left out in the beginning? How should you structure learning programming? The easiest way is to use imperative style – but that is hardly the “right” for Java. Starting out with too much OOP will also lead to highly coupled classes.

If you simply teach Java and programming, you’re bound to fail at only good OOP and Clean Code practices because Java sometimes enforces you to do things in a bad way.

TDD is having its own troubles – as debated by DHH, Fowler and Beck in “Is TDD Dead?”

Rich Hickey compares TDD to “driving a car around banging into the guard rails”, and Donald Knuth says something along the lines of tests are good for probing and figuring out an otherwise unknown domain. This blog has links to both.

Ward Cunningham created Fit, which Uncle Bob built Fitnesseon top of, so I believe that they are quite happy with repeatable testing. Uncle Bob at least writes about it in Clean Code

Edsger Dijkstra said: “Program testing can be used to show the presence of bugs, but never to show their absence!” – but then he was likely into proving correctness using Hoare triplets – the pre and post condition proofs.

In “Working Effectively with Legacy Code”, Michael Feathers says that legacy code is code without tests, and that tests makes it safe to refactor code.

I really like Hickey’s notion. The tests only shows the code in the exercises the tester had in mind. If the tester is the developer, then it is likely a proof of concept rather than an attempt to disprove working software,

I also really like Feathers’ concept – it’s really nice to have exercises for a section of code making sure that a change in the section will not misbehave, when swapped out with an equivalent. At least it is nice to have tests for the modules you depend upon, to be able to check than an upgrade does not cause any bad things. Basically, we use what Dijkstra said – making sure that we are not introducing previously known bugs again.

Knowing programmers, we’re likely to not be modest nor follow the scientific method: Observe, Think, Hypothesize, Create testable predictions, Test, Refine, General theory, nor Deming’s circle: Observe, Plan, Do, Check, Act. It is often more: Hack, Observe, Repeat – using a Waterfall approach it is sometimes more like: hack, hack, hack, observe, wtf!, repeat.

Dijkstra, Hickey, and Knuth seem to have their own disciplined framework in place, and TDD is a formal way trying to introduce discipline to the masses, though often being misunderstood, and due to our bias for confirming our beliefs (“Don’t believe everything you think” by Thomas Kida) we make poor tests more often than good tests. Sometimes we even make tests just to get a high test coverage, because someone, somewhere heard that this was a good metric.

Can you learn Clean Code without knowing programming? I don’t think so, and quite likely, then Clean Code should be left after Patterns – which isn’t currently part of Marcus’ course.

Should you learn Clean Code before creating your own ugly mess? Would you know the difference if taught from day one?

How to refactor a refactored switch/case statement

torsdag, november 26th, 2015

When good intentions go slightly wrong

For some odd reason I picked up a link to DZONE on “How to refactor a switch/case statement” – the link https://dzone.com/articles/how-to-refactor-a-switchcase-statement is now defunct, I’m not sure why. Anyway, Gianluca Tomasino, the original author still has the article on his blog.

So I read through this – I know I dislike switch/case jump tables, though not as much as I hate if-else-if – or as I like to reminisce Sid Meier’s Pirates! and call it the “evil El Sif”

Gianluca is quite right, that one option would be to use the Strategy pattern, but then goes on to show how not to implement this pattern by adding a method for each of the enums, then tie a specific implementation inside the enum ending up with a less readable and less maintainable code.

The enum part is right – eliminate the magic strings, define the different types.

The strategy interface definition is wrong – the name “HasStrategies” does not convey any useful information. The 2 methods bind concrete enums to an interface, 1 abstract method, e.g. ‘execute’ should be sufficient. Then the specific strategy is pushed inside the enums themselves. Enums should not care for whichever strategies you have for them, thus that sort of coupling is not wanted.

In the Decider class, we now define the specific strategy to use, which sort of defies the purpose of extracting the code from a switch – the specific class will now have 2 reasons for change:

  1. Change to the strategy
  2. Change to the enum definitions

“A class should have one, and only one reason to change.” That is the intent of the Single Responsibility Principle

If we add another value to the enums, then we need to change the Decider implementation as well, that is contrary to the Open Close Principle. From the looks of it, we have to change the enums (well, that’s a given), the strategy, and the decider implementation.

What I’d recommend:

Define the strategy interface using only one method

interface Strategy {
    String execute();
}

Simply define the values

enum Values {
    PIPPO, PLUTO;
}

Implement the strategies for each of the values, and add them to an EnumMap

class ValueStrategies {
    final static EnumMap<Values, Strategy> MAP = 
             new EnumMap<Values, Strategy>(Values.class);
    static {
        MAP.put(Values.PIPPO, new Strategy() {
            @Override
            public String execute() {
                return "methodA";
            }
        });
        MAP.put(Values.PLUTO, new Strategy() {
            @Override
            public String execute() {
                return "methodB";
            }
        });
    }
    static Strategy get(Values value) {
        return MAP.get(value);
    }
}

Implement the decider using these elements:

public class AltDecider implements Decider {

    @Override
    public String call(String which) {
        Values value = Values.valueOf(which.toUpperCase());
        return ValueStrategies.get(value).execute();
    }

}

Well, the mapping from a primitive to the enum should not take place inside the method, the Decider interface should be modified to fix such hacks, if the String, which, is null or does not represent a Value, then a NullPointerException and IllegalArgumentException respectively will be thrown from the Value conversion.

The names are still not meaningful.

With this solution a new enum value will require a change to Values and the implementation for its strategy inside the ValueStrategies.

If re-use of the strategy implementations were of concern, then naturally they should be implemented in their own classes and not as anonymous values inside the map.

9 bad programming habits we secretly love

mandag, oktober 26th, 2015

Reading the article 9 bad programming habits we secretly love I’m appalled that apparently this is considered the norm.

While I have likely used all of these bad programming habits at one time or another, I’m pretty sure that these are mostly due to the fact that the programmer is a poor one. Thus writing the article and hoping to get cheered on, we’re apparently worshiping the mediocre or even poor workmanship.

Depending upon where I reside in the hierarchy you should then either follow my advice or that of the article.

The habits:

No.1: Using goto

Using line numbers, you were bound to run out of valid line identifiers, thus you had to insert one line in essence telling “the code is to be continued on line xxx”, which made reading the code in its entirety really difficult.

The article states that a goto in a case statement will produce something that’s simpler to understand than a more properly structured list of cascading if-then-else blocks. Well, swapping one bad idea for another really isn’t the way to argue. Chain of Responsibility would be the right abstraction in that case. The part that sometimes needs a goto is then a different method called by some of these handlers.

No. 2: Eschewing documentation

Documentation should state evergreen information on the entities. The intent of a class, function, field. In some rare cases it should provide sufficient insights as to why a specific approach is used over another.

The code may be changing – and you’d likely then be violating the Open Closed Principle, but that’s another story – rarely the intent of the code is changing at the same pace as the code itself.

The function names suggested: insertReservation and cancelReservationare really poor names. Quite likely there would be an argument to these functions in the sort of a reservation object, and you would end up with having code as:

insertReservation(Reservation reservation)
cancelReservation(Reservation reservation)

Which – when read out loud – really is stuttering and horrible. I prefer:

insert(Reservation reservation)
cancel(Reservation reservation)

No. 3: Jamming too much code on one line

Readability is at a premium, why would anyone write long lines of code? I know Java is basically a one dimensional source code language – hence the need for semicolons between statements. Why you cannot have a line wrapping string is then a bit odd, but that’s a different story.

Yes, minified JavaScript loads faster, but leave minifying to minifiers.

If you need to put things in a grouped environment, then either use functions or separate within functions with additional blank lines.

The code is not getting longer – well, perhaps line-wise, but not really byte-wise or at least code-wise. The readability on the other hand goes up.

No. 4: Not declaring types

Well, that really depends upon your programming language. In a type safe language it does give you insight when reading the code, what the writer likely had in mind. You know that  a + b is supposed to be string concatenation and not an arithmetic sum if one of the arguments is a string. You know that 1/2 is integer division and will
result in 0 and not .5

No. 5: Yo-yo code

With the web a more frequent part of any development, there is a need for string to something else conversions, and sometimes back to strings again. With JSON we have basic numbers, but no dates or timestamps, thus a string <-> x is needed.

No. 6: Writing your own data structures

Usually you really shouldn’t, not even for an anticipated performance improvement. But then – who knows – maybe you’re writing the next great data structure to be used for decades.

No. 7: Breaking out of loops in the middle

The reason for not breaking or returning at several different places is code readability – and thus maintainability. Odd thing is, then loop breaks are often used for “find first”. Java – being a bit slow on this – does not cater for this functionality, whereas Scala has find(predicate) doing exactly what is needed.

No. 8: Using short variable names (but i, x, and and make sense)

Definitely! Working with coordinates, math, physics you’d be less confusing using the nomenclature of those domains. Using a for an array and l for a list seems to be counter intuitive given habit No.4 “Not declaring types”. To be hones, I don’t care if it’s an array or a list – I care what it is: List of some sort of entities: Books, users, …

No. 9: Redefining operators and functions

This is only funny until you have to debug the 2nd and 3rd redefinition. Use a mapping between whatever needs to be hacked using inverse logic and the sane world.

Summary

If you deliberately use any of the bad programming habits – with the exception of No.5, which has a few valid excuses – then my take is that you are a bad programmer. Luckily there are ways to improve – start by not doing these bad things. Follow up by not taking bad advice (mine included).

Bad Teaching

søndag, juni 29th, 2014

Someone is wrong on the internet!

Someone is wrong on the internet
(Visit http://xkcd.com/ for a lot more of similar stuff)

Making too many mistakes while trying to teach a concept is worse than not teaching at all.

Don’t get me wrong – I admire the people making an effort to teach and
especially when it is about how to program. But just as enthusiastic I am about those who can, I am frustrated and angry with those who really can’t. Unfortunately there are plenty of the those who can’t who try – that is most likely due to the Dunning-Kruger effect

For some odd reason I stumbled upon one of these bad teaching resources, http://javapostsforlearning.blogspot.in/2014/06/method-overriding-in-java.html.

I became so furious with the content. A teacher should know better. A teacher should do better.

In an effort to teach inheritance of Object Oriented Programming, specifically for Java, the author takes a simple example and by doing so violates principles of design and good practices.

Concrete vs. Abstract

One of the benefits of inheritance is the ability to use the interface or in case of a missing interface then the super class to abstract the
hierarchy.

In the code, MethodOverridingMain lines 9-12 the declared objects (left hand side) should be Employee.


package org.arpit.javapostsforlearning;

public class MethodOverridingMain {

    public static void main(String[] args) {
        Employee d1 = new Developer(1, "Arpit", 20000);
        Employee d2 = new Developer(2, "John", 15000);
        Employee m1 = new Manager(1, "Amit", 30000);
        Employee m2 = new Manager(2, "Ashwin", 50000);

        System.out.println("Name of Employee:" + d1.getEmployeeName() + "---"
                + "Salary:" + d1.getSalary());
        System.out.println("Name of Employee:" + d2.getEmployeeName() + "---"
                + "Salary:" + d2.getSalary());
        System.out.println("Name of Employee:" + m1.getEmployeeName() + "---"
                + "Salary:" + m1.getSalary());
        System.out.println("Name of Employee:" + m2.getEmployeeName() + "---"
                + "Salary:" + m2.getSalary());
    }
}

This is particular useful as the subclasses don’t provide additional methods. Now every employee can be thought of as an Employee.

Violation of Liskov Substitution Principle

When working with hierarchies – which is a natural part of inheritance – then it is important to adhere to best practices such as Liskov Substitution Principle (LSP), which states that if a program module is using a Base class, then the reference to the Base class can be replaced with a Derived class without affecting the functionality of the program module.

Why is this important? It allows any developers using your source code as a library to reduce the cognitive load to only be concerned with the base class, which is another reason why you should program to an interface and not a concrete implementation (Interface Segregation Principle).

The violation is in the getSalary method of Manager and Developer. For the base class employee what you set is what you get, not so for the others.

Let us say that we have a policy of dividing the surplus every month with equal shares to every employee. The code to set the new salary for the employees would look something like this:


    public static void divideSurplus(double surplus, List<Employee> employees) {
        if (employees != null && employees.size() > 0) {
            double share = surplus / employees.size();
            for (Employee employee : employees) {
                employee.setSalary(employee.getSalary() + share);
            }
        }
    }

Yes, this is ugly mutating code but let us not be concerned with this yet.

If every employee were created as Employee this would work, that is, if version 1 of the library only had Employee, then this would have been the implementation to do the work.

When employees are created as Developer and Manager as well as Employee the code doesn’t break, but the business logic does. You end up paying more than you have made. This is an extremely ugly side effect of not adhering to LSP.


import java.util.ArrayList;
import java.util.List;

import org.arpit.javapostsforlearning.Developer;
import org.arpit.javapostsforlearning.Employee;
import org.arpit.javapostsforlearning.Manager;

public class SurplusDivision {

    public static void divideSurplus(double surplus, List<Employee> employees) {
        if (employees != null && employees.size() > 0) {
            double share = surplus / employees.size();
            for (Employee employee : employees) {
                employee.setSalary(employee.getSalary() + share);
            }
        }
    }

    // cannot be used if salaries are 0
    public static void divideSurplus2(double surplus, List<Employee> employees) {
        if (employees != null && employees.size() > 0) {
            double share = surplus / totalSalaries(employees);
            for (Employee employee : employees) {
                employee.setSalary(employee.getSalary()*(1 + share));
            }
        }
    }

    public static double totalSalaries(List<Employee> employees) {
        double total = 0;
        for (Employee employee : employees) {
            total += employee.getSalary();
        }
        return total;
    }

    public static double claculateSalary(Employee employee) {
        return employee.getSalary();
    }

    public static void main(String[] args) {
        double revenue = 90000.0;
        List<Employee> employees = new ArrayList<>();
        employees.add(new Employee(1, "name1", 10000.0));
        employees.add(new Employee(2, "name2", 20000.0));
        employees.add(new Employee(3, "name3", 30000.0));
        double surplus = revenue - totalSalaries(employees); 

        divideSurplus(surplus, employees);
        System.out.println(totalSalaries(employees)); // prints 90000.0

        employees = new ArrayList<>();
        employees.add(new Employee (1, "name1", 10000.0));
        employees.add(new Developer(2, "name2", 20000.0));
        employees.add(new Manager  (3, "name3", 30000.0));
        surplus = revenue - totalSalaries(employees); 

        divideSurplus(surplus, employees);
        System.out.println(totalSalaries(employees)); // prints 101600.0

        divideSurplus(0, employees);
        System.out.println(totalSalaries(employees)); // prints 115226.66666666666

        // surplus 2
        employees = new ArrayList<>();
        employees.add(new Employee(1, "name1", 10000.0));
        employees.add(new Employee(2, "name2", 20000.0));
        employees.add(new Employee(3, "name3", 30000.0));
        surplus = revenue - totalSalaries(employees); 

        divideSurplus2(surplus, employees);
        System.out.println(totalSalaries(employees)); // prints 90000.0

        employees = new ArrayList<>();
        employees.add(new Employee (1, "name1", 10000.0));
        employees.add(new Developer(2, "name2", 20000.0));
        employees.add(new Manager  (3, "name3", 30000.0));
        surplus = revenue - totalSalaries(employees);

        divideSurplus2(surplus, employees);
        System.out.println(totalSalaries(employees)); // prints 102441.17647058822

        divideSurplus2(0, employees);
        System.out.println(totalSalaries(employees)); // prints 117079.41176470587

    }

}

The main method should print 90000.0 in every case, but it doesn’t.

Not only has the hierarchy broken the business case – it has also made it quite impossible to create correct.

Double the money

This is a widespread mistake – having a decimal point in the string representation of the number does not make it a viable currency indicator. Please go and read What Every Computer Scientist Should Know About Floating-Point Arithmetic

It is incredible that people over and over again seem to think that infinitely many elements can be stored in a finite machine.

Public Fields are Bad

Well, there are no public fields in the code shown, they are class protected fields. While that is technically true, the fact is that getters and setters galore is basically no better. Having this bean structure or promiscuous objects makes it easy to implement in imperative style, and extremely hard to preserve the codes maintainability because you have violated the core tenet of Object Oriented Programming: Encapsulation. Read Why getter and setter methods are evil for more on this.

How often is it required to set the Employee name, Id, or salary?

Hardcoded Values

The BONUSPERCENT constant for both Manager and Developer are hardcoded, that is if one manager is allowed a different bonus percentage then it is not possible. If every developer needs a different bonus, then the code needs to be recompiled.

Unclear or Misleading names

BONUSPERCENT is in the decimal representation, that is 0.2 = 20%.

Bad Design

While I do understand the notion that we seemingly have a hierarchy because a Developer is an Employee and a Manager is an Employee, there really is no reason for this. They are only different – in the provided example – through title, which apparently isn’t a part of the object, and how their salaries are calculated. If an existing employee becomes a manager or developer, we cannot shift their roles, but must create a new instance of the matching object – something there isn’t support for in the code provided.

So if the Employee had a Role associated, then something else could calculate the salary to be paid based upon the role. Naturally this wouldn’t help with explaining code inheritance, and it probably wouldn’t help with the surplus division.

OOP is dead and alive

torsdag, maj 23rd, 2013

Discussing Programming topics with non-programmers

The other day I was in good company with a business owner, who is also a programmer, and a business controller, who isn’t a programmer. We were discussing the issues of one of my favorite topics: Software Quality.

Naturally the software must do what it is intended to do, but the hidden issue, which to me is almost as important: Functionality must be placed in the right areas.

The analogy became looking for $10 in a persons wallet.

To the programmer getting this task, the basic work: Find person, find persons wallet, check if wallet contains $10. Has to be done regardless of where the functionality is applied. To the CPU the steps will be loaded in sequence anyway thus the “correct” position of the code is another matter.

Now, in real life, if I ask someone if they have $10 in their wallet, they are able to check their own wallet and inform me. On the other hand, I could just take their wallet and check myself. Naturally this would be a violation of privacy – and I’d have to know where they keep their wallets.

In the same sense, a controller could implement the functionality required leading to promiscuous objects being violated. Or the objects themselves could have the functionality implemented. The first leads to violation of Law of Demeter, tight coupling, and too much knowledge, which in turn leads to higher risk of introducing bugs, higher maintenance cost, intricate dependencies, and a big ball of mud.

Controller {
  person.getWallet().hasAmount(amount)
}

versus

Controller {
  person.hasAmount(amount)
}
Person {
 hasAmount(Integer amount){
  return myWallet.hasAmount(amount)
 }
}

The example is a bit far fetched, but it served the purpose of why it is important to have clean code in more than one sense.

The first snippet the Controller will have to know of Person and Wallet. In the latter Controller needs to know of Person, and Person needs to know of Wallet. Even though there are the same amount of dependencies, the context for the Controller is much higher in the first snippet.

OOP is dead

A while ago Pinterest suggested I read Object Oriented Programming is Dead which is a bit dated, nevertheless it is still relevant. I just think that there are more reasons why OOP is dead – and yet still alive.

First off, we killed OOP by trying to fit a relational database as the persistence layer – this leads to data transfer objects, which are mostly grouped global variables or Java beans, which has nothing to do with encapsulation.

Second, we killed OOP by placing logic in the wrong classes, classes in the wrong hierarchy, and generally forgetting what OOP is about.

Third blow, we apparently insist on imperative styled programming for an OOP, which leads to the issues described above.

Fourth stab, we seem to be grounded in the snapshot state of databases. That is, an object in a database has a single state, without any prior history. This is similar to register loading, and overwriting, and is prevalent in the Update keyword. You actually have to twist, turn, and contort the default behavior of an RDBMS to get a history/audit trail for the values.

The final death blow was delivered by Martin Odersky – who also kindly revived OOP in junction with FP – in his presentation Object and functions, conflict without a cause. Well he has done so on other occasions touting the Scala horn.

Rich Hickey – the Clojure guy – seems to at least backup the FP and OOP notion – primarily of stateless objects, and Datomic seems like a brilliant choice for a persistence layer.

OOP is alive

The Object Oriented notion is extremely important as it is what should drive SOA services. They should be defined by their interfaces and encapsulate data and implementations. As Steve Yegge mentioned Amazon is quite good at, and Google not quite as good.

I believe SOA is the right level of re-usability of software, and we will have to get much better at it as more and more mash-ups are wanted. That means we have to accept interoperability at a different level, keep disciplined and not query database tables which we know of, but really belongs to another service.

We also have to embrace the functional style – I’m talking REST for web developers – in which objects are immutable/stateless, and transformations can be predictably run whether in parallel or sequence, synchronous or asynchronous to the client.

Not even mediocre

tirsdag, maj 7th, 2013

This is a follow-up to Trust has failed – time for control and assurance

I’ve been annoyed by the findings that we – as software developers – have at best an average of 68% success rate, and with approximately a third of these in need of costly repairs, which in my book constitutes a success rate of only 46%. We are not even mediocre at best, and we haven’t moved away from Frederick P. Brooks’ “Plan to throw one away” (Chapter 11 of The Mythical Man-Month, ISBN: 0201835959) Even though he argues against it in the 20th anniversary edition from 1995 – almost 20 years ago. That is – it seems we haven’t really learned anything for the past 40 years of software development.

I know that there are other professions in which the products are declared as successes while they soon after display less than adequate abilities. Vasa, Titanic, Columbus reaching the west passage to India, Apollo 1 and Soyuz 1 & 11, etc.

Let us assume that the complexity of the requested software solutions are normal distributed. Then 68% is 2s in the Six Sigma terminology, or zero nines in the engineering Nines terminology. The only natural division of things into 2/3 and 1/3 which comes to mind is the awake and sleep division of the day.

The complexity of a software solution is quite hard to measure as it involves the actual solution as well as the work done in order to get to the final solution, and that in turn involves the people associated with the project over its entire scope: Developers, testers, architects, project managers, customers, etc. For a parallel to the world of mechanics we have the Invention of Heavier-than-air Aircraft, in which the Wright brothers won over the better funded and equipped team of Samuel Pierpont Langley.

I’m not sure whether Langleys team was Punished by Rewards or they just over-engineered the task at hand. Fact is, they didn’t deliver before the Wright brothers.

I know – usually we don’t have a race to be first, we have a contract to fulfill instead. Perhaps that is why there are so many failures in the business. It is work and not (serious) play. Perhaps it is the ever increasing measures of confinement. Normally when I’m using someones services, we have a general acceptance of What we are discussing, I know Why I want or need whatever service it is, but I’m not telling the service provider How they must go about their business.

Take travelling for instance – I pay someone to transport me and my luggage to some specified destination, but I don’t tell them How they should do it, which route to take or in any other way meddle with the service, they are providing. A cab driver might ask which way to take, e.g. choosing between fast and cheap.

Conversely for software projects, there are notions of “must be the same as the previous”, “must be XML”, and “must have a response time below 100 ms” – while these are demands on How things must be done, they could very well – and in some cases have – impaired the end product. You end up with something as ugly as trying to use Java with twitter4j to store Tweets in MongoDB. Tweets come in as JSON object, MongoDB stores “rows” of JSON. On paper JSON objects comes in and should be filtered and piped to the database. But twitter4j reads JSON, makes it into standard Java objects, which then have to be constructed back into JSON using a convoluted builder. Adding Java makes the simple solution a lot more complicated.

But I digress.

If we on average are 68% successful, then the software around us are the result of those successes. I’m sorry, but I’m pretty sure that my definition of a success is not quite compatible with the apparent standard. Clunky interfaces and useless messages are one thing, but not being able to see your online bank account around payday due to too much stress on the machines. Not being able to log in to a “secure” facility because Java is not the right version – Java has had so many security breaches that I’m puzzled why it is in use for “security”. For extra fun, try doing this using Firefox on a 64 bit Windows box – there are seemingly no end to the hiccups Firefox will have, and will have to be killed by the process manager. Having to click on a message sent to you – notified by sending you an e-mail, that you have a new message – redirecting to downloading the message as a PDF, then having to open the PDF in another viewer, while maintaining the notion in the interface that you have unread messages. Windows not being able to shut down because it cannot play the logoff sound – it’s apparently extremely important that this sound is played.

I am pretty sure we could do better than this – and as these bugs/annoyances have existed for a long time, I have to believe that these are in the 46% of the successful software which doesn’t have to be remedied.

On the graph we have the normal distribution, the red shaded area is the 46%, the pink shaded area is the next 22% – the successes which have to be mended. In total, the entire shaded area is 68% – our success rate.

If the x-axis reflects the complexity of projects, then we’re struggling with the mediocre, which is quite puzzling as that would probably be the projects we get most of – and then we should have learned from the previous attempts, and thus be better at handling – but then developers aren’t the only part of the solution.

I am certain that with higher discipline on both sides of the table during project development we can consolidate 68% and even jump the next 22% to the first nine: 90% – it could be that 93% is a possibility within a few years, but let’s settle for the first milestone.

The next graph shows shaded areas for 68% (red), 90% (blue), and 93% (green).

Going to 90% means +1.3s and 93% means +1.5s which is approximately one standard deviation above the 68% mark.

I have assembled these points on the accumulated frequency graph. 46% (purple), 68% (red), 90% (blue), 93% (green)

As can be seen, then the step from 46% to 68% is approximately the same as the 68% to 90% – these should be low hanging fruits, ripe for the picking.

If we can pull this off – and quite frankly we simply have to – this would mean that we would have a lot more successes, which should bring greater happiness within the working environment. It could bring more work as the investment is more secure there could be a higher number of companies trying to implement new projects, which at the current rates would be deemed too risky.

The graph shows the number of tries you have to make to ensure a probability of success above 90% for the three success rates: 1/2, 2/3, and 9/10 – currently we are about the 1/2, if we estimate the successes in need of mending as not quite a success, 2/3 is approximately 68%. Thus to probabilistic ensure at least 98% success rate, our customers are expected to be willing to invest 2x, 4x, and 6x the estimated cost of a project depending upon the quality we – as a business – on average can provide.

More positive work – it shouldn’t mean more death marches, nor longer hours – quite the opposite. More projects delivered to the satisfaction of the customers, on time, on budget, hopefully working better than hoped for.

If on the other hand we can’t pull this off by discipline, better contracts, and better cooperation for all involved, then we simply have o cut down on the complexity of the projects. I know that whether you fail or succeed, you still earn the money along the way – but that is not the way I want to live and work. Return customers – the happy ones – are amazingly better customers, and they are replenishable resources, simply by the fact that they return and thus is not depleted.

If we build services that our customers will benefit from, then there should be no other hindrances to mutual benefit.

While this seems like a nobrainer, a win-win situation, why aren’t we already there? What can we do?

Unfortunately I’m not sure, but to start somewhere, I think we can start at the software development process. I’ve been reading the Danish Quality Model (DDKM in Danish) for healthcare. They have done a splendid job in providing reasons Why things must be done, and What the things should encompass, Who is responsible, but not How – this is left for each Hospital to decide. I really like that approach. The hospitals will require renewal of their certificates at least every 3 years. Furthermore, I read the Bonnerup report (in Danish) “Experiences from governmental IT projects – how to do it better?” ISBN:8790221567 from March 2001 – and the article (Still in Danish) “Authors of the Bonnerup-report 10 years later: None the wiser.”

Usually customers don’t quite know what they want until they see it – sometime though, they know exactly what they want, but have a hard time telling developers what it is. Communication is essential, and short iterations of continuous improvement, i.e. Plan-Do-Check-Act with the customer at hand seems essential. We know that getting everything right in the first try is almost impossible. I don’t think any golf player expects to play a perfect round.

The customer representative must be available, knowledgeable, and able to make decisions in all aspects. This will make the communication fluent as opposed to a lot of back and forth, who said what, and similar issues. If there is a reason to change a color, the customer representative should be able to make the call as opposed to having 5 meetings and a committee approve.

Decision makers must be invested in the project. Outside consultants who will earn money regardless of the progress of the project should be discouraged as decision makers.

Teams – on both sides – should be as small as possible to improve collaboration, communication, and understanding of the aspects in the project.

Project management must improve – staffing, estimates, communication, user participation, and post delivery follow up. It is important to learn from previous and others mistakes. At times you feel part of a Monty Python sketch sometimes the Architect Sketch, sometimes the “Meeting to take action” part of Life of Brian. Boehm and Turner discusses Alistair Cockburns notion of competence levels in the book “Balancing Agility and Discipline” ISBN:0321186125. A lawyer with specialty in IT projects mentioned reading meeting minutes from failed projects stating that “The project is delayed” as the sole content. Not why, not how long, not which actions are being taken to counter this. Later on the minutes would read “The project is greatly delayed.”

Developers must work on a single specifically defined atomic task. This makes it easier to describe why something is added to the solution. A task can touch upon several files, but doesn’t have to.

Developers must use version control. Each atomic task is committed to version control – preferably with the task description and Why it was added. This allows a nice readable history of the project, and an audit of the project at any point in time.

Developers must test their code. Not only to ensure correct behavior of the expected flow  and of the negated behavior, but – and this is more important – to guard against future changes. Tests for code makes it possible to freely try alternatives, and if you are stress testing, then it makes it possible to evaluate different configurations.

Have a vision, set goals. It is important for everyone to know which direction the project is heading in, and to know some of the milestones along the way. Accept that even the best laid out plans will fail – they do in sports, so why should we expect anything else from corporate business?

Third party competent people should audit the source code according to these rules and the project description, making it possible to give an adequate description of the projects health – much like an accountant should be able to read a companys ledger and estimate the financial soundness of the company. Third party because we need independent observers to be objective about the project.

If we cannot improve, then it must be imposed that the Minimum Viable Product becomes the Maximum accepted proposal for future projects. 

Should we strive for an accrediting institution certifying software companies on a yearly basis? I really hope not.

Resources

Update

TechRepublic had a blog entry IT projects: Why you need to fail more often, and perhaps this is actually one of the reasons why we don’t do any better: We keep on beating a dead horse as opposed to cutting the losses early and learn from mistakes made, both our own, but also those of our colleagues in the business.

I know, it is hard to keep a business running if you terminate projects early due to infeasibility. But no matter how far you go down a wrong road, going further or faster will not get you back on track.

Why Why is more important than What

torsdag, april 25th, 2013

When trying to understand a new concept the important thing to understand is not what the concept is, but why it exists. Thereby getting to the essence of the thing in itself.

This is probably why the 5 Whys is an important tool for root cause analysis and incident investigation albeit it doesn’t fit all purposes. But if it is a sequence of burrowing down to the core of an issue, then it is probably one of the better methods of examining unknown processes.

As in the story about the newlywed couple. One evening, the husband noticed than when his wife began to prepare a roast beef for dinner she cut off both ends of the meat before placing it in the roasting pan. He asked her why she did that. “I don’t know,” she said. “That’s the way my mother always did it.” The next time they went to the home of the wife’s parents, he told his mother-in-law about the roast beef and asked her why she cut off the ends of the meat. “Well, that’s the way my mother always did it” was her reply.

He decided that he had to get to the bottom of this mystery. So when he went with his wife to visit her grandparents, he talked to his grandmother-in-law. He said, “Your daughter and granddaughter both cut off the ends of the meat when they fix roast beef and they say, ‘That’s the way my mother always did it.’ How about you? Why do you cut the meat in this way?” Without hesitation the grandmother replied, “Oh, that’s because my roaster was too small and the only way I could get the meat to fit in it was to cut off the ends.” (I’ve heard it before, but the only text I could find was from The Everlasting Tradition on Google Books)

If you don’t know the root cause you may end up doing unnecessary work at best, but most likely limiting, and in worst case counterproductive and wasteful work.

Don’t ask people what they want or do, but why they want or do it. It’s just as Henry Ford said: “If I had asked people what they wanted, they would have said faster horses.” They would have asked for faster horses, because horses was something they knew about, and faster or stronger would make transportation better.

In the same vein, it is just as important to learn the reason behind, when embarking on a new project with unknown entities. In particular when starting on new software project, and especially for project managers on both sides of the table. You need to know what to deliver to be able to deliver it in the first place, you can’t tell a developer what you need, if you don’t know what it is, and you cannot accept or test the thing if you don’t know how it should behave.

If a feature has to be cut it is paramount that you can argue why that doesn’t impair the end product too much.

If a feature can be implemented in multiple ways, then the simpler should be opted for. If you don’t know the essence of the feature, you don’t know the feasible ways, and you may choose a too simple solution – these are the solutions which seems to almost work.

Going back to Ford’s quote, it is important that you know what to abstract and how to abstract it, e.g. “faster horses” to “faster means of transportation” and not “faster animals” – that would lead to trying to hitch a cheetah or a bear to a buggy.

As the character Forrest Gump is accustomed to say: “Stupid is as stupid does.” – if we don’t know better, then we do stupid things. If you know why you do things, you may have a chance not to act stupid.

When knowing why as opposed to just what, then you are closer to the Ha step of Shu Ha Ri, because you already know the mechanics, and you are armed with the path. You may not know which quantum leaps you have to make to diverge to another stable level, but at least you know whether a path is perpendicular to the current flow or perhaps an ever so slightly diverging path.

On a much more pragmatic level, it is better to know why a certain color or method is chosen, especially when the time to change it comes around. Which is why the “why” is a much better comment for source code than the “what” – which should be evident by the code itself. And if you have complete memory of the history of changes, you can check if we’re going in circles.