Archive for the ‘programming’ Category

Fragile agile

fredag, april 12th, 2013

Some people don’t “get” agile – then again perhaps it’s just me who doesn’t get it. At least here’s my take/rant on the issue:

Software development is an evolution of the software, thus you’re not really ever done – the product may be fine for now, but will later have to evolve to keep up with customer requests, e.g. new rules, new layout, new features. A product is not a boxed-in set of features for a given amount of time and money. It’s a concept with an evolving portfolio of features, and an evolving wish list.

Sometimes you have to provide a solution to a customer in order for them to better understand what they really want.

A project cannot be a boxed development of features for a set amount of time and money, as time goes by and new and possibly more interesting features evolve for the customer at hand. Well – perhaps a project could be to “change fonts” or “change colors” – but there’s no project which requires more than one task, well perhaps “change fonts and colors”. A Project becomes a single iteration or sprint.

There should be no need to argue whether a request should be in a project or not, except for technical reasons. If the request is there, then at some point in time the request will be fulfilled. If it’s of high value to the customer, then it’ll most likely be done earlier.

Estimates are wrong – by at least a factor of 2: I’m not saying we should work slower, I’m saying that we should work towards improving our legacy systems as well. A sprint is not called a sprint because you blindly go along full speed without concern for what you’re doing – that would be a demolition derby.

A sprint is a just a short distance with the goal easily visibly, and hopefully easily attainable. You go along a sprint as fast as you can while still maintaining the highest level of quality, not building any technical debt, thus highest sustainable pace. You know, that when the sprint is over, there’ll be another one. Thus there will be no time for rest, no time for clearing technical debt.

If a better way during a sprint is to refactor code to get a better structure, then this should be taken into account when the estimates are done – and the refactoring should be done. If not, then the code becomes more rigid – quite the opposite of agile and software.

Sometimes requests are disrupting the flow of the current code, then it is the developers responsibility to spot this and remedy the problem, otherwise the code ends up as an exquisite corpse. Sometimes Project managers think they need to shield developers from information thereby making the developers create exquisite corpses – don’t do that. Be on the same journey, work towards the same goal.

Agile done the wrong way: The team should work as a team. That is on the same features in the sprints backlog – if they’re not working on the same features, then collocation is pointless and so is team size limits. The customer should be passionate about the product, and be involved in the development. This is why the customer should not be represented by a random project manager. The developers should be passionate about the product, which is why decisions shouldn’t be made by random project managers.

The Chinese Whisper strategy of getting answers: Developer asking Lead asking Project Manager asking Customers Project Manager asking feature requesting employee – and sending the answer back in the reverse direction fails on at least 3 accounts:

  1. It takes a long time to get an answer.
  2. The question and answer may get corrupted along the way. Especially if
    any of the communication bridges are verbal. “You tell me they said
    ‘red’ but the task says ‘green’ – which is it? Or better yet: Can I get
    a hex-color for that?”
  3. Information may be lost or withheld along the way – deemed “not
    important” or “not relevant” by any party.

This would be alleviated by developer asking (onsite) customer directly – if the developer could show the customer representative then a lot of other questions are answered so much faster and so much better saving a lot of redoing a feature.

You may think I don’t like Project Managers – well, if there’s no project, then there’s no point in having a manager for it. My take is that the Project Managers, Project Secretaries – or SCRUM Masters – should remove obstacles in the software development process and make sure that the project/sprint gets finished on time. They should not create problems nor prolong the sprint by adding features. There’s probably still need for time/feature accounting thus I’m not entirely against them – they should just not be in the way of progress, and should keep their micromanagement tendencies and software estimates far away from the development and developers. This, of course, requires developers to prove that they are responsible professionals delivering estimates and solutions consistently and on time.

Crossing a River

fredag, april 5th, 2013

Introduction

For those who are not in the software development business, I have tried to create a parallel to an engineering task simple enough that most can see what some of the issues are.

In this story we have a customer who wants to cross a river. He has seen other people using boats to cross the river, but he doesn’t know anything about boats, he simply wants to be able to carry some merchandise from one shore to the other without getting it wet.

The software developers he hires have no business insights into the river crossing domain, but they do know that they can solve almost every problem.

Customer: “I want to cross the river without getting wet. I’ve seen boats, but I can’t be bothered with the hassle of sailing one.”

Developers: “River? Wet? Boat? What are those things?”

Here the customers animosity towards the apparently completely stupid developers germinate.

Customer: “Don’t you know anything? The river is this flowing barrier. Boats float somehow and can be rowed across using muscle strength. But as I said – I can’t be bothered with the hassle of sailing one.”

The developers taking note of the animosity don’t want to ask more questions trying to keep up appearances they conclude that they have sufficient information to start work and their own preliminary investigations on the subject.

Version 1.0

The developers set up an investigation on the objects mentioned. They get that wood will float, and that apparently the river flows carrying the wood downstream, which means that there is a possibility of some kind of raft can be carried across using this flow.

Another team tests the best launch site of the raft within the river to the highest probability that it can land on the other side. They conclude that starting close to the middle of the stream has a high probability of keeping the raft in the middle of the stream.

The developers come up with the following solutions covering all of the customers needs.

Initial design for crossing a river

Proudly they show this to the customer, who in turn is thrilled to see it in action. Though after getting to the other side he’s a bit stomped, and asks the developers: “How do I get back then?”

The developers are fluttered. There is no mention of getting back from the other side, just a simple crossing.

The customer argues: “But that goes without saying. If I take my merchandise from one side to the other I have to get back to resupply.”

Version 2.0

One developer suggests reversing the flow of the river. Naturally you can’t do that thus he shows the lack of domain knowledge of rivers. But if it was possible to reverse the flow, then this would be a viable idea.

Whether the customer is informed of this idea or the developers try it out for themselves, the result is greater animosity or wasted effort.

While the reverse solution seemed stupid, another developer reflect upon the concept and the original idea, then gets his Eureka moment: We’ll just add another dock on the first side further downstream – it’ll be the same as going from the first to the second side, just reversed.

The developers proudly present version 2.0

Getting back to the first side again

The now wary customer is thrilled that he can get back home, but now there’s an issue with reuse. “I want to be able to bring another shipment, but now the raft will just continue down the river.”

Version 3.0, 3.1, …

After mentioning that this was another thing not in the specification, the developers come up with several suggestions for solutions.

  1. Repeat the success. Build more docks alternating sides. While this works for a few round-trips the customer finds that the way back home becomes excruciatingly long. Or they end up wherever the river flows to.
  2. Skip the surplus docks, just keep the three from the second solution. Build new rafts at the start of every round-trip. While this works, and the way back home is constant, the price for a raft: Materials and time spent. Soon becomes unfeasible.
  3. Drag the raft from the last dock to the first. This can be done either by the customer or some other power source, e.g. a horse.

Dragging the raft back to the beginning

While the customer now has a reusable way of crossing the river, there are better alternatives.

Some of the downsides to this solution:

  • Cost – the developers worked on several solutions. If they got to version 3 above before multiple docks or rafts, they would at least have saved a lot of time and resources.
  • Time – while time is a cost, then time spent crossing the river in this fashion seems wasteful
  • Parallelism – everyone crossing must depart at the same time or wait a full revolution of the raft. Naturally more rafts can be issued, but then there’s another problem of cluttering up the river.

The “correct” solution would have been to build a bridge crossing the river.

Crossing the river using a bridge

While this should seem like a no-brainer to us, then there was a first bridge at some point.

The “funny” part is that this could be achieved at 2/3 the cost of the final solution as the docks are going halfway into the river.

Another “funny” part is that most likely then the developers analyzing the flow and suggesting the launch site to be he middle of the river, would probably have build a tiny bridge crossing the entire river from which they could launch probes. Quite possibly they would have build another bridge to measure where the probes went.

The bridge does have a drawback, which the dock solution doesn’t have: Tall things can pass through the river without tearing apart the solution.

To keep with this parallel. A new developer company – a set of bridge builders – shows up telling the customer that there is a better way to do things, namely a bridge. But the customer is quite happy with his solution. It might not be the best, but it works, and the cost/benefit of the current solution is still in the red.

Summary

So – what to take away from this story? First off, this is an extreme simplification of the issues in software development. It would probably have been more correct if the customer had been blind, as most software customers don’t understand the code making up the solution. But then some of the developers would have had to be blind, deaf, mute, and forgetful.

The customer is most likely the best candidate for an expert in the domain for which the developers are building a solution. On the other hand, then developers should be experts in the solution cracking and development domains.

Animosity and trying to keep up appearances are detrimental to the solution at hand. It is quite easy for the customer to think that the developers are stupid when they don’t know what he is trying to communicate. On the other hand, then the developers quite easily find themselves on a death-march project where it seems the customer keeps wanting more features for free.

The unsaid, not written down expected parts of solutions are always there – they may be given for those knowledgeable in the domain, but probably aren’t for others.

Anchoring terms in the conscience of anyone is dangerous if you want something completely different.

Communication is difficult

Trust has failed – time for control and assurance.

mandag, marts 18th, 2013

The general success rate of any software development project is at best 68% – just about 2 for 3, or Two Sigma – and we’re expecting reliability of services to be in the Nine nines or Six Sigma, then perhaps we should do all we can to at least get a single nine or improve onto the 3 Sigma.

Why is this an issue?

Primarily this is an issue because computing is pervasive and software is ubiquitous, interconnected, and in the hands of common people – not just experts. This means that the software solutions provide services for anything, and if mistakenly trusted – that is, trusted to provide the service without hiccups but failing to do so – then at best it is a nuisance to the user, at worst it is catastrophic.

While Apple Maps have some fun pictures – being misled into the Australian outback is probably not a minor nuisance. Putting drivers on the runway is not only dangerous for the driver or the people on board an airplane, but anyone in the vicinity.

Common sense should prevail – Darwin awards and whatnot.

Well, that would be an argument, but if you have to double or triple check all information, then the smart devices aren’t really helpful in anything but increased bewilderment. If all resources point to the same mistake, then any number of double check will fail. If you combine services, e.g. Siri + Apple map + Google self-driving car, then you are betting against the combined odds.

The common sense should be to trust the services provided. That the services are added to be helpful and not in the challenging sense. I know gamification is a hot topic the stakes should just not be the life of people. At the very least that is detrimental to most sales.

Professionalism

If any other business would have success rates of 2/3 – I’d be surprised if they would stay in business for long. I’m pretty sure I would get aboard an airplane if the success rate of getting to the right destination was only 2/3 – I’m not saying “on time”.

Some surgeons are trying to improve on their success rate targeting 97-99% instead of the 90-95% range they currently reside in – and we can’t even reach the 90% level.

Money

It seems funny that we have audit requirements – at least in Denmark – regarding the financial state of a company. It seems natural that the pervasive entity known as money is managed and controlled, checked and audited. The flows accounted for. But in the end it is only money, it is less pervasive than the elusive software.

I bought my computer with money some years ago, that is, I exercised my financial resources at that time for the specific purpose. Every day I turn on the computer, thereby exercising the operating system and various other software entities.

Money: Once. Hardware: Every day. Software: Every day

It wouldn’t seem like an unfair comparison to expect the same engineering effort and dedication to go into both software and hardware, but my hardware hasn’t failed yet, whereas my software has failed often and sometimes unbeknownst to me.

You might not see it, but that doesn’t mean it is not there.

Waste

Getting my hair cut my hairdresser wanted some profile information for their cash register/customer management system. As I had been there before my ID should recall all the information, but it didn’t. If they are spending 5 minutes per day on mistakes or circumventing oddities in flow dynamics of software, then they are wasting 20 hours per year, which naturally the customer will have to pay for in the end.

240 days at 5 minutes/day = 1200 minutes = 20 hours waste

I would rather pay the same and have the business blooming than know that the more software they add which isn’t really working, the more I have to pay to keep them in business.

As it turns out – they are using other software besides the CMS – at least 3 other programs. If they are at the same level of waste, then every year the hairdresser is wasting 80 hours approximately half a month or 4%.

No matter what you have as an acceptable wastage rate, starting out at 4% requires seems unacceptable to me.

4% correlates to the process fallout for Sigma level 2 in process capability index – we really should strive for a better world going for at least Sigma level 3 for starters.

Possible Solution

The other day my Windows 7 installed 16,000+ updates. I was baffled. I don’t think we can get to perfect, but we can strive towards it, we can do better.

If a financial audit exists to add credibility to whatever management does, then we should add a source code audit to add credibility to the software product for the benefit of the end user.

And the source code audit should be obligatory.

A certificate is insufficient to solve any of the issues.

Quality of (once) closed source code

mandag, marts 11th, 2013

Looking at one of the larger files in Libre Office:

com.sun.star.filter.config.tools.utils.Cache

This file has been removed in the above repository – and for good reasons.

I guess that the file  was once a closed source file as the com.sun package name suggests, and I found it only fair to expose some of the defective code for a closed source after writing about dotCMS, because – as I stated then: The problem is in both worlds.

Opening the Libre Office repository this Friday, spending some time to get statistics to suggest looking at a specific group of source files I found this. I must say it is immediate ugly when you just browse through the code. But once you really start to examine it you would have to be a bit masochistic to work on it.

The code is really horrible on several levels:

  • Coupling implementation types instead of interfaces, or Interface vs. specific class
  • Primitive Obsession
  • Non-enumerable enumerable
  • Magic Strings and numbers
  • Dynamic objects of known entities
  • Modification of parameters and poor understanding of Map
  • System.out.println + a logger available.
  • Cyclic dependency with XMLHelper
  • StringBuffer: Setting size, then using +
  • Correcting spelling mistakes, but apparently not all of them
  • Unrolled loops
  • Hungarian misnomer
  • Missing set functionality
  • Keeping current
  • Not conforming to standard Java code convention for packages
  • Sloppy copy-paste

Coupling implementation types instead of interfaces

Using HashMap as the field reference type is considered bad as it exposes too much specific knowledge of the object. As the variables are simply used as maps, there is no reason why they can’t be defined as Map.

That said the Maps shouldn’t really be raw types. Trying to change this reveals another issue: Dynamic Objects of know entities.

Primitive Obsession

No specific modeling is applied to the objects in the solution even though a lot of specific knowledge is assumed. This is bad as assumptions rarely hold up.

Non-enumerable enumerable

Flags, E_types, and CFG are specific enumerable types, but they are defined specifically, which usually is a disaster waiting to happen and unnecessarily hampers development.

Magic Strings and numbers

The code is littered with magic strings and numbers – even though there are constant definitions as well. This often leads to undesirable issues as a single type can go undetected.

Dynamic objects of known entities

This is a combination of the raw types mentioned under coupling and primitive obsession. The maps are used to store entities defined by the PROPS constants.

<sarcasm>If only there was a way in which one could  define an object of known entities – oh wait, that’s one of the benefits of a programming language.</sarcasm>

E.g. m_lTypes should be a Map<String, Map<String, Object>> if not sticking to the raw type. The keys to the value map are defined by:

Key

Value type

PROPNAME_CLIPBOARDFORMAT

String

PROPNAME_CONTENTHANDLER

String

PROPNAME_DETECTSERVICE

String

PROPNAME_DOCUMENTICONID

Integer

PROPNAME_EXTENSIONS

Vector<String>

PROPNAME_FRAMELOADER

String

PROPNAME_MEDIATYPE

String

PROPNAME_PREFERRED

Boolean

PROPNAME_PREFERREDFILTER

String

PROPNAME_UINAME

Map<String, String>

PROPNAME_UIORDER

Integer

PROPNAME_URLPATTERN

Vector<String>

m_lFilters is – on the surface – of the same type: Map<String, Map<String, Object>>, but the keys and types of the value map are defined as follows:

Key

Value type

PROPNAME_DOCUMENTSERVICE

String

PROPNAME_FILEFORMATVERSION

Integer

PROPNAME_FILTERSERVICE

String

PROPNAME_FLAGS

Integer

PROPNAME_ORDER

Integer

PROPNAME_TEMPLATENAME

String

PROPNAME_TYPE

String

PROPNAME_UICOMPONENT

String

PROPNAME_UINAME

Map<String, String>

PROPNAME_USERDATA

Vector<String>

m_lDetectServices, m_lFrameLoaders, and m_lContentHandlers are all of the type:

Map<String, Map<String, Vector<String>>>

And they seem to only really use PROPNAME_TYPES as the key to the value map.

Finally, m_lFilterToTypeRegistrations is on the following form:

Map<String, Vector<String>>

Besides that, it is not used as a field but as a local variable.

Modification of parameters

In the method convertDetectServicePropsToExternal and others have issues with side effects on the parameters. Whether this is to be a flaw of Java’s implementation of Map or the programmers lacking understanding is difficult to say.

private static java.util.HashMap convertDetectServicePropsToExternal(java.util.HashMap aMap   ,
                                                                     int               nFormat)
    throws java.lang.Exception
{
    java.util.HashMap aResultMap = null;
    switch(nFormat)
    {
        //-----------------------------------
        case FORMAT_60 :
        {
            // no changes!
            aResultMap = aMap;
        }
        break;

        //-----------------------------------
        case FORMAT_6Y :
        {
            // remove localized name
            aResultMap = aMap;
            aResultMap.remove(PROPNAME_UINAME);
        }
        break;

        //-----------------------------------
        default :
            throw new java.lang.Exception("unknown format");
    }
    return aResultMap;
}

Probably the intention was to return a different instance with the same basic values as the given map. The problem is that the assignment, aResultMap = aMap, makes the pointer for either point to the same data structure in memory, thus the call to remove removes the key and value from whatever the pointer is pointing to, thereby removing it from the original as well. What’s worse is that the pointer is returned.

The solution: Use putAll()

private static Map convertDetectServicePropsToExternal(Map aMap, int nFormat){
  if (!(nFormat == FORMAT_60 || nFormat == FORMAT_6Y)) {
    throw new IllegalArgumentException("unknown format: " + nFormat);
  }
  Map aResultMap = new HashMap();
  aResultMap.putAll(aMap);

  if (nFormat == FORMAT_6Y) { 
    aResultMap.remove(PROPNAME_UINAME); 
  }
  return aResultMap 
}

System.out.println + a logger available.

If you want to output information, then you should make sure that the information is available to be read. System.out.println() really isn’t doing that and a logger ought to be used instead. As it stands, then the class is actually using a logger, but on top of that there are still calls to System.out.println.

Cyclic dependency with XMLHelper

Cache uses XMLHelper to extract elements from XML – but XMLHelper uses some of the constants defined in Cache. While this seems to be a minor issue, the problem becomes evident when XMLHelper needs to help something else adapting the constants from that entity, which Cache then will be indirectly depending on. Thus starts a big ball of mud.

StringBuffer: Setting size, then using +

StringBuffer is used to append values in a way to circumvent String concatenation, which is costly due to Strings immutability. So if you are using StringBuffer, and you start defining the initial buffer size, then please don’t misuse the purpose for which StringBuffer was intended.

There is a probability that the compiler will spot the less than splendid blemish and help you out.

Correcting spelling mistakes, but apparently not all of them

It seems that the code – under the Libre Office banner – has been modified a handful of times, mostly due to spelling mistakes in the comments. This is not the biggest issue in the code base but if you are really adamant about the text quality, then please fix all the mistakes.

“How these entries will be readed or written can be switch in different modes.”

“superflous”

“They dont have any valid document service name set and cant be handled”

Unrolled loops

It puzzles me that the same code is repeated instead of added to a method. This kind of unrolling or in-lining can be seen in toHTML and other methods, e.g. convertFilterFlagNames2Values and convertFilterFlagValues2Names, which we shall dive into later.

We see something similar in the toXML method where there is a loop over the magic values 0 to 5, which actually corresponds to the enumerable eType, which means that adding a new type you will likely miss this loop termination.

java.util.Iterator aIt = m_lTypes.keySet().iterator();
while (aIt.hasNext())
{
    java.lang.String  sType = (java.lang.String)aIt.next();
    java.util.HashMap aType = (java.util.HashMap)m_lTypes.get(sType);

    sRelationView.append("<tr>");
    sRelationView.append("<td>"+sType+"</td>");

    java.lang.String sVal = (java.lang.String)aType.get(PROPNAME_DETECTSERVICE);
    if (sVal == null || sVal.length()<1)
        sRelationView.append("<td> - </td>");
    else
        sRelationView.append("<td>"+sVal+"</td>");

    sVal = (java.lang.String)aType.get(PROPNAME_PREFERREDFILTER);
    if (sVal == null || sVal.length()<1)
        sRelationView.append("<td> - </td>");
    else
        sRelationView.append("<td>"+sVal+"</td>");

    sVal = (java.lang.String)aType.get(PROPNAME_FRAMELOADER);
    if (sVal == null || sVal.length()<1)
        sRelationView.append("<td> - </td>");
    else
        sRelationView.append("<td>"+sVal+"</td>");

    sVal = (java.lang.String)aType.get(PROPNAME_CONTENTHANDLER);
    if (sVal == null || sVal.length()<1)
        sRelationView.append("<td> - </td>");
    else
        sRelationView.append("<td>"+sVal+"</td>");

    sRelationView.append("</tr>");
}

Could be replaced by something ugly but less verbose.

Iterator<String> aIt = m_lTypes.keySet().iterator();
while (aIt.hasNext())
{
    String  sType = aIt.next();
    Map aType = (Map)m_lTypes.get(sType);

    sRelationView.append("<tr>");
    sRelationView.append(tableData(sType));
    sRelationView.append(tableData((String) aType.get(PROPNAME_DETECTSERVICE)));
    sRelationView.append(tableData((String) aType.get(PROPNAME_PREFERREDFILTER)));
    sRelationView.append(tableData((String) aType.get(PROPNAME_FRAMELOADER)));
    sRelationView.append(tableData((String) aType.get(PROPNAME_CONTENTHANDLER)));

    sRelationView.append("</tr>");
}
private String tableData(String sVal) {
    StringBuffer sb = new StringBuffer();
    sb.append("<td>");
    if (sVal == null || sVal.length()<1)
        sb.append(" - ");
    else
        sb.append(sVal);
    sb.append("</td>");
    return sb.toString();
}

Hungarian misnomer

My personal belief is that Hungarian notation is misleading and verbose – for strongly typed languages – in the same manner that method comments can be misleading if not updated in unison with code changes.

The code has several misleading Hungarian notations, e.g.

/** list of all located types.
*  Format: [string,HashMap]
*/
private java.util.HashMap m_lTypes;

Maps should – according to http://msdn.microsoft.com/en-us/library/aa260976(v=vs.60).aspx – be mpXY prepended thus for the map at hand it would be:

private java.util.HashMap m_mpSMPTypes;

Whether the internal maps types should be a part I have no idea. I would rather have the Java notation correct, that is:
private Map<String, Map<String, Object>> types;

And if fixing another issue – namely that which is caused by the Dynamic entities – where the Map<String, Object> really ought to be its own entity, e.g. Type, then:

private Map<String, Type> types;

But if there is a necessity for Hungarian notation, then stick to it.

Missing set functionality

Often the programming language used has an implication on the solution provided, sometimes for the good, e.g. QuickSort is easy in a functional language, e.g. Haskell (see http://www.haskell.org/haskellwiki/Introduction#Quicksort_in_Haskell).

quicksort :: Ord< a => [a] -> [a]
quicksort [] = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
  where
    lesser  = filter (< p) xs
    greater = filter (>= p) xs

Sometimes, though, the implication is for the worse. Javas lack of native support for Sets and Maps make developers choose weird constructs instead. Such as the if-construct in validateTypes:

if (
    (!aType.containsKey(PROPNAME_MEDIATYPE      )) ||
    (!aType.containsKey(PROPNAME_PREFERRED      )) ||
    (!aType.containsKey(PROPNAME_CLIPBOARDFORMAT)) ||
    (!aType.containsKey(PROPNAME_DOCUMENTICONID )) ||
    (!aType.containsKey(PROPNAME_URLPATTERN     )) ||
    (!aType.containsKey(PROPNAME_EXTENSIONS     )) ||
    (!aType.containsKey(PROPNAME_UINAME         ))
)
{
    throw new java.lang.Exception("Type \""+sType
        + "\" does not contain all neccessary properties for a 6.0/6.Y format.");
}

The basic idea is that the type must have all the keys set. Naturally if the type was a real object and not a Map, the validation check should be on that entity.

If on the other hand the properties test were in a Set, e.g. TYPE_REQUIRED_KEYS, then:

StringBuffer missingKeys = new StringBuffer();
for (String required : TYPE_REQUIRED_KEYS) {
    if (!aFilter.containsKey(required)) {
        missingKeys.append(required).append(",");
    }
}
if (missingKeys.length() > 0) {
    throw new Exception("Type \"" + sType
    + "\" does not contain all necessary properties for a 6.0/6.Y format: "
    + missingKeys.toString());
}

Would provide the same check, but it would be guarded against extension, and the exception thrown hold more information.

Furthermore, the code would do one thing and one thing only: Check the validity. Not checking the validity and keeping track of what constitutes required keys.

Keeping current

It may seem like a weird requirement to keep current and update code which works just for the sake of being current.

This is a simple amortized development cost. Doing a little work today to keep the code current helps you to not face an impossible leap once the backwards compatibility is dropped. It keeps you going forward, and you will have a current understanding of the codebase. That is, you won’t have any weird blobs of code that only one programmer knows how to use. You alleviate some of the dependencies of people and libraries.

At another time scale this is comparable to having documents in hieroglyphs along with modern writing. If Shakespeare is difficult to understand, then perhaps the language has just moved along.

These are the reasons for keeping current, which means that todays’ codebase should be Java 7 compliant and at least Java 6 compliant.  Which again means that enums and generics should be used, as should the enhanced for-loop, which should kill off a lot of the iterators throughout the code.

Not conforming to standard Java code convention for packages

There lots of ways a Java project can be defined, but apparently coming out of Sun the least they could have done was to map to the natural layout of each package element matching a folder. This is not the case here, where the com.sun.star.filter.config.tools.utils package maps to the l10ntools/source/filter/utils folder.

For the entire Libre Office polyglot project there are even other weird twist and turns in which test code is on the source path. I haven’t seen that recently.

Sloppy copy-paste

The code you look at today tells you how you will write the code tomorrow. That is, if you are looking at ugly hacks today, then tomorrow you will make new ugly hacks.

If something looks like copy and paste, then that is the way you do it. If you’re not quite aware of what you are doing, then you’ll be sloppy and miss out on a few needed alterations. E.g.

if((field & FLAGVAL_COMBINED) == FLAGVAL_SUPPORTSSELECTION)
    lFlags.add(FLAGNAME_SUPPORTSSELECTION);

This check will always fail, as FLAGVAL_COMBINED doesn’t have any matching values with FLAGVAL_SUPPORTSSELECTION. It should probably have been:

if((field & FLAGVAL_SUPPORTSSELECTION) == FLAGVAL_SUPPORTSSELECTION)
    lFlags.add(FLAGNAME_SUPPORTSSELECTION);

Even so, this is only half of one of the related problems. The code resides in convertFilterFlagValues2Names, but there is a reverse mapping required in convertFilterFlagNames2Values. The problem though is that in that method there is no such check.

One solution would be to have the flags as an enum, and have the two bidirectional mappings work on the entirety of the enum scope. That would allow an addition to the enum to automatically cater for the bidirectional mapping without having to touch up everywhere.

public enum Flag {

    THIRDPARTYFILTER  (0x00080000, "3RDPARTYFILTER"),       // 524288
    ALIEN             (0x00000040, "ALIEN"),                // 64
    ASYNCHRON         (0x00004000, "ASYNCHRON"),            // 16384
    BROWSERPREFERRED  (0x00400000, "BROWSERPREFERRED"),     // 4194304
    CONSULTSERVICE    (0x00040000, "CONSULTSERVICE"),       // 262144
    DEFAULT           (0x00000100, "DEFAULT"),              // 256
    EXPORT            (0x00000002, "EXPORT"),               // 2
    IMPORT            (0x00000001, "IMPORT"),               // 1
    INTERNAL          (0x00000008, "INTERNAL"),             // 8
    NOTINCHOOSER      (0x00002000, "NOTINCHOOSER"),         // 8192
    NOTINFILEDIALOG   (0x00001000, "NOTINFILEDIALOG"),      // 4096
    NOTINSTALLED      (0x00020000, "NOTINSTALLED"),         // 131072
    OWN               (0x00000020, "OWN"),                  // 32
    PACKED            (0x00100000, "PACKED"),               // 1048576
    PREFERRED         (0x10000000, "PREFERRED"),            // 268435456
    READONLY          (0x00010000, "READONLY"),             // 65536
    TEMPLATE          (0x00000004, "TEMPLATE"),             // 4
    TEMPLATEPATH      (0x00000010, "TEMPLATEPATH"),         // 16
    USESOPTIONS       (0x00000080, "USESOPTIONS"),          // 128
    COMBINED          (0x00800000, "COMBINED"),             // 8388608
    SUPPORTSSELECTION (0x00000400, "SUPPORTSSELECTION");    // 1024

    private int bitpattern;
    private String name;

    private Flag(int bitpattern, String name) {
        this.bitpattern = bitpattern;
        this.name = name;
    }

    public static int toBitPattern(List<Flag> flags) {
        int val = 0;
        for (Flag f : flags) {
            val |= f.bitpattern;
        }
        return val;
    }
    public static List<Flag> toList(int bitpattern) {
        List<Flag> flags = new ArrayList<Flag>();
        for(Flag flag : Flag.values()) {
            if ((flag.bitpattern & bitpattern) == flag.bitpattern) {
                flags.add(flag);
            }
        }
        return flags;
    }
}

Most likely the name field isn’t needed anymore.

Conclusion

It does not matter whether code is closed source or open source, there is ugly and flawed in both worlds. Most likely there is something beautiful in both as well.

I just cannot accept the poor quality we seem to use as foundation for almost any profession. There is sloppiness an hacks all around. Expensive bugs and clunky solutions, which make everyone's daily digital life more frustrating than it possibly has to be.

You cannot improve on the foundation - the quality of a solution with dependencies is deteriorating like compound interest with a negative rate. If your dependencies accumulate to 90% the best you can hope for is to do 100% yourself, leaving the resulting solution at 0.9*1.0 = 90%. If - on the other hand - you are only delivering 90% quality yourself, the result will be 0.9*0.9 = 81%

Could I do it better? No, most likely not. I'm not claiming to write perfect code. What is frustrating though is that I can easily spot the massive pitfalls and the ugliness of solutions.

Given the short amount of time I've spent on this and the ease of which it was to find a problematic file my estimate is that there is too much poor quality code in the world and not enough people who cares. Just remember that the code used to handle your flight and luggage, audit your tax returns, ship your parts, handle your bank account, etc. is probably no better.

dotCMS short code inspection

onsdag, marts 6th, 2013

Under the impression that I was welcome to further comment on the dotCMS  source code (see my post Programming vs. Flow steps), I created a brief inspection.

While the source code under scrutiny is dotCMS I’m positive that similar issues can be found in most other source code repositories, thus whether or not you are affiliated with dotCMS you could probably benefit from reading the report (it is a 30 page PDF created with DocBook).

As I created some graphs which don’t fit readable onto a page I uploaded them to my Dropbox as well.

Commit buddies

The conclusion:

Pointing fingers is easy. I hope I have not offended anyone at least it was the intention to only show some of the things which are wrong with a software project. And most likely what is wrong with most software projects.

I didn’t touch upon the more stringent theoretical issues with cyclic dependencies – of which I’m sure there are plenty based on what we saw in Chapter 2, Changes. Nor did I go into functional points, cyclomatic complexity, length of methods, combinatorial signatures, and others such code quality issues, as I find these to be too specific for the code base and there was plenty of other – more general – issues to look at.

The cost of software defects (see sourceninja) is immense. Here I have provided a brief overview of some of the low hanging fruits for remediating some of the issues in dotCMS. These methods are free of charge and generally applicable.

If  the  numbers  are  correct,  then  I’ve  spent  a  few  days  identifying  14  issues  worth  approximately $200,000 – and there are plenty more. Please don’t tell me that you don’t have the resources to go over the code base.

By the way, if I read the infographic correctly, then it is a yearly cost per defect until fixed.

It  could  be  that  it  is  not  your  money  you  are  saving,  but  someone  furhter  down  the  dependency hierarchy.

You may think that I’m just too shallow in my examination, that I don’t know the intricate details which make up the foundation for the decisions made. You would probably be right. I consider myself an average person with average skills – but does that change the results I have provided? It was never my intention to find and fix all bugs.

I’m all for the hard working people out there providing services for all of us, some of the services are even free. If I am that upset with dotCMS I could simply chose something different. While I could chose something different my point is: I don’t think it would be a better choice – merely a different one with bugs, errors, and warts in other places.

I might not even know whether or not I’m making a choice for or against one product or another, as e.g. dotCMS could be the foundation upon which other services are built.

This is a call to arms, people. Stop accumulating technical debt. Help keep the World a clean and sane place. The price today is high, but it is lower than it will be tomorrow.

Programming vs. Flow steps

onsdag, februar 27th, 2013

Looking at source code from different developers – including myself – it has come to my attention that for some reason bright people tend to fall into the not as bright category when it comes to correct placement of the actions.

Programming is not simply appending steps to be performed to a Turing machines tape. That’s basically the entire idea behind programming language – not just high level programming languages. You can separate functional units and call these units from elsewhere. The execution of the program by the computer is a stream of actions, but that doesn’t mean that the source code has to be.

What I’ve seen seems to be developers eager to solve a problem. Having the solution in their head, which steps are needed, they type it into the class or source file, where they are, then move on to the next problem to be solved.

What we miss is the urge to stand back and look at the bigger picture. And in addition ask ourselves: “Is this right? Should these steps be here or somewhere else?”

We have the need for some functionality of an object, which it doesn’t yet provide. Sometimes the object is within our realm. Sometimes it is third party or language intrinsic. In any case we simply dump the ‘solution’ where it is needed with no regards to coupling or cohesion. This leads to promiscuous objects and parts of violations scattered over several different objects. In addition each objects internal complexity rises because it has to know of the internals of several other objects. We simply fail to adhere to the Single Responsibility Principle.

Usually this grows incrementally over time as new issues are brought up, e.g. the requirement to support a new database.

if (DbConnectionFactory.getDBType().equals(DbConnectionFactory.MSSQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < DATEADD("+ type.toString() +", "+ sign + "" + time +", GETDATE()) ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.MYSQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < DATE_ADD(NOW(), INTERVAL "+ sign + "" + time +" " + type.toString()+") ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.POSTGRESQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < NOW() "+ sign + " INTERVAL '"+ time +" " + type.toString()  +"' ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.ORACLE))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE CAST(time_entered AS TIMESTAMP) <  CAST(SYSTIMESTAMP "+ sign +
                             "  INTERVAL '"+time+"' "+ type.toString() + " AS TIMESTAMP)");

From com.dotcms.journal.business.ESDistributedJournalFactoryImpl.distReindexJournalCleanup()

When you go from one database to two or more, the easy way is to introduce an if-else clause around the differences, but this is wrong. It’s an extremely easy hack, but the technical debt is immense. The next maintenance developer coming along to add yet another database will simply adopt the poor decision and add an extra “else if”

Hopefully people have stopped doing the parallel for languages and are using i18n.

One of the weirder things is that apparently the check for database type is in such a great need that it has a specific implementation, which is used in the following snippet.

void fixNotNull() throws SQLException {
    // contentlet_version_info => working_inode & lang
    DotConnect dc=new DotConnect();
    if(DbConnectionFactory.isMsSql()) {
        // MS SQL
        dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN lang numeric(19,0) NOT NULL");
        dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN working_inode varchar(36) NOT NULL");
    }
    else if(DbConnectionFactory.isOracle()) {
        // ORACLE
        dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY (lang NOT NULL)");
        dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY (working_inode NOT NULL)");
    }
    else if(DbConnectionFactory.isMySql()) {
        // MySQL
        dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY lang int8 NOT NULL");
        dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY working_inode varchar(36) NOT NULL");
    }
    else if(DbConnectionFactory.isPostgres()) {
        // PostgreSQL
        dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN lang SET NOT NULL");
        dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN working_inode SET NOT NULL");
    }
}

From com.dotmarketing.startup.runonce.Task00840FixContentletVersionInfo

Using the isMsSql() we have:

And using MSSQL we have:

You may have spotted that actually 3 files contain both:

Which is a bit puzzling, perhaps a clean-up wasn’t seen all the way through. But I digress.

My point is that classes should adhere to OOP principles, information hiding, and clearly defined boundaries. A bean with getters and setters for all fields is not supposed to be used as a grouping of global variables. No matter how promiscuous an object seems - violation of its privacy, i.e. rape, is not allowed.

If the current object has a different responsibility, then delegate the work to another object. This way the class does what the name indicates.

20 questions

Sometimes the code indicates that the steps are in the wrong place. This is usually indicated by asking another object - often a parameter - 20 questions, or at least getting a lot of the intrinsic values, and then deciding which steps to take. The object being questioned will seem promiscuous and have getters and setters abound with little or no other methods, and a plethora of updates.

As an example of an indication of too much knowledge on the wrong side of the boundary we can look at com.dotmarketing.util.UtilHTML:

public static String getStatusIcons(Versionable v) throws DotStateException, DotDataException, DotSecurityException{
    StringBuffer buf = new StringBuffer();
    if(!v.isLive() && v.isWorking() && (!v.isArchived() && v.isWorking())){
        buf.append("<span class='workingIcon'></span>");
    }else{
        buf.append("<span class='greyDotIcon' style='opacity:.4'></span>");
    }
    if(v.isArchived()){
        buf.append("<span class='archivedIcon'></span>");
    }else if(APILocator.getVersionableAPI().hasLiveVersion(v)){
        buf.append("<span class='liveIcon'></span>");
    }else{
        buf.append("<span class='greyDotIcon' style='opacity:.4'></span>");
    }

    if(v.isLocked()){
        buf.append("<span class='lockIcon'></span>");
    }
    return buf.toString();
}

public static String getVersionStatusIcons(Versionable v) throws DotStateException, DotDataException, DotSecurityException{
    StringBuffer buf = new StringBuffer();
    if(v.isWorking()){
        buf.append("<span class='workingIcon'></span>");
    }
    if(v.isLive()){
        buf.append("<span class='liveIcon'></span>");
    }

    return buf.toString();
}

Looking at these two utility functions it becomes clear that:

  • The utility function knows more of the Versionable than is absolutely necessary, but apparently not enough. The 20 questions style: isWorking? isLive? is???
  • The difference between the functions getStatusIcons and getVersionStatusIcons not quite clear when one or the other should be used.
  • There is a bit of a discrepancy between which icons are shown given the same status. I.e. if your version isWorking and isLive, then getStatusIcons will give you greyDotIcon possibly followed by a liveIcon, whereas getVersionStatusIcons will give you workingIcon followed by liveIcon.
  • Boolean expressions are apparently illusive. I.e. the expression !v.isLive() && v.isWorking() && (!v.isArchived() && v.isWorking()) is equivalent to !v.isLive() && v.isWorking() && !v.isArchived() - unless there are timing issues with the isWorking() call, in which case it’s even worse to add the second check.

Utility classes are quite often prone to a too wide reach of functionality. There is a simple need of some sort of utility, creating a new class with just this single function seems too much work, and so the function is added to an existing class with a wishy-washy name. And thus starts another Big Ball of Mud

Why bashing on dotCMS?

It is not my intention to bash on dotCMS or open source. I’m simply using dotCMS as an example, I’m confident there are a lot of source codes out there in about the same state, some worse, and a few better.

The main reasons why I chose dotCMS:

  • The code is open source, which means it is available for anyone wanting to read it.
  • I didn’t write any of it. I’m not affiliated in any way.
  • The code seems like a fine example of Java code for average projects.
  • Looking at several CMS systems I came by it and scratched the surface a bit.

Fixing it?

As it is open source I could be helping rather than pointing, but no. I think my opinions are better left not fighting against hard working developers, and my work is better left fighting for my opinions.

I am tempted to analyse the code quality in greater detail though.

That said, the first step of my “solution” to the first issue above:

if (DbConnectionFactory.getDBType().equals(DbConnectionFactory.MSSQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < DATEADD("+ type.toString() +", "+ sign + "" + time +", GETDATE()) ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.MYSQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < DATE_ADD(NOW(), INTERVAL "+ sign + "" + time +" " + type.toString()+") ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.POSTGRESQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < NOW() "+ sign + " INTERVAL '"+ time +" " + type.toString()  +"' ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.ORACLE))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE CAST(time_entered AS TIMESTAMP) <  CAST(SYSTIMESTAMP "+ sign +
                             "  INTERVAL '"+time+"' "+ type.toString() + " AS TIMESTAMP)");

Would be to add a method:

String getReindexJournalCleanupSQL(String type, String sign, String time) {
    Properties sqlProperties = SQLProperties.get(DbConnectionFactory.getDBType());
    return sqlProperties.get("reindex-journal-cleanup")
                        .replaceAll("%time%", time)
                        .replaceAll("%sign%", sign)
                        .replaceAll("%type%", type);  
}

Then replace the first section with:

    reindexJournalCleanupSql.append(getReindexJournalCleanupSQL(type, sign, time));

Yes, this is a signature which falls under the category of having primitive obsession, but the entire idea is to fiddle about with strings and return a string.

Where SQLProperties is a Map<h;String, Properties> loaded up with the query strings with placeholders in. Thus you would have an entry MSSQL -> Properties where the 'reindex-journal-cleanup' key points to the following value:

"DELETE FROM dist_reindex_journal WHERE time_entered < DATEADD(%type%, %sign%%time%, GETDATE())"

Which means you have all the SQL for a single database type in a single property file, all you have to do to add a new database is add a new property file with the correct SQL commands for all the entries. That is hard work, but it beats having to add another else-if clause everywhere.

If this had been done from the beginning it would have been over-architected and thus a waste of time.

The second step would be to look at the greater picture.

If the method works, then possibly a more general solution is warranted, e.g.

String getSQL(String key, Map substitutions) {
    Properties sqlProperties = SQLProperties.get(DbConnectionFactory.getDBType());
    String template = sqlProperties.get(key);
    for(Map.Entry substitution : substitutions.entrySet()) {
        template = template.replaceAll(substitution.getKey(), substitution.getValue());
    }
    return template;
}

This solution has nothing to do with what the properties are called, how many replaceable entries there are, how these entries are separated in the property string - which naturally must be used in the same manner in the calling code, which is reworked into:

    Map substitutions = new HashMap();
    substitutions.put("%time%", time);
    substitutions.put("%sign%", sign);
    substitutions.put("%type%", type.toString());
    reindexJournalCleanupSql.append(getSQL("reindex-journal-cleanup", substitutions));

The third step: Reiterate, i.e. GO TO 2.

Conclusion

Programming is not the same as listing the flow steps as they would be executed. Anyone with time on their hands can go through the steps of what is required, e.g. Gary Larson’s "First pants, then shoes.". Recipes, TV schedules, and games all fall in the category of flow steps. Making the right decisions and grouping the elements is where it is harder to program than combining flow steps sequentially.

Basically, what I’m saying is: GO TO is still considered harmful, but delegation is not. And while we’re not accustomed to GO TO anymore, we still create the same haphazard styled source code.

It seems most programmers lack the discipline expected of any professional. This is probably due to the facts that:

  • The programs work somewhat as expected.
  • The source code is hidden from - or not understood by - management. Thus fast is better than correct. It is like robbing to solve an issue of lacking funds. It may solve the problem in theory or for a short period, but it’s not a sound way.
  • The big picture is hidden from view of the developers sprinting along to create a working solution within a time bound. Thus knowing what is right at the current time and when to switch is not possible.
  • Estimation of software development apparently lacks the idea of continuous improvement. The notion that you’ll know more about the solution when you’re done than you did in the beginning. That software is a continuous maintenance job, and if you accumulate technical debt you will suffer.
  • There are differences between inventing and creating new functionality and adding support for a wider audience.
  • Evolution is difficult. What was right yesterday may be wrong tomorrow.
  • Engineering practice parallels fail to account for reusability, yet somehow we’re measured using the same yardstick.

Big Ball of Mud with physics

mandag, februar 25th, 2013

Being a fan of Foote and Yoder’s Big Ball of Mud paper. Well, fan in the sense that I think it is a correct observation of most software developed, yet not liking that fact.

So I have been pondering on this – donning my amateur physicists hat I have contemplated some of the issues.

Tournaround

Let’s say that we have a team of developers pushing the project forward. Let’s call them Sisyphus. They have a development velocity, v. The ball of mud has a radius, r, which is comparable to the number of features in the product. The angular veloticy, ω, is proportional to the frequency of revolution, let’s denote this as the version number.

ω = v/r

That is, as the radius grows, then – given the velocity is constant – the angular velocity reduces, which in turn means that the versions will be harder to crank out.

Work

Work, W, is force, F, times the distance, s, which in turn is the velocity times the time passed.
The force is mass times acceleration, and the mass is density, ρ, times the volume, V.

W = F Δs
Δs = v t
F = m a = ρV a

The volume of a sphere is 4Ï€ r3/3

ball-work

Thus the work is proportional to the radius of the sphere cubed.

The interpretation

If something requires twice the number of features, then the work required will be 23 = 8 times greater or run at 1/8 the velocity.

If – on the other hand – this could be done in two disjoint balls of mud, it would only require twice the amount of work.

More generally, we could probably divide the features of the big ball of mud into two smaller balls having different radii, R and r.

The Volume of the original ball with radius R+r will be:

ball-volume

With the volumes of each of the lesser balls as:

balls-volume

Effectively being ball-bigger smaller than the larger ball. We are actually saving more than 6 times the volume of the smallest ball.

ball-excess

If we split the big ball into 80% and 20% – which is likely, then the combined volumes of those two smaller balls would save
ball-save
48% of the original volume.

If volume is comparable to complexity, then we will save complexity and development effort in keeping the balls as small as possible. Nothing new there.

biball

This is just another argument for low coupling and high cohesion. But it is also a reminder that we must look at our code from a different perspective from time to time. Are classes adhering to the Single Responsibility Principle?

Disjoint Communication

If the Big Ball of Mud is split into several smaller balls, they would probably need to communicate. This would have been some of the intrinsic complexity we shaved off.

Let us try to add it in the most stupid form possibly: A complete graph in which each and every ball will know and communicate with each and every other ball.

Renaming the radii having R as the radius of the original Bug Ball of mud, and divide it into n = R/r balls of approximately the same diameter, r. Then taking the added complexity into account we get n(n-1)/2 additional entanglement. Leaving us with:

ball-entangled

Which about the original volume divided by 2n.

As a complete graph is just inverting the big ball of mud, there is a minimum of n-1 connections for a completely connected graph. That is, the refactored ball of mud could – at best – save an additional factor of n.

Different distributions

What if the balls cannot be split using an equal distribution? Perhaps we should contemplate a more natural distribution such as a power law.

Doubling down

Dividing the original radius into n different sizes each half the length of the previous, but with twice as many balls.

Dividing into n=4 different sizes we get:

1/4 + 2/8 + 4/16 + 8/32 = 4/4 = 1

That is, we get 24-1 = 15 balls, the biggest of radius R/4.

The general formula for volume per size is now:

V(c,r) = 4/3 π c r3

Where c is the number of balls with the radius, r.

As the communication is no longer of equal size, we have to do some more work. The rectangle spanned by the two communicating parties radii seems a fitting proporsition.

We end up with the following formula for added complexity due to full communication needs:

ball-comm-sum

As R is a factor of all radii, we see that Vcomm is an area in R2, whereas the volume, V, is a volume in R3. This means that we shouldn’t be allowed to add them, but as we’re not concerned with this type of detail we add them anyway but notice that the effect is dependent upon the initial radius, R.

Using the relative complexity as the yardstick, we can plot the effect of R and n. Remembering that we’re getting 3 balls for n=2, 7 balls for n=3, and 15 balls for n=4.

From the graph it looks as though the initial radius doesn’t matter as much.

Furthermore, it seems that the best value would be to split into 2 or 3 different sized smaller balls. Going beyond seems futile as the gain from splitting the largest ball left would probably be bigger – given that the initial radius doesn’t matter as much.

Without full communication needs the relative complexity would be even lower.
doubledown

Conclusion

While it is human nature to hoard development into a single monolithic big ball of mud. It does pay off to break down the monolith to a certain degree. Whether for sanity, savings, or speed.

The notion of equal distribution – which corresponds to disection into primitive obsession – is luckily reverted by the doubling down distribution. Which is likely to be a more natural division of elements.

It does seem counterintuitive to work at a constant pace and still grind to a halt. Yet this seems to be a logical conclusion of the above calculations. If not to a complete standstill, then at least turning so slowly that it seems like a complete stop.

I didn’t consider friction in the above calculations. Friction would impact on the complete forces, and thus on the work – but not in a positive way.

The question now is: Will a Big Ball of Mud collapse under its own weight if not cared for?

Strongly typed + primitive obsession = poorly typed

torsdag, februar 21st, 2013

Due to the rather large amount of work required to create an object in Java there is a shortcut which leads to primitive obsession: Use the best existing match – for web services a String seems sufficiently applicable for any type, which makes a Map<String, String> or Properties the default data structure for transferable objects.

Property objects easily extended – you simply add new keys – and in that sense they are much handier than Beans, but they aren’t objects in the sense that an object hides information – which is quite difficult if being a data transfer object – but they don’t have methods which works upon the intrinsic state of the objects. Not having methods on the objects leads to procedural programming, which defeats the purpose of objects entirely.

Claiming Java is a strongly typed language and then defaulting to using Strings is really an oxymoron and detrimental to the quality of anything developed in such a language.

I’m all for rapid development and in a tight spot or early on in exploratory development primitives can be used, but they should be replaced by meaningful types as soon as possible. Ages aren’t measured in Integer, money isn’t Double, and I’m sure names belong to a subset of all the possible String values.

While I’m bashing Java – and in particular Java developers – then this is valid for most C-styled languages and developers in those languages. I’m quite puzzled as to why newish strongly typed languages didn’t look at Pascal’s type declaration – it is extremely simple to define or simply alias an existing type. It seems Scala did pick up on that and added the type keyword, which is an easy way to alias another type. E.g. for prototyping you can define Name as a type, which underneath is a String. You can still make type errors by using a declared String as Name or Name as String, but if you define the methods and functions as taking Name as parameter or returning Name, then you should spot some issues in the prototyping phase, and when you exchange the alias for a real type, you should no longer be able to make the type errors. While this is ugly it is far superior to the String hell of Java. NB: This is not the way type was intended to be used.

You should be able to spot the issues by reading code. Sometimes it helps reading it out aloud.

As an example let’s take Martin Fowler’s EventCollaboration example:

class Trader...
        public void PlaceOrder(string symbol, int volume) {
            StockExchange exchange = ServiceLocator.StockExchangeFor(symbol);
            Order order = new Order(symbol, volume, this);
            exchange.SubmitOrder(order);
        }

Transcribing what I was reading out aloud:

  • We have a Trader class in which we can place an order. Placing and order requires a string called symbol and an integer volume – presumably of the things identified by the symbol string.
  • To process the placing of the order we locate the StockExchange, which we will refer to internally as exchange, through a ServiceLocator service using the symbol as argument.
  • Proceeding we create an Order, which we will refer to as order, using the parameters given and the Trader object itself.
  • Finally we submit the order to the exchange.

While that seems like a reasonable process for placing an order only the last step is actually placing the order.

One part is getting the StockExchange on which to place the order, another is actually building the order. This leads to the fact that Trader will have to know about the internals of Order – this may be relevant, then again as Order must know about Trader we have introduced a dependency cycle between the two objects.

Furthermore, if symbol was ever to be confined to some meaningful subset of the String hell, then we would have to modify at least 3 classes: Order, Trader, and ServiceLocator.

While the example goes on changing the method to cater for EventCollaboration:

class Trader...
        public void PlaceOrder(string stock, int volume) {
            Order order = new Order(stock, volume);
            outstandingOrders.Add(order);
            MessageBus.PublishOrderPlacement(order);
        }

A nicer and more consistent way would be to replace the previous method by:

public void place(Order order, StockExchange exchange) { 
    exchange.SubmitOrder(order);
}

Naturally this leads to the discussion of where the remaining steps should go and whether Order should know about Trader or not. My point is that Order should not be created in the flow of PlaceOrder it should be given as a parameter. And giving it as a strongly typed parameter you can remove the type from the method name. Trader.place(Order) reads much better than Trader.placeOrder(Order) – who knows maybe you’ll find a recurring pattern which could lead to an interface and from there to insights.

Why pick on Fowler?

Well, there are several reasons:

  • The code is available. That is, it exists for another purpose than mine. If I wrote example code it would not prove anything except that I couldn’t program.
  • Martin Fowler is known for his books Refactoring: Improving the Design of Existing Code and Patterns of Enterprise Application Architecture – the latter which is used in academia and from which the code in the examples are from – well, the companion web-site.
  • Wanting to “improve the design of existing code” should not leave you stranded with poor code to start with
  • If teaching material is poor, the experience will suffer
  • My guess is that Martin Fowler is a better than average programmer – presumably quite a lot better than average. Showing that his code is not perfect is simply an indication of the state of software in general
  • If you focus on one thing you may lose sight of other elements. In the EventCollaboration example the focus is on event collaboration and not on cyclic dependencies

Back on track

Going back to the introduction I mentioned that ages aren’t measured in Integer, money isn’t Double, and I’m sure names belong to a subset of all the possible String values. Let’s dive into these in greater detail.

Age

The standard 32 bit Integer range is -2,147,483,648 to 2,147,483,647 – if age is measured in years, and we’re talking about humans, then an unsigned 8-bit (0-255) seems quite the fitting range. I’ve never heard of humans being more than 150 years old – and not even that. Most certainly none have been recognized as being -7 years old.

If the age is to be measured in seconds since the start of the epoch (Jan 1st, 1970 00:00:00 UTC), then either it’s not Age but rather a time in space, or we’re into an uncertainty of approximately 43,200 seconds (half a day) – at least I have no idea of when I was born. In either case 32-bit is off. The range of 4 billion seconds is about 136 years. That is, we can only go back to 1901 and forward to 2038, which isn’t suitable for all occasions.

Money

Money consists of an amount of a certain denomination. Most – but not all – currencies have some sort of 1/100 of the denomination, and for percentages, taxation, and currency exchange we often have to work with numbers 1/100 of that, but we shouldn’t go beyond those digits, and we should not accept scientific notation, e.g. $1E2. NaN and Math.PI don’t seem fitting either. Numbers have certain operation, which can be performed upon them, e.g. addition and multiplication. You can’t add $1 and €1 in a meaningful way – at least not without an exchange rate, and you cannot multiply them.

That should leave sufficient arguments not to use floating points without going into details of What Every Computer Scientist Should Know About Floating-Point Arithmetic

Names

I know I’m a bit cynical when saying that nobody is called:

I'm a little teapot,
Short and stout,
Here is my handle, 
Here is my spout,
When I get all steamed up,
Hear me shout,
Tip me over and pour me out!

source

Nor

Robert'); DROP TABLE

XKCD’s Bobby Tables

Why the obsession?

Well it is about minimizing required work. If you have more than one method, which will take an argument of a given type, e.g. age, then you should check the validity of the input for each and every method. Binding the check to where the argument is instantiated, you will know that the argument conforms to the anticipated accepted values everywhere else. In essence you are white listing your input.

If you have a range with start and end, and these two endpoints aren’t connected and the bounds checked, i.e. that start comes before end, but blindly passed on, then you’d have to check over and over again. Possibly introducing an off by one error along the way.

It’s the mathematical laziness, one abstract solution, as opposed to the busy-work mindlessness.

So while primitive obsession is problematic, primitive obsession in a strongly typed language is extremely detrimental. The arguments for a strong type check is defeated by the poor choices made basically removing the foundation for the trust in the types. A dynamically typed programming language would be better, i.e. not as bad in this case – not that you would be better off making mistakes in those languages.

Software rant

tirsdag, februar 19th, 2013

Something is rotten in the state of Denmark. Well – to be sure, it’s not just Denmark it is perhaps in the state of the art of programming.

MVC frameworks such as Rails, Grails, and Play! all fall prey to the fallacy of developing to please if not the compiler, then the framework. Don’t get me wrong, I’m not against the MVC pattern, I’m not against rapid development. But the way that the frameworks separate the entities is just plain wrong. At least according to High Cohesion/Low Coupling. It seems we’ve ended up in the low cohesion/indirect high coupling quadrant.

Models aren’t interchangeable, they do not adhere to the same higher ordered abstraction and should not be packaged together.

In the case where you want to reuse some models of a prior project, the separation of entities in that project will not carry over any easy reuse.

Instead of:

Models/
      + A
      + B
Views/
     + A
     + B
Controllers/
          + A
          + B

It seems it would be prudent to have:

A/
 + Model
 + View
 + Controller
B/
 + Model
 + View
 + Controller

And if the entities A and B are independent services, then by all means they should reside in separate modules to heighten the potential of reuse.

Another benefit of this way of packaging is the immediate overview of the project. You open the project, and see that there are 2 entities: A and B. Not that the project is an MVC project.

This is not to say that MVC frameworks are evil and should be shied away from – there are other evils lurking around in other areas, especially in the Java world with annotations, which on their own can be useful, but through 3rd party libraries they form an extreme vendor lock-in.

The other part is the problem of programmers – apparently on any level – forgetting the scientific evidence, e.g. SOLID principles.

Single Responsibility means one thing only. Perhaps this is where Object Oriented Principles breaks down and you have to adopt some functional programming style into the solution.

For lack of better ideas let’s examine Martin Fowler‘s EventSourcing example. By the way, I’m all for Event Sourcing – it should just be done right. Sadly I think it is taught wrong. I’m not saying that it won’t work; I’m simply saying it could – and most likely should – be done differently. This is not particularly to point fingers at Martin Fowler, it’s to illustrate that we all do less than optimal things. Focus is directed elsewhere. The real point is to raise awareness and in the hope that we don’t end up with big balls of mud.

class EventProcessor...
  IList log = new ArrayList();  
  public void Process(DomainEvent e) {
    e.Process();
    log.Add(e);
  }

class DomainEvent...
  DateTime _recorded, _occurred;
  internal DomainEvent (DateTime occurred) {
    this._occurred = occurred;
    this._recorded = DateTime.Now;
  }
  abstract internal void Process();

class DepartureEvent...
    Port _port;
    Ship _ship;
    internal Port Port   {get { return _port; }}
    internal Ship Ship   {get { return _ship; }}
    internal DepartureEvent(DateTime time, Port port, Ship ship) : base (time)   {
      this._port = port;
      this._ship = ship;
    }
    internal override void Process() {
      Ship.HandleDeparture(this);
    }

class Ship...
  public Port Port;
  public void HandleDeparture(DepartureEvent ev) {
    Port = Port.AT_SEA;
  }

Before the addition of Events the Ship knew about the Port, and most likely the Port knew about the Ships, which is cyclic dependency in itself. With the addition of Events the Ship is now modified – which is a violation of the Open/Closed principle, what’s more: The newly added Events will have to know of Ship and Port – this is fine – but the existing objects will have to know about the new objects, thereby entangling the cyclic dependencies even further.

I do understand the idea behind the Ship knowing which Port if any it is at, an optimized double-link as opposed to examine all the ports and in worst case not find the Ship. But the Port is not part of what makes up a Ship. The solution would be to have a ShipLocationService which ties the Port ID and the Ship ID – basically the Event Sourcing events. Thus a simple query with the known Ship ID should – in O(1) provide the location of the Ship in question. As I see it, then that solution is far superior to the one shown.

Second, if the Ship and Port classes aren’t modified, we should have greater confidence in an update, as the prior solution is untouched, and the addition can be rolled out and back independently.

Furthermore, as the classes for Ship and Port aren’t touched, they can be reused for other services, without the need to update those services as a cascade of changes. I’m not saying they should, only that they could. And I’m not saying that there never will be a need to modify an object definition only that it has to be true to the single responsibility principle.

Naturally for a one-off solution this is overkill, but evidence shows that one-off solution rarely exists, that maintenance is a heavy burden – around 1/3 of the total spent on software. (source)

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” C.A.R.Hoare

The third part is programming languages. Each and every programming language has flaws in the sense that they force a specific way of doing things. And being habitual people developers assimilate the thought patterns laid out by the language. In this I’m disagreeing with Linus Thorvalds – programming languages actually have a profound impact on the solutions. But the disagreement only goes so far, as I’m agreeing with his “Bad programmes worry about the code. Good programmers worry about data structures and their relationships” E.g. Java without the static Set initializations makes programmers use other data structures.

Java’s requirement to create the folder structure matching the package structure, setting up a file with the correct name, and then naming a method in which the implementation steps for a given solution can reside is detrimental for the progress of development, as there is too much pomp and circumstances – or grinding – associated with this, which has nothing to do with solving a problem.

Thus the programmer can either be careless and get things done with a lot of technical debt – or be meticulous and slow down progress.

It is extremely difficult to convince stakeholders that their working software solution ought to be modified – especially given the track record of software projects (source). I must say that I’m not happy with the statistics and actually quite upset with the comparison of iterative, agile, and lean with the traditional method. To me it seems like the boiling frog anecdote (source).

With a 2/3 average success rate – at best, and probably more likely 1/2 – then the odds of having a continued successful software product after a few versions is diminishing quite rapidly: (2/3)^version. Naturally the Binomial distribution could be used if accepting some failures along the way, but then it would not be continuously successful.

In other words, if a change is to be reasonable, then it should increase the expected value of the product by at least 50% and prudently more than double the value, not counting the cost of development.

continued-success-rate-of-versions

If we can improve on the average success rate, then we can accept less valuable commitments without too much worry. With 90% success rate we can accept 11% increase in value per version, and run 6 versions with an expected continued success above 50% for the last version. Currently we should only accept one version under these criteria. As Dan North says: “Software is a liability.”

Java Application Architecture

tirsdag, oktober 2nd, 2012

Kirk Knoernschild’s book, Java Application Architecture: Modularity Patterns with Examples Using OSGi, ought to be made superfluous, but as it stands it is direly needed.

At least the first two parts: “The Case for Modularity” and “The Patterns” should be known by heart by any Java developer. It should be second nature for disciplined and professional developers.

In “The Case for Modularity” Kirk goes over the definition of what a module is, whether architecture is a phase carved in stone in the beginning of a project, or whether it is pervasive and should be attended to all the time.

How growing complexity must be countered by disciplined maintenance.

Why Technical Debt and Design Rot must be cleared to keep the source code from succumbing to the development death spiral.

Why cyclic dependencies must be rooted at all costs.

And what the benefits of modular structures are.

While you may think that a given piece of software is not ripe for modular extraction, because it will never be replaced or reused elsewhere. This is not the right attitude, as the first or second use of the source code is the test cases preferably developed in sync with the source code, either in a TDD or BDD way. It is simply easier to unit test a unit, no matter how big that unit is.

Finally – just to prove that this is not hot air preached from the pulpit – Kirk refactors a small application in 7 steps each a tiny isolated step improving on the overall structure. This is how software should evolve once it has fallen prey to design rot.

The second part of the book, The Patterns, describes 18 patterns in 5 chapters, which Kirk has made available here:

The Base Patterns:

  • Manage Relationships
  • Module Reuse
  • Cohesive Modules

Dependency Patterns:

  • Acyclic Relationships
  • Levelize Modules
  • Physical Layers
  • Container Independence
  • Independent Deployment

Usability Patterns:

  • Published Interface
  • External Configuration
  • Default Implementation
  • Module Façade

Extensibility Patterns:

  • Abstract Module
  • Implementation Factory
  • Separate Abstractions

Utility Patterns:

  • Colocate Exceptions
  • Levelize Build
  • Test Module

The third part concludes with examples in OSGi, covering Java, Scala, and Groovy implementations.

The book is a must read and must understand for (Java) developers and architects.

I highly recommend the book for anyone interested in modularity, especially developers looking at an existing code base resembling a big ball of mud who wants to disentangle it.