Archive for marts, 2013

Trust has failed – time for control and assurance.

mandag, marts 18th, 2013

The general success rate of any software development project is at best 68% – just about 2 for 3, or Two Sigma – and we’re expecting reliability of services to be in the Nine nines or Six Sigma, then perhaps we should do all we can to at least get a single nine or improve onto the 3 Sigma.

Why is this an issue?

Primarily this is an issue because computing is pervasive and software is ubiquitous, interconnected, and in the hands of common people – not just experts. This means that the software solutions provide services for anything, and if mistakenly trusted – that is, trusted to provide the service without hiccups but failing to do so – then at best it is a nuisance to the user, at worst it is catastrophic.

While Apple Maps have some fun pictures – being misled into the Australian outback is probably not a minor nuisance. Putting drivers on the runway is not only dangerous for the driver or the people on board an airplane, but anyone in the vicinity.

Common sense should prevail – Darwin awards and whatnot.

Well, that would be an argument, but if you have to double or triple check all information, then the smart devices aren’t really helpful in anything but increased bewilderment. If all resources point to the same mistake, then any number of double check will fail. If you combine services, e.g. Siri + Apple map + Google self-driving car, then you are betting against the combined odds.

The common sense should be to trust the services provided. That the services are added to be helpful and not in the challenging sense. I know gamification is a hot topic the stakes should just not be the life of people. At the very least that is detrimental to most sales.

Professionalism

If any other business would have success rates of 2/3 – I’d be surprised if they would stay in business for long. I’m pretty sure I would get aboard an airplane if the success rate of getting to the right destination was only 2/3 – I’m not saying “on time”.

Some surgeons are trying to improve on their success rate targeting 97-99% instead of the 90-95% range they currently reside in – and we can’t even reach the 90% level.

Money

It seems funny that we have audit requirements – at least in Denmark – regarding the financial state of a company. It seems natural that the pervasive entity known as money is managed and controlled, checked and audited. The flows accounted for. But in the end it is only money, it is less pervasive than the elusive software.

I bought my computer with money some years ago, that is, I exercised my financial resources at that time for the specific purpose. Every day I turn on the computer, thereby exercising the operating system and various other software entities.

Money: Once. Hardware: Every day. Software: Every day

It wouldn’t seem like an unfair comparison to expect the same engineering effort and dedication to go into both software and hardware, but my hardware hasn’t failed yet, whereas my software has failed often and sometimes unbeknownst to me.

You might not see it, but that doesn’t mean it is not there.

Waste

Getting my hair cut my hairdresser wanted some profile information for their cash register/customer management system. As I had been there before my ID should recall all the information, but it didn’t. If they are spending 5 minutes per day on mistakes or circumventing oddities in flow dynamics of software, then they are wasting 20 hours per year, which naturally the customer will have to pay for in the end.

240 days at 5 minutes/day = 1200 minutes = 20 hours waste

I would rather pay the same and have the business blooming than know that the more software they add which isn’t really working, the more I have to pay to keep them in business.

As it turns out – they are using other software besides the CMS – at least 3 other programs. If they are at the same level of waste, then every year the hairdresser is wasting 80 hours approximately half a month or 4%.

No matter what you have as an acceptable wastage rate, starting out at 4% requires seems unacceptable to me.

4% correlates to the process fallout for Sigma level 2 in process capability index – we really should strive for a better world going for at least Sigma level 3 for starters.

Possible Solution

The other day my Windows 7 installed 16,000+ updates. I was baffled. I don’t think we can get to perfect, but we can strive towards it, we can do better.

If a financial audit exists to add credibility to whatever management does, then we should add a source code audit to add credibility to the software product for the benefit of the end user.

And the source code audit should be obligatory.

A certificate is insufficient to solve any of the issues.

Quality of (once) closed source code

mandag, marts 11th, 2013

Looking at one of the larger files in Libre Office:

com.sun.star.filter.config.tools.utils.Cache

This file has been removed in the above repository – and for good reasons.

I guess that the file  was once a closed source file as the com.sun package name suggests, and I found it only fair to expose some of the defective code for a closed source after writing about dotCMS, because – as I stated then: The problem is in both worlds.

Opening the Libre Office repository this Friday, spending some time to get statistics to suggest looking at a specific group of source files I found this. I must say it is immediate ugly when you just browse through the code. But once you really start to examine it you would have to be a bit masochistic to work on it.

The code is really horrible on several levels:

  • Coupling implementation types instead of interfaces, or Interface vs. specific class
  • Primitive Obsession
  • Non-enumerable enumerable
  • Magic Strings and numbers
  • Dynamic objects of known entities
  • Modification of parameters and poor understanding of Map
  • System.out.println + a logger available.
  • Cyclic dependency with XMLHelper
  • StringBuffer: Setting size, then using +
  • Correcting spelling mistakes, but apparently not all of them
  • Unrolled loops
  • Hungarian misnomer
  • Missing set functionality
  • Keeping current
  • Not conforming to standard Java code convention for packages
  • Sloppy copy-paste

Coupling implementation types instead of interfaces

Using HashMap as the field reference type is considered bad as it exposes too much specific knowledge of the object. As the variables are simply used as maps, there is no reason why they can’t be defined as Map.

That said the Maps shouldn’t really be raw types. Trying to change this reveals another issue: Dynamic Objects of know entities.

Primitive Obsession

No specific modeling is applied to the objects in the solution even though a lot of specific knowledge is assumed. This is bad as assumptions rarely hold up.

Non-enumerable enumerable

Flags, E_types, and CFG are specific enumerable types, but they are defined specifically, which usually is a disaster waiting to happen and unnecessarily hampers development.

Magic Strings and numbers

The code is littered with magic strings and numbers – even though there are constant definitions as well. This often leads to undesirable issues as a single type can go undetected.

Dynamic objects of known entities

This is a combination of the raw types mentioned under coupling and primitive obsession. The maps are used to store entities defined by the PROPS constants.

<sarcasm>If only there was a way in which one could  define an object of known entities – oh wait, that’s one of the benefits of a programming language.</sarcasm>

E.g. m_lTypes should be a Map<String, Map<String, Object>> if not sticking to the raw type. The keys to the value map are defined by:

Key

Value type

PROPNAME_CLIPBOARDFORMAT

String

PROPNAME_CONTENTHANDLER

String

PROPNAME_DETECTSERVICE

String

PROPNAME_DOCUMENTICONID

Integer

PROPNAME_EXTENSIONS

Vector<String>

PROPNAME_FRAMELOADER

String

PROPNAME_MEDIATYPE

String

PROPNAME_PREFERRED

Boolean

PROPNAME_PREFERREDFILTER

String

PROPNAME_UINAME

Map<String, String>

PROPNAME_UIORDER

Integer

PROPNAME_URLPATTERN

Vector<String>

m_lFilters is – on the surface – of the same type: Map<String, Map<String, Object>>, but the keys and types of the value map are defined as follows:

Key

Value type

PROPNAME_DOCUMENTSERVICE

String

PROPNAME_FILEFORMATVERSION

Integer

PROPNAME_FILTERSERVICE

String

PROPNAME_FLAGS

Integer

PROPNAME_ORDER

Integer

PROPNAME_TEMPLATENAME

String

PROPNAME_TYPE

String

PROPNAME_UICOMPONENT

String

PROPNAME_UINAME

Map<String, String>

PROPNAME_USERDATA

Vector<String>

m_lDetectServices, m_lFrameLoaders, and m_lContentHandlers are all of the type:

Map<String, Map<String, Vector<String>>>

And they seem to only really use PROPNAME_TYPES as the key to the value map.

Finally, m_lFilterToTypeRegistrations is on the following form:

Map<String, Vector<String>>

Besides that, it is not used as a field but as a local variable.

Modification of parameters

In the method convertDetectServicePropsToExternal and others have issues with side effects on the parameters. Whether this is to be a flaw of Java’s implementation of Map or the programmers lacking understanding is difficult to say.

private static java.util.HashMap convertDetectServicePropsToExternal(java.util.HashMap aMap   ,
                                                                     int               nFormat)
    throws java.lang.Exception
{
    java.util.HashMap aResultMap = null;
    switch(nFormat)
    {
        //-----------------------------------
        case FORMAT_60 :
        {
            // no changes!
            aResultMap = aMap;
        }
        break;

        //-----------------------------------
        case FORMAT_6Y :
        {
            // remove localized name
            aResultMap = aMap;
            aResultMap.remove(PROPNAME_UINAME);
        }
        break;

        //-----------------------------------
        default :
            throw new java.lang.Exception("unknown format");
    }
    return aResultMap;
}

Probably the intention was to return a different instance with the same basic values as the given map. The problem is that the assignment, aResultMap = aMap, makes the pointer for either point to the same data structure in memory, thus the call to remove removes the key and value from whatever the pointer is pointing to, thereby removing it from the original as well. What’s worse is that the pointer is returned.

The solution: Use putAll()

private static Map convertDetectServicePropsToExternal(Map aMap, int nFormat){
  if (!(nFormat == FORMAT_60 || nFormat == FORMAT_6Y)) {
    throw new IllegalArgumentException("unknown format: " + nFormat);
  }
  Map aResultMap = new HashMap();
  aResultMap.putAll(aMap);

  if (nFormat == FORMAT_6Y) { 
    aResultMap.remove(PROPNAME_UINAME); 
  }
  return aResultMap 
}

System.out.println + a logger available.

If you want to output information, then you should make sure that the information is available to be read. System.out.println() really isn’t doing that and a logger ought to be used instead. As it stands, then the class is actually using a logger, but on top of that there are still calls to System.out.println.

Cyclic dependency with XMLHelper

Cache uses XMLHelper to extract elements from XML – but XMLHelper uses some of the constants defined in Cache. While this seems to be a minor issue, the problem becomes evident when XMLHelper needs to help something else adapting the constants from that entity, which Cache then will be indirectly depending on. Thus starts a big ball of mud.

StringBuffer: Setting size, then using +

StringBuffer is used to append values in a way to circumvent String concatenation, which is costly due to Strings immutability. So if you are using StringBuffer, and you start defining the initial buffer size, then please don’t misuse the purpose for which StringBuffer was intended.

There is a probability that the compiler will spot the less than splendid blemish and help you out.

Correcting spelling mistakes, but apparently not all of them

It seems that the code – under the Libre Office banner – has been modified a handful of times, mostly due to spelling mistakes in the comments. This is not the biggest issue in the code base but if you are really adamant about the text quality, then please fix all the mistakes.

“How these entries will be readed or written can be switch in different modes.”

“superflous”

“They dont have any valid document service name set and cant be handled”

Unrolled loops

It puzzles me that the same code is repeated instead of added to a method. This kind of unrolling or in-lining can be seen in toHTML and other methods, e.g. convertFilterFlagNames2Values and convertFilterFlagValues2Names, which we shall dive into later.

We see something similar in the toXML method where there is a loop over the magic values 0 to 5, which actually corresponds to the enumerable eType, which means that adding a new type you will likely miss this loop termination.

java.util.Iterator aIt = m_lTypes.keySet().iterator();
while (aIt.hasNext())
{
    java.lang.String  sType = (java.lang.String)aIt.next();
    java.util.HashMap aType = (java.util.HashMap)m_lTypes.get(sType);

    sRelationView.append("<tr>");
    sRelationView.append("<td>"+sType+"</td>");

    java.lang.String sVal = (java.lang.String)aType.get(PROPNAME_DETECTSERVICE);
    if (sVal == null || sVal.length()<1)
        sRelationView.append("<td> - </td>");
    else
        sRelationView.append("<td>"+sVal+"</td>");

    sVal = (java.lang.String)aType.get(PROPNAME_PREFERREDFILTER);
    if (sVal == null || sVal.length()<1)
        sRelationView.append("<td> - </td>");
    else
        sRelationView.append("<td>"+sVal+"</td>");

    sVal = (java.lang.String)aType.get(PROPNAME_FRAMELOADER);
    if (sVal == null || sVal.length()<1)
        sRelationView.append("<td> - </td>");
    else
        sRelationView.append("<td>"+sVal+"</td>");

    sVal = (java.lang.String)aType.get(PROPNAME_CONTENTHANDLER);
    if (sVal == null || sVal.length()<1)
        sRelationView.append("<td> - </td>");
    else
        sRelationView.append("<td>"+sVal+"</td>");

    sRelationView.append("</tr>");
}

Could be replaced by something ugly but less verbose.

Iterator<String> aIt = m_lTypes.keySet().iterator();
while (aIt.hasNext())
{
    String  sType = aIt.next();
    Map aType = (Map)m_lTypes.get(sType);

    sRelationView.append("<tr>");
    sRelationView.append(tableData(sType));
    sRelationView.append(tableData((String) aType.get(PROPNAME_DETECTSERVICE)));
    sRelationView.append(tableData((String) aType.get(PROPNAME_PREFERREDFILTER)));
    sRelationView.append(tableData((String) aType.get(PROPNAME_FRAMELOADER)));
    sRelationView.append(tableData((String) aType.get(PROPNAME_CONTENTHANDLER)));

    sRelationView.append("</tr>");
}
private String tableData(String sVal) {
    StringBuffer sb = new StringBuffer();
    sb.append("<td>");
    if (sVal == null || sVal.length()<1)
        sb.append(" - ");
    else
        sb.append(sVal);
    sb.append("</td>");
    return sb.toString();
}

Hungarian misnomer

My personal belief is that Hungarian notation is misleading and verbose – for strongly typed languages – in the same manner that method comments can be misleading if not updated in unison with code changes.

The code has several misleading Hungarian notations, e.g.

/** list of all located types.
*  Format: [string,HashMap]
*/
private java.util.HashMap m_lTypes;

Maps should – according to http://msdn.microsoft.com/en-us/library/aa260976(v=vs.60).aspx – be mpXY prepended thus for the map at hand it would be:

private java.util.HashMap m_mpSMPTypes;

Whether the internal maps types should be a part I have no idea. I would rather have the Java notation correct, that is:
private Map<String, Map<String, Object>> types;

And if fixing another issue – namely that which is caused by the Dynamic entities – where the Map<String, Object> really ought to be its own entity, e.g. Type, then:

private Map<String, Type> types;

But if there is a necessity for Hungarian notation, then stick to it.

Missing set functionality

Often the programming language used has an implication on the solution provided, sometimes for the good, e.g. QuickSort is easy in a functional language, e.g. Haskell (see http://www.haskell.org/haskellwiki/Introduction#Quicksort_in_Haskell).

quicksort :: Ord< a => [a] -> [a]
quicksort [] = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
  where
    lesser  = filter (< p) xs
    greater = filter (>= p) xs

Sometimes, though, the implication is for the worse. Javas lack of native support for Sets and Maps make developers choose weird constructs instead. Such as the if-construct in validateTypes:

if (
    (!aType.containsKey(PROPNAME_MEDIATYPE      )) ||
    (!aType.containsKey(PROPNAME_PREFERRED      )) ||
    (!aType.containsKey(PROPNAME_CLIPBOARDFORMAT)) ||
    (!aType.containsKey(PROPNAME_DOCUMENTICONID )) ||
    (!aType.containsKey(PROPNAME_URLPATTERN     )) ||
    (!aType.containsKey(PROPNAME_EXTENSIONS     )) ||
    (!aType.containsKey(PROPNAME_UINAME         ))
)
{
    throw new java.lang.Exception("Type \""+sType
        + "\" does not contain all neccessary properties for a 6.0/6.Y format.");
}

The basic idea is that the type must have all the keys set. Naturally if the type was a real object and not a Map, the validation check should be on that entity.

If on the other hand the properties test were in a Set, e.g. TYPE_REQUIRED_KEYS, then:

StringBuffer missingKeys = new StringBuffer();
for (String required : TYPE_REQUIRED_KEYS) {
    if (!aFilter.containsKey(required)) {
        missingKeys.append(required).append(",");
    }
}
if (missingKeys.length() > 0) {
    throw new Exception("Type \"" + sType
    + "\" does not contain all necessary properties for a 6.0/6.Y format: "
    + missingKeys.toString());
}

Would provide the same check, but it would be guarded against extension, and the exception thrown hold more information.

Furthermore, the code would do one thing and one thing only: Check the validity. Not checking the validity and keeping track of what constitutes required keys.

Keeping current

It may seem like a weird requirement to keep current and update code which works just for the sake of being current.

This is a simple amortized development cost. Doing a little work today to keep the code current helps you to not face an impossible leap once the backwards compatibility is dropped. It keeps you going forward, and you will have a current understanding of the codebase. That is, you won’t have any weird blobs of code that only one programmer knows how to use. You alleviate some of the dependencies of people and libraries.

At another time scale this is comparable to having documents in hieroglyphs along with modern writing. If Shakespeare is difficult to understand, then perhaps the language has just moved along.

These are the reasons for keeping current, which means that todays’ codebase should be Java 7 compliant and at least Java 6 compliant.  Which again means that enums and generics should be used, as should the enhanced for-loop, which should kill off a lot of the iterators throughout the code.

Not conforming to standard Java code convention for packages

There lots of ways a Java project can be defined, but apparently coming out of Sun the least they could have done was to map to the natural layout of each package element matching a folder. This is not the case here, where the com.sun.star.filter.config.tools.utils package maps to the l10ntools/source/filter/utils folder.

For the entire Libre Office polyglot project there are even other weird twist and turns in which test code is on the source path. I haven’t seen that recently.

Sloppy copy-paste

The code you look at today tells you how you will write the code tomorrow. That is, if you are looking at ugly hacks today, then tomorrow you will make new ugly hacks.

If something looks like copy and paste, then that is the way you do it. If you’re not quite aware of what you are doing, then you’ll be sloppy and miss out on a few needed alterations. E.g.

if((field & FLAGVAL_COMBINED) == FLAGVAL_SUPPORTSSELECTION)
    lFlags.add(FLAGNAME_SUPPORTSSELECTION);

This check will always fail, as FLAGVAL_COMBINED doesn’t have any matching values with FLAGVAL_SUPPORTSSELECTION. It should probably have been:

if((field & FLAGVAL_SUPPORTSSELECTION) == FLAGVAL_SUPPORTSSELECTION)
    lFlags.add(FLAGNAME_SUPPORTSSELECTION);

Even so, this is only half of one of the related problems. The code resides in convertFilterFlagValues2Names, but there is a reverse mapping required in convertFilterFlagNames2Values. The problem though is that in that method there is no such check.

One solution would be to have the flags as an enum, and have the two bidirectional mappings work on the entirety of the enum scope. That would allow an addition to the enum to automatically cater for the bidirectional mapping without having to touch up everywhere.

public enum Flag {

    THIRDPARTYFILTER  (0x00080000, "3RDPARTYFILTER"),       // 524288
    ALIEN             (0x00000040, "ALIEN"),                // 64
    ASYNCHRON         (0x00004000, "ASYNCHRON"),            // 16384
    BROWSERPREFERRED  (0x00400000, "BROWSERPREFERRED"),     // 4194304
    CONSULTSERVICE    (0x00040000, "CONSULTSERVICE"),       // 262144
    DEFAULT           (0x00000100, "DEFAULT"),              // 256
    EXPORT            (0x00000002, "EXPORT"),               // 2
    IMPORT            (0x00000001, "IMPORT"),               // 1
    INTERNAL          (0x00000008, "INTERNAL"),             // 8
    NOTINCHOOSER      (0x00002000, "NOTINCHOOSER"),         // 8192
    NOTINFILEDIALOG   (0x00001000, "NOTINFILEDIALOG"),      // 4096
    NOTINSTALLED      (0x00020000, "NOTINSTALLED"),         // 131072
    OWN               (0x00000020, "OWN"),                  // 32
    PACKED            (0x00100000, "PACKED"),               // 1048576
    PREFERRED         (0x10000000, "PREFERRED"),            // 268435456
    READONLY          (0x00010000, "READONLY"),             // 65536
    TEMPLATE          (0x00000004, "TEMPLATE"),             // 4
    TEMPLATEPATH      (0x00000010, "TEMPLATEPATH"),         // 16
    USESOPTIONS       (0x00000080, "USESOPTIONS"),          // 128
    COMBINED          (0x00800000, "COMBINED"),             // 8388608
    SUPPORTSSELECTION (0x00000400, "SUPPORTSSELECTION");    // 1024

    private int bitpattern;
    private String name;

    private Flag(int bitpattern, String name) {
        this.bitpattern = bitpattern;
        this.name = name;
    }

    public static int toBitPattern(List<Flag> flags) {
        int val = 0;
        for (Flag f : flags) {
            val |= f.bitpattern;
        }
        return val;
    }
    public static List<Flag> toList(int bitpattern) {
        List<Flag> flags = new ArrayList<Flag>();
        for(Flag flag : Flag.values()) {
            if ((flag.bitpattern & bitpattern) == flag.bitpattern) {
                flags.add(flag);
            }
        }
        return flags;
    }
}

Most likely the name field isn’t needed anymore.

Conclusion

It does not matter whether code is closed source or open source, there is ugly and flawed in both worlds. Most likely there is something beautiful in both as well.

I just cannot accept the poor quality we seem to use as foundation for almost any profession. There is sloppiness an hacks all around. Expensive bugs and clunky solutions, which make everyone's daily digital life more frustrating than it possibly has to be.

You cannot improve on the foundation - the quality of a solution with dependencies is deteriorating like compound interest with a negative rate. If your dependencies accumulate to 90% the best you can hope for is to do 100% yourself, leaving the resulting solution at 0.9*1.0 = 90%. If - on the other hand - you are only delivering 90% quality yourself, the result will be 0.9*0.9 = 81%

Could I do it better? No, most likely not. I'm not claiming to write perfect code. What is frustrating though is that I can easily spot the massive pitfalls and the ugliness of solutions.

Given the short amount of time I've spent on this and the ease of which it was to find a problematic file my estimate is that there is too much poor quality code in the world and not enough people who cares. Just remember that the code used to handle your flight and luggage, audit your tax returns, ship your parts, handle your bank account, etc. is probably no better.

dotCMS short code inspection

onsdag, marts 6th, 2013

Under the impression that I was welcome to further comment on the dotCMS  source code (see my post Programming vs. Flow steps), I created a brief inspection.

While the source code under scrutiny is dotCMS I’m positive that similar issues can be found in most other source code repositories, thus whether or not you are affiliated with dotCMS you could probably benefit from reading the report (it is a 30 page PDF created with DocBook).

As I created some graphs which don’t fit readable onto a page I uploaded them to my Dropbox as well.

Commit buddies

The conclusion:

Pointing fingers is easy. I hope I have not offended anyone at least it was the intention to only show some of the things which are wrong with a software project. And most likely what is wrong with most software projects.

I didn’t touch upon the more stringent theoretical issues with cyclic dependencies – of which I’m sure there are plenty based on what we saw in Chapter 2, Changes. Nor did I go into functional points, cyclomatic complexity, length of methods, combinatorial signatures, and others such code quality issues, as I find these to be too specific for the code base and there was plenty of other – more general – issues to look at.

The cost of software defects (see sourceninja) is immense. Here I have provided a brief overview of some of the low hanging fruits for remediating some of the issues in dotCMS. These methods are free of charge and generally applicable.

If  the  numbers  are  correct,  then  I’ve  spent  a  few  days  identifying  14  issues  worth  approximately $200,000 – and there are plenty more. Please don’t tell me that you don’t have the resources to go over the code base.

By the way, if I read the infographic correctly, then it is a yearly cost per defect until fixed.

It  could  be  that  it  is  not  your  money  you  are  saving,  but  someone  furhter  down  the  dependency hierarchy.

You may think that I’m just too shallow in my examination, that I don’t know the intricate details which make up the foundation for the decisions made. You would probably be right. I consider myself an average person with average skills – but does that change the results I have provided? It was never my intention to find and fix all bugs.

I’m all for the hard working people out there providing services for all of us, some of the services are even free. If I am that upset with dotCMS I could simply chose something different. While I could chose something different my point is: I don’t think it would be a better choice – merely a different one with bugs, errors, and warts in other places.

I might not even know whether or not I’m making a choice for or against one product or another, as e.g. dotCMS could be the foundation upon which other services are built.

This is a call to arms, people. Stop accumulating technical debt. Help keep the World a clean and sane place. The price today is high, but it is lower than it will be tomorrow.