Archive for februar, 2013

Programming vs. Flow steps

onsdag, februar 27th, 2013

Looking at source code from different developers – including myself – it has come to my attention that for some reason bright people tend to fall into the not as bright category when it comes to correct placement of the actions.

Programming is not simply appending steps to be performed to a Turing machines tape. That’s basically the entire idea behind programming language – not just high level programming languages. You can separate functional units and call these units from elsewhere. The execution of the program by the computer is a stream of actions, but that doesn’t mean that the source code has to be.

What I’ve seen seems to be developers eager to solve a problem. Having the solution in their head, which steps are needed, they type it into the class or source file, where they are, then move on to the next problem to be solved.

What we miss is the urge to stand back and look at the bigger picture. And in addition ask ourselves: “Is this right? Should these steps be here or somewhere else?”

We have the need for some functionality of an object, which it doesn’t yet provide. Sometimes the object is within our realm. Sometimes it is third party or language intrinsic. In any case we simply dump the ‘solution’ where it is needed with no regards to coupling or cohesion. This leads to promiscuous objects and parts of violations scattered over several different objects. In addition each objects internal complexity rises because it has to know of the internals of several other objects. We simply fail to adhere to the Single Responsibility Principle.

Usually this grows incrementally over time as new issues are brought up, e.g. the requirement to support a new database.

if (DbConnectionFactory.getDBType().equals(DbConnectionFactory.MSSQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < DATEADD("+ type.toString() +", "+ sign + "" + time +", GETDATE()) ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.MYSQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < DATE_ADD(NOW(), INTERVAL "+ sign + "" + time +" " + type.toString()+") ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.POSTGRESQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < NOW() "+ sign + " INTERVAL '"+ time +" " + type.toString()  +"' ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.ORACLE))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE CAST(time_entered AS TIMESTAMP) <  CAST(SYSTIMESTAMP "+ sign +
                             "  INTERVAL '"+time+"' "+ type.toString() + " AS TIMESTAMP)");

From com.dotcms.journal.business.ESDistributedJournalFactoryImpl.distReindexJournalCleanup()

When you go from one database to two or more, the easy way is to introduce an if-else clause around the differences, but this is wrong. It’s an extremely easy hack, but the technical debt is immense. The next maintenance developer coming along to add yet another database will simply adopt the poor decision and add an extra “else if”

Hopefully people have stopped doing the parallel for languages and are using i18n.

One of the weirder things is that apparently the check for database type is in such a great need that it has a specific implementation, which is used in the following snippet.

void fixNotNull() throws SQLException {
    // contentlet_version_info => working_inode & lang
    DotConnect dc=new DotConnect();
    if(DbConnectionFactory.isMsSql()) {
        // MS SQL
        dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN lang numeric(19,0) NOT NULL");
        dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN working_inode varchar(36) NOT NULL");
    }
    else if(DbConnectionFactory.isOracle()) {
        // ORACLE
        dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY (lang NOT NULL)");
        dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY (working_inode NOT NULL)");
    }
    else if(DbConnectionFactory.isMySql()) {
        // MySQL
        dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY lang int8 NOT NULL");
        dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY working_inode varchar(36) NOT NULL");
    }
    else if(DbConnectionFactory.isPostgres()) {
        // PostgreSQL
        dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN lang SET NOT NULL");
        dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN working_inode SET NOT NULL");
    }
}

From com.dotmarketing.startup.runonce.Task00840FixContentletVersionInfo

Using the isMsSql() we have:

And using MSSQL we have:

You may have spotted that actually 3 files contain both:

Which is a bit puzzling, perhaps a clean-up wasn’t seen all the way through. But I digress.

My point is that classes should adhere to OOP principles, information hiding, and clearly defined boundaries. A bean with getters and setters for all fields is not supposed to be used as a grouping of global variables. No matter how promiscuous an object seems - violation of its privacy, i.e. rape, is not allowed.

If the current object has a different responsibility, then delegate the work to another object. This way the class does what the name indicates.

20 questions

Sometimes the code indicates that the steps are in the wrong place. This is usually indicated by asking another object - often a parameter - 20 questions, or at least getting a lot of the intrinsic values, and then deciding which steps to take. The object being questioned will seem promiscuous and have getters and setters abound with little or no other methods, and a plethora of updates.

As an example of an indication of too much knowledge on the wrong side of the boundary we can look at com.dotmarketing.util.UtilHTML:

public static String getStatusIcons(Versionable v) throws DotStateException, DotDataException, DotSecurityException{
    StringBuffer buf = new StringBuffer();
    if(!v.isLive() && v.isWorking() && (!v.isArchived() && v.isWorking())){
        buf.append("<span class='workingIcon'></span>");
    }else{
        buf.append("<span class='greyDotIcon' style='opacity:.4'></span>");
    }
    if(v.isArchived()){
        buf.append("<span class='archivedIcon'></span>");
    }else if(APILocator.getVersionableAPI().hasLiveVersion(v)){
        buf.append("<span class='liveIcon'></span>");
    }else{
        buf.append("<span class='greyDotIcon' style='opacity:.4'></span>");
    }

    if(v.isLocked()){
        buf.append("<span class='lockIcon'></span>");
    }
    return buf.toString();
}

public static String getVersionStatusIcons(Versionable v) throws DotStateException, DotDataException, DotSecurityException{
    StringBuffer buf = new StringBuffer();
    if(v.isWorking()){
        buf.append("<span class='workingIcon'></span>");
    }
    if(v.isLive()){
        buf.append("<span class='liveIcon'></span>");
    }

    return buf.toString();
}

Looking at these two utility functions it becomes clear that:

  • The utility function knows more of the Versionable than is absolutely necessary, but apparently not enough. The 20 questions style: isWorking? isLive? is???
  • The difference between the functions getStatusIcons and getVersionStatusIcons not quite clear when one or the other should be used.
  • There is a bit of a discrepancy between which icons are shown given the same status. I.e. if your version isWorking and isLive, then getStatusIcons will give you greyDotIcon possibly followed by a liveIcon, whereas getVersionStatusIcons will give you workingIcon followed by liveIcon.
  • Boolean expressions are apparently illusive. I.e. the expression !v.isLive() && v.isWorking() && (!v.isArchived() && v.isWorking()) is equivalent to !v.isLive() && v.isWorking() && !v.isArchived() - unless there are timing issues with the isWorking() call, in which case it’s even worse to add the second check.

Utility classes are quite often prone to a too wide reach of functionality. There is a simple need of some sort of utility, creating a new class with just this single function seems too much work, and so the function is added to an existing class with a wishy-washy name. And thus starts another Big Ball of Mud

Why bashing on dotCMS?

It is not my intention to bash on dotCMS or open source. I’m simply using dotCMS as an example, I’m confident there are a lot of source codes out there in about the same state, some worse, and a few better.

The main reasons why I chose dotCMS:

  • The code is open source, which means it is available for anyone wanting to read it.
  • I didn’t write any of it. I’m not affiliated in any way.
  • The code seems like a fine example of Java code for average projects.
  • Looking at several CMS systems I came by it and scratched the surface a bit.

Fixing it?

As it is open source I could be helping rather than pointing, but no. I think my opinions are better left not fighting against hard working developers, and my work is better left fighting for my opinions.

I am tempted to analyse the code quality in greater detail though.

That said, the first step of my “solution” to the first issue above:

if (DbConnectionFactory.getDBType().equals(DbConnectionFactory.MSSQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < DATEADD("+ type.toString() +", "+ sign + "" + time +", GETDATE()) ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.MYSQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < DATE_ADD(NOW(), INTERVAL "+ sign + "" + time +" " + type.toString()+") ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.POSTGRESQL))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE time_entered < NOW() "+ sign + " INTERVAL '"+ time +" " + type.toString()  +"' ");
else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.ORACLE))
    reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " +
     " WHERE CAST(time_entered AS TIMESTAMP) <  CAST(SYSTIMESTAMP "+ sign +
                             "  INTERVAL '"+time+"' "+ type.toString() + " AS TIMESTAMP)");

Would be to add a method:

String getReindexJournalCleanupSQL(String type, String sign, String time) {
    Properties sqlProperties = SQLProperties.get(DbConnectionFactory.getDBType());
    return sqlProperties.get("reindex-journal-cleanup")
                        .replaceAll("%time%", time)
                        .replaceAll("%sign%", sign)
                        .replaceAll("%type%", type);  
}

Then replace the first section with:

    reindexJournalCleanupSql.append(getReindexJournalCleanupSQL(type, sign, time));

Yes, this is a signature which falls under the category of having primitive obsession, but the entire idea is to fiddle about with strings and return a string.

Where SQLProperties is a Map<h;String, Properties> loaded up with the query strings with placeholders in. Thus you would have an entry MSSQL -> Properties where the 'reindex-journal-cleanup' key points to the following value:

"DELETE FROM dist_reindex_journal WHERE time_entered < DATEADD(%type%, %sign%%time%, GETDATE())"

Which means you have all the SQL for a single database type in a single property file, all you have to do to add a new database is add a new property file with the correct SQL commands for all the entries. That is hard work, but it beats having to add another else-if clause everywhere.

If this had been done from the beginning it would have been over-architected and thus a waste of time.

The second step would be to look at the greater picture.

If the method works, then possibly a more general solution is warranted, e.g.

String getSQL(String key, Map substitutions) {
    Properties sqlProperties = SQLProperties.get(DbConnectionFactory.getDBType());
    String template = sqlProperties.get(key);
    for(Map.Entry substitution : substitutions.entrySet()) {
        template = template.replaceAll(substitution.getKey(), substitution.getValue());
    }
    return template;
}

This solution has nothing to do with what the properties are called, how many replaceable entries there are, how these entries are separated in the property string - which naturally must be used in the same manner in the calling code, which is reworked into:

    Map substitutions = new HashMap();
    substitutions.put("%time%", time);
    substitutions.put("%sign%", sign);
    substitutions.put("%type%", type.toString());
    reindexJournalCleanupSql.append(getSQL("reindex-journal-cleanup", substitutions));

The third step: Reiterate, i.e. GO TO 2.

Conclusion

Programming is not the same as listing the flow steps as they would be executed. Anyone with time on their hands can go through the steps of what is required, e.g. Gary Larson’s "First pants, then shoes.". Recipes, TV schedules, and games all fall in the category of flow steps. Making the right decisions and grouping the elements is where it is harder to program than combining flow steps sequentially.

Basically, what I’m saying is: GO TO is still considered harmful, but delegation is not. And while we’re not accustomed to GO TO anymore, we still create the same haphazard styled source code.

It seems most programmers lack the discipline expected of any professional. This is probably due to the facts that:

  • The programs work somewhat as expected.
  • The source code is hidden from - or not understood by - management. Thus fast is better than correct. It is like robbing to solve an issue of lacking funds. It may solve the problem in theory or for a short period, but it’s not a sound way.
  • The big picture is hidden from view of the developers sprinting along to create a working solution within a time bound. Thus knowing what is right at the current time and when to switch is not possible.
  • Estimation of software development apparently lacks the idea of continuous improvement. The notion that you’ll know more about the solution when you’re done than you did in the beginning. That software is a continuous maintenance job, and if you accumulate technical debt you will suffer.
  • There are differences between inventing and creating new functionality and adding support for a wider audience.
  • Evolution is difficult. What was right yesterday may be wrong tomorrow.
  • Engineering practice parallels fail to account for reusability, yet somehow we’re measured using the same yardstick.

Big Ball of Mud with physics

mandag, februar 25th, 2013

Being a fan of Foote and Yoder’s Big Ball of Mud paper. Well, fan in the sense that I think it is a correct observation of most software developed, yet not liking that fact.

So I have been pondering on this – donning my amateur physicists hat I have contemplated some of the issues.

Tournaround

Let’s say that we have a team of developers pushing the project forward. Let’s call them Sisyphus. They have a development velocity, v. The ball of mud has a radius, r, which is comparable to the number of features in the product. The angular veloticy, ω, is proportional to the frequency of revolution, let’s denote this as the version number.

ω = v/r

That is, as the radius grows, then – given the velocity is constant – the angular velocity reduces, which in turn means that the versions will be harder to crank out.

Work

Work, W, is force, F, times the distance, s, which in turn is the velocity times the time passed.
The force is mass times acceleration, and the mass is density, ρ, times the volume, V.

W = F Δs
Δs = v t
F = m a = ρV a

The volume of a sphere is 4Ï€ r3/3

ball-work

Thus the work is proportional to the radius of the sphere cubed.

The interpretation

If something requires twice the number of features, then the work required will be 23 = 8 times greater or run at 1/8 the velocity.

If – on the other hand – this could be done in two disjoint balls of mud, it would only require twice the amount of work.

More generally, we could probably divide the features of the big ball of mud into two smaller balls having different radii, R and r.

The Volume of the original ball with radius R+r will be:

ball-volume

With the volumes of each of the lesser balls as:

balls-volume

Effectively being ball-bigger smaller than the larger ball. We are actually saving more than 6 times the volume of the smallest ball.

ball-excess

If we split the big ball into 80% and 20% – which is likely, then the combined volumes of those two smaller balls would save
ball-save
48% of the original volume.

If volume is comparable to complexity, then we will save complexity and development effort in keeping the balls as small as possible. Nothing new there.

biball

This is just another argument for low coupling and high cohesion. But it is also a reminder that we must look at our code from a different perspective from time to time. Are classes adhering to the Single Responsibility Principle?

Disjoint Communication

If the Big Ball of Mud is split into several smaller balls, they would probably need to communicate. This would have been some of the intrinsic complexity we shaved off.

Let us try to add it in the most stupid form possibly: A complete graph in which each and every ball will know and communicate with each and every other ball.

Renaming the radii having R as the radius of the original Bug Ball of mud, and divide it into n = R/r balls of approximately the same diameter, r. Then taking the added complexity into account we get n(n-1)/2 additional entanglement. Leaving us with:

ball-entangled

Which about the original volume divided by 2n.

As a complete graph is just inverting the big ball of mud, there is a minimum of n-1 connections for a completely connected graph. That is, the refactored ball of mud could – at best – save an additional factor of n.

Different distributions

What if the balls cannot be split using an equal distribution? Perhaps we should contemplate a more natural distribution such as a power law.

Doubling down

Dividing the original radius into n different sizes each half the length of the previous, but with twice as many balls.

Dividing into n=4 different sizes we get:

1/4 + 2/8 + 4/16 + 8/32 = 4/4 = 1

That is, we get 24-1 = 15 balls, the biggest of radius R/4.

The general formula for volume per size is now:

V(c,r) = 4/3 π c r3

Where c is the number of balls with the radius, r.

As the communication is no longer of equal size, we have to do some more work. The rectangle spanned by the two communicating parties radii seems a fitting proporsition.

We end up with the following formula for added complexity due to full communication needs:

ball-comm-sum

As R is a factor of all radii, we see that Vcomm is an area in R2, whereas the volume, V, is a volume in R3. This means that we shouldn’t be allowed to add them, but as we’re not concerned with this type of detail we add them anyway but notice that the effect is dependent upon the initial radius, R.

Using the relative complexity as the yardstick, we can plot the effect of R and n. Remembering that we’re getting 3 balls for n=2, 7 balls for n=3, and 15 balls for n=4.

From the graph it looks as though the initial radius doesn’t matter as much.

Furthermore, it seems that the best value would be to split into 2 or 3 different sized smaller balls. Going beyond seems futile as the gain from splitting the largest ball left would probably be bigger – given that the initial radius doesn’t matter as much.

Without full communication needs the relative complexity would be even lower.
doubledown

Conclusion

While it is human nature to hoard development into a single monolithic big ball of mud. It does pay off to break down the monolith to a certain degree. Whether for sanity, savings, or speed.

The notion of equal distribution – which corresponds to disection into primitive obsession – is luckily reverted by the doubling down distribution. Which is likely to be a more natural division of elements.

It does seem counterintuitive to work at a constant pace and still grind to a halt. Yet this seems to be a logical conclusion of the above calculations. If not to a complete standstill, then at least turning so slowly that it seems like a complete stop.

I didn’t consider friction in the above calculations. Friction would impact on the complete forces, and thus on the work – but not in a positive way.

The question now is: Will a Big Ball of Mud collapse under its own weight if not cared for?

Strongly typed + primitive obsession = poorly typed

torsdag, februar 21st, 2013

Due to the rather large amount of work required to create an object in Java there is a shortcut which leads to primitive obsession: Use the best existing match – for web services a String seems sufficiently applicable for any type, which makes a Map<String, String> or Properties the default data structure for transferable objects.

Property objects easily extended – you simply add new keys – and in that sense they are much handier than Beans, but they aren’t objects in the sense that an object hides information – which is quite difficult if being a data transfer object – but they don’t have methods which works upon the intrinsic state of the objects. Not having methods on the objects leads to procedural programming, which defeats the purpose of objects entirely.

Claiming Java is a strongly typed language and then defaulting to using Strings is really an oxymoron and detrimental to the quality of anything developed in such a language.

I’m all for rapid development and in a tight spot or early on in exploratory development primitives can be used, but they should be replaced by meaningful types as soon as possible. Ages aren’t measured in Integer, money isn’t Double, and I’m sure names belong to a subset of all the possible String values.

While I’m bashing Java – and in particular Java developers – then this is valid for most C-styled languages and developers in those languages. I’m quite puzzled as to why newish strongly typed languages didn’t look at Pascal’s type declaration – it is extremely simple to define or simply alias an existing type. It seems Scala did pick up on that and added the type keyword, which is an easy way to alias another type. E.g. for prototyping you can define Name as a type, which underneath is a String. You can still make type errors by using a declared String as Name or Name as String, but if you define the methods and functions as taking Name as parameter or returning Name, then you should spot some issues in the prototyping phase, and when you exchange the alias for a real type, you should no longer be able to make the type errors. While this is ugly it is far superior to the String hell of Java. NB: This is not the way type was intended to be used.

You should be able to spot the issues by reading code. Sometimes it helps reading it out aloud.

As an example let’s take Martin Fowler’s EventCollaboration example:

class Trader...
        public void PlaceOrder(string symbol, int volume) {
            StockExchange exchange = ServiceLocator.StockExchangeFor(symbol);
            Order order = new Order(symbol, volume, this);
            exchange.SubmitOrder(order);
        }

Transcribing what I was reading out aloud:

  • We have a Trader class in which we can place an order. Placing and order requires a string called symbol and an integer volume – presumably of the things identified by the symbol string.
  • To process the placing of the order we locate the StockExchange, which we will refer to internally as exchange, through a ServiceLocator service using the symbol as argument.
  • Proceeding we create an Order, which we will refer to as order, using the parameters given and the Trader object itself.
  • Finally we submit the order to the exchange.

While that seems like a reasonable process for placing an order only the last step is actually placing the order.

One part is getting the StockExchange on which to place the order, another is actually building the order. This leads to the fact that Trader will have to know about the internals of Order – this may be relevant, then again as Order must know about Trader we have introduced a dependency cycle between the two objects.

Furthermore, if symbol was ever to be confined to some meaningful subset of the String hell, then we would have to modify at least 3 classes: Order, Trader, and ServiceLocator.

While the example goes on changing the method to cater for EventCollaboration:

class Trader...
        public void PlaceOrder(string stock, int volume) {
            Order order = new Order(stock, volume);
            outstandingOrders.Add(order);
            MessageBus.PublishOrderPlacement(order);
        }

A nicer and more consistent way would be to replace the previous method by:

public void place(Order order, StockExchange exchange) { 
    exchange.SubmitOrder(order);
}

Naturally this leads to the discussion of where the remaining steps should go and whether Order should know about Trader or not. My point is that Order should not be created in the flow of PlaceOrder it should be given as a parameter. And giving it as a strongly typed parameter you can remove the type from the method name. Trader.place(Order) reads much better than Trader.placeOrder(Order) – who knows maybe you’ll find a recurring pattern which could lead to an interface and from there to insights.

Why pick on Fowler?

Well, there are several reasons:

  • The code is available. That is, it exists for another purpose than mine. If I wrote example code it would not prove anything except that I couldn’t program.
  • Martin Fowler is known for his books Refactoring: Improving the Design of Existing Code and Patterns of Enterprise Application Architecture – the latter which is used in academia and from which the code in the examples are from – well, the companion web-site.
  • Wanting to “improve the design of existing code” should not leave you stranded with poor code to start with
  • If teaching material is poor, the experience will suffer
  • My guess is that Martin Fowler is a better than average programmer – presumably quite a lot better than average. Showing that his code is not perfect is simply an indication of the state of software in general
  • If you focus on one thing you may lose sight of other elements. In the EventCollaboration example the focus is on event collaboration and not on cyclic dependencies

Back on track

Going back to the introduction I mentioned that ages aren’t measured in Integer, money isn’t Double, and I’m sure names belong to a subset of all the possible String values. Let’s dive into these in greater detail.

Age

The standard 32 bit Integer range is -2,147,483,648 to 2,147,483,647 – if age is measured in years, and we’re talking about humans, then an unsigned 8-bit (0-255) seems quite the fitting range. I’ve never heard of humans being more than 150 years old – and not even that. Most certainly none have been recognized as being -7 years old.

If the age is to be measured in seconds since the start of the epoch (Jan 1st, 1970 00:00:00 UTC), then either it’s not Age but rather a time in space, or we’re into an uncertainty of approximately 43,200 seconds (half a day) – at least I have no idea of when I was born. In either case 32-bit is off. The range of 4 billion seconds is about 136 years. That is, we can only go back to 1901 and forward to 2038, which isn’t suitable for all occasions.

Money

Money consists of an amount of a certain denomination. Most – but not all – currencies have some sort of 1/100 of the denomination, and for percentages, taxation, and currency exchange we often have to work with numbers 1/100 of that, but we shouldn’t go beyond those digits, and we should not accept scientific notation, e.g. $1E2. NaN and Math.PI don’t seem fitting either. Numbers have certain operation, which can be performed upon them, e.g. addition and multiplication. You can’t add $1 and €1 in a meaningful way – at least not without an exchange rate, and you cannot multiply them.

That should leave sufficient arguments not to use floating points without going into details of What Every Computer Scientist Should Know About Floating-Point Arithmetic

Names

I know I’m a bit cynical when saying that nobody is called:

I'm a little teapot,
Short and stout,
Here is my handle, 
Here is my spout,
When I get all steamed up,
Hear me shout,
Tip me over and pour me out!

source

Nor

Robert'); DROP TABLE

XKCD’s Bobby Tables

Why the obsession?

Well it is about minimizing required work. If you have more than one method, which will take an argument of a given type, e.g. age, then you should check the validity of the input for each and every method. Binding the check to where the argument is instantiated, you will know that the argument conforms to the anticipated accepted values everywhere else. In essence you are white listing your input.

If you have a range with start and end, and these two endpoints aren’t connected and the bounds checked, i.e. that start comes before end, but blindly passed on, then you’d have to check over and over again. Possibly introducing an off by one error along the way.

It’s the mathematical laziness, one abstract solution, as opposed to the busy-work mindlessness.

So while primitive obsession is problematic, primitive obsession in a strongly typed language is extremely detrimental. The arguments for a strong type check is defeated by the poor choices made basically removing the foundation for the trust in the types. A dynamically typed programming language would be better, i.e. not as bad in this case – not that you would be better off making mistakes in those languages.

Software rant

tirsdag, februar 19th, 2013

Something is rotten in the state of Denmark. Well – to be sure, it’s not just Denmark it is perhaps in the state of the art of programming.

MVC frameworks such as Rails, Grails, and Play! all fall prey to the fallacy of developing to please if not the compiler, then the framework. Don’t get me wrong, I’m not against the MVC pattern, I’m not against rapid development. But the way that the frameworks separate the entities is just plain wrong. At least according to High Cohesion/Low Coupling. It seems we’ve ended up in the low cohesion/indirect high coupling quadrant.

Models aren’t interchangeable, they do not adhere to the same higher ordered abstraction and should not be packaged together.

In the case where you want to reuse some models of a prior project, the separation of entities in that project will not carry over any easy reuse.

Instead of:

Models/
      + A
      + B
Views/
     + A
     + B
Controllers/
          + A
          + B

It seems it would be prudent to have:

A/
 + Model
 + View
 + Controller
B/
 + Model
 + View
 + Controller

And if the entities A and B are independent services, then by all means they should reside in separate modules to heighten the potential of reuse.

Another benefit of this way of packaging is the immediate overview of the project. You open the project, and see that there are 2 entities: A and B. Not that the project is an MVC project.

This is not to say that MVC frameworks are evil and should be shied away from – there are other evils lurking around in other areas, especially in the Java world with annotations, which on their own can be useful, but through 3rd party libraries they form an extreme vendor lock-in.

The other part is the problem of programmers – apparently on any level – forgetting the scientific evidence, e.g. SOLID principles.

Single Responsibility means one thing only. Perhaps this is where Object Oriented Principles breaks down and you have to adopt some functional programming style into the solution.

For lack of better ideas let’s examine Martin Fowler‘s EventSourcing example. By the way, I’m all for Event Sourcing – it should just be done right. Sadly I think it is taught wrong. I’m not saying that it won’t work; I’m simply saying it could – and most likely should – be done differently. This is not particularly to point fingers at Martin Fowler, it’s to illustrate that we all do less than optimal things. Focus is directed elsewhere. The real point is to raise awareness and in the hope that we don’t end up with big balls of mud.

class EventProcessor...
  IList log = new ArrayList();  
  public void Process(DomainEvent e) {
    e.Process();
    log.Add(e);
  }

class DomainEvent...
  DateTime _recorded, _occurred;
  internal DomainEvent (DateTime occurred) {
    this._occurred = occurred;
    this._recorded = DateTime.Now;
  }
  abstract internal void Process();

class DepartureEvent...
    Port _port;
    Ship _ship;
    internal Port Port   {get { return _port; }}
    internal Ship Ship   {get { return _ship; }}
    internal DepartureEvent(DateTime time, Port port, Ship ship) : base (time)   {
      this._port = port;
      this._ship = ship;
    }
    internal override void Process() {
      Ship.HandleDeparture(this);
    }

class Ship...
  public Port Port;
  public void HandleDeparture(DepartureEvent ev) {
    Port = Port.AT_SEA;
  }

Before the addition of Events the Ship knew about the Port, and most likely the Port knew about the Ships, which is cyclic dependency in itself. With the addition of Events the Ship is now modified – which is a violation of the Open/Closed principle, what’s more: The newly added Events will have to know of Ship and Port – this is fine – but the existing objects will have to know about the new objects, thereby entangling the cyclic dependencies even further.

I do understand the idea behind the Ship knowing which Port if any it is at, an optimized double-link as opposed to examine all the ports and in worst case not find the Ship. But the Port is not part of what makes up a Ship. The solution would be to have a ShipLocationService which ties the Port ID and the Ship ID – basically the Event Sourcing events. Thus a simple query with the known Ship ID should – in O(1) provide the location of the Ship in question. As I see it, then that solution is far superior to the one shown.

Second, if the Ship and Port classes aren’t modified, we should have greater confidence in an update, as the prior solution is untouched, and the addition can be rolled out and back independently.

Furthermore, as the classes for Ship and Port aren’t touched, they can be reused for other services, without the need to update those services as a cascade of changes. I’m not saying they should, only that they could. And I’m not saying that there never will be a need to modify an object definition only that it has to be true to the single responsibility principle.

Naturally for a one-off solution this is overkill, but evidence shows that one-off solution rarely exists, that maintenance is a heavy burden – around 1/3 of the total spent on software. (source)

“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.” C.A.R.Hoare

The third part is programming languages. Each and every programming language has flaws in the sense that they force a specific way of doing things. And being habitual people developers assimilate the thought patterns laid out by the language. In this I’m disagreeing with Linus Thorvalds – programming languages actually have a profound impact on the solutions. But the disagreement only goes so far, as I’m agreeing with his “Bad programmes worry about the code. Good programmers worry about data structures and their relationships” E.g. Java without the static Set initializations makes programmers use other data structures.

Java’s requirement to create the folder structure matching the package structure, setting up a file with the correct name, and then naming a method in which the implementation steps for a given solution can reside is detrimental for the progress of development, as there is too much pomp and circumstances – or grinding – associated with this, which has nothing to do with solving a problem.

Thus the programmer can either be careless and get things done with a lot of technical debt – or be meticulous and slow down progress.

It is extremely difficult to convince stakeholders that their working software solution ought to be modified – especially given the track record of software projects (source). I must say that I’m not happy with the statistics and actually quite upset with the comparison of iterative, agile, and lean with the traditional method. To me it seems like the boiling frog anecdote (source).

With a 2/3 average success rate – at best, and probably more likely 1/2 – then the odds of having a continued successful software product after a few versions is diminishing quite rapidly: (2/3)^version. Naturally the Binomial distribution could be used if accepting some failures along the way, but then it would not be continuously successful.

In other words, if a change is to be reasonable, then it should increase the expected value of the product by at least 50% and prudently more than double the value, not counting the cost of development.

continued-success-rate-of-versions

If we can improve on the average success rate, then we can accept less valuable commitments without too much worry. With 90% success rate we can accept 11% increase in value per version, and run 6 versions with an expected continued success above 50% for the last version. Currently we should only accept one version under these criteria. As Dan North says: “Software is a liability.”