Due to the rather large amount of work required to create an object in Java there is a shortcut which leads to primitive obsession: Use the best existing match – for web services a String seems sufficiently applicable for any type, which makes a Map<String, String> or Properties the default data structure for transferable objects.
Property objects easily extended – you simply add new keys – and in that sense they are much handier than Beans, but they aren’t objects in the sense that an object hides information – which is quite difficult if being a data transfer object – but they don’t have methods which works upon the intrinsic state of the objects. Not having methods on the objects leads to procedural programming, which defeats the purpose of objects entirely.
Claiming Java is a strongly typed language and then defaulting to using Strings is really an oxymoron and detrimental to the quality of anything developed in such a language.
I’m all for rapid development and in a tight spot or early on in exploratory development primitives can be used, but they should be replaced by meaningful types as soon as possible. Ages aren’t measured in Integer, money isn’t Double, and I’m sure names belong to a subset of all the possible String values.
While I’m bashing Java – and in particular Java developers – then this is valid for most C-styled languages and developers in those languages. I’m quite puzzled as to why newish strongly typed languages didn’t look at Pascal’s type declaration – it is extremely simple to define or simply alias an existing type. It seems Scala did pick up on that and added the type keyword, which is an easy way to alias another type. E.g. for prototyping you can define Name as a type, which underneath is a String. You can still make type errors by using a declared String as Name or Name as String, but if you define the methods and functions as taking Name as parameter or returning Name, then you should spot some issues in the prototyping phase, and when you exchange the alias for a real type, you should no longer be able to make the type errors. While this is ugly it is far superior to the String hell of Java. NB: This is not the way type was intended to be used.
You should be able to spot the issues by reading code. Sometimes it helps reading it out aloud.
As an example let’s take Martin Fowler’s EventCollaboration example:
class Trader... public void PlaceOrder(string symbol, int volume) { StockExchange exchange = ServiceLocator.StockExchangeFor(symbol); Order order = new Order(symbol, volume, this); exchange.SubmitOrder(order); }
Transcribing what I was reading out aloud:
- We have a Trader class in which we can place an order. Placing and order requires a string called symbol and an integer volume – presumably of the things identified by the symbol string.
- To process the placing of the order we locate the StockExchange, which we will refer to internally as exchange, through a ServiceLocator service using the symbol as argument.
- Proceeding we create an Order, which we will refer to as order, using the parameters given and the Trader object itself.
- Finally we submit the order to the exchange.
While that seems like a reasonable process for placing an order only the last step is actually placing the order.
One part is getting the StockExchange on which to place the order, another is actually building the order. This leads to the fact that Trader will have to know about the internals of Order – this may be relevant, then again as Order must know about Trader we have introduced a dependency cycle between the two objects.
Furthermore, if symbol was ever to be confined to some meaningful subset of the String hell, then we would have to modify at least 3 classes: Order, Trader, and ServiceLocator.
While the example goes on changing the method to cater for EventCollaboration:
class Trader... public void PlaceOrder(string stock, int volume) { Order order = new Order(stock, volume); outstandingOrders.Add(order); MessageBus.PublishOrderPlacement(order); }
A nicer and more consistent way would be to replace the previous method by:
public void place(Order order, StockExchange exchange) { exchange.SubmitOrder(order); }
Naturally this leads to the discussion of where the remaining steps should go and whether Order should know about Trader or not. My point is that Order should not be created in the flow of PlaceOrder it should be given as a parameter. And giving it as a strongly typed parameter you can remove the type from the method name. Trader.place(Order) reads much better than Trader.placeOrder(Order) – who knows maybe you’ll find a recurring pattern which could lead to an interface and from there to insights.
Why pick on Fowler?
Well, there are several reasons:
- The code is available. That is, it exists for another purpose than mine. If I wrote example code it would not prove anything except that I couldn’t program.
- Martin Fowler is known for his books Refactoring: Improving the Design of Existing Code and Patterns of Enterprise Application Architecture – the latter which is used in academia and from which the code in the examples are from – well, the companion web-site.
- Wanting to “improve the design of existing code†should not leave you stranded with poor code to start with
- If teaching material is poor, the experience will suffer
- My guess is that Martin Fowler is a better than average programmer – presumably quite a lot better than average. Showing that his code is not perfect is simply an indication of the state of software in general
- If you focus on one thing you may lose sight of other elements. In the EventCollaboration example the focus is on event collaboration and not on cyclic dependencies
Back on track
Going back to the introduction I mentioned that ages aren’t measured in Integer, money isn’t Double, and I’m sure names belong to a subset of all the possible String values. Let’s dive into these in greater detail.
Age
The standard 32 bit Integer range is -2,147,483,648 to 2,147,483,647 – if age is measured in years, and we’re talking about humans, then an unsigned 8-bit (0-255) seems quite the fitting range. I’ve never heard of humans being more than 150 years old – and not even that. Most certainly none have been recognized as being -7 years old.
If the age is to be measured in seconds since the start of the epoch (Jan 1st, 1970 00:00:00 UTC), then either it’s not Age but rather a time in space, or we’re into an uncertainty of approximately 43,200 seconds (half a day) – at least I have no idea of when I was born. In either case 32-bit is off. The range of 4 billion seconds is about 136 years. That is, we can only go back to 1901 and forward to 2038, which isn’t suitable for all occasions.
Money
Money consists of an amount of a certain denomination. Most – but not all – currencies have some sort of 1/100 of the denomination, and for percentages, taxation, and currency exchange we often have to work with numbers 1/100 of that, but we shouldn’t go beyond those digits, and we should not accept scientific notation, e.g. $1E2. NaN and Math.PI don’t seem fitting either. Numbers have certain operation, which can be performed upon them, e.g. addition and multiplication. You can’t add $1 and €1 in a meaningful way – at least not without an exchange rate, and you cannot multiply them.
That should leave sufficient arguments not to use floating points without going into details of What Every Computer Scientist Should Know About Floating-Point Arithmetic
Names
I know I’m a bit cynical when saying that nobody is called:
I'm a little teapot, Short and stout, Here is my handle, Here is my spout, When I get all steamed up, Hear me shout, Tip me over and pour me out!
Nor
Robert'); DROP TABLE
Why the obsession?
Well it is about minimizing required work. If you have more than one method, which will take an argument of a given type, e.g. age, then you should check the validity of the input for each and every method. Binding the check to where the argument is instantiated, you will know that the argument conforms to the anticipated accepted values everywhere else. In essence you are white listing your input.
If you have a range with start and end, and these two endpoints aren’t connected and the bounds checked, i.e. that start comes before end, but blindly passed on, then you’d have to check over and over again. Possibly introducing an off by one error along the way.
It’s the mathematical laziness, one abstract solution, as opposed to the busy-work mindlessness.
So while primitive obsession is problematic, primitive obsession in a strongly typed language is extremely detrimental. The arguments for a strong type check is defeated by the poor choices made basically removing the foundation for the trust in the types. A dynamically typed programming language would be better, i.e. not as bad in this case – not that you would be better off making mistakes in those languages.