Looking at source code from different developers – including myself – it has come to my attention that for some reason bright people tend to fall into the not as bright category when it comes to correct placement of the actions.
Programming is not simply appending steps to be performed to a Turing machines tape. That’s basically the entire idea behind programming language – not just high level programming languages. You can separate functional units and call these units from elsewhere. The execution of the program by the computer is a stream of actions, but that doesn’t mean that the source code has to be.
What I’ve seen seems to be developers eager to solve a problem. Having the solution in their head, which steps are needed, they type it into the class or source file, where they are, then move on to the next problem to be solved.
What we miss is the urge to stand back and look at the bigger picture. And in addition ask ourselves: “Is this right? Should these steps be here or somewhere else?â€
We have the need for some functionality of an object, which it doesn’t yet provide. Sometimes the object is within our realm. Sometimes it is third party or language intrinsic. In any case we simply dump the ‘solution’ where it is needed with no regards to coupling or cohesion. This leads to promiscuous objects and parts of violations scattered over several different objects. In addition each objects internal complexity rises because it has to know of the internals of several other objects. We simply fail to adhere to the Single Responsibility Principle.
Usually this grows incrementally over time as new issues are brought up, e.g. the requirement to support a new database.
if (DbConnectionFactory.getDBType().equals(DbConnectionFactory.MSSQL)) reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " + " WHERE time_entered < DATEADD("+ type.toString() +", "+ sign + "" + time +", GETDATE()) "); else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.MYSQL)) reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " + " WHERE time_entered < DATE_ADD(NOW(), INTERVAL "+ sign + "" + time +" " + type.toString()+") "); else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.POSTGRESQL)) reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " + " WHERE time_entered < NOW() "+ sign + " INTERVAL '"+ time +" " + type.toString() +"' "); else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.ORACLE)) reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " + " WHERE CAST(time_entered AS TIMESTAMP) < CAST(SYSTIMESTAMP "+ sign + " INTERVAL '"+time+"' "+ type.toString() + " AS TIMESTAMP)");
From com.dotcms.journal.business.ESDistributedJournalFactoryImpl.distReindexJournalCleanup()
When you go from one database to two or more, the easy way is to introduce an if-else clause around the differences, but this is wrong. It’s an extremely easy hack, but the technical debt is immense. The next maintenance developer coming along to add yet another database will simply adopt the poor decision and add an extra “else ifâ€
Hopefully people have stopped doing the parallel for languages and are using i18n.
One of the weirder things is that apparently the check for database type is in such a great need that it has a specific implementation, which is used in the following snippet.
void fixNotNull() throws SQLException { // contentlet_version_info => working_inode & lang DotConnect dc=new DotConnect(); if(DbConnectionFactory.isMsSql()) { // MS SQL dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN lang numeric(19,0) NOT NULL"); dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN working_inode varchar(36) NOT NULL"); } else if(DbConnectionFactory.isOracle()) { // ORACLE dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY (lang NOT NULL)"); dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY (working_inode NOT NULL)"); } else if(DbConnectionFactory.isMySql()) { // MySQL dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY lang int8 NOT NULL"); dc.executeStatement("ALTER TABLE contentlet_version_info MODIFY working_inode varchar(36) NOT NULL"); } else if(DbConnectionFactory.isPostgres()) { // PostgreSQL dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN lang SET NOT NULL"); dc.executeStatement("ALTER TABLE contentlet_version_info ALTER COLUMN working_inode SET NOT NULL"); } }
From com.dotmarketing.startup.runonce.Task00840FixContentletVersionInfo
Using the isMsSql() we have:
- com.dotcms.content.elasticsearch.business.ESContentFactoryImpl
- com.dotcms.journal.business.ESDistributedJournalFactoryImpl
- com.dotmarketing.common.reindex.ReindexThread
- com.dotmarketing.startup.AbstractJDBCStartupTask
- com.dotmarketing.startup.runonce.Task00769UpdateTagDataModel
- com.dotmarketing.startup.runonce.Task00785DataModelChanges
- com.dotmarketing.startup.runonce.Task00795LiveWorkingToIdentifier
- com.dotmarketing.startup.runonce.Task00820CreateNewWorkFlowTables
- com.dotmarketing.startup.runonce.Task00840FixContentletVersionInfo
And using MSSQL we have:
- com.dotcms.journal.business.ESDistributedJournalFactoryImpl
- com.dotmarketing.business.PermissionBitFactoryImpl
- com.dotmarketing.business.PermissionedWebAssetUtil
- com.dotmarketing.business.RoleFactoryImpl
- com.dotmarketing.common.db.DotConnect
- com.dotmarketing.common.util.SQLUtil
- com.dotmarketing.db.HibernateUtil
- com.dotmarketing.fixtask.tasks.FixTask00040CheckFileAssetsMimeType
- com.dotmarketing.portlets.dashboard.business.DashboardDataGenerator
- com.dotmarketing.portlets.dashboard.business.DashboardFactory
- com.dotmarketing.quartz.DotJobStore
- com.dotmarketing.startup.AbstractJDBCStartupTask
- com.dotmarketing.startup.runalways.Task00001LoadSchema
- com.dotmarketing.startup.runonce.Task00780UUIDTypeChange
- com.dotmarketing.startup.runonce.Task00785DataModelChanges
- com.dotmarketing.startup.runonce.Task00790DataModelChangesForWebAssets
- com.dotmarketing.startup.runonce.Task00800CreateTemplateContainers
- com.dotmarketing.startup.runonce.Task00815WorkFlowTablesChanges
- com.dotmarketing.startup.StartupTasksExecutor
- com.dotmarketing.util.ImportExportUtil
- com.dotmarketing.util.MaintenanceUtil
- com.dotmarketing.velocity.VelocityServlet
You may have spotted that actually 3 files contain both:
- com.dotcms.journal.business.ESDistributedJournalFactoryImpl
- com.dotmarketing.startup.AbstractJDBCStartupTask
- com.dotmarketing.startup.runonce.Task00785DataModelChanges
Which is a bit puzzling, perhaps a clean-up wasn’t seen all the way through. But I digress.
My point is that classes should adhere to OOP principles, information hiding, and clearly defined boundaries. A bean with getters and setters for all fields is not supposed to be used as a grouping of global variables. No matter how promiscuous an object seems - violation of its privacy, i.e. rape, is not allowed.
If the current object has a different responsibility, then delegate the work to another object. This way the class does what the name indicates.
20 questions
Sometimes the code indicates that the steps are in the wrong place. This is usually indicated by asking another object - often a parameter - 20 questions, or at least getting a lot of the intrinsic values, and then deciding which steps to take. The object being questioned will seem promiscuous and have getters and setters abound with little or no other methods, and a plethora of updates.
As an example of an indication of too much knowledge on the wrong side of the boundary we can look at com.dotmarketing.util.UtilHTML:
public static String getStatusIcons(Versionable v) throws DotStateException, DotDataException, DotSecurityException{ StringBuffer buf = new StringBuffer(); if(!v.isLive() && v.isWorking() && (!v.isArchived() && v.isWorking())){ buf.append("<span class='workingIcon'></span>"); }else{ buf.append("<span class='greyDotIcon' style='opacity:.4'></span>"); } if(v.isArchived()){ buf.append("<span class='archivedIcon'></span>"); }else if(APILocator.getVersionableAPI().hasLiveVersion(v)){ buf.append("<span class='liveIcon'></span>"); }else{ buf.append("<span class='greyDotIcon' style='opacity:.4'></span>"); } if(v.isLocked()){ buf.append("<span class='lockIcon'></span>"); } return buf.toString(); } public static String getVersionStatusIcons(Versionable v) throws DotStateException, DotDataException, DotSecurityException{ StringBuffer buf = new StringBuffer(); if(v.isWorking()){ buf.append("<span class='workingIcon'></span>"); } if(v.isLive()){ buf.append("<span class='liveIcon'></span>"); } return buf.toString(); }
Looking at these two utility functions it becomes clear that:
- The utility function knows more of the Versionable than is absolutely necessary, but apparently not enough. The 20 questions style: isWorking? isLive? is???
- The difference between the functions getStatusIcons and getVersionStatusIcons not quite clear when one or the other should be used.
- There is a bit of a discrepancy between which icons are shown given the same status. I.e. if your version isWorking and isLive, then getStatusIcons will give you greyDotIcon possibly followed by a liveIcon, whereas getVersionStatusIcons will give you workingIcon followed by liveIcon.
- Boolean expressions are apparently illusive. I.e. the expression !v.isLive() && v.isWorking() && (!v.isArchived() && v.isWorking()) is equivalent to !v.isLive() && v.isWorking() && !v.isArchived() - unless there are timing issues with the isWorking() call, in which case it’s even worse to add the second check.
Utility classes are quite often prone to a too wide reach of functionality. There is a simple need of some sort of utility, creating a new class with just this single function seems too much work, and so the function is added to an existing class with a wishy-washy name. And thus starts another Big Ball of Mud
Why bashing on dotCMS?
It is not my intention to bash on dotCMS or open source. I’m simply using dotCMS as an example, I’m confident there are a lot of source codes out there in about the same state, some worse, and a few better.
The main reasons why I chose dotCMS:
- The code is open source, which means it is available for anyone wanting to read it.
- I didn’t write any of it. I’m not affiliated in any way.
- The code seems like a fine example of Java code for average projects.
- Looking at several CMS systems I came by it and scratched the surface a bit.
Fixing it?
As it is open source I could be helping rather than pointing, but no. I think my opinions are better left not fighting against hard working developers, and my work is better left fighting for my opinions.
I am tempted to analyse the code quality in greater detail though.
That said, the first step of my “solution†to the first issue above:
if (DbConnectionFactory.getDBType().equals(DbConnectionFactory.MSSQL)) reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " + " WHERE time_entered < DATEADD("+ type.toString() +", "+ sign + "" + time +", GETDATE()) "); else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.MYSQL)) reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " + " WHERE time_entered < DATE_ADD(NOW(), INTERVAL "+ sign + "" + time +" " + type.toString()+") "); else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.POSTGRESQL)) reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " + " WHERE time_entered < NOW() "+ sign + " INTERVAL '"+ time +" " + type.toString() +"' "); else if(DbConnectionFactory.getDBType().equals(DbConnectionFactory.ORACLE)) reindexJournalCleanupSql.append("DELETE FROM dist_reindex_journal " + " WHERE CAST(time_entered AS TIMESTAMP) < CAST(SYSTIMESTAMP "+ sign + " INTERVAL '"+time+"' "+ type.toString() + " AS TIMESTAMP)");
Would be to add a method:
String getReindexJournalCleanupSQL(String type, String sign, String time) { Properties sqlProperties = SQLProperties.get(DbConnectionFactory.getDBType()); return sqlProperties.get("reindex-journal-cleanup") .replaceAll("%time%", time) .replaceAll("%sign%", sign) .replaceAll("%type%", type); }
Then replace the first section with:
reindexJournalCleanupSql.append(getReindexJournalCleanupSQL(type, sign, time));
Yes, this is a signature which falls under the category of having primitive obsession, but the entire idea is to fiddle about with strings and return a string.
Where SQLProperties is a Map<h;String, Properties> loaded up with the query strings with placeholders in. Thus you would have an entry MSSQL -> Properties where the 'reindex-journal-cleanup' key points to the following value:
"DELETE FROM dist_reindex_journal WHERE time_entered < DATEADD(%type%, %sign%%time%, GETDATE())"
Which means you have all the SQL for a single database type in a single property file, all you have to do to add a new database is add a new property file with the correct SQL commands for all the entries. That is hard work, but it beats having to add another else-if clause everywhere.
If this had been done from the beginning it would have been over-architected and thus a waste of time.
The second step would be to look at the greater picture.
If the method works, then possibly a more general solution is warranted, e.g.
String getSQL(String key, Mapsubstitutions) { Properties sqlProperties = SQLProperties.get(DbConnectionFactory.getDBType()); String template = sqlProperties.get(key); for(Map.Entry substitution : substitutions.entrySet()) { template = template.replaceAll(substitution.getKey(), substitution.getValue()); } return template; }
This solution has nothing to do with what the properties are called, how many replaceable entries there are, how these entries are separated in the property string - which naturally must be used in the same manner in the calling code, which is reworked into:
Mapsubstitutions = new HashMap (); substitutions.put("%time%", time); substitutions.put("%sign%", sign); substitutions.put("%type%", type.toString()); reindexJournalCleanupSql.append(getSQL("reindex-journal-cleanup", substitutions));
The third step: Reiterate, i.e. GO TO 2.
Conclusion
Programming is not the same as listing the flow steps as they would be executed. Anyone with time on their hands can go through the steps of what is required, e.g. Gary Larson’s "First pants, then shoes.". Recipes, TV schedules, and games all fall in the category of flow steps. Making the right decisions and grouping the elements is where it is harder to program than combining flow steps sequentially.
Basically, what I’m saying is: GO TO is still considered harmful, but delegation is not. And while we’re not accustomed to GO TO anymore, we still create the same haphazard styled source code.
It seems most programmers lack the discipline expected of any professional. This is probably due to the facts that:
- The programs work somewhat as expected.
- The source code is hidden from - or not understood by - management. Thus fast is better than correct. It is like robbing to solve an issue of lacking funds. It may solve the problem in theory or for a short period, but it’s not a sound way.
- The big picture is hidden from view of the developers sprinting along to create a working solution within a time bound. Thus knowing what is right at the current time and when to switch is not possible.
- Estimation of software development apparently lacks the idea of continuous improvement. The notion that you’ll know more about the solution when you’re done than you did in the beginning. That software is a continuous maintenance job, and if you accumulate technical debt you will suffer.
- There are differences between inventing and creating new functionality and adding support for a wider audience.
- Evolution is difficult. What was right yesterday may be wrong tomorrow.
- Engineering practice parallels fail to account for reusability, yet somehow we’re measured using the same yardstick.