Thursday, October 24, 2013

Retrospective

So, here I am on the feather-edge of retirement (I'll be 70 in a few months) and I'm still learning things.  I had an insight last night that kept me awake mulling it.  My last contract was with Bank of America in Texas and, while it was fun, it was also more than just a little frustrating.

When I first started looking at the code I would be working with at BofA, I was confused.  Everybody these days writes 'strictly structured', right?  No, wrong, and that was what was so confusing.  Last night's insight cleared away all the cobwebs...  just in time for Hallowe'en.

 

There are two ways to approach a programming problem.  In the first, you start out by assuming that this is an easy problem;  you have to do A, B, C, and finally D.  Voila!  You sit down and write code and it comes out as a single long module.  It may be fairly complex.  (This was typical for most of the BofA code I saw.)

If, instead, you assume that the problem will be complex, that you have to do A, B, C, and finally D, you will sit down and write a top-level module with stubs for the called subroutines.  Then you will write the innards of the subroutines, probably as routers with stubs for their called subroutines.  This process will continue through n levels until each subroutine is so simple it just doesn't make sense to break it down further.  The resulting program will be longish, but (all things considered) pretty simple regardless of the initial estimate of complexity.

I (almost) always presume a programming task will be complex.  If that turns out to be wrong, no big loss.  If I were to assume some programming task were simple and it turns out not to be quite as simple as I originally thought — that would hurt.  It would hurt because halfway through writing that 'one long module', I would discover the need for the same code I used in lines 47 through 101.  Stop.  Grab a copy of that code.  Create a subroutine at the end of the code.  Insert CALLs at both places.  Continue writing that 'one long module' where you left off.

If that scene happens more than once or twice, what we wind up with is a long main module with several calls to randomly-placed subroutines.  The coefficient of complexity has just been bumped up, and the bump could be quite a lot.  If it's one of the newly-created subroutines whose function needs to be partitioned, the code soon takes on a distinct air of 'disorganization'.

Do I have to point out that there's way too much overly-complex and disorganized code out there and running in production?  No, I probably don't;  we've all experienced Windows.

So, there's a built-in penalty for assuming simplicity, and it turns out this penalty applies (in REXX, at any rate) no matter how complex the eventual program actually is.

If a (REXX) program is written as 'one long module', possibly with a few random subroutines for function required in more than one place, diagnosis becomes a problem.  Unless the programmer has anticipated bypassing iterative loops, a trace will have to endure every iteration in every loop before getting to the next stage.  To avoid this most painful experience, what happens most often with such code is a quick one-time fix to turn TRACE on here and shut it off there.  But then, the program being diagnosed is no longer the program that failed;  it's a modified version of the failing program.

If a (REXX) program is highly-structured, function will be very encapsulated to the point that any error will be isolated to one or a very small number of suspect segments.  Running such a heavily-encapsulated program in trace-mode means that entire trees of logic can be bypassed:  if TRACE is on for a higher-level module, it can be turned off in a submodule (and all its children) but will still be on when control returns to the higher-level module.  The more structured the code, the easier it is to debug.  With one proviso...

You can have a highly-structured program that is nevertheless disorganized.  If, for example, you place your subroutines in alphabetical order, the flow of control will appear chaotic.  Ideally, submodules that are merely segments of an upper-level router should appear in roughly their order-of-execution.  Although they're broken out into separate segments, they still retain the flavor of that 'one long module' insofar as they appear one after the other like the cars of a train.  Reading such code becomes easier because a CALL to a submodule is a call to code which is (probably) physically close by.  (This is not always strictly true.)

COBOL programmers long ago adopted a more-or-less universal convention: they prefix the name of each code segment with a glyph ('D100', perhaps) that indicates its logical position in the complete program.  A COBOL programmer seeing a reference to 'D100-something' in module 'C850-GET-USER-ID' knows to look later in the code for that segment.  The same technique works equally well in all languages, and REXX is not an exception.  (I tend to use alpha-only such that the mainline calls A_INIT, B_something, etc.  Module C_whatever calls CA_blah, CB_blah, etc.  Whatever works...)

Exactly the same sorts of things can be said about modifying an existing program.  The 'one long module' requires careful planning and skillful execution when inserting new function or changes to existing function.  Testing the new function is a chore to the same extent diagnosing an error is a chore, and for the same reasons.  Highly-structured code is designed to be modified;  it was written that way.

Summarizing:  a highly-structured REXX program may be a little longer than it (strictly) has to be, but it will be easier to understand and easier to diagnose in case of an error.  This understanding can be enhanced by strategic naming of segments and by arranging the segments to more closely align with the actual order of execution.

Recommendation:  Structure is your friend.  It may be your best friend.