Tuesday, August 21, 2012

What's Wrong With This picture?

How many times have you seen this and thought nothing of it?

"EXECIO * DISKR FIRST (STEM FIRST. FINIS"
"EXECIO * DISKR NEXT  (STEM NEXT.  FINIS"
do nn = 1 to next.0
   --parse token from next.nn --
   --35 lines of binary search thru first. --
   if notfound then do
      --diagnostic message--
      end 
   else do
      --process the match--
      end
end 

Not only have I seen this kind of code, I have written this kind of code.  A startling revelation, an epiphany, has rocked my world.

The 'revelation' is this:  in MVS, the first time you say 'READ', you don't just read one record; you read five buffers.  If you're talking about modern DASD and 80-byte records, you've just 'read' something like 1400 or 1500 records. They're all in main storage.  Whatever operation you do to them, you do at main storage speeds.  And "EXECIO * DISKR" doesn't stop at five buffers;  it reads all of it.  All the heavy lifting, the SIOs, has already been done.  You've just spent $11 to read from DASD, and now you propose to save four cents by doing a binary search.  Are we all nuts?

In a situation like this, we should leverage the power of REXX's data-associated arrays by spinning through one of those stems and parsing everything we can find, then use the data-association to the second file to know (intuitively) whether there is or is not a match.  It's all in main storage, right?  You would need highly-sophisticated and very expensive equipment to discern how much time you saved by doing a binary search over using a sequential process.  The cost of having a programmer write those thirty-five lines of binary search will never be paid back by the time saved.

Edsger Dijkstra once proposed that a programmer worrying about how fast (or slow) hir code would execute was worrying about the wrong thing.  "Get a faster computer", he advised.  Easier said than done in many cases, but always the optimal solution.

That's not, however, what we're seeing here.  This is truly "penny-wise and pound-foolish" to do all that I/O and then waste the advantage of having it all immediately accessible (let's face it) for free.

I think I may have written my last binary search.  What do you think?

2 comments:

  1. Usually if I really have to do heavy match I just use DFSORT and its wonderful JOIN commands.
    That's really fast.

    When I've really to do some match with Rexx I always use stems as associative arrays something like:
    ...
    MyStem. = 0
    MyStem._thekeytomatch = 1
    ...
    So I really agree with you.

    ReplyDelete
  2. Just a thought that may apply here... in my experience:

    First, write code that works. Second, make changes to improve performance ONLY if it NEEDS to be improved. In most cases better is the enemy of good enough.

    You probably did the binary search because you knew how and thought it was worthwhile. The simplest solution is quite often the best solution, and almost always good enough.

    ReplyDelete