cosmic ray trace analysis

stevan apter — 2003-07-15 12:48:52

and now for something completely different ..

browsing the archives of the k list, i came across this message,
originally posted to (among other groups) comp.lang.apl. i've
seen j and k solutions. how would you go about implementing
bill huber's algorithm in joy?

btw, the application is analyzing cosmic ray traces on photographic
plates.

everything below is part of the original message.

---------------------------

To: Leonard Howell <leonard.howell@...>
Subject: Re: Interesting Programming Problem, Need Some Help
From: whuber <whuber@...>
Date: Thu, 21 May 1998 21:26:27 GMT
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
Newsgroups:
aus.mathematics,comp.lang.apl,comp.lang.fortran,comp.programming.contests,microsoft.public.vb.visual_modeler,sci.stat.math
Organization: Quantitative Decisions
References: <6k1hi1$27j$1@...>
Reply-To: whuber@...
Xref: newshub2.home.com aus.mathematics:30001295 comp.lang.apl:30001564 comp.lang.fortran:30006553
comp.programming.contests:30000503 microsoft.public.vb.visual_modeler:30000271 sci.stat.math:30002564
Leonard Howell wrote:
>
> Here is a interesting problem that perhaps someone has solved
> before or has an idea of how to program the solution. Given a
> square array (NxN) filled with zeros and ones (in practice, there
> are many more zeros than ones), what is the distribution of patterns
> of size 1, 2, 3, ...., where a pattern is defined any string of adjacent
> 1's, either horizontally or /and vertically distributed, and the number
> of "singlets" is also of interest To simply the problem, I'm ignoring
> diagonal connections. For example, in the following array, there are:
> 5 ones, 1 two, 1 three, 2 fours, and 1 six.
>
> 0 0 0 0 0 0 1 0
> 1 0 0 1 1 0 1 1
> 0 1 1 0 1 1 0 0
> 1 1 1 0 0 0 1 0
> 0 0 1 0 0 0 0 1
> 0 1 0 0 0 0 0 1
> 0 0 1 1 1 0 0 0
> 1 0 1 0 0 0 0 1
>
> APL and FORTRAN solutions preferred, but any language appreciated.
> Thanks, Leonard Howell
Let's generalize and suppose there are M rows and N columns, with M >= N
(you can always transpose the array to make this happen). There is a
scanning algorithm that requires O(N) space and O(M*N) time, which is as
good as it gets. The proof is by induction on the number of rows, M.
The algorithm requires a data structure that maintains information about
the distribution of patterns within the M X N array together with
information about the patterns that touch its bottom row. For the
induction step, one updates this structure according to what's in the
next row.

Rather than be formal, I will proceed by example. In the 8 X 8 array
presented, after scanning the first row I will represent the data
structure thus:

0 0 0 0 0 0 [1] 0

where the brackets enclose the tip of the lone identified pattern.
After scanning the second row, the data structure looks like this:

[1] 0 0 [2 2] 0 [3 3]

which echos the pattern of ones on the second row of the array, but also
remembers the size of each pattern in which those ones participate. You
should be able to see that creating this updated structure requires only
the previous structure together with the new row, and that you only have
to scan the new row two or three times (or once, if you're clever with
pointers). At the third step we get:

0 [2 2] 0 [4 4] 0 0 | 1, 3

where now, on the right, we begun to output the complete patterns that
have been identified. The full algorithm for this pattern then looks
like:

0 0 0 0 0 0 [1] 0
[1] 0 0 [2 2] 0 [3 3] |
0 [2 2] 0 [4 4] 0 0 | 1, 3
[5 5 5] 0 0 0 [1] 0 | 4
0 0 [6] 0 0 0 0 [1] | 1
0 [1] 0 0 0 0 0 [2] | 6
0 0 [3 3 3] 0 0 0 | 1, 2
[1] 0 [4] 0 0 0 0 [1] |
| 1, 4, 1

The cumulative output, shown on the right, gives the desired result.
(To get the last bit of output, consider processing an additional row of
zeros.)

Actually, one of the most interesting aspects of the updating is not
used in this example, so let's look at the following array:

1 0 1
1 1 1
1 0 0

The algorithm proceeds as follows:

[1] 0 [1]
[5 5 5] |
[6] 0 0 |
| 6


The interesting thing happened at the second step, where a row of three
ones overlapped two patterns. They were merged. Overall, though, it
should still be easy to see that the updating can occur in one pass,
left to right across each row, and that the data structure can't be any
bigger than a constant times the row size.

Here's another amusing example:

1 1 1 1
1 0 0 1
1 1 0 1
1 0 1 0

which is processed in this way:

[4 4 4 4]
[6 0 0 6] |
[9 9 0 9] |
[10] 0 [1] 0 |
| 10, 1

Here, in the second row, separate runs of 1s were connected to the same
pattern. In general, adjacent pattern tips can merge during the
updating, pattern tips can disappear (and cause some output), and new
patterns can appear.

The trickiest part of all this is demonstrating that the data structure
can be represented succinctly as a disjoint series of intervals: in
other words, that you can't have any interleaving of the tips of two
patterns. But this is clear, by a discrete analog of the Jordan curve
theorem, that states if you find a 1 that is the tip of one pattern, and
another 1 further on in the same row that is the tip of the same
pattern, then any intervening 1s must belong to the same pattern, even
if they are separated along that row by 0s. The previous example
illustrates this.

This scanning algorithm clearly lends itself to a FORTRAN implementation
(which would actually scan left to right) rather than an APL one. I
leave the details to an intrepid programmer. Of course, since I've been
so informal (perhaps cryptic), it's also possible I've been wrong...
there's nothing like the rigor of writing a proof (or a working
program).

- --Bill Huber
Quantitative Decisions
Merion, PA