The Collatz Conjecture Through a Binary Lens

juni 4th, 2026

Per Lauge Buresø Holst  ·  with Claude (Anthropic AI)

4 June 2026

A structural exploration of why 3n+1 trajectories tend downward — and why “tend” cannot yet be upgraded to “must.”

Collaboration & process

This document is a collaboration between Per Lauge Buresø Holst (who conceived and directed the investigation) and Claude, an AI assistant from Anthropic. The work split along its natural seam: the human supplied the ideas, intuitions, and direction — the binary lens, the “firewall” / “fuel” / “glider” framings, the cycle-climb questions, and the persistent skepticism that caught more than one overstatement; Claude supplied verification and formalization — running every claim in code, producing closed forms and counterexamples, correcting errors, cross-checking attributions against the literature (Lagarias, Terras, Tao, Conway, Eliahou), and writing the prose, code, figures, and this document. Every quantitative statement here was checked computationally and every “known result” traced to a source. The honest conclusion that this is expository, not novel (§14–15) is itself an output of that process — as is the recurring theme that each elementary line of attack halts at the same wall. Remaining errors are ours, jointly.


1. The Problem

For a positive integer n, apply: if even, \(n \to n/2\); if odd, \(n \to 3n+1\). The Collatz conjecture asserts every starting value eventually reaches 1. Verified to ~2??; unproven in general.

This writeup follows one line of attack: read the dynamics in binary, where \(\div 2\) is a right shift and \(3n+1 = n + (n\ll 1) + 1\) is an add-with-carry. Dividing by 2 does no real work — it merely pans the bit pattern through a fixed positional frame, a change of reference (Figure 1); all the dynamics live in the odd step \(3n+1\). The goal was to find structure that forces convergence. We found a great deal of structure — enough to explain why trajectories descend on average — but each structural fact stops exactly at the same wall, which turns out to be the conjecture itself.

Figure 1. Dividing by 2 is a change of reference frame: a right shift just pans the bit pattern through a fixed positional “viewport,” dropping the bit that falls off the units end. No information is created or destroyed — the real work happens only at 3n+1.

Notation & Symbols

SymbolMeaning
\(T(n)\)shortcut map on odd \(n\): \(T(n) = (3n+1) / 2^\nu \) — does \(3n+1\) then strips all trailing zeros, landing on the next odd number.
\(\nu \) (= \(\nu _{2}(3n+1)\))2-adic valuation of \(3n+1\): the number of trailing zero bits = how many times you halve in one shortcut step. \(\nu =1 \iff n \equiv 3 \bmod 4\). (Greek “nu”.)
\(v_{3}(m)\)3-adic valuation of \(m\): the exponent of the highest power of 3 dividing \(m\). E.g. \(v_{3}(54) = 3\) because \(54 = 2\cdot 3^{3}\) (so \(3^{3} \mid 54\) but \(3^{4} \nmid 54\)); \(v_{3}(102)=1\) since \(102 = 2\cdot 3\cdot 17\); \(v_{3}(80)=0\). (Latin “v” — not the same as \(\nu \) above.)
\(\equiv a (\bmod b)\)congruence: \(n \equiv 3 \bmod 4\) means \(n\) leaves remainder 3 on division by 4, i.e. its low two bits are \(\dots 11\).
\(M\)the maximum odd element of a hypothetical non-trivial cycle.
\(P, Q, R\)the predecessors (all odd numbers) climbing down from \(M\): \(P\) maps to \(M\), \(Q\) to \(P\), \(R\) to \(Q\). Generally \(x_j\) is the \(j\)-th rung, \(x_{0}=M, x_{1}=P, x_{2}=Q, x_{3}=R\), with \(x_j + 1 = (2/3)^j (M+1)\).
\(a\), \(S\)for a whole cycle: \(a\) = number of odd (\(3n+1\)) steps; \(S = \sum \nu \) = total halvings. The balance is \(3^a \approx 2^S\).
\(H, L\)high / low parts of a number split across a zero-run “firewall”: \(n = H\cdot 2^{m+b} + L\).
\(A_j\), \(r_m\)alternating numbers: \(A_j = (4^{j+1}-1)/3 = 101\dots 01\); \(r_m = (2^m-1)/3\) (same family).
\(\mathrm{octave}\)a factor-of-2 band of magnitude (one bit of length); “log-uniform” = roughly equal count per octave.
\(\ll \), \(\gg \)left / right bit shift (\(n\ll 1 = 2n\)).
\(2^k-1\)“all-ones” number (\(k\) consecutive 1-bits); the \(-1\) of the 2-adic integers \(Z_{2}\) is the infinite all-ones \(\dots 1111\).

2. The Shortcut Map and the Two Residue Classes

Work odd-to-odd via the shortcut map \(T(n) = (3n+1) / 2^\nu \), where ? = number of trailing zeros of \(3n+1\). Even numbers are irrelevant — they just halve down. So only odd residues mod 4 matter:

n mod 43n+1?net per step
? 1 (\(\dots 01\))4(3k+1)? ? 2shrinks (descends below n immediately)
? 3 (\(\dots 11\))2(6k+5)? = 1grows (×3/2)

Immediate consequences:

  • Any cycle’s minimum element is odd, and cannot be ? 1 mod 4 (it would drop below itself). So a non-trivial cycle’s minimum must be ? 3 mod 4 — binary \(\dots 11\).
  • \(n \equiv 1 \bmod 4\) provably descends in one step. All difficulty lives in the \(\equiv 3 \bmod 4\) (“sticky”) class.

Figure 2 shows a full trajectory — that of 27 — read in binary: the heatmap (rows = steps), the value curve, the pattern metrics, and the carry depth \(\nu \) at each odd step.

Figure 2. The trajectory of 27. Top-left: binary heatmap (rows = steps, blue = 1). Top-right: value (log scale). Bottom-left: Hamming density / longest zero-run / longest alternating-run. Bottom-right: carry depth ??(3n+1) at each odd step. Red lines mark “detonation” steps.

3. The Carry Mechanism

In \(3n = n + (n\ll 1)\), classify each bit position by its pair:

  • Generator (\(11\)): always emits a carry.
  • Propagator (\(10\)/\(01\)): passes a carry if one arrives.
  • Absorber (\(00\)): stops the carry.

A carry travels exactly as far as the trailing run of 1s in \(3n\), i.e. \(\nu _{2}(3n+1)\) = length of that run. For a generic number the carry fizzles in 1–2 bits. The exception is the alternating pattern \(1010\dots 1\), which is all propagators: \(3 \times (1010\dots 1) = (111\dots 1)\), so a single triggering carry conducts across its entire length.

\(\times 3\) as a 3-state transducer. Processing \(n\) in 4-bit chunks from LSB to MSB, each chunk computes \(3x + c\) where \(c\) is the carry-in from below; its low 4 bits stay, and \(\mathrm{cout} = (3x+c)\gg 4\) passes up. Since \(3\cdot 15 + 2 = 47\) and \(47\gg 4 = 2\), the carry-out never exceeds 2 when the carry-in is \(\le 2\) — so the carry state lives in a closed set \({0,1,2}\) (carry-in 3 never arises). Multiplication by 3 is thus a finite-state transducer with three carry states; the chunks split into three bands by \(\lfloor 3x/16\rfloor \) (\(0000-0101 \to 0\), \(0110-1010 \to 1\), \(1011-1111 \to 2\)), and the carry-in only matters at the band edges:

chunkbehaviourrole
\(0101\)\(c=0\to \mathrm{cout} 0\), \(c\ge 1\to \mathrm{cout} 1\)conductor — the unique band-0 chunk that lets a carry escape (\(3\cdot 5 = 1111\))
\(1010\)\(c\le 1\to \mathrm{cout} 1\), \(c=2\to \mathrm{cout} 2\)promoter — carries a 2 onward
\(1111\)\(\mathrm{cout} = 2\) for every \(c\)generator — a saturated carry source
all others\(\mathrm{cout}\) fixed by \(3x\) aloneabsorber — incoming carry shifts bits but cannot escape

This is the finite, local object behind the whole carry story (and the cellular-automaton analogy of Section 11): a long alternating run \(\dots 0101 0101\dots \) is a chain of conductors that walks a bottom-injected carry (the \(+1\)) all the way up, while a long all-ones run \(\dots 1111\dots \) is a wall of generators forcing \(\mathrm{cout} = 2\) at every chunk — exactly the two motifs (alternating/firewall, all-ones/\(-1\)) that dominate both obstacles. Figure 3 shows this toggle: the pure alternating number 1365 has carry-reach 0, but ×3 turns it all-ones, flipping carry-reach to full width.

Figure 3. ×3 applied repeatedly to the alternating number 1365 (= a conductor chain). Red = generator chunk, orange = propagator, gray = absorber; the bar shows how far the carry reaches. The alternating pattern (reach 0) becomes all-ones under ×3 (reach = full width), then alternating again — a two-state toggle.

The ?-distribution is geometric, \(P(\nu =k) = 1/2^k\), giving \(E[\nu ] = 2\). Hence the famous heuristic: each odd step multiplies by ~\(3/2^{2} = 3/4\) on average. Figure 4 shows the per-step carry roles along 27’s shortcut trajectory.

Figure 4. Shortcut (odd-only) steps of 27. Left: bit values. Centre: carry roles per bit position (gray = absorb, orange = propagate, red = generate) with carry-reach markers. Right: ? (the firewall the +1 creates) vs carry-reach. Step 1367 is wall-to-wall propagators — “loaded” — just before the firewall at 3077.

4. The Firewall: Isolation by a Zero Run

Write \(n = H\cdot 2^{m+b} + L\), with a low part L (b bits) separated from a high part H by m zeros. Because \(3L+1 < 2^{b+2}\), the carry from L cannot cross a sufficiently wide zero run. While the firewall holds, H and L evolve independently:

\[T^j(n) = 3^j \cdot H \cdot 2^{m+b-\sum \nu } + T^j(L)\]

H is purely passive (just ×3 each step); L alone drives the dynamics. If L converges, n converges, reducing by a factor ? 1/L.

Two conditions (verified in code) make this exact:

  1. \(\mathrm{max}(L-\mathrm{trajectory}) < 2^m\) — carry never reaches H.
  2. \(m + b \ge \sum \nu (L-\mathrm{trajectory})\) — H stays at a positive bit position.

Condition (2) binds: the required firewall scales with L’s stopping time, not its peak. This is a rigorous convergence lemma for an infinite class of numbers — but it only covers numbers already containing such a structure. Whether every trajectory acquires one is the open part.

(This is a concrete restatement of the standard 2-adic view: a wide zero run means n is 2-adically close to L, and the map is Lipschitz there.)


5. The Average Argument, Made Rigorous — and Its Ceiling

The \(\times 3/4\) drift is provable for explicit subclasses:

  • Trivial: \(n \equiv 1 \bmod 4\) descends in one step (half of all odds).
  • Terras / Everett (1976–77): The first k steps depend only on \(n \bmod 2^k\), and the parity-vector map is a bijection. A class with j odd-steps contracts iff \(3^j < 2^k\), i.e. \(j < k\cdot \mathrm{log}2/\mathrm{log}3 \approx 0.631k\). Since odd-steps are Binomial(k, ½) with mean k/2 < 0.631k, the density of contracting classes ? 1. Theorem: the set of integers with finite stopping time has natural density 1.
  • Tao (2019): Almost all orbits (logarithmic density) attain almost bounded values.

Why it is believed (the even-distribution heuristic). The drift is governed by one statistic: how often a step has \(\nu =1\). And \(\nu =1 \iff n \equiv 3 \bmod 4\), i.e. the number simply ends in \(\dots 11\) — half of all odd numbers. If the trailing bits along an orbit are unbiased (each two-bit ending at its natural rate), then

P(?=1)=½, P(?=2)=¼, …  ?  E[?]=2 > log?3 ? 1.585  ?  bit-length drifts down  ?  n ? 1.

This is borne out empirically: over ~570,000 odd steps sampled from ~80-bit starts, \(\nu =1\) occurs 49.97% of the time and average \(\nu = 1.998\) — exactly the unbiased prediction. *The evenness of the trailing bits is the contraction. Divergence would require a sustained skew: with \(\nu =1\) at fraction \(p\) and the rest following the natural tail (\(E[\nu |\nu \ge 2]=3\)), \(E[\nu ] = 3 – 2p\), so neutrality needs \(p = (3 – \mathrm{log}_{2}3)/2 \approx 0.71\) and growth needs more — a \(\equiv 3 \bmod 4\) rate of about 71%, a ~5:2 ratio, versus the natural 50%. (Not to be confused with the Terras threshold \(\mathrm{log}2/\mathrm{log}3 \approx 0.631\), which counts odd steps among all* steps, a different ratio.) There is no apparent mechanism for such a skew — the bits look balanced and Hamming density hovers at ½. The \(\nu \) distribution that drives all this matches the geometric \(1/2^k\) prediction to several orders of magnitude (Figure 5).

Figure 5. Distribution of \(\nu = \nu _{2}(3n+1)\) over odd n, compared with the geometric prediction \(P(\nu =k)=1/2^k\). The match holds across four orders of magnitude — the basis of the “even bits ? ×3/4 drift” heuristic.

Where the fuel comes from (and why the split stays even). A step lands on the growth class (\(T(n) \equiv 3 \bmod 4\), “fuel”) according to the bottom 3 bits of \(n\):

\(n\) ends in\(T(n) \equiv 3 \bmod 4\)?role
\(\dots 111\) (\(7 \bmod 8\))always (\(\nu =1\), \(T=(3n+1)/2\))definite feeder
\(\dots 011\) (\(3 \bmod 8\))never (? \(\dots 01\))definite exit
\(\dots x01\) (\(1 \bmod 4\))~50%, \(T = \mathrm{oddpart}(3\cdot (n\gg 2)+1)\)recursive in the high bits

(Careful: this table is about which \(n\) produce a fuel successor. A \(\dots 011\) number is itself \(\equiv 3 \bmod 4\), so it is a growth step and counts toward the rate — \(\dots 011\) and \(\dots 111\) both grow \(\times 3/2\). The distinction is only in their successor: \(\dots 111\) chains to another growth step, \(\dots 011\) exits to a descending one. So \(\dots 011\) is not “unusable” — it just doesn’t perpetuate the chain.)

The only deterministic fuel source is a \(\dots 111\) ending — but a trailing-1 run erodes one bit per step (\(\dots 111 \to \dots 11 \to \dots 1\)) and cannot be regenerated as structure (Sections 7–8). Once spent, the orbit falls into the \(\dots 01\) case, which is a fresh ~50% coin flip on the high bits. So the guaranteed source is self-limiting and everything else averages ½ — which is exactly why \(P(\equiv 3 \bmod 4) \approx \tfrac{1}{2}\) with no bias. A sustained skew would have to come from the recursive \(\dots 01\) branch landing on \(\equiv 3 \bmod 4\) more than half the time forever — a conspiracy in the high bits that nothing local forbids. That branch is precisely where an un-rule-out-able bias could hide.

The ceiling: “evenly distributed” is provable in the wrong senses and unprovable in the needed one.

sense of “trailing bits are even”status
across all odd numbers (static): exactly ½ are \(\equiv 3 \bmod 4\)proven, trivial — but an orbit visits one specific sequence, not “all numbers”
along almost all orbits ? those contractproven (Terras/Everett density-1; Tao almost-all)
along every orbit (no biased exception)= the conjecture

The trap is that an orbit is deterministic: the trailing bits of \(n_{i+1}\) are a fixed function of \(n_i\), so consecutive endings are correlated, not independent. “Looks random” is not “is random” — a deterministic sequence may carry any bias. Equivalently, a density-1 set is not forward-invariant: when \(n\) descends to \(n’\), the residue class of \(n’\) is uncorrelated with the contraction just used, so “almost all descend once” cannot be iterated into “all reach 1.” Proving no orbit’s mod-4 classes lock into a self-reinforcing \(\equiv 3 \bmod 4\) skew is exactly the open problem.


6. Cycle Structure

The conjecture has two independent obstacles: ruling out non-trivial cycles would not rule out escape to infinity, and vice versa. Neither is known. This section maps what the binary lens forces about any hypothetical non-trivial cycle; Sections 7–12 then turn to divergence.

Everything follows from one exact per-element rule (verified): under T,

\[n \equiv 3 \bmod 4 \implies T(n) > n\qquad \text{(ascends, $\nu$=1, $\times$3/2)}\]

\[n \equiv 1 \bmod 4 \implies T(n) < n\qquad \text{(descends, $\nu$$\ge$2; n > 1)}\]

Minimum and maximum (a dual pair).

  • The minimum of a non-trivial cycle ascends out of itself, so it is ? 3 mod 4 (it cannot be \(\equiv 1 \bmod 4\) — that would drop below itself; the trivial cycle’s \(1 = 4\cdot 0+1\) is the \(k=0\) boundary case).
  • The maximum descends out of itself, so it is ? 1 mod 4.
  • Hence both residue classes are mandatory: a cycle cannot be all-\(\equiv 3\) (strictly increasing) nor all-\(\equiv 1\) (strictly decreasing). \(4k+1\) numbers are not excludable from cycles — they are required, with the maximum among them.

Two-arc decomposition. The elements are distinct, so the minimum \(c_\mathrm{min}\) and maximum \(c_\mathrm{max}\) are unique; cutting the loop at these two points splits it into exactly two arcs — a rise \(c_\mathrm{min} \to \dots \to c_\mathrm{max}\) and a fall \(c_\mathrm{max} \to \dots \to c_\mathrm{min}\). Every element lies on exactly one arc, both arcs stay within \([c_\mathrm{min}, c_\mathrm{max}]\), and the net multipliers are \(c_\mathrm{max}/c_\mathrm{min} > 1\) (rise) and \(c_\mathrm{min}/c_\mathrm{max} < 1\) (fall), product 1 around the loop. The endpoints are pinned as above: \(c_\mathrm{min} \equiv 3 \bmod 4\) opens the rise, \(c_\mathrm{max} \equiv 1 \bmod 4\) opens the fall. Within an arc the path may wander up and down (local peaks and valleys); it need not be monotone. In fact, for every cycle but one it cannot be: a cycle whose rise is a single clean run of increases and whose fall is a single clean run of decreases is a 1-circuit, and Steiner (1977) [14] proved the only 1-circuit is the trivial \(1\to 4\to 2\to 1\). So \({1,2,4}\) is the unique non-choppy cycle; any other cycle must zig-zag — multiple local maxima on the rise, multiple local minima on the fall (the circuit structure below).

The rise tends to be the longer arc. The climb is rate-limited — the only way up is \(\times 3/2\) per step (\(+0.585\) bits) — while the fall can plunge (one large-\(\nu \) step drops many bits at once), so covering the same height needs more up-steps than down-steps. The one informative \(\times 3\) sibling cycle bears this out sharply: \(3n-1\)’s \({17\to 25\to 37\to 55\to 41\to 61\to 91}\) has rise 6, fall 1 (the fall is the single drop \(91\to 17\), \(\nu =4\)). It is only a tendency, not a law: a fall built from gentle \(\nu =2\) steps (\(-0.415\) bits each, slower than the climb) would be longer, and in \(5n+1\) — where the up-step is the faster \(\times 5/2\) — the cycle \({17\to 43\to 27}\) has the fall longer (rise 1, fall 2). And it carries no leverage for the bounds: growth (\(\equiv 3 \bmod 4\)) steps occur on both arcs, so the split neither isolates them nor sharpens the \(5:2\) count — it only makes the cycle’s shape vivid (a slow rate-limited crawl up, punctuated by deep plunges).

The cousins do cycle. The maps \(3n-1\), \(5n+1\), and \(3n+1\) on the negative integers all share the \(\mathrm{an}+b\) shape, yet each has genuine non-trivial cycles (\(3n-1\): \({5,7}\), \({17,\dots ,91}\); \(5n+1\): \({13,33,83}\), \({17,43,27}\)) — which is exactly what let us read real \(\times 3\) cycle data off them above. So the absence of a non-trivial cycle is special to \(3n+1\), not a property of the form: nearby members of the family cycle freely. That sensitivity is the elementary face of Conway’s theorem (1972 [15]) that the general \(\mathrm{an}+b\)-type iteration is Turing-complete and its reachability undecidable — the family is as wild as computation itself, and \(3n+1\) is just one (conjecturally tame) point in it.

The maximum is ? 5 (mod 12). Its odd predecessor \(P\) satisfies \(T(P)=M\); since \(M\) is the max, \(P<M\) forces the ascending case, giving exactly

\[P = (2M-1)/3, P \equiv 3 \bmod 4, P \approx \tfrac{2}{3}M.\]

For \(P\) to be a positive integer, \(2M \equiv 1 \bmod 3\), i.e. \(M \equiv 2 \bmod 3\). (If \(M \not\equiv 2 \bmod 3\), its only odd predecessor has \(\nu \ge 2\) and exceeds \(M\) — impossible for the max.) Combined with \(M \equiv 1 \bmod 4\): \(M \equiv 5 (\bmod 12)\), verified (e.g. \(13 \equiv 1 \bmod 3\) cannot be a peak; \(5, 17, 29, 41, \dots \) can).

And the maximum is never ? 9 (mod 16). A forward version of the same idea: \(M \equiv 9 \bmod 16\) forces \(\nu = 2\) exactly, so the successor \(T(M) = (3M+1)/4 \equiv 3 \bmod 4\) ascends, giving \(T^{2}(M) = (9M+7)/8 > M\) — impossible for the maximum (the cycle is closed, so \(T^{2}(M) \le M\)). The overshoot is special to \(\nu =2\): for \(\nu =3\) (\(M \equiv 13 \bmod 16\)) one gets \(T^{2}(M)=(9M+11)/16 < M\), and smaller for \(\nu \ge 4\), so only \(9 \bmod 16\) self-destructs. Combining with \(M \equiv 5 \bmod 12\): the maximum is \(\equiv 5, 17, \mathrm{or} 29 (\bmod 48)\) — never \(41 \bmod 48\) — sharpening the sieve from 1-in-12 to 3-in-48.

The predecessor’s low bits are \(\dots 011\). Since \(T(P) = M \equiv 1 \bmod 4\), and the mod-8 rule gives \(P \equiv 3 \bmod 8 \implies T(P) \equiv 1 \bmod 4\) while \(P \equiv 7 \bmod 8 \implies T(P) \equiv 3 \bmod 4\), the peak forces \(P \equiv 3 \bmod 8\) — three pinned bits. The 4th bit is free (\(P \equiv 3 \mathrm{or} 11 \bmod 16\), flipping with \(M \bmod 16\)). Structurally, \(3 \bmod 8\) is the exit residue and \(7 \bmod 8\) the stay-sticky residue (Section 7, below), so the maximum caps an ascending sticky-run, and \(P\) is the run’s terminal exit element — the same mechanism as the divergence-side fuel burns, which also peak at \(2\cdot 3^{k-1}-1 \equiv 1 \bmod 4\). The free 4th bit encodes the (invisible-to-the-peak) shape of the run that climbed in.

The predecessor must not be a multiple of 3 — excluding \(M \equiv 5 \bmod 9\). \(P\) is itself a cycle element, so it needs its own predecessor; but an odd multiple of 3 has none (\(3Q+1 \equiv 1 \bmod 3\), never \(\equiv 0\)), so odd multiples of 3 lie in no cycle. When \(M \equiv 5 \bmod 9\), \(P = (2M-1)/3\) comes out an odd multiple of 3 (e.g. \(M=5\to P=3\), \(M=23\to P=15\)) — contradiction. Hence the maximum is \(\equiv 2\) or \(8 (\bmod 9)\), never \(5 \bmod 9\) (an independent sieve, stacking with the mod-4 and mod-16 ones).

\(M \bmod 9\) fixes how the peak is approached. \(P \bmod 3\) (inherited from \(M \bmod 9\)) decides whether \(P\)’s own predecessor \(Q\) ascends or descends into it:

\(M \bmod 9\)\(P \bmod 3\)\(Q\) (predecessor of \(P\))shape before the peak
\(2\)\(1\)\((4P-1)/3 > P\) (descends in, \(\nu =2\))V-dip: \(Q \to P \to M\) — trajectory drops to \(P\), then jumps to the max
\(8\)\(2\)\((2P-1)/3 < P\) (ascends in, \(\nu =1\))steady climb: \(\dots \to Q \to P \to M\), unbroken ascent into the max

So the maximum either crowns a smooth ascending run (\(M \equiv 8 \bmod 9\)) or sits atop a V-shaped spike where the trajectory first dipped to the local minimum \(P\) (\(M \equiv 2 \bmod 9\)).

The surviving residues, and their binary shapes. Intersecting just the mod-16 and mod-9 sieves leaves 6 residues mod 144, which group into the three allowed \(\bmod 16\) classes (their low 4 bits), each paired across \(\bmod 9 \in {2,8}\):

\(\bmod 16\) (low 4 bits)residuessmall-rep shaperegime
\(0001\) (\(1\))\(17, 65\)\(2^k+1\) — sparse, lone high bit\(\nu =2\), gentlest drop
\(1101\) (\(13\))\(29, 125\)\(2^k-3 = (k-2 \mathrm{ones})01\) — near all-ones\(\nu =3\), steeper
\(0101\) (\(5\))\(53, 101\)alternating tail \(\dots 0101\) + cap\(\nu \ge 4\), steepest (firewall)

The clean \(2^k\pm c\) forms are an artifact of taking the smallest representatives (a genuine — astronomically large — maximum only shares the low 4 bits and the mod-9 class). But the three shapes are not artifacts: they are the three binary motifs that recur throughout this whole analysis — sparse, all-ones, and alternating. The \(2^k-3\) class is the cycle-side echo of the \(-1\) glider (Section 12, below); the \(0101\) class is the cycle-side echo of the explosive/firewall fuel (Section 9, below). The deeper maximality sieve (mod 32, 64, …) then carves into the higher bits of each.

The peak is a sharp spike: ascend in from \(P \approx \tfrac{2}{3}M\), drop out to the successor \(T(M) \le (3M+1)/4 \approx \tfrac{3}{4}M\).

An ascending-free zone below the peak. No cycle element in the interval \((P, M) = (\tfrac{2}{3}M, M)\) can be \(\equiv 3 \bmod 4\): an ascending step there gives \(T(x) = (3x+1)/2 > M\) (overshoot), since \(x > P \iff T(x) > M\). So that band is reachable only by descending (\(\equiv 1 \bmod 4\)) steps — the lone ascending entry is the final jump \(P \to M\) onto the peak itself. (When \(\nu (M)=2\), i.e. \(M \equiv 1 \bmod 16\), the successor \(T(M) = (3M+1)/4 \in (P,M)\) is one such descending occupant.) It is tempting to read this as “a third of the numbers just below \(M\) are barred from being growth steps, so the 5:2 ratio is harder.” It is not — the exclusion only relocates growth, it does not thin it. The band \((P, M]\) is just the top \(\mathrm{log}_{2}(M/P) = \mathrm{log}_{2}(3/2) \approx 0.585\) octave (\(M\) and \(P \approx \tfrac{2}{3}M\) share a leading magnitude), and all it asserts is that every \(\equiv 3 \bmod 4\) element sits at or below \(P\). The \(5:2\) count is therefore taken over \([c_\mathrm{min}, P]\) with \(P\) as its ceiling — exactly the band the next bound packs — so the exclusion is already paid for there, not lost. (Note this is a real width, not a negligible sliver: the range \(c_\mathrm{max}/c_\mathrm{min}\) itself is forced \(\ge ~3.23\) below, so \((P, M]\) is a sizeable top fraction, not 1% of some imagined 70-octave spread.) The constraint shapes where growth sits (at or below \(\tfrac{2}{3}M\), never in the top sliver), not how much of it there is.

The climb into the peak has length at most \(v_{3}(M+1)\). Trace ascending predecessors back from \(M\) via \(x_{j+1} = (2x_j – 1)/3\). Since that map has fixed point \(-1\), it has a closed form:

\[x_j + 1 = (2/3)^j \cdot (M+1).\]

This is an integer for exactly \(j \le v_{3}(M+1)\) steps (the 3-adic valuation of \(M+1\)) and then stops — \(3\) runs out of \(M+1\). So the maximal ascending chain that could end at \(M\) has length \(v_{3}(M+1)\), with rungs \(P, Q, R, \dots = (2/3)^j(M+1) – 1\). The actual ascending run feeding \(M\) in an orbit is a suffix of this chain — the preceding descent may land partway up — so its length is \(\le v_{3}(M+1)\), with equality only when the descent lands on the bottom rung. (Verified over 390,193 ascending runs from small and random ~80-bit starts: zero violations of the bound; equality \(\mathrm{runlen} = v_{3}(\mathrm{peak}+1)\) in 65.8% of runs, with the shortfall almost always just 1 — slack \(0/1/2 = 65.8\% / 31.2\% / 2.4\%\). So the bound is tight about two-thirds of the time. E.g. peak \(377\) runs \(2\) though \(v_{3}(378)=3\).) The bottom rung \(2^{v}(M+1)/3^{v} – 1\) is all-ones \(2^k-1\) (the §7 fuel reserve, below, viewed from the top) exactly when \(M+1 = 2^a\cdot 3^b\); otherwise generic. Either way it has no ascending predecessor, so in a cycle it is entered by a descent — a local minimum — giving the Steiner “circuit” shape: descent ? local minimum ? ascending run (length \(\le v_{3}(M+1)\)) ? peak \(M\) ? descent.

The mod-3 floor — and the source of the slack. The rungs have forced residues mod 3: every intermediate rung \(x_{1} \dots x_{v-1}\) is \(\equiv 2 \bmod 3\) (because \(x_j+1 = 2^j(M+1)/3^j\) still carries a factor of 3, so \(x_j \equiv -1\)), while only the bottom \(x_v\) can be \(\equiv 0 \bmod 3\). A \(\equiv 2 \bmod 3\) rung always has predecessors; a \(\equiv 0 \bmod 3\) number has none at all (\(3w+1 \equiv 1 \bmod 3 \ne 0\)), so it lies in no cycle and appears in no orbit except as a start. All-ones numbers \(2^k-1\) are \(\equiv 0 \bmod 3\) exactly when \(k\) is even (\(3, 15, 63, \dots \)) — so the “all-ones reserve” is unreachable precisely in the even-\(k\) case. The consequence: when the maximal bottom \(x_v\) is \(\equiv 0 \bmod 3\), the climb cannot reach it — the run must stop at a higher \(\equiv 2 \bmod 3\) rung fed by a descent. This is the slack measured above: the ~31% of runs that fall one short of \(v_{3}(\mathrm{peak}+1)\) are largely the cases where \(x_v \equiv 0 \bmod 3\) blocks the bottom rung. So the all-ones-÷3 obstruction (and \(\equiv 0 \bmod 3\) generally) doesn’t forbid \(M\); it sets the climb’s floor one or more rungs up, at the lowest \(\not\equiv 0 \bmod 3\) rung.

Climb grammar near the peak. The rungs obey a residue rule. A generator (ascending, \(\equiv 3 \bmod 4\)) strictly inside a band \((Q, P)\) maps into \((P, M)\), the ascending-free zone — so its image must be \(\equiv 1 \bmod 4\), forcing the generator to be \(\equiv 3 \bmod 8\) (the “exit” residue). A \(\equiv 7 \bmod 8\) (“stay-sticky”) generator strictly inside is forbidden: its image would be \(\equiv 3 \bmod 4\) in \((P, M)\) and overshoot \(M\) one step later. The \(\equiv 7 \bmod 8\) rungs are therefore exactly the main climb spine (\(P \equiv 3 \bmod 8\) is its terminal exit rung onto \(M\); the rungs below are \(\equiv 7 \bmod 8\)), and any extra generators must be \(\equiv 3 \bmod 8\) “exit” rungs that peel off into descending side-branches. Verified, \(M=53\): spine \(15\to 23\to 35\to 53\); the off-spine candidate \(27 (\equiv 3 \bmod 8)\) is allowed (\(\to 41\), descends) while \(31 (\equiv 7 \bmod 8)\) is forbidden (\(\to 47\to 71 > 53\)).

The descent off the peak is graded by \(M \bmod 16\) (the three surviving residues \(1, 5, 13\); \(9\) is excluded above). The successor’s depth is fixed by \(\nu \), even though its low bits are not:

\(M \bmod 16\)\(\nu \)\(T(M)/M \approx 3/2^\nu \)character of the drop
\(1\)2\(\approx \tfrac{3}{4}\)gentlest; \(T(M) \equiv 1 \bmod 4\) (descends again), fully fixed by 4 bits
\(13\)3\(\approx \tfrac{3}{8}\)steeper; \(\nu \) fixed, one more bit (bit 4) sets the successor’s class
\(5\)?4\(\le 3/16\), mean \(\approx \tfrac{1}{8}\)steepest and recursive — \(T(M) = \mathrm{oddpart}(3H+2)\) (a 3n+1 step on the high part), unbounded \(\nu \)

Each extra halving roughly halves the successor, so \(13\) and \(5\) are the steep-fall maxima. The depth is determined by the residue; the successor’s low bits are not — they scatter across \({01,11}\) as the higher bits vary (only special sub-cases pin them).

\(M \equiv 5 \bmod 16\) is the alternating/firewall regime, governed by one identity. Writing \(M = H\cdot 2^{2j+1} + A_j\) with \(A_j = (4^{j+1}-1)/3\) the pure alternating tail (\(101\dots 01\)), and using \(3A_j+1 = 2^{2j+2}\):

\[3M+1 = 2^{2j+1} \cdot (3H+2).\]

  • Uncapped (\(H=0\), never-ending alternating, the \(r_m=(2^m-1)/3\) residues): \(3M+1 = 2^{2j+2}\), a strict power of two ? \(T(M)=1\), total collapse in one step.
  • Capped (\(H\ge 1\)): \(3M+1 = [3H+2]\| [2j+1 \mathrm{zeros}]\) — the alternating run is spent to make a \(2j+1\)-zero firewall, and the cap survives as \(3H+2\) (so \(T(M)=\mathrm{oddpart}(3H+2)\), recursing on the cap). E.g. \(110101\): \(H=1\), \(3H+2=5\), \(3M+1 = 101\| 00000 = 160\); successor \(5 = \dots 01\). This is the §9 conservation law (below) made exact for cycle maxima: the alternating tail converts entirely to zeros, only the cap propagates.

What it gives — cycle lower bounds. Around a cycle with \(a\) odd steps and \(S = \sum \nu \) halvings, dropping the \(+1\)s would demand \(3^a = 2^S\) — impossible (coprime). Cycles can exist only via the \(+1\) corrections:

\[3^a \cdot \prod (1 + 1/(3x_{i})) = 2^S.\]

For large elements the correction product is tiny, forcing \(3^a \approx 2^S\) to extreme precision — so \(S/a\) must approximate \(\mathrm{log}_{2}3 \approx 1.585\) extraordinarily well. Good rational approximations to an irrational require large denominators (continued fractions / Baker’s theorem on linear forms in logarithms), forcing \(a\) large, hence the cycle long. Combined with the verified-to-\(2^68\) frontier, this yields the known bounds: a non-trivial cycle’s minimum element must exceed 2?? (? 3×10²?), and its length is bounded below — historically \(~1.7\times 10^{7}\) terms (Eliahou [12], at the verification of its era), far larger now (Simons–de Weger [13]).

Capping the range doesn’t work — but \(c_\mathrm{min}\) in the balance does (the explicit bound). A direct cap on the range is hopeless: the rise multiplies by at most \(3/2\) per step, so only \(c_\mathrm{max}/c_\mathrm{min} \le (3/2)^a\), astronomically loose. The leverage is instead that every element is \(\ge c_\mathrm{min}\), which pins the correction product:

\[1 < 2^S/3^a = \prod (1 + 1/(3x_{i})) \le (1 + 1/(3c_\mathrm{min}))^a \approx 1 + a/(3c_\mathrm{min}).\]

With \(c_\mathrm{min} > 2^{68}\), the right side is \(1 + ~10^{-13}\), so \(2^S/3^a\) lies within \(~10^{-13}\) of 1 — i.e. \(S/a\) must hit \(\mathrm{log}_{2}3\) to ~13–14 digits. Writing \(\delta = S – a\cdot \mathrm{log}_{2}3 > 0\), the requirement is \(\delta < a/(3c_\mathrm{min}\cdot \mathrm{ln}2)\), so a length \(a\) is viable only if its “upward gap” \(\delta \) is that small — which happens only at large continued-fraction denominators of \(\mathrm{log}_{2}3\). Marching through the convergents against the \(2^{68}\) frontier, the smallest viable length is \(a \approx 7.96 \times 10^{19}\) odd steps — the convergent denominator 79,641,170,620,168,673,833 (its upward gap is \(\delta \approx 0.017\)). So a non-trivial cycle with minimum above \(2^{68}\) must have on the order of \(10^{20}\) odd elements (total terms ~2.6× that) — the concrete version of “long.” Note what is and isn’t bounded: the range \(c_\mathrm{max}/c_\mathrm{min}\) stays unbounded; it is the length that the irrationality of \(\mathrm{log}_{2}3\) forces to be enormous. (This estimate captures the mechanism and order of magnitude via the convergents; the rigorous published bounds sharpen the constants with the same engine.)

The growth numbers occupy a sub-band — which can’t be flat. The \(\equiv 3 \bmod 4\) (ascending) elements — the ones the \(5:2\) count tracks — are confined to \([c_\mathrm{min}, P]\), where \(P = (2M-1)/3 \approx \tfrac{2}{3}M\) is the largest of them (the ascending-free zone \((P, M]\) holds none) and \(c_\mathrm{min}\) the smallest. So the relevant span is \(P/c_\mathrm{min} = \tfrac{2}{3}\cdot c_\mathrm{max}/c_\mathrm{min}\), not the full range. Now count the usable slots: a cycle element is coprime to 3 (a multiple of 3 has no predecessor), so among \(\equiv 3 \bmod 4\) integers — residues \(3, 7, 11 \bmod 12\) — only \(7, 11 \bmod 12\) are eligible, density \(\tfrac{1}{6}\), not \(\tfrac{1}{4}\). Packing the \((5/7)a\) ascending elements as distinct such integers needs

\[P – c_\mathrm{min} \ge 6\cdot (5/7)a = (30/7)a.\]

With \(a \ge 8\times 10^{19}\) and \(c_\mathrm{min} > 2^{68}\), that forces \(P \ge 6.4\times 10^{20}\), hence \(c_\mathrm{max} = (3/2)P\) and the clean \(c_\mathrm{min}\)-independent bound

\[c_\mathrm{max} \ge (3/2)c_\mathrm{min} + (45/7)a.\]

Equivalently \(c_\mathrm{max}/c_\mathrm{min} \ge 3/2 + (45/7)(a/c_\mathrm{min})\), which is \(\approx 3.23\) at the frontier (\(c_\mathrm{min} \approx 2^{68}\)). So a non-trivial cycle cannot be flat: its span exceeds \(~3.23\times \) near \(c_\mathrm{min} \approx 2^{68}\), relaxing toward the irreducible \(3/2\) (since \(c_\mathrm{max} = (3/2)P \ge (3/2)c_\mathrm{min}\)) for taller cycles. Density \(\tfrac{1}{6}\) is where this stops — beyond “not a multiple of 3” there is no further universal exclusion on a generic ascending element (the finer residue bounds apply only to the max/predecessor). And it remains a lower bound: like every counting fact here it sets a floor on the shape, not a ceiling that closes the question.

The floor \(c_\mathrm{min}\), by contrast, is verification-bound — packing cannot lift it. Note the asymmetry: the band argument constrains the width (\(P – c_\mathrm{min} \ge (30/7)a\)), which pushes \(c_\mathrm{max}\) up given \(c_\mathrm{min}\) — but it says nothing about how high the band must start. Nothing in the dynamics forces elements low (they may sit arbitrarily high), so density yields no rigorous floor on \(c_\mathrm{min}\); the only lower bound is direct verification (\(c_\mathrm{min} > 2^{68}\): every smaller \(n\) is computed to reach 1), a per-number fact stronger than any count. The log-uniform bottom-octave heuristic merely recovers \(~2^{68}\) — and circularly, since the length \(a\) was itself derived from \(c_\mathrm{min} > 2^{68}\). So the counting/coprimality machinery is a width / \(c_\mathrm{max}\) / ratio tool; the floor is a computation. Raising \(c_\mathrm{min}\) needs more verification, not more counting.

Why this makes cycles heuristically impossible. \(S/a \approx \mathrm{log}_{2}3\) means the cycle’s average \(\nu \) equals \(1.585\) — i.e. \(\nu =1\) at the ~71% (5:2) rate of Section 5, versus the natural 50%. For the probabilistic model (\(\nu \) iid geometric, mean 2, variance 2), holding the sample mean down at \(1.585\) over \(a\) steps is a deviation of \(\approx 0.415\cdot \surd (a/2)\) standard deviations. At the forced length \(a \sim 8\times 10^{19}\) (previous paragraph) that is ~2.6×10? standard deviations, a Gaussian-tail probability on the order of \(10^{-1.5\times 10^{18}}\). Worse, the exponent grows linearly with the length the bounds force — so the longer a cycle must be, the more impossible it becomes, and summed over all candidate lengths the heuristic expected number of non-trivial cycles is essentially zero. (This is the cycle-side analogue of the even-distribution heuristic in Section 5; it rests on the same unprovable pseudorandomness assumption — the \(\nu \) sequence of a deterministic orbit need not behave like iid draws.)

Ceiling. The congruence sieves (\(M \equiv 1 \bmod 4\), \(M \equiv 5 \bmod 12\), \(P \equiv 3 \bmod 8\)) and the spike bounds each remove a constant fraction of candidates and feed the size bound — but \(\mathrm{log}_{2}3\) can be approximated arbitrarily well given enough steps, so they push the minimum cycle size upward without ever reaching zero. Same character as the divergence side: every constraint is sharp and real; none closes the door.


7. Fuel I — Consecutive 1s (Growth, but Weak and Finite)

A consecutive run of growth requires staying in the \(\equiv 3 \bmod 4\) class (?=1) step after step. (Note: net divergence does not need this — only an above-average frequency of ?=1, scattered, which needs no structure; see Section 13. This section is about the structured, consecutive route.) Which numbers stay ?=1 in a row?

Uniqueness theorem (verified): the only residue mod \(2^{j+1}\) that stays ?=1 for j consecutive steps is the all-ones residue \(2^{j+1} – 1\). Concretely: stay-sticky-1-step = \(3 \bmod 4\), 2-steps = \(7 \bmod 8\), 3-steps = \(15 \bmod 16\), … There is no alternative structure — a length-j growth run requires j+1 trailing 1s.

For the all-ones number \(n = 2^k – 1\), the run has an exact closed form:

\[a_i = 3^i \cdot 2^{k-i} – 1\qquad \text{(eats one trailing 1 per step)}\]

It grows ×3/2 for exactly k?1 steps, then exits at \(2\cdot 3^{k-1} – 1 \equiv 1 \bmod 4\) and descends. Maximal fuel is a finite reserve: k ones buy k?1 growth steps.


8. Fuel I, continued — “No Free Fuel”

To reach \(2^k – 1\) (not start there), the nearest odd predecessor has exactly k+1 bits (it is ? 4/3 larger, pushed over the \(2^k\) boundary). And the all-ones numbers split:

  • k even (\(3, 15, 63, 255, \dots \)): divisible by 3. Since \(3n+1\) is never divisible by 3, these have no odd predecessor at all — isolated islands, reachable only on their own doubling chain.
  • k odd (\(7, 31, 127, \dots \)): reachable, but only by descending into them from a strictly longer (k+1-bit) ancestor.

Conclusion: you can never climb into maximal fuel. Pure fuel is always downhill; assembling a length-k all-ones block costs a number of length k+1 — you compact and lose ?1 bit. Fuel cannot be bootstrapped upward.


9. Fuel II — Alternating Structure (Explosive, but Self-Emptying)

The other way to get “power” is a long carry cascade, which needs the alternating pattern. But propagation converts 1s into 0s:

nones(n)ones(3n+1)firewall created
1365 = \(10101010101\)6112 zeros
1367 = \(10101010111\)739 zeros
341 = \(101010101\)5110 zeros

Conservation law: carry distance = 1s destroyed = zeros created. The cascade spends the alternating fuel to manufacture the very firewall (Section 4) that then isolates and collapses the number. Power converts directly into void. The alternating “residues” \(r_m = (2^m-1)/3\) are the pure form of this fuel (Figure 6): each is a \(101\dots 01\) pattern, and \(3\cdot r_m + 1 = 2^m\) exactly — a clean power of two.

Figure 6. The alternating residues \(r_m = (2^m-1)/3\). Left: their binary patterns (pure \(101\dots 01\)). Right: the firewall width each produces — \(3\cdot r_m+1 = 2^m\) is an exact power of two, so the whole number collapses to a single 1.

10. The Complete Structural Picture

There are exactly two sources of power in the binary string, and the carry mechanism dooms both:

FuelBehaviourWhy it cannot sustain divergence
Consecutive 1sslow ×3/2 climbweak, finite (burns out in k?1 steps), cannot be bootstrapped (downhill to reach)
Alternatingone large carry cascadethe explosion is a firewall — it destroys its own 1s to create the void that collapses it

No third option: a carry either fizzles (no effect) or propagates (destroying the 1s that enabled it). You cannot have a large cascade that preserves its fuel — that is contradictory by construction. This is the structural reason behind the average \(\times 3/4\) drift, with both tails of the ?-distribution explained: small ? = weak bounded growth, large ? = self-emptying collapse.

This dooms every structured route to divergence. What it does not address is the unstructured one: a scattered, above-average frequency of ?=1 steps (numbers merely ending in \(\dots 11\), no all-ones block, no run) sustained over infinitely many steps. That carries no fuel to preserve and so dodges this argument entirely — it is the lone survivor of Section 13.


11. The Game of Life Analogy

The binary trajectory updated by a fixed local-ish rule invites comparison to cellular automata (CA) — and in particular to Conway’s Game of Life, where a glider is a finite pattern that, after a fixed number of steps, reappears unchanged but translated: a self-sustaining structure that neither dies nor blows up. The Collatz analogue of a glider is a binary pattern that perpetually reconstructs the conditions for its own growth — i.e. a divergent trajectory or a non-trivial cycle. The conjecture, restated, is: 3n+1 has no glider.

The comparison is sharper than a slogan:

  • In 1-D cellular automata (Wolfram’s elementary CAs), whether gliders exist is decidable for any fixed period and speed, because the rule is local (radius 1) — a glider’s influence spreads at bounded speed, so a finite enumeration (de Bruijn diagrams / transfer matrices) settles it. Wolfram’s Class I/II rules provably have none; Class III/IV (e.g. Rule 110) do.
  • Collatz looks Class III/IV — complex, sensitive, plausibly Turing-complete — yet behaves Class I (everything dies to one fixed point). The obstruction to importing the CA decision procedure is exactly that Collatz has no locality: a single carry can, in principle, couple bit 0 to bit 60. The “firewall” of Section 4 is an attempt to manufacture locality — and it works conditionally, but proving every trajectory creates one is the open problem.

So the right question is not “does a glider exist?” but “can a finite configuration sustain glider-like behaviour without the carry mechanism eventually destroying it?” — which Sections 7–10 answer “not by either available fuel,” and Section 12 answers precisely in the 2-adics.


12. The Glider That Exists — in the 2-adics

A perfect glider does exist, but not as a positive integer. In the 2-adic integers, \(-1 = \dots 11111\) is a fixed point of the shortcut map:

\[3\cdot (-1)+1 = -2 = \dots 11110, \div 2 \to -1.\]

The finite all-ones numbers \(2^k – 1\) are its truncations — which is exactly why they are the maximal growth runs: they track ?1 for k?1 steps before their finiteness leaks at the top. So divergence isn’t logically impossible; the obstruction is arithmetic approximation — whether a finite integer can chase ?1 to ever-greater lengths. (In CA terms: the glider lives on the infinite tape \(\dots 1111\), and every finite truncation of it decays.)

Gun vs glider. The distinction matters. A Life glider conserves mass and merely translates; a Life glider gun creates, growing without bound — possible only because Life has no conservation law. Collatz does (Section 9: carry converts 1s?0s, fuel is never created on net), so the gun is forbidden — no configuration perpetually manufactures fresh trailing-1 fuel. The glider is not killed by conservation alone, but a finite one is squeezed by two opposing constraints:

  • Anchored below. Growth needs trailing 1s (2-adic closeness to ?1). Erode that anchor to a single 1 (\(\equiv 1 \bmod 4\)), or collapse the low part to a power of 2, and the number detaches from ?1 and falls — to 1.
  • In vacuum above. A carry cascade into high structure destroys 1s and creates a zero firewall (Section 9). So a glider cannot grow through structure, only into emptiness — its top must climb into leading-zero vacuum.

The all-ones run shows the geometry exactly: \(111 \to 1011 \to 10001\), the top climbing into vacuum while the anchor erodes (3?2?1 trailing 1s), then detaching. The anchor is finite fuel; refilling it would need the forbidden gun. The one crack: the anchor need not be steady — it could flicker, short anchors re-established by chance often enough to net-grow. Conservation bars a deterministic gun, not a statistically lucky infinite sequence of re-anchorings. That such luck cannot run forever is exactly the conjecture.


13. Where the Wall Stands

Every structural fact above is true per step, per run, or per residue class. Both obstacles fragment into mechanisms that are individually dead, with one survivor each:

Divergence:

MechanismStatus
One infinite growth runDead — capped by bit-length
Climbing into ever-longer all-ones runsDead — entered only by descent / isolated (Section 8)
A finite integer being the gliderDead — the glider is ?1, not an integer (Section 12)
Many short, impure runs regenerating in aggregateOpen = half the conjecture

Cycles:

ConstraintEffect
min ? 3 mod 4; max ? 5 mod 12, ? 9 mod 16, ? 5 mod 9; predecessor ? 3 mod 8, ? 0 mod 3congruence sieves — thin candidates by constant factors (Section 6)
\(3^a \ne 2^S\) exactly; only \(\approx \) via \(+1\)forces \(S/a \approx \mathrm{log}_{2}3\), hence huge \(a\)
A small non-trivial cycleDead — bounded out to enormous size
A large non-trivial cycle (good \(\mathrm{log}_{2}3\) approximation)Open = half the conjecture

Both survivors are infinite-aggregate properties: for divergence, enough ?=1 steps to grow without ever exploding or exhausting the impure fuel, forever; for cycles, a step-count ratio approximating \(\mathrm{log}_{2}3\) well enough to close a loop, which only ever requires more steps. Sharp local facts — per step, per run, per residue — do not compose into infinite-aggregate conclusions. That gap is precisely where Collatz has stood since 1937.


14. Honest Assessment

This is expository, not novel. The binary/carry framing is a concrete shadow of standard 2-adic Collatz analysis [4, 5, 6]; the density results are Terras/Everett [2, 3]; the state of the art is Tao 2019 [7]. What the binary lens adds is intuition — making visible, via fuel and firewalls and carry conservation, why the average drift is downward and why that cannot be promoted to a proof. The value is pedagogical: a tangible account of a problem usually stated abstractly.


15. Relation to Known Results

None of the mathematics here is new; the binary lens is a re-derivation of established theory. The correspondence, made explicit:

In this documentEstablished source
Shortcut map \(T(n)=(3n+1)/2^\nu \); parity of \(\nu \)standard; Terras [2], Lagarias survey [1]
Binary “firewall” = independence across a zero run2-adic continuity of the Collatz map [4, 5, 6]
\(-1 = \dots 111\) as the all-ones fixed point (the “glider”)textbook 2-adic fact [5, 6]
“Conservation law” (carry spends 1s ? zeros)a re-description of the \(\nu _{2}(3n+1)\) arithmetic
Density-1 descent; the \(\mathrm{log}2/\mathrm{log}3 \approx 0.631\) break-evenTerras [2], Everett [3]
“Almost all orbits attain almost bounded values”Tao [7]
Cycle balance \(3^a \approx 2^S\) ? \(\mathrm{log}_{2}3\) approximation ? size boundsCrandall, Steiner, Eliahou [12], Simons–de Weger [13]
Cycle min \(\equiv 3 \bmod 4\); no element \(\equiv 0 \bmod 3\); congruence sieves on the extremaelementary folklore in the cycle literature

The specific small artifacts — \(\mathrm{max} \not\equiv 9 \bmod 16\), \(\mathrm{max} \in {17,29,53,65,101,125} \bmod 144\), the binary-shape grouping of those residues — may not appear verbatim elsewhere, but they are immediate consequences of the standard residue/maximality method, not new facts. They are new expressions, not new results.

What the exercise does offer, honestly stated: (i) a from-scratch reconstruction of the 2-adic structure using only elementary binary reasoning; (ii) a unified narrative in which the same three binary motifs — sparse, all-ones, alternating — govern both the divergence and cycle obstacles; and (iii) a clear demonstration of why every elementary structural attack hits the same wall: each constraint is a constant-factor sieve, and constant-factor sieves push the floor up without ever reaching zero, because sharp local facts do not compose into infinite-aggregate conclusions. The lesson is about the shape of the obstruction, not about new ground gained.


16. References

Collatz / 3n+1

  1. J. C. Lagarias, The 3x+1 problem and its generalizations, American Mathematical Monthly 92 (1985), 3–23. (The standard survey; see also Lagarias, ed., The Ultimate Challenge: The 3x+1 Problem, AMS, 2010.)
  2. R. Terras, A stopping time problem on the positive integers, Acta Arithmetica 30 (1976), 241–252. (Parity vectors; density-1 finite stopping time.)
  3. C. J. Everett, Iteration of the number-theoretic function f(2n)=n, f(2n+1)=3n+2, Advances in Mathematics 25 (1977), 42–45.
  4. H. Möller, Über Hasses Verallgemeinerung des Syracuse-Algorithmus (Kakutanis Problem), Acta Arithmetica 34 (1978), 219–226. (2-adic formulation.)
  5. G. J. Wirsching, The Dynamical System Generated by the 3n+1 Function, Lecture Notes in Mathematics 1681, Springer, 1998.
  6. K. R. Matthews, Generalized 3x+1 mappings: Markov chains and ergodic theory, in [1, The Ultimate Challenge]. (2-adic/ergodic background.)
  7. T. Tao, Almost all orbits of the Collatz map attain almost bounded values, Forum of Mathematics, Pi 10 (2022), e12; arXiv:1909.03562 (2019).

Cellular automata / Game of Life

  1. M. Gardner, Mathematical Games: The fantastic combinations of John Conway’s new solitaire game “life”, Scientific American 223 (Oct. 1970), 120–123.
  2. E. R. Berlekamp, J. H. Conway, R. K. Guy, Winning Ways for Your Mathematical Plays, 2nd ed., A K Peters, 2001–2004. (Life: gliders, guns, universality.)
  3. S. Wolfram, Computation theory of cellular automata, Communications in Mathematical Physics 96 (1984), 15–57. (de Bruijn diagrams, symbolic dynamics.)
  4. S. Wolfram, A New Kind of Science, Wolfram Media, 2002. (Elementary CA classification; full text at wolframscience.com.)

Cycle bounds

  1. S. Eliahou, The 3x+1 problem: new lower bounds on nontrivial cycle lengths, Discrete Mathematics 118 (1993), 45–56.
  2. J. Simons, B. de Weger, Theoretical and computational bounds for m-cycles of the 3n+1 problem, Acta Arithmetica 117 (2005), 51–70. (See also R. Crandall, Math. Comp. 32 (1978), 1281–1292.)
  3. R. P. Steiner, A theorem on the Syracuse problem, in Proc. 7th Manitoba Conf. on Numerical Mathematics (1977), 553–559. (No non-trivial 1-circuit cycle.)

Undecidability of the general family

  1. J. H. Conway, Unpredictable iterations, in Proc. 1972 Number Theory Conference (Univ. of Colorado, Boulder, 1972), 49–52. (Reachability of generalized \(\mathrm{an}+b\) maps is undecidable; basis of FRACTRAN.)

17. Artifacts

  • collatz_explorer.py — trajectory analysis, ? / Hamming / zero-run / alternating-run metrics, carry roles & reach, firewall-isolation verifier, r_m residues.
  • collatz_viz.py — overview heatmaps, 3-panel shortcut heatmap (bits / carry roles / ? & reach), ?-distribution, r_m patterns, multi-trajectory.
  • collatz_carry.py — carry-cycle visualizations (generic vs alternating).
  • collatz_discussion.md — the full chronological Q&A log this writeup summarizes. (not shared)

https://github.com/perlauge/collatz

Key reproducible facts: python3 collatz_explorer.py 27, python3 collatz_viz.py 1365.


18. One-Paragraph Summary

In binary, Collatz descent comes from two facts about carries: consecutive-1 “fuel” produces only weak (×3/2), finite, un-bootstrappable growth, and the alternating structure needed for a far-reaching carry cascade is destroyed by that very cascade (1s ? 0s, creating a collapsing firewall). Together these explain the average ×3/4 drift completely. The perfect self-sustaining structure exists only as the 2-adic integer ?1, which no finite integer can be. On the cycle side, the same per-element rule pins any non-trivial cycle’s extremes — minimum ? 3 mod 4, maximum ? 5 mod 12 with immediate predecessor ? 3 mod 8 (the terminus of an ascending sticky-run, identical to the divergence-side fuel burns) — and the impossibility of \(3^a = 2^S\) forces the step-count ratio to approximate \(\mathrm{log}_{2}3\), making any cycle astronomically large. Both obstacles fragment the same way: every mechanism is provably dead except one infinite-aggregate survivor each — impure short runs regenerating forever (divergence), or an arbitrarily-good \(\mathrm{log}_{2}3\) approximation closing a loop (cycles). Each is exactly half the conjecture, unreachable because sharp per-step, per-run, and per-residue facts never compose into an infinite-aggregate guarantee.

Benford’s Law

august 28th, 2022

I was curious as to what I actually could do with Benford’s Law. Could I detect anomalies that would warrant questioning? Where could I apply it?

So – first off – what is Benford’s Law – or the Newcomb-Benford law? It is a natural occurring distribution if the domain spans several orders of magnitude. It states that 1 is then – by far – the most prevalent most significant digit. The theoretical frequency is found by applying log10(1+1/digit).

Let’s say our data set is based upon the number of people living in cities, then the cities with 10-19 million, 1 million, 100 -199 thousand, 10-19 thousand, 1 thousand, 100-199, 10-19, and 1 citizen would cover a little more than 30% of the cities, whereas cities with 20-29 million, 2 million, 200-299 thousand, 20-29 thousand, 2 thoushand, 200-299, 20-29, and 2 citizens would cover 17.6% of the cities.

I find this counter intuitive, and thus very fascinating.

I had a bank statement with deposits and withdrawal – 212 in total – I though, why not try to apply Benford’s law to that lot and see how well it fits, and if not, if there are any explanation.

To get the most significant digit I chose to use the Log10 function basically writing the number in scientific notation. Then taking the exponent and divide the original amount to get a power of one, then finally take the absolute integer value:

abs(as.integer(amount/10^floor(log10(abs(amount)))))

Let’s assume the amount is 12345, then log10 would give us 4.091491, we then divide 12345 by 10^4 to get 1.2345, then take the integer value – not rounding – this gives us 1.

Using dplyr in R to get the frequency for this, I used:

data_hist <- as.data.frame(data %>% group_by(digit=abs(as.integer(amount/10^floor(log10(abs(amount)))))) %>% summarize(count=n()) )

Which gave me:

digitcount
154
229
342
418
524
615
711
86
913

To add the Benford values simply add the values:

data_hist$benford = log10(1+1/data_hist$digit)

We would also like to have the frequency as a percentage, not as an observation count.

data_hist$freq = data_hist$count / sum(data_hist$count)

Now, we could use these two values to calculate which digits are the most off either simply using freq/benford or (freq-benford)/(freq+benford) – but let’s just plot the data as a bar chart along with the Benford curve.

ggplot(data_hist, aes(x=digit, y =freq)) + geom_bar(stat=”identity”) + geom_line(aes(x=digit, y = benford)) + labs(title = “Bank payment most significant digit: Benfords law”) + scale_x_discrete(“digit”, data_hist$digit, waiver(), factor(data_hist$digit))

It would seem tha 3, 5, and 9 are overrepresented, which begs the question: Why?

It turns out, that I have subscriptions in here and those are not random, and will thus skew my distribution.

While this was relatively easy to do, and I did spot a few outliers, it seems I didn’t quite get what I was looking for. And the 4 subscriptions didn’t expose themselves in this plot – maybe if the other subscriptions had been removed. Studies for another day.

Pipe Dream

august 29th, 2016

In which we speculate about software reuse, serverless architecture, and find out it is all a pipe dream in the end.

Let us begin in a different business: Plumbing. I can go to an online store and buy pipes and fittings, taps, heaters, and a lot of other stuff. Components can be bought for electricians and electronics in a similar fashion. That is, if I want, I can buy components and put them together as I see fit – luckily you have to be certified to do create installations in your home – at least in Denmark. Software on the other hand is almost always bought in complete installations.

While it did not make any sense to buy a host of single components in the old days, there is basically no obstacles to do so today. Amazon (and other *aaS vendors) sell by the second or even millisecond, which means you could make micropayments for micro – or even nano – services. It seems it should be possible to have a McIlroy (bash) pipe connection of components across the internet with differentiated payment schemes. Perhaps a Mac Automator for the web.

The step up from “basically no obstacles”

Communication – both the format and the pipes are obstacles. In plumbing, you cannot seamlessly fit a 1 inch to a 2 inch pipe, which is visibly different. If water runs through the system, then water runs through every part. In IT the common denominator is text. Sometimes this seems too simple, but it works quite well for bash programming.

The other element, the pipes or rather the network. Networks are slow compared to local disk access, yet n-distributed parallel execution is extremely fast, and with a forwarding address on the output, i.e. redirection of STDOUT in bash, the data would not have to return to the origin for every segment.

This then suggests that there should be a protocol along the lines:

setup: parameters, forward to, payment key
execute: stream of data

Having such a setup would allow you to actually define the schema of the data – if it conforms to a schema. This in turn would allow SQL like filtering, but that really depends upon the service.

Why would I want this?

Well, first off, real re-use. It seems silly that we’re re-inventing the wheel every time a new Rails/Django/Drupal/… is installed. It seems unnecessary that a non-programming customer should have to setup and maintain a webservice to do something which is already possible to do by using existing components possibly in a different way. Even if the customer does this, everyone else with the same ideas would have to do the same for themselves.

Naturally I want it because I’ve run into some missing functionality in one product, that I could construct myself or use from another vendor, but without the pipe and the linking it is not possible.

Whether it is a poor search functionality in a webshop, an e-mail filtering/handling service, or news feeds I’d like to filter – I have to do it myself, and I have to pull the information.

Why it is all a pipe dream after all

Spinning up the services can take a lot of time. Sending massive amounts of data across the web will congest an already congested infrastructure. Hoping for every service provider to provide a standardized service interface is in itself a pipe dream.

Pipe dream – a hope, wish, or dream that is impossible to achieve or not practical (Merriam-Webster)

Programming by the pound

august 21st, 2016

In which we try to figure out why some programs are worse off for the end users than they need to be.

We are likely to spot the root cause and less than pleasing hairdos if a hairdresser cut by the pound, that is, with no regards to style or aesthetics, but rahter focused on cutting a pound of hair off each customer.

We’d see: bowl cuts, bald, half-bald, … Speed and weight is of the essence.

The source code of some programs looks as if they have been created in a similar fashion. Just push code in without a sense of style or aesthetics. Copy and paste, ad hoc design from the beginning, Rube Goldberg like amendments.

Most people will not be able to see this, as they only have the user interface to interact with, if the software ever hits the market.

The programmer on the other hand will never learn any better. Basically he is doing a stellar job – the customers get served in a speedily manner, and there are no complaints. As mentioned above, the customers are less likely to spot the ugliness as they don’t see the code. The programmer rarely sees other peoples code, and likely cannot tell the difference between a good style and his own.

Sort of having a blind barber tell you: “Looks good to me.”

But if the end users are happy with the delivered product should we care at all? Well that depends, we’re all paying the price. The price for the software has to be earned somewhere. If the productivity of the end users doesn’t increase to cover the cost, then the choice to get the software in the first place was a bad idea. The cost of updates and additions goes up the more convoluted the source code is, because it will take longer to fit the new requirements into the existing code. Starting from scratch means losing the entire investment and betting on another system with a high probability of having the exact same issues. You may even lose the part of your data.

I’m not arguing that the hairdressers shop should be a cathedral, rather that it shouldn’t be a shanty building with wires and hoses dangling in dangerous positions.

Whenever you spot a user interacting with a user interface, and they become frustrated, odds are that the developers had more focus on programming by the pound getting things done than actually enabling their customers to get things done.

The architects grievance with MVC

juli 19th, 2016

MVC is a separation of concerns pattern in which you have a Model, a View, and a Controller for a given entity. The Model should contain the data and methods for the entity while the view – or views – are responsible for visual representation of the model, e.g. a pie chart or a bar chart of data. The controller is responsible for the providing the model with command requests.

Often, when developers are trying to follow the MVC pattern they follow the pattern as implemented by Rails; all the models go into the app/models directory, all the views reside in app/views, and all the controllers will be found inside the app/controllers directory.

This is comparable to designing a house and have a special room for all faucets, power outlets, and drains, and another room for all the handles and switches.

The faucet you would usually find in the kitchen will now be labelled “kitchen” but reside in the faucet room, and will likely sit next to the faucet labelled “bathroom”.

You could run a hose from the faucet to the kitchen, but that would only save some trouble. The handle for turning on and off the water resides in the controller room, you have the “kitchen faucet” controller. Next to these you may have the power on/off switches for the oven.

This construct is quite easy for the installers to set up, for the software equivalent this is easy on the framework.

But we are not building houses to please the work crew, but rather for the ease of living. We should focus upon the user experience as well, when we write code.

What we are achieving by this model is a high cohesion in unrelated entities performing the same role, which is contrary to what Larry Constantine suggested in 1968. Teasing apart the application for reuse is much more difficult – we cannot easily swap out one kitchen for a different one.

The better structure would be to have the strongly related entities in the same place, i.e. instead of:

models/
  kitchen
  bathroom

views/
  kitchen
  bathroom

controllers/
  kitchen
  bathroom

it would make sense to have:

kitchen/
  model
  view
  controller

bathroom/
  model
  view
  controller

At least this would easily identify the views associated with a specific model, and if otherwise keeping with modular discipline should make it possible to pull out one entity.

Logic in the controller

Sometimes you run across a project where there is (model) logic in the controller, but that is a bad idea. It should be possible to keep the controller and change the model implementation, e.g. my keyboard (controller) does not have to change because my application changes or the keyboard layout changes. The controller should send events to the provided model to be interpreted there.

If you have logic in the controller, then you will need to change both the controller and the model, when you make a change, that means you have one more element in the cognitive load, which makes things just a bit more complicated. Complication that does not have to exist.

It seems that by tooling we are building software that is easy for the frameworks and the original constructors, but not good for those who have to maintain or live with the product. That is simply not the right way to be service minded.

Snake oil and “everybody can program”

juli 6th, 2016

Everyone can program, but not necessarily code – I fully agree with Quicy Larson: Coding isn’t easy.

I believe that all of us are capable of programming, that all of us can – in the Socrates Meno dialogue way – we have the ability to describe a set of procedures to apply in a given order.

We’re all capable of writing novels, but very few of those who do will make a successful novel.  That is to say: “It is not as easy as it looks or sounds.”

All of us can cook, though we’d be pressed to get a Michelin star.

Not all of us can write these procedures in a programming language, and – of those who can – not all should. And not all software should be written in a procedural/imperative style.

Some will throw together programming language correct grammar with no regards to the task being solved. I’m not sure this should constitute as programming. It is true that working software provides value to the user, but with little understanding of the needed solution in both behaviour and coding, there is so much more value to be had by doing it right (but there are so many more ways to do it wrong).

If they had worked the same level in the restaurant business, it is likely Gordon Ramsay would have called it a Kitchen Nightmare. As no one can see the internals of the code fewer customers turn away from the business, and the parallel fails in that we usually eat every day, we don’t get a new software product served every day.

In the business world scarcity with increased demand means prices will go up, leading to more resources being applied. In software development, this leads to people who really shouldn’t program are being hired to hack away at the next big thing.

As a society we are not better off having poorly constructed “cathedrals” forced upon us. If everytime we need to go through a door, we will have to use jump through hoops, we would be quick to remedy this odd contraption, but in the software world, there are usually no such way.

The pursuit for infinite savings allows for expenses now, but apparantly not investment in solid work, nor the hidden value by improving the software.

I am still wondering why there is so little regulation in a field so wide and with so far reaching consequences. Why do people accept snake oit?

Simple insights into source code bases

april 8th, 2016

Does the code base scream the domain?

I was wondering whether or not it would be possible to use parts of PageRank (https://en.wikipedia.org/wiki/PageRank) to gain insights into a code base. If PageRank works on web pages to ascertain what the contents of the page relates to, then likely a similar way could be construed for source code.

The simplest thing that could possibly work?
I chose the n-gram (https://en.wikipedia.org/wiki/N-gram) approach – unigram to be specific. While bi- and tri-grams are better for text, I’m not so sure for code bases, nevertheless, it could be tested.

The simple process

  • Find all files of a specific language inside the project structure. Likely it would be prudent to examine source and test code independently
  • Remove all for of new-lines
  • Tokenize on non alphanumeric entities
  • Build histogram of these tokens

Removing comments and possibly strings would likely be a good idea, but that would require parsing and not just bash.

find . -name “*.java” -type f | xargs cat | tr -d ‘\n’ | tr -d ‘\r’| tr -cs ‘[:alnum:]’ ‘\n’ | sort | uniq -c | sort -rn > wordfreq.txt

Looking at gerrit’s word frequency, we get something along these lines:

  • 27560 import
  • 25092 the
  • 21758 com
  • 21385 google
  • 16615 License
  • 14676 public
  • 14544 gerrit
  • 13553 String
  • 12309 final
  • 11823 return
  • 10431 private
  • 10191 new
  • 9809 if
  • 8940 this
  • 8196 0
  • 7665 in
  • 7225 void
  • 7163 under
  • 6809 a
  • 6590 null
  • 6389 server
  • 6234 client
  • 6185 static
  • 6125 for
  • 6024 2
  • 5965 org
  • 5953 to
  • 5384 or
  • 5212 class
  • 4972 may
  • 4963 Override
  • 4934 name
  • 4923 get
  • 4752 distributed
  • 4666 of
  • 4602 java
  • 4492 throws
  • 4392 n
  • 4164 is
  • 3705 e

Reading it “import the com google License public gerrit String final return private new if this 0 in void under a null server client static for 2 org to or class may Override name get distributed of java throws n is e” doesn’t quite make sense. Clearly the “License” and namespace “com.google” influences heavily.

Removing the keywords we get:
“the com google License gerrit 0 in under a server client 2 org to or may name get distributed of n is e”

It is not as if the source code really screams what gerrit is about. From Chinese Whisper reconstruction I get something about a “client server with name distribution” – not quite the “Gerrit provides web based code review and repository management for the Git version control system” tagline.

The frequency count drops rapidly – let’s pull the data into R to see if there are some patterns.

gerrit <- read.table(“wordfreq.txt”, header=F)
f <- as.data.frame(table(gerrit$V1))
f$Var1 <- as.numeric(as.character(f$Var1))
plot(log(f), type=”l”, xlab=”log(frequency)”, ylab=”log(count)”, main =”Gerrit source code tokens\nlog-log plot”)

gerrit loglog plot

This seems to be a power law distribution, but with a lot of outliers above 7 (corresponding to around 1100) – and with an anomaly just short of 8 (corresponding to 2374 to be exact). This is quite likely the template License.

gerrit[gerrit$V1 == 2374,]
V1 V2
101 2374 Unless
102 2374 Licensed
103 2374 LICENSE
104 2374 law
105 2374 governing
106 2374 express
107 2374 CONDITIONS
108 2374 compliance
109 2374 BASIS
110 2374 agreed

Plotting the more conformant data

k <- f[f$Var1 <1100,]
plot(log(k), type=”l”, xlab=”log(frequency)”, ylab=”log(count)”, main =”Gerrit source code tokens\nfrequency < 1100\nlog-log plot”)
abline(glm(log(k$Freq ) ~ log(k$Var1)), col=”red”)

glm(log(k$Freq ) ~ log(k$Var1))

Call: glm(formula = log(k$Freq) ~ log(k$Var1))

Coefficients:
(Intercept) log(k$Var1)
8.554 -1.372

Degrees of Freedom: 495 Total (i.e. Null); 494 Residual
Null Deviance: 1299
Residual Deviance: 152.9 AIC: 829.7

exp(8.554/1.372)
[1] 510.1444

gerrit loglog < 1100

So, we should likely look at values with the frequency in this area to get a better suggestion for what the code base is used for.

gerrit[gerrit$V1 < 600 & gerrit$V1 >= 500,]
V1 V2
240 598 code
241 597 url
242 592 rw
243 590 values
244 589 label
245 581 plugin
246 580 v
247 563 ctx
248 561 Result
249 558 Util
250 550 UUID
251 544 2013
252 541 bind
253 538 cb
254 533 IdentifiedUser
255 532 err
256 531 u
257 530 o
258 528 substring
259 526 master
260 525 Repository
261 522 CurrentUser
262 522 as
263 521 res
264 520 dom
265 517 assertEquals
266 516 token
267 508 start
268 508 RESOURCES
269 508 interface
270 507 lang
271 506 servlet
272 500 Object

This has a better match with the core of the project, though we still see comment debris, e.g. “2013”

Gerrit can be found at https://gerrit.googlesource.com/gerrit/ – I was looking at the codebase from 02bafe0f4c51aa24b2b05d4d1309ecfc828762c0 (January 20th, 2016)

Independence check

With the previous information – and the notion of a vector representation – I thought about the possibility to check for independence.

If two vectors are independent, then they should be orthogonal. If two code bases are independent, then they should be orthogonal in their domain vectors. To test this, we can try to plot the words used in the code bases. Naturally, we would need to strip away the language keywords, but as we will see, this is not quite as necessary as expected. We can even gain other insights by looking at the keyword uses.

So, as above, I created word frequence files for two JavaScript projects.

p1 <- read.table(“p1-wordfreq.txt”, header=F)
p2 <- read.table(“p2-wordfreq.txt”, header=F)

We don’t really want the exact count, so we pick the relative frequencies

p1$V1 <- p1$V1/max(p1$V1)
p2$V1 <- p2$V1/max(p2$V1)

Now, we only want to look at the tokens they have in common to see whether or not they are orthogonal – the tokens not common are already orthogonal.

common <- merge(p1, p2, by = “V2”)

plot(common$V1.x, common$V1.y, xlab=”p1″, ylab=”p2″, main=”Comparing p1 and p2″)

comparing JavaScript projects p1 and p2

Next, we want to identify the JavaScript keywords.

js <- read.table(“JavaScriptKeywords.txt”, header=F)
names(js) <- “V2″ # js is a single column, we want to merge on the keywords in the same column names
js2 <- merge(js, common, by=”V2″)
points(js2$V1.x, js2$V1.y, pch=19, col=”red”)

# mark the 20% in both directions, thus we get a Pareto segmentation
abline(h=.2, col=”blue”)
abline(v=.2, col=”blue”)

high <- common[common$V1.x > .2 & common$V1.y > .2,]

The most frequently used non-keywords:

high[-(match(intersect(high$V2, js2$V2), high$V2)),]
V2 V1.x V1.y
34 data 0.4170306 0.2444444
49 err 0.5545852 0.4555556
50 error 0.3013100 0.8000000
115 censored 0.6812227 0.6888889
131 settings 0.2052402 0.2111111

The second to last in this list has been censored, it does provide an indication that the projects aren’t quite independent. The error, err, and data are so common and nondescript that it is somewhat okay to find them in this area, though I’d rather have less callback functions and better names in general.

The most frequently used keywords:

high[(match(intersect(high$V2, js2$V2), high$V2)),]
V2 V1.x V1.y
47 else 0.3449782 0.4444444
65 function 1.0000000 0.8000000
72 if 0.4716157 0.5000000
154 var 1.0000000 0.6444444

Again this can be explained by a lot of callbacks, which are often on the form:

function(err, data) {
if(err){
} else {
}
}

Another explanation could be lots of anonymous functions, though usually callback.

Conclusion

Removing comments and imports should provide for a better picture of the code base. Even so, it seems to not exactly scream the domain or architecture.

Bi-grams could be another improvement.

Independence check of supposedly independent projects may reveal that they aren’t or that the code is skewed towards an unwanted design.

It is far from perfect, but as always it brings a different way of looking at the code base, and it is relatively quick to do.

Comparing large code bases somewhat defeats the purpose as regression to the mean tells nothing much of interest. Taking Gerrit as an example, then the most used token is “import”, which is used 27560 times and as we saw above, the interesting parts reveal themselves around 1100 uses, which is less than 4%.

comparing gerrit to dotCMScomparing gerrit to dotCMS (loglog)

Comparing Gerrit and an old repo I had of dotCMS, we find that the most used keywords including entities in java.lang are:

import
String
public
return
if
new
this
null
private
void
static

Which could indicate a lot of String constants and conditional logic (with return statements instead of else clauses), and with a possibility of Primitive Obsession – well, the web does call for a lot of String use.

How many bugs are left?

januar 7th, 2016

After reading How many bugs are left? I was intrigued by the use of the Lincoln Index to estimate the number of bugs residing in a solution. But after reading the blog post I was bit baffled that the conclusion didn’t pick up on what was really reflected in the data.

In the blog post there are 2 examples concerning 2 QAs, A and B, finding 20 and 30 bugs respectively in the each case. The real difference is the overlap.

In the first example there is only 1 bug in the overlap, and the Lincoln Index is then 20*30/1 = 600 – in total 49 bugs found

In the second example there are 18 bugs in the overlap, making the Lincoln Index 20*30/18 = 33.3 – in total 32 bugs found

The probability that a QA finds a bug is then:

QA A QA B Total
Example 1 20/600  = .03 30/600  = .05 49/600  = .08
Example 2 20/33.3 = .6 30/33.3 = .9 32/33.3 = .96

While this is an example of the method it tells me something not mentioned in the blog post: The bugs in Example 2 must have been extremely obvious making it questionable whether the trials are independent.

Another thing, while it may seem like overkill to have 2 QAs in the 2nd example, it seems too little to be worth the effort in the first example, but we really should have 3 QAs in both cases.

There is nothing indicating the size of the example solutions – which is part why the example is good, and part why I was a bit skeptical at first. There is no right answer for the examples, but if the Lincoln Index are to be considered sufficient estimates on the number of bugs in the systems, then what should we do?

Starting with Example 2 we have found almost all the bugs, and hopefully the fixes will not introduce new ones. There is a good probability that the remaining bugs will be fixed when the code base is fixed – after all 33.3 bugs in a code base is not a lot (depending on the size of the code base itself naturally).

Examining Example 1 we have a different problem. We have discovered approximately 1/12th of the bugs, and we have an estimated 600 bugs in the system. It would seem that we are in dire need for some sort of assistance. Possibly rework of the system as well.

Code base size estimates

Yes – I know – “Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs” (Bill Gates), but the bugs has to come from somewhere, and being somewhat consistent in our styles the number of lines do pose as a quantifiable metric.

According to Dan Mayer (bugs per line of code ratio) referencing Steve McConnell, then we have different ratios of bugs per 1000 lines of code (bugs/kloc): 3, 10-20, 15-50 bugs/kloc

Apart from the obvious 600/33.3 = 18 factor in number of bugs between the examples, which may be as simple as 18 times as much code, there are alternative explanations for the number.

Example 1
600 bugs at  3 bugs/kloc =  200,000 lines

600 bugs at 50 bugs/kloc =   12,000 lines
Example 2
33.3 bugs at  3 bugs/kloc =  11,111 lines

33.3 bugs at 50 bugs/kloc =     666 lines

That is, if Example 1 is 200 kloc with 3 bugs/kloc, and Example 2 is 666 lines with 50 bugs/kloc, then Example 1 is 300 times the lines, but only 18 times the bugs – in which case it is a rather small amount of bugs even at 600. Example 2 though should really clean up the mess.

If it is the opposite, that is Example 1 12,000 lines at 50 bug/kloc, and Example 2 11,111 lines at 3 bug/kloc, then the number of lines are almost the same, yet the number of bugs is 18 times higher. In this case Example 1 is truly in dire needs of some help.

Alternative Analysis

These speculations are really afterthoughts on the blog’s content. My real beef was with the Lincoln Index itself – it degenerates at a 0 overlap, basically saying that if two observers examine the same area, they must find some of the same elements. That is a natural assumption if observers are stringent and actually look at the same things. Seeing some of the Escape Room issues where contestants overlook the obvious it would seem that for a software solution there would be several opportunities for QAs to overlook something the developers already overlooked.

While there are some suggestions on improving the Lincoln Index in case the overlap is less than 10, e.g. Bailey (1952) suggesting N = A*(B+1)/(C+1), which would lead Example 1 to 310 bugs instead of the 600. My idea was to turn to the German Tank Problem and estimate the number of bugs from the Bayesian credibility score.

By applying our own serial number system to the bugs (tracking ID) we aren’t really playing into the correct scenario, but bear with me. The maximum serial number we see is thus the total number of unique bugs found. We only have 2 observations – one from each QA.

Only having 2 observations mean the mean, µ, is infinite. We should have at least 4 observations to come up with a mean and standard deviation.

We can still try to make a credible guess. Given at least 2 observations, the credibility that the number of bugs is equal to n, is:

0 if n < m
k-1/k * C(m-1,k-1)/C(n,k-1) if n >= m

m = number of distinct bugs found in the k observations

As k is 2 in our case, the formula simplifies into:

0 if n < m
(m-1)/(n*(n-1)) if n >= m

The credibility that we have more than n bugs is:

1 if n < m
C(m-1, k-1)/C(n, k-1) if n >= m

Again with k = 2 this simplifies into:

1 if n < m
(m-1)/n if n >= m

This latter formula means that if we want to be 95% confident in the number of bugs, n, then 5% risk that N > n: .05 = (m-1)/n <=> n = (m-1)/0.05 = 20*(m-1)

Running the examples under the German Tank Problem setting we get:

Example 1: A = 20, B = 30, C = 1, m = A+B-C = 49

Number of bugs at 95% confidence: 20*(49-1) = 960

pA    = 20/960 = 0.02

pB    = 30/960 = 0.03

total = 49/960 = 0.05

Example 2: A = 20, B = 30, C = 18, m = A+B-C = 32

Number of bugs at 95% confidence: 20*(32-1) = 620

pA    = 20/620 = 0.03

pB    = 30/620 = 0.05

total = 32/620 = 0.05

We see that we have a lot more bugs than our previous estimates, but the QAs probability of finding bugs are almost the same (below 5%) for both examples, and we have an estimated 5% of the total amount of bugs.

credibility-of-total-number-of-bugs

Looking at the accumulated credibility score, we can see that it grows rapidly, then slows down, perhaps an 80% confidence is sufficient. In this case .2 = (m-1)/n <=> n = 5*(m-1), this is a quarter of the 95% confidence numbers.

Example 1: A = 20, B = 30, C = 1, m = A+B-C = 49

Number of bugs at 80% confidence: 5*(49-1) = 240

pA    = 20/240 = 0.08

pB    = 30/240 = 0.13

total = 49/240 = 0.20

Example 2: A = 20, B = 30, C = 18, m = A+B-C = 32

Number of bugs at 80% confidence: 5*(32-1) = 155

pA    = 20/155 = 0.13

pB    = 30/155 = 0.19

total = 32/155 = 0.20

This is certainly better for Example 1 both with regards to the 95% confidence, but also with regards to the Lincoln Index – even the improved estimate.

Conclusion

I didn’t know about the Lincoln Index, so I learned something new today – that is always good. The original application to estimate the number of bugs in total seems good, at least better than disregarding data from the trenches.

John D. Cook suggests calibrating through experiments. This blog post has been a thought experiment on some of the deliveries presented by the data and an unrealistic application of the German Tank Problem – the odds of getting the “tanks” in sequence diminishes quickly, thus improvements can be applied to the m estimate.

Cutting the confidence level from 95% to 80% may seem drastic – and it is – as it cuts 75% off of the number of expected bugs, but for thought experiments it may be good enough.

QAs are valuable, and there is value in having several (at least 2, but 4 is better) to test a product.

Resources:

http://leankit.com/blog/2015/12/how-many-bugs-are-left-the-software-qa-puzzle/
https://en.wikipedia.org/wiki/German_tank_problem
https://en.wikipedia.org/wiki/Lincoln_index
http://c2.com/cgi/wiki?LinesOfCode
http://www.mayerdan.com/ruby/2012/11/11/bugs-per-line-of-code-ratio/

Learning Java, Programming, TDD, Clean Code – which order?

november 26th, 2015

Recently, Marcus Biel asked me to review and comment on his “Free Java Clean Code Beginner Course”.

I’m quite flattered that anyone would ask my opinion, so naturally I gave him some feedback. I think the concept of Marcus’ project is valuable, especially considering the large community (9-10 million) of Java programmers, and the number of would be programmers, and the current state of the quality we – as a profession – provide. Just take a look at some of the questions asked on LinkedIn’s Java Developers group.

One of the key hurdles, I think, is that Marcus wants it all: Teach Java, Programming, OOP, TDD, and Clean Code. While these are all good things to know, I find it quite a lot all at once. That said, what should be left out in the beginning? How should you structure learning programming? The easiest way is to use imperative style – but that is hardly the “right” for Java. Starting out with too much OOP will also lead to highly coupled classes.

If you simply teach Java and programming, you’re bound to fail at only good OOP and Clean Code practices because Java sometimes enforces you to do things in a bad way.

TDD is having its own troubles – as debated by DHH, Fowler and Beck in “Is TDD Dead?”

Rich Hickey compares TDD to “driving a car around banging into the guard rails”, and Donald Knuth says something along the lines of tests are good for probing and figuring out an otherwise unknown domain. This blog has links to both.

Ward Cunningham created Fit, which Uncle Bob built Fitnesseon top of, so I believe that they are quite happy with repeatable testing. Uncle Bob at least writes about it in Clean Code

Edsger Dijkstra said: “Program testing can be used to show the presence of bugs, but never to show their absence!” – but then he was likely into proving correctness using Hoare triplets – the pre and post condition proofs.

In “Working Effectively with Legacy Code”, Michael Feathers says that legacy code is code without tests, and that tests makes it safe to refactor code.

I really like Hickey’s notion. The tests only shows the code in the exercises the tester had in mind. If the tester is the developer, then it is likely a proof of concept rather than an attempt to disprove working software,

I also really like Feathers’ concept – it’s really nice to have exercises for a section of code making sure that a change in the section will not misbehave, when swapped out with an equivalent. At least it is nice to have tests for the modules you depend upon, to be able to check than an upgrade does not cause any bad things. Basically, we use what Dijkstra said – making sure that we are not introducing previously known bugs again.

Knowing programmers, we’re likely to not be modest nor follow the scientific method: Observe, Think, Hypothesize, Create testable predictions, Test, Refine, General theory, nor Deming’s circle: Observe, Plan, Do, Check, Act. It is often more: Hack, Observe, Repeat – using a Waterfall approach it is sometimes more like: hack, hack, hack, observe, wtf!, repeat.

Dijkstra, Hickey, and Knuth seem to have their own disciplined framework in place, and TDD is a formal way trying to introduce discipline to the masses, though often being misunderstood, and due to our bias for confirming our beliefs (“Don’t believe everything you think” by Thomas Kida) we make poor tests more often than good tests. Sometimes we even make tests just to get a high test coverage, because someone, somewhere heard that this was a good metric.

Can you learn Clean Code without knowing programming? I don’t think so, and quite likely, then Clean Code should be left after Patterns – which isn’t currently part of Marcus’ course.

Should you learn Clean Code before creating your own ugly mess? Would you know the difference if taught from day one?

How to refactor a refactored switch/case statement

november 26th, 2015

When good intentions go slightly wrong

For some odd reason I picked up a link to DZONE on “How to refactor a switch/case statement” – the link https://dzone.com/articles/how-to-refactor-a-switchcase-statement is now defunct, I’m not sure why. Anyway, Gianluca Tomasino, the original author still has the article on his blog.

So I read through this – I know I dislike switch/case jump tables, though not as much as I hate if-else-if – or as I like to reminisce Sid Meier’s Pirates! and call it the “evil El Sif”

Gianluca is quite right, that one option would be to use the Strategy pattern, but then goes on to show how not to implement this pattern by adding a method for each of the enums, then tie a specific implementation inside the enum ending up with a less readable and less maintainable code.

The enum part is right – eliminate the magic strings, define the different types.

The strategy interface definition is wrong – the name “HasStrategies” does not convey any useful information. The 2 methods bind concrete enums to an interface, 1 abstract method, e.g. ‘execute’ should be sufficient. Then the specific strategy is pushed inside the enums themselves. Enums should not care for whichever strategies you have for them, thus that sort of coupling is not wanted.

In the Decider class, we now define the specific strategy to use, which sort of defies the purpose of extracting the code from a switch – the specific class will now have 2 reasons for change:

  1. Change to the strategy
  2. Change to the enum definitions

“A class should have one, and only one reason to change.” That is the intent of the Single Responsibility Principle

If we add another value to the enums, then we need to change the Decider implementation as well, that is contrary to the Open Close Principle. From the looks of it, we have to change the enums (well, that’s a given), the strategy, and the decider implementation.

What I’d recommend:

Define the strategy interface using only one method

interface Strategy {
    String execute();
}

Simply define the values

enum Values {
    PIPPO, PLUTO;
}

Implement the strategies for each of the values, and add them to an EnumMap

class ValueStrategies {
    final static EnumMap<Values, Strategy> MAP = 
             new EnumMap<Values, Strategy>(Values.class);
    static {
        MAP.put(Values.PIPPO, new Strategy() {
            @Override
            public String execute() {
                return "methodA";
            }
        });
        MAP.put(Values.PLUTO, new Strategy() {
            @Override
            public String execute() {
                return "methodB";
            }
        });
    }
    static Strategy get(Values value) {
        return MAP.get(value);
    }
}

Implement the decider using these elements:

public class AltDecider implements Decider {

    @Override
    public String call(String which) {
        Values value = Values.valueOf(which.toUpperCase());
        return ValueStrategies.get(value).execute();
    }

}

Well, the mapping from a primitive to the enum should not take place inside the method, the Decider interface should be modified to fix such hacks, if the String, which, is null or does not represent a Value, then a NullPointerException and IllegalArgumentException respectively will be thrown from the Value conversion.

The names are still not meaningful.

With this solution a new enum value will require a change to Values and the implementation for its strategy inside the ValueStrategies.

If re-use of the strategy implementations were of concern, then naturally they should be implemented in their own classes and not as anonymous values inside the map.