Article · June 9, 2026 · Digital infrastructure

BTX 0.32.3.
The v2 MatMul path goes production-fast.

Two days after BTX activated v2 MatMul seeds at block 125,000, the project shipped 0.32.3 — a point release with no new activation height, no new consensus rules, and one very specific job: make the GPU backends fast on the v2 path. The 0.32.2 implementation was correct but unbatched. 0.32.3 replaces it with a production-grade batched CUDA implementation, lands a matching architecture on Metal, restores the pool-side tuning hook, and ships a handful of follow-up fixes the activation week surfaced.

The headline number is the CUDA throughput change at block 125,000: roughly 14.1k nonces per second on the v2 path before this release, and roughly 2.45M nonces per second after — call it 170×. That is not a magic optimisation; it is the gap between a hand-rolled first-cut implementation and one that does the pre-hash scan on the GPU, batches the digests, and only ships consensus-passing candidates back across the PCIe boundary.

v0.32.3/no new activation/cuda + metal batched/mandatory upgrade

Whyte Consolidated Research · 2026-06-09· 9 min read

1 · What 0.32.3 is — and is not

A point release, with one big job and several small ones.

0.32.3 is not a consensus change. There is no new activation height; the rules at block 125,000 from the v2 MatMul activation stand unchanged. Validation behaviour is unchanged. Wallet behaviour is unchanged. What 0.32.3 ships is a production-grade GPU implementation of the v2 path, a matching Metal architecture, a handful of operator-side quality-of-life improvements, and packaging fixes that the static-release path was missing.

That said, it is described as a mandatory upgradefor anyone running an earlier 0.32.x build, and the reason matters. The 0.32.2 GPU path was correct under the new consensus rules — it would find valid blocks and validate other miners' blocks — but it was effectively unoptimised for v2. A miner running 0.32.2 against 0.32.3 peers is leaving an enormous amount of throughput on the table, and the rest of the network gets to that throughput first.

The release fits the rhythm of the previous two weeks. 0.31 activated C-002 shielded proof and FIPS-205 enforcement at block 123,000. 0.32 activated v2 MatMul at block 125,000. 0.32.3 lands two days later, no activation required, doing exactly the kind of tightening a production network needs in the days after a consensus change.

Diagram · The CUDA jump
CUDA throughput at block 125,000 before and after 0.32.3Two horizontal bars on a log scale. The first bar represents CUDA throughput on the v2 path under 0.32.2 at approximately 14,100 nonces per second. The second bar represents CUDA throughput on the v2 path under 0.32.3 at approximately 2,450,000 nonces per second — roughly 170× higher.CUDA · NONCES PER SECOND · POST-ACTIVATION0.32.2 (pre-tuning)~14,100nonces/sec · unbatched v2 path0.32.3 (production batched)~2,450,000nonces/sec · GPU pre-hash + variable-base digest batchingSPEED-UP~170×Bars not to scale — 2.45M ÷ 14.1k ≈ 174. The 0.32.2 bar would be roughly one pixel wide on a true scale.
The gap between “works correctly on the new consensus rules” and “runs at production speed on the new consensus rules.” The first is what 0.32.2 shipped at activation. The second is what 0.32.3 ships two days later.
2 · How the CUDA path goes that fast

Three things, all on the GPU side of the boundary.

The reason 0.32.2's CUDA path was slow on v2 is not exotic. Under the v2 rules every nonce produces its own seed and its own matrices, so a naïve implementation generates one set on the host, ships it across the PCIe boundary, runs the multiplication on the GPU, ships the digest back, and asks the host to check it. Throughput is bounded by the slowest hop in that round trip — and for a fast modern GPU that hop is the round trip itself.

0.32.3 collapses the round trip. The pre-hash scan that selects candidate nonces runs on the GPU directly, against a batch of seeds derived in place. The matrix multiplications and digests for the entire batch happen on the GPU in one wave. Only the candidates whose digests already pass the consensus check are shipped back to the host, which is a small fraction of the batch. The host stops being a serialised bottleneck and starts being what it should always have been on a GPU mining loop: a coordinator that hands off work and counts blocks.

The performance the release notes attach to that change — roughly 14.1k to 2.45M nonces per second on the v2 path — is exactly the order-of-magnitude jump a v1-to-v2 rewrite of this shape produces when the original was correct but unbatched. It is the difference between “does it work” and “does it run.” 0.32.2 answered the first. 0.32.3 answers the second.

3 · Metal lands the matching architecture

Apple Silicon gets the same pre-hash + batched digest design.

The Metal backend on Apple Silicon has been the small-operator story since launch — a single Mac mini contributing real work to a pool, profiled at length in Mining pools, AI agents, and the Mac mini that earns. 0.32.3 ports the same pre-hash + variable-base digest batching architecture to Metal that CUDA gets, plus miner-loop routing through the GPU batch path so the host side of the loop on macOS looks structurally identical to the host side on Linux + CUDA.

Device detection improves at the same time. The Metal backend now reports the GPU core count of the host — useful for tuning the per-batch workload and for telling the difference between an M-series base chip and an M-series Max or Ultra at provision time. The release notes do not attach a specific throughput number to the Metal change the way they do for CUDA, but the architecture is the same architecture and the bottleneck pattern is the same bottleneck pattern. Operators should expect a meaningful jump on the Metal side too.

The SOLVE_BATCH_SIZE tuning knob carries forward from 0.32.2 and is the right surface to re-explore on Apple Silicon under 0.32.3. The default that worked best on the 0.32.2 unbatched path is unlikely to be the default that maximises throughput on the new batched architecture — a fact the v2 MatMul piece flagged at the time of activation and that 0.32.3 makes worth acting on.

4 · The operator tunables

Two CUDA env vars. One Metal env var. Sensible defaults.

The new CUDA path is configurable through two environment variables. The first, BTX_MATMUL_NONCE_SEED_BATCH_SIZE, controls how many candidate nonces the GPU batches together before checking digests. Larger batches amortise launch overhead and memory transfers; very large batches eventually saturate the GPU's available memory or trip warp scheduling limits. The release ships with batch sizing tuned for production hosts; the env override is the dial for operators whose hardware does not match the production-host profile.

The second, BTX_MATMUL_CUDA_NONCE_SEED_MEMORY_PERCENT, caps the share of GPU memory the nonce-seed path is allowed to consume. The default keeps the path neighbourly on a GPU also being used for inference, training, or any other workload. An operator running a dedicated mining GPU can raise the cap and trade safety margin for additional batch capacity.

On Metal, SOLVE_BATCH_SIZE continues to be the primary knob and is now meaningfully more interesting than it was under the 0.32.2 implementation. The release notes suggest treating both backends as worth a brief retune pass after upgrading, with the chain-guard RPC introduced in 0.31 as the safe place to do that work — the node will tell you whether to continue, pause, or catch_up at every point.

5 · The three backends, side by side

CUDA, Metal, CPU — what each one gets in 0.32.3.

01

CUDA

Post-activation nonce-seed path with batched pre-hash scans on GPU. Returns only consensus-passing candidates to the host. Operator-tunable batch size and memory share via env vars. Older-GPU opt-in for compute-capability-gated hardware.

02

Metal

Matching architecture — pre-hash scan, variable-base digest batching, miner-loop routing through the GPU batch path. Device detection now reports GPU core count, useful for tuning. SOLVE_BATCH_SIZE from 0.32.2 carries forward.

03

CPU

Continues to use the batched solver introduced in 0.32.1 — each thread takes a slice of candidate nonces, each with its own seed. No change in correctness; benefits from the surrounding diagnostics work.

6 · Everything else in the release

Six small, load-bearing items.

None of these are headline material on their own. Each one removes friction the activation week made visible.

share_target_override restored

Pool operators get back the early-exit targeting hook for MatMul solvers. Affects digest early-exit only — not what the network accepts.

CUDA fallback diagnostics

Failure reasons now visible in logs with concrete error categories. Troubleshooting guide added at doc/btx-cuda-mining-troubleshooting.md.

Older-GPU opt-in

BTX_CUDA_ALLOW_OLDER_GPUS=1 enables compute-capability-gated GPUs. Supported but slower; the optimised path is tuned for current generations.

getshieldedstateinfo RPC

New RPC for shielded-state repair observability — nullifier accumulator state, persisted-state freshness, restart-time behaviour. Maintenance tool, not a wallet primitive.

RPC + wallet fixes

getblock verbosity-2 fee reporting fix. PSBT handling fix. Small but load-bearing for downstream tooling that depends on them.

ZMQ in static binaries

Static release binaries now bundle libzmq.a. Removes the most common packaging gap for operators wiring ZMQ-driven monitoring against a stock binary.

7 · The cadence is the story

Three releases in a week. Two activations. One follow-up.

Look at the trailing seven days. 0.31 activated at block 123,000 with a hardened shielded proof format, FIPS-205 SLH-DSA enforcement, and the mining chain-guard RPC. 0.32 activated at block 125,000 with v2 MatMul seeds that made every nonce attempt require fresh matrices. 0.32.3 lands two days later, no activation, doing exactly the operational tightening the v2 path needed at production speed. None of the three required ordinary wallet users to do anything. All three landed on slack the launch wire had reserved on purpose.

The thing to take away from a cadence like that is not the individual changes. It is the demonstration that the upgrade path works. A network that can ship a consensus change, an activation, and a production-speed catch-up release inside a week without breaking anyone — and without forcing wallets, exchanges, or custodians to scramble — has the right plumbing for the larger changes that are still to come: wBTX as a wrapped-asset spec, the bridge layer, hybrid post-quantum transport, signed auto-update. The slack-and-activate pattern keeps working at the shape it was designed for.

0.32.3 is a tuning release. It is also, structurally, evidence that the activation rhythm is real.

8 · The release at a glance

Four numbers that frame 0.32.3.

~170×
CUDA throughput jump
~14.1k → ~2.45M nonces/sec at block 125,000
v0.32.3
Current release
Mandatory upgrade for 0.32.x operators
0
New activation heights
Point release — no consensus change
2 env vars
Operator tunables
Batch size + GPU memory share
Bottom line

The activation went live. Two days later the production path caught up.

0.32.3 does not change the rules. It catches the GPU backends up to where the rules need them to be — CUDA jumps from a working-but-unbatched 14.1k nonces per second on the v2 path to a production-grade 2.45M, Metal lands the matching pre-hash + batched digest architecture, the pool-side targeting hook returns, the diagnostics get honest, the older-GPU opt-in arrives, and the static binaries finally include ZMQ.

For operators on an earlier 0.32.x build, the upgrade is mandatory in the sense that anyone who skips it is voluntarily mining at a meaningful disadvantage to the rest of the network. For wallet users the release is invisible. For the network it is the third meaningful piece of work to land in a single week, and the second of that three that did not require ordinary users to know it happened. That is the cadence a production settlement chain wants.

Frequently asked

BTX 0.32.3, in brief.

Is 0.32.3 a consensus change?
No. It is a point release with no new activation height. Validation and consensus rules are unchanged from 0.32.2. What changed is the speed and observability of the mining backends, packaging, and a handful of RPC follow-ups — operator work, not protocol work.
Where does the 170× CUDA jump come from?
From doing the pre-hash scan on the GPU instead of off it, plus variable-base digest batching, plus routing the entire miner loop through a single GPU batch path. The 0.32.2 v2 implementation was correct but unbatched on CUDA. 0.32.3 replaces it with a production-grade batched implementation that returns only consensus-passing candidates back to the host.
Do I need to retune anything?
On CUDA, yes — at least to check. The batch size and the share of GPU memory the nonce-seed path is allowed to use are both env-overridable (BTX_MATMUL_NONCE_SEED_BATCH_SIZE and BTX_MATMUL_CUDA_NONCE_SEED_MEMORY_PERCENT). On Metal, the same SOLVE_BATCH_SIZE knob from 0.32.2 carries forward but is worth revisiting now that the Metal path matches the CUDA architecture.
What does the new getshieldedstateinfo RPC actually tell you?
It exposes the state of the shielded-state repair machinery — the nullifier accumulator condition, persisted state freshness, and what the node would do on restart if the persisted state turned out to be invalid. It is a maintenance and forensics tool, not something a wallet user needs to call.
Do older GPUs work now?
Older GPUs that fell outside the default CUDA compute-capability gate can be enabled with BTX_CUDA_ALLOW_OLDER_GPUS=1. Treat it as opt-in: the production batch path is tuned for current-generation hardware, and the older-GPU path is supported but slower and less validated.
Context & further reading

This article is a plain-language summary of BTX 0.32.3, the post-activation tuning release that follows the block-125,000 v2 MatMul activation. The items below are the primary sources for the chain itself and related pieces from this site.

For informational purposes only. Not financial, investment, or legal advice. Technical claims reflect the BTX v0.32.3 release notes and may be subject to further iteration. Systems, protocols, and tokens referenced are described for context and are not endorsements. Mining outcomes depend on hardware, difficulty, and market conditions, and are not guaranteed.