Skip to content

feat: wide fixed-width integer load/store accessors on ByteArray#14053

Draft
kim-em wants to merge 1 commit into
leanprover:masterfrom
kim-em:bytearray-wide-uint-accessors
Draft

feat: wide fixed-width integer load/store accessors on ByteArray#14053
kim-em wants to merge 1 commit into
leanprover:masterfrom
kim-em:bytearray-wide-uint-accessors

Conversation

@kim-em

@kim-em kim-em commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

This PR adds little- and big-endian UInt16/UInt32/UInt64 load and store accessors to ByteArray, reading or writing a fixed-width integer at a byte offset in a single native load/store rather than through a boxed Array UInt32 (a tagged load plus an unbox) or hand-written byte assembly.

For each width and endianness there are three readers and three writers, mirroring the existing byte accessors get!/get/uget and set!/set/uset:

  • getUInt32LE! / setUInt32LE!Nat offset, no proof. All-or-nothing on bounds: a read whose W/8-byte window does not fit returns 0, and such a write leaves the array unchanged, matching the defaulting behaviour of ByteArray.get!/set!.
  • getUInt32LE / setUInt32LENat offset with an in-bounds proof.
  • ugetUInt32LE / usetUInt32LEUSize offset with an in-bounds proof (the fast path).

The offset is a byte position, so the same primitive serves both an array-of-UInt32 view (i = 4*k) and the read-a-UInt32-at-an-arbitrary-position case common in codecs. The @[extern] implementations are static inline C in lean.h built from the portable byte-shift idiom, which optimizing compilers typically fold to an efficient (possibly unaligned) load or store; the Lean definitions are the proof-level model and the externs are validated against them by tests.

In a hot loop the USize-indexed ugetUInt* / usetUInt* forms are the ones to reach for: the Nat-indexed variants (including the ! forms) take a boxed Nat, so the loop's index arithmetic runs boxed and measures noticeably slower. The module docstring says so.

Lemmas accompany the API in Init.Data.ByteArray.Lemmas: the proof-carrying variants are definitionally the ! model, writes preserve size, and reads round-trip writes (get* (set* a off v) off = v under bounds), for every width and endianness. The round-trip proofs reduce the read of the freshly-written bytes to a fixed-width bit-recombination identity discharged by getLsbD extensionality (no bv_decide, so everything stays in Init). Disjointness is stated at the byte level — getElem!_setUIntWE!_of_outside says a wide write changes only the bytes in its own window, so a read of any byte (hence any width or endianness) outside that window is unaffected; same-width/endianness _of_disjoint corollaries are provided for convenience. tests/elab/bytearray_pack.lean additionally checks that the @[extern] C implementations match the Lean model on concrete values, endianness, and the all-or-nothing out-of-bounds behaviour.

This is motivated by performance work on pure-Lean codecs (see #14050 "feat: fast fixed-width integer load/store on ByteArray"): a data structure that is conceptually a dense array of fixed-width integers previously had no representation with a single-instruction element load.

ByteSlice forwarding accessors are a natural follow-up left out of this PR.

🤖 Prepared with Claude Code

@kim-em kim-em added the changelog-library Library label Jun 15, 2026
@github-actions github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Jun 15, 2026
@mathlib-lean-pr-testing

Copy link
Copy Markdown

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 9f1e8022b71e919870342562a89a6cb71e3e38c7 --onto 659e8bb858995b0a1ada239c5b3819c8f8f2772f. You can force Mathlib CI using the force-mathlib-ci label. (2026-06-15 09:35:25)

@leanprover-bot

Copy link
Copy Markdown
Collaborator

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 9f1e8022b71e919870342562a89a6cb71e3e38c7 --onto 803553a556fd82fa1060efb0c43eda542130cb16. You can force reference manual CI using the force-manual-ci label. (2026-06-15 09:35:26)

@kim-em kim-em force-pushed the bytearray-wide-uint-accessors branch 3 times, most recently from 96f608a to 61fa405 Compare June 15, 2026 23:27
This PR adds little- and big-endian UInt16/UInt32/UInt64 load and store
accessors to ByteArray, reading or writing a fixed-width integer at a byte
offset in a single native load/store rather than through a boxed Array UInt32
or hand-written byte assembly. For each width and endianness there are
defaulting (`!`), Nat-with-proof, and USize-with-proof variants mirroring the
existing byte accessors; the defaulting variants are all-or-nothing on bounds.
Hot loops should use the USize-indexed `uget*`/`uset*` forms, since the
Nat-indexed variants box the index (the module docstring says so).

Lemmas in Init.Data.ByteArray.Lemmas establish the proof-carrying variants as
the `!` model, size preservation, and round-trip (`get* (set* a off v) off = v`,
with the bit-recombination identity discharged by getLsbD extensionality, no
bv_decide). Disjointness is stated at the byte level: a wide write changes only
the bytes in its own window, so a read of any width/endianness outside that
window is unaffected. Tests in tests/elab/bytearray_pack.lean check the @[extern]
C against the Lean model.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kim-em kim-em force-pushed the bytearray-wide-uint-accessors branch from 61fa405 to ce55a45 Compare June 15, 2026 23:35
@Rob23oba

Copy link
Copy Markdown
Contributor

I actually added these already in #8165; I'll update that to fix the merge conflicts (to be clear, I don't particularly like the approach of making a ton of independent functions, my PR instead adds a general simp normal form of setBitVecLE / setBitVecBE).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-library Library toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants