feat: wide fixed-width integer load/store accessors on ByteArray#14053
Draft
kim-em wants to merge 1 commit into
Draft
feat: wide fixed-width integer load/store accessors on ByteArray#14053kim-em wants to merge 1 commit into
kim-em wants to merge 1 commit into
Conversation
|
Mathlib CI status (docs):
|
Collaborator
|
Reference manual CI status:
|
96f608a to
61fa405
Compare
This PR adds little- and big-endian UInt16/UInt32/UInt64 load and store accessors to ByteArray, reading or writing a fixed-width integer at a byte offset in a single native load/store rather than through a boxed Array UInt32 or hand-written byte assembly. For each width and endianness there are defaulting (`!`), Nat-with-proof, and USize-with-proof variants mirroring the existing byte accessors; the defaulting variants are all-or-nothing on bounds. Hot loops should use the USize-indexed `uget*`/`uset*` forms, since the Nat-indexed variants box the index (the module docstring says so). Lemmas in Init.Data.ByteArray.Lemmas establish the proof-carrying variants as the `!` model, size preservation, and round-trip (`get* (set* a off v) off = v`, with the bit-recombination identity discharged by getLsbD extensionality, no bv_decide). Disjointness is stated at the byte level: a wide write changes only the bytes in its own window, so a read of any width/endianness outside that window is unaffected. Tests in tests/elab/bytearray_pack.lean check the @[extern] C against the Lean model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
61fa405 to
ce55a45
Compare
Contributor
|
I actually added these already in #8165; I'll update that to fix the merge conflicts (to be clear, I don't particularly like the approach of making a ton of independent functions, my PR instead adds a general simp normal form of |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds little- and big-endian
UInt16/UInt32/UInt64load and store accessors toByteArray, reading or writing a fixed-width integer at a byte offset in a single native load/store rather than through a boxedArray UInt32(a tagged load plus an unbox) or hand-written byte assembly.For each width and endianness there are three readers and three writers, mirroring the existing byte accessors
get!/get/ugetandset!/set/uset:getUInt32LE!/setUInt32LE!—Natoffset, no proof. All-or-nothing on bounds: a read whoseW/8-byte window does not fit returns0, and such a write leaves the array unchanged, matching the defaulting behaviour ofByteArray.get!/set!.getUInt32LE/setUInt32LE—Natoffset with an in-bounds proof.ugetUInt32LE/usetUInt32LE—USizeoffset with an in-bounds proof (the fast path).The offset is a byte position, so the same primitive serves both an array-of-
UInt32view (i = 4*k) and the read-a-UInt32-at-an-arbitrary-position case common in codecs. The@[extern]implementations arestatic inlineC inlean.hbuilt from the portable byte-shift idiom, which optimizing compilers typically fold to an efficient (possibly unaligned) load or store; the Lean definitions are the proof-level model and the externs are validated against them by tests.In a hot loop the
USize-indexedugetUInt*/usetUInt*forms are the ones to reach for: theNat-indexed variants (including the!forms) take a boxedNat, so the loop's index arithmetic runs boxed and measures noticeably slower. The module docstring says so.Lemmas accompany the API in
Init.Data.ByteArray.Lemmas: the proof-carrying variants are definitionally the!model, writes preservesize, and reads round-trip writes (get* (set* a off v) off = vunder bounds), for every width and endianness. The round-trip proofs reduce the read of the freshly-written bytes to a fixed-width bit-recombination identity discharged bygetLsbDextensionality (nobv_decide, so everything stays inInit). Disjointness is stated at the byte level —getElem!_setUIntWE!_of_outsidesays a wide write changes only the bytes in its own window, so a read of any byte (hence any width or endianness) outside that window is unaffected; same-width/endianness_of_disjointcorollaries are provided for convenience.tests/elab/bytearray_pack.leanadditionally checks that the@[extern]C implementations match the Lean model on concrete values, endianness, and the all-or-nothing out-of-bounds behaviour.This is motivated by performance work on pure-Lean codecs (see #14050 "feat: fast fixed-width integer load/store on ByteArray"): a data structure that is conceptually a dense array of fixed-width integers previously had no representation with a single-instruction element load.
ByteSliceforwarding accessors are a natural follow-up left out of this PR.🤖 Prepared with Claude Code