Skip to content

Entity-level code map via sem-core (#315)#321

Merged
ODAncona merged 1 commit into
mufeedvh:mainfrom
rs545837:feat/sem-entity-map
Jun 18, 2026
Merged

Entity-level code map via sem-core (#315)#321
ODAncona merged 1 commit into
mufeedvh:mainfrom
rs545837:feat/sem-entity-map

Conversation

@rs545837

Copy link
Copy Markdown
Contributor

Implements the entity-level code map from #315. Thanks for the enthusiastic green light @ODAncona, excited to land this.

What it does

With --entity-map, code2prompt extracts each file's structural entities (functions, classes, methods) with line ranges and signatures via sem-core, and renders a compact outline before the file contents. So a prompt can lead with the shape of the codebase instead of making the model infer it from full files.

Example output:

Code Map:

`math.py`:
  - class Calculator `class Calculator:` (lines 1-7)
  - function add `def add(self, a, b):` (lines 3-4)
  - function main `def main():` (lines 9-11)

Design decisions (the ones flagged in the issue)

  • Placement: both. Per-file via FileEntry.entities, plus a top-level code_map aggregate, since the data is useful both ways and costs nothing to expose twice.
  • Format: structured data is the source of truth (name, kind, line range, signature, parent); a default Handlebars partial renders the indented tree, so templates that want a signature-only or custom view can build it.
  • Default + flag: off by default, opt in with --entity-map / Code2PromptConfig.entity_map.
  • Build cost: sem-core carries tree-sitter grammars for ~28 languages, so it is an optional dependency behind the entity-map Cargo feature. Standard builds are unaffected.
  • Privacy: sem-core is the offline library and emits no telemetry (that lives only in the sem CLI binary, not the crate), so code2prompt stays fully air-gapped. Quick proof: grep -r reqwest in sem-core returns nothing.

Try it

cargo run -p code2prompt --features entity-map -- /path/to/repo --entity-map

Tests

Unit tests for Rust and Python extraction (feature-gated); full suite passes with and without the feature; verified end to end on a mixed sample.

Scope

This is the code map. The natural follow-up is function-level context forging (pull a function plus its real callers/callees across files, token-budgeted, via sem's dependency graph), which is the bigger token-savings win. Happy to do that as a second PR once this lands.

Adds an optional entity-level code map: for each source file, extract its
structural entities (functions, classes, methods) with line ranges and
signatures, and expose them to templates both per-file (FileEntry.entities)
and as a top-level `code_map`. The default markdown template renders a compact
outline before the file contents, so a prompt can lead with structure instead
of relying on the LLM to infer it from full files.

Design:
- Behind the `entity-map` Cargo feature (off by default). sem-core pulls in
  tree-sitter grammars for ~28 languages, so users who don't want the map pay
  no build cost.
- sem-core (the published crate, not the sem CLI) is offline and carries no
  telemetry, so enabling this does not change code2prompt's privacy posture.
  It stays fully air-gapped.
- New `--entity-map` CLI flag and `Code2PromptConfig.entity_map`.
- One parser registry per worker thread (thread_local) so grammar
  registration is amortized across files in the rayon pipeline.

Tested: unit tests for Rust and Python extraction (feature-gated); full suite
passes with and without the feature; verified end to end on a mixed Python/Rust
sample.

@ODAncona ODAncona left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

I'll take over with the chores =>

  • Update doc
  • Move unit tests in /test
  • Remove some comments

I'll try it out and come back to you for next steps

@ODAncona ODAncona merged commit ab4fa06 into mufeedvh:main Jun 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Entity-level code map via sem (Rust-native, for templates + token savings)

2 participants