Skip to content

Optimize eval runtime for precompiled tuples and structured input#399

Merged
bertysentry merged 6 commits intomainfrom
398-optimize-evalinterpret-runtime-for-precompiled-tuples-and-structured-input
Mar 20, 2026
Merged

Optimize eval runtime for precompiled tuples and structured input#399
bertysentry merged 6 commits intomainfrom
398-optimize-evalinterpret-runtime-for-precompiled-tuples-and-structured-input

Conversation

@bertysentry
Copy link
Contributor

Summary

  • split AVM/JRT responsibilities more cleanly and optimize eval preparation for precompiled tuples
  • add lazy structured-input handling so eval can work efficiently with text or pre-split fields
  • precompute and serialize tuple execution metadata, including omitting redundant SET_NUM_GLOBALS for simple eval expressions

Testing

  • mvn formatter:format
  • mvn test
  • mvn verify (still hits the existing Windows compatibility-suite failures in BwkMiscTest, BwkTTest, and GawkTest)

Closes #398

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53ec0cb301

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Jawk’s eval/interpret runtime to reduce per-invocation overhead for precompiled AwkTuples, adds first-class structured input support (record text and/or pre-split fields), and serializes tuple execution metadata so deserialized tuples are execution-ready.

Changes:

  • Introduces AwkTuples.ExecutionProfile (serialized) to drive optimized eval preparation and to omit redundant SET_NUM_GLOBALS for simple eval expressions.
  • Updates InputSource to support getRecordText() (nullable) alongside getFields() for lazy $0/field synchronization and fields-only sources.
  • Splits/cleans up AVM/JRT responsibilities (runtime preparation, IO/resource lifecycle, output sinks) and expands tests + docs for the new eval and structured-input flows.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/main/java/org/metricshub/jawk/Awk.java Reworks eval(...) APIs to use cached settings snapshots + fresh AVM per call and supports InputSource directly.
src/main/java/org/metricshub/jawk/util/AwkSettings.java Adds modificationCount to support cache invalidation for derived runtime snapshots.
src/main/java/org/metricshub/jawk/intermediate/AwkTuples.java Adds serialized ExecutionProfile and removes redundant eval SET_NUM_GLOBALS when safe.
src/main/java/org/metricshub/jawk/jrt/InputSource.java Renames getRecord()getRecordText() and clarifies nullable record-text contract.
src/main/java/org/metricshub/jawk/jrt/JRT.java Introduces lazy record-state model + output sink abstraction and refactors IO state handling.
src/main/java/org/metricshub/jawk/backend/AVM.java Aligns eval/interpret preparation around tuple metadata, resets execution state per run, and routes print/printf through JRT sinks.
src/main/java/org/metricshub/jawk/backend/RuntimeStack.java Switches to ArrayDeque, adds reset support, and avoids null elements via sentinel.
src/main/java/org/metricshub/jawk/backend/RegexRuntimeSupport.java Extracts regex replacement helpers from JRT for AVM usage.
src/main/java/org/metricshub/jawk/jrt/StreamInputSource.java Updates implementation to getRecordText().
src/main/java/org/metricshub/jawk/SandboxedAwk.java Updates AVM creation hook to match new Awk#createAvm(AwkSettings) shape.
src/test/java/org/metricshub/jawk/AwkEvalTest.java Adds coverage for eval execution paths, tuple metadata behavior, and reuse/reset semantics.
src/test/java/org/metricshub/jawk/InputSourceTest.java Adds coverage for fields-only sources and eval with InputSource (text-only and fields-only).
src/test/java/org/metricshub/jawk/ListInputSourceTest.java Updates tests to new getRecordText() API.
src/test/java/org/metricshub/jawk/RegexpTupleAndCachingTest.java Adds serialization test for execution profile metadata.
src/site/markdown/java.md Updates docs for eval(...) / InputSource contract and adds AVM advanced reuse section.
src/site/markdown/index.md Adds quick-start docs for precompiled eval and structured sources.
README.md Updates README examples for eval(...), tuple reuse, and structured input.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/main/java/org/metricshub/jawk/backend/AVM.java:2402

  • applySpecialVariables treats ARGC as a JRT-managed special variable and calls vm.assignVariable("ARGC", ...) during prepareExecution(), before SET_NUM_GLOBALS has allocated the runtime stack globals. When globalVariableOffsets is already installed, assignVariable will try to write into runtimeStack while globals is still null, which can trigger an assertion/NPE. Consider deferring assignments until globals are allocated (e.g., treat runtimeStack.getNumGlobals()==null the same as missing offsets and store into the baseline initial-variable snapshot), or exclude ARGC from the early special-variable application path.
	public final void assignVariable(String name, Object obj) {
		// When offsets are not available yet, treat the assignment as part of this
		// AVM's baseline initial-variable snapshot.
		if (globalVariableOffsets == null || globalVariableArrays == null) {
			baseInitialVariables.put(name, obj);
			if (JRT.isJrtManagedSpecialVariable(name)) {
				baseSpecialVariables.put(name, obj);
			}
			return;
		}

		// make sure we're not receiving funcname=value assignments
		if (functionNames != null && functionNames.contains(name)) {
			throw new IllegalArgumentException("Cannot assign a scalar to a function name (" + name + ").");
		}

		Integer offsetObj = globalVariableOffsets.get(name);
		Boolean arrayObj = globalVariableArrays.get(name);

		if (offsetObj != null) {
			assert arrayObj != null;
			if (arrayObj.booleanValue()) {
				throw new IllegalArgumentException("Cannot assign a scalar to a non-scalar variable (" + name + ").");
			} else {
				runtimeStack.setFilelistVariable(offsetObj.intValue(), obj);
			}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bertysentry bertysentry merged commit 8ede8bf into main Mar 20, 2026
9 checks passed
@bertysentry bertysentry deleted the 398-optimize-evalinterpret-runtime-for-precompiled-tuples-and-structured-input branch March 20, 2026 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize eval/interpret runtime for precompiled tuples and structured input

2 participants