Apps: Replace APK manifest parser for lower memory use#354
Merged
Conversation
Drops the com.github.d4rken:apk-parser dependency in favor of a custom streaming binary-AXML parser. Scanner peak memory drops from 10-40 MB per parse (apk-parser loaded both resources.arsc and the full manifest, ByteArrayOutputStream grow gave ~3x transient peak) to ~50-150 KB. Viewer peak drops to ~200 KB. Symbolic resource references in the viewer are now resolved through PackageManager.getResourcesForApplication — zero Java heap cost for resolution. Architecture: ApkManifestReader exposes two entry points. readQueries runs a single streamer pass with QueriesExtractor. readFullManifest runs the streamer with a composite visitor (ManifestTextRenderer + QueriesExtractor) producing rawXml and queries from one pass. ManifestRepo keeps two in-flight maps so a concurrent queries caller can piggyback on an in-flight viewer parse (one-direction share-work; the reverse forces a second parse, documented tradeoff). Parser failures classify as transient Error, not MALFORMED_APK. ManifestHintRepo no longer deletes stale hints on parser failures — a bug in the new parser should not destroy user-visible state. Scanner retries on the next scan. Cache format bumped to v2. formatVersion is nullable with default null so pre-migration JSONs (which lack the field) decode distinctly from written v2 entries — defeats the kotlinx.serialization default-fill pitfall. Strict equality check on read deletes anything that is not exactly v2. Removed the getQueries-backfill-from-full path which would have defeated the memory win on devices with populated pre-migration caches. Added putQueries for scanner-only sibling writes. Tests: 384 total, all passing. New: BinaryXmlStreamerTest, QueriesExtractorTest, ManifestTextRendererTest, ManifestEnumFlagNamesTest, AxmlFixtureBuilder (synthetic AXML test helper). Existing ApkManifestReaderTest, ManifestRepoTest, ManifestCacheTest, ManifestCacheSerializationTest, ManifestHintRepoTest rewritten for the new API and policy.
When a scan emits a non-Success outcome, log the pkgName + specific reason (Unavailable reason enum or Failure exception). Previously the scan completion line reported aggregate counters only, so individual failures were invisible. This made the parity check on Pixel 7a diagnose a failure=2 count as two PKG_NOT_FOUND races (benign), not parser errors.
Temporary validation scaffolding for the streaming binary-AXML parser migration. Re-adds com.github.d4rken:apk-parser:v1.0.0 as debugImplementation only, wires a ParserParityChecker that parses every installed app with both the old and new parsers, diffs the QueriesInfo projection, and logs divergences at the PP:Debug:ParserParity tag. Trigger: adb shell am broadcast -a eu.darken.myperm.DEBUG_PARSER_PARITY -n eu.darken.myperm/.debug.manifest.ParserParityReceiver On-device result (Pixel 7a, 355 installed apps): matches=352 divergences=0 oldOnly=1 newOnly=2 oldFailed=2 newFailed=0. Zero behavioral divergences. New parser handled 2 apps (Chrome, WebView) that the old parser crashed on with XmlPullParserException. Reverted in the next commit.
This reverts commit 23ddf9e.
Cross-validates the streaming parser against output from the real Android build toolchain, complementing the synthetic AxmlFixtureBuilder. Guards against the risk that the builder and the parser share the same mental model of the format — a risk Codex flagged during plan review. Five fixtures cover baseline manifests, framework resource references, the queries projection (package/intent/provider), enum/flag attributes (protectionLevel / launchMode / configChanges / etc.), and TYPE_NULL handling. UTF-16 string pool coverage stays with the synthetic builder — aapt2 always emits UTF-8. The fixture blobs are checked in; regenerate via app/src/test/fixtures-src/generate.sh if the source XMLs change. The script reads the SDK path from local.properties and picks aapt2 from the Gradle transforms cache automatically.
With the streaming parser each parse uses ~150-200 KB transient (down from 10-40 MB), so the PR #352 preflight scaffolding — entry-gate free-heap ratio, estimated-peak vs maxHeap*0.25 budget, post-preflight free-heap re-check — no longer protects anything that couldn't be caught as a real OutOfMemoryError at allocation or parse time. Removed along with the Mode enum, per-mode multipliers, heapInfoProvider hook, and the APK_TOO_LARGE outcome. Semaphore bumped from 1 to 4: the single-permit limit existed to prevent two 40 MB parses overlapping. Four concurrent streaming parses cost ~800 KB — well under any realistic device budget. Lets viewer+scanner+other concurrent paths actually parallelize.
Six correctness fixes and three defensive-hardening tweaks raised by code review of the preceding commits. ManifestEnumFlagNames rewritten: the table now references android.R.attr.* directly (eliminating four wrong hand-typed resource IDs — foregroundServiceType / importantForAutofill / gwpAsanMode / usesPermissionFlags were pointing at unrelated framework attrs). protectionLevel is now an EnumWithFlags decoder — the base value (normal/dangerous/signature/signatureOrSystem/internal) lives in the low nibble with flag bits above; the previous pure-flag decoder would render "signatureOrSystem" (data=3) as "dangerous|signature". windowSoftInputMode uses a new TwoNibbleEnum decoder that correctly decodes the state enum in bits 0..3 and the adjust enum in bits 4..7 — previously "stateAlwaysHidden" (data=0x3) rendered as "stateUnchanged|stateHidden". usesPermissionFlags.neverForLocation fixed to the 0x00010000 bit. Tests cover all 12 table entries and assert every key matches its android.R.attr constant. ApkManifestReader catches IOException / ZipException during the manifest byte read and classifies them as MALFORMED_APK. Corrupt DEFLATE streams in user-installed APKs no longer escape the reader and crash the hint scanner mid-iteration. ManifestHintRepo policy refined: LOW_MEMORY outcomes preserve stale hints (transient — next scan refreshes), and Failure outcomes now delete stale-version hints (the current-version shortcut at loop top means Failure branch only sees stale entries; showing old-version flags as current is worse than "no data"). Matches the stable-vs-transient split already used by the outcome memory-cache. ManifestRepo single-flight atomicity: the piggyback check for an in-flight viewer parse and the queries in-flight insertion now live inside one mutex block via a small InFlight sealed class. A viewer parse registering between the two previous lock acquisitions would have caused a duplicate parse. BinaryXmlStringPool UTF-16 guard: byteLen computed as Long to prevent Int overflow when charLen is near Int.MAX_VALUE (from the two-word varlen encoding). Previously the overflow bypassed the bounds check and allocated a multi-gigabyte CharArray before the caller classified the OOM. QueriesExtractor resets intentDepth when queriesDepth tears down, so a malformed <queries><intent>...</queries> (unclosed intent) can't leak intent state into unrelated elements after the queries block.
Add direct unit tests for components that were only covered transitively: the resource-name resolver, the string-pool decoder, the composite visitor, and several TypedValue branches in the text renderer. Also exercise readManifestBytes edge cases through mocked ZipFile/ZipEntry. New: - ResourceNameResolverTest: owner / framework / other-package resolution paths, NameNotFoundException and Resources.NotFoundException handling, guards for resourceId 0 and -1, containsKey-based null caching. - BinaryXmlStringPoolTest: UTF-8 two-byte char/byte-count varlens, UTF-16 two-word varlen, regression guard for the charLen * 2 Int-overflow in readUtf16 (crafts a pool claiming 0x7FFFFFFF chars in 4 bytes of data), stringsStart / offset / truncation bounds. - CompositeVisitorTest: dispatch-order fan-out to multiple visitors plus a streamer integration pass. Extended: - ApkManifestReaderTest: directory / zero-size / oversized / short-read / ZipException / IOException branches in readManifestBytes. - ManifestTextRendererTest: Bool=false, AttributeRef (resolved / hex fallback / rawValue fallback), DynamicAttributeRef, all four Color widths, Dimension unit table (px / sp / pt / in / mm), fraction suffix mapping, Float.
The four memoryCache touchpoints all sit in suspend functions; switching from synchronized() to a dedicated coroutine Mutex aligns the cache lock with the rest of the repo (which already uses Mutex.withLock for the in-flight maps) and lets cache access suspend rather than block. The cacheMutex is intentionally separate from inFlightMutex so cache reads can't get tangled up in in-flight bookkeeping.
Visitors that defer namespace bookkeeping (the renderer's pendingNamespaces FIFO is the only current example) rely on onStartNamespace firing before the next onStartElement. The streamer guarantees this because of AXML chunk layout, but until now nothing on the visitor surface said so. Adds the contract as a kdoc.
readQueries and readFullManifest shared ~30 lines of zip-open + bytes-read + classification boilerplate. A private inline withManifestBytes helper consolidates the preamble and lets each entry point focus on the parser-time work and the result shape. Adds a SecurityException catch arm at the helper level — closes the read-time race window where an APK can become unreadable between openApk's preflight and the actual byte read (e.g. mid-parse permission revocation). Classifies as APK_NOT_READABLE.
Three sealed types — QueriesReadResult (reader), QueriesResult (in ManifestData), and QueriesOutcome (repo) — were saying the same thing in three shapes, with toOutcome and queriesResultToOutcome walking between them. Collapsed to a single QueriesOutcome. Cacheability now lives on the data: UnavailableReason.isTransient is the single source of truth used by both ManifestRepo (skip the memory cache) and ManifestHintRepo (preserve stale hints rather than delete them). The shouldMemoryCache enum-walk shrinks to a one-liner.
Pulls the pure formatting helpers (qualifiedName, attributeName, formatAttributeValue, formatReference, formatColor, formatComplex, escapeXml) plus the ResourceRefResolver fun interface into a shared internal object. Renderer now consumes them; all behaviour preserved. Sets the stage for ManifestSectionVisitor to share the same formatting machinery without duplicating renderer state.
Depth-aware visitor that produces the per-section pretty XML the manifest viewer displays, eliminating the need to render the manifest to text and re-parse it with XmlPullParser. Reuses BinaryXmlFormatting for attribute/value formatting. Mirrors the legacy ManifestSectionParser's section assignment (USES_PERMISSION / PERMISSION / QUERIES at depth 2; ACTIVITIES / SERVICES / RECEIVERS / PROVIDERS / META_DATA inside <application>; everything else into OTHER). Drains namespace declarations on <manifest> entry so xmlns:android does not bleed into the first section. isFlagged stays unset — heuristic flagging depends on scanner thresholds that change independently of cache contents, so the viewer model recomputes it from the live queries projection at read time. Not wired into ApkManifestReader yet — Step 5c does the full cutover and removes ManifestSectionParser + ManifestTextRenderer.
Removes the binary→string→XmlPullParser double-parse. The viewer no longer needs ManifestTextRenderer or ManifestSectionParser; ApkManifestReader.readFullManifest now attaches [ManifestSectionVisitor, QueriesExtractor] and produces sections directly in one pass. ManifestData drops rawXml in favour of sections: SectionsResult. The disk cache bumps to v3, storing CachedManifestSection (without isFlagged — flagging recomputes from live queries on every read so threshold tuning takes effect immediately). v2 entries fail decode against the new schema and are deleted by the existing corruption catch. Coverage migration: the old ManifestTextRendererTest is gone (the renderer is gone too); enum/flag, reference, namespace, and color/dimension assertions move into AaptFixtureIntegrationTest and ManifestSectionVisitorTest, and ManifestCacheTest gains explicit v2-to-v3 migration assertions for both the full file and the queries sibling.
_currentMatchIdx starts at -1; the combine block clamps to 0 for display, but onNextMatch was stepping the raw flow value, so the first tap landed on 0 again and the counter appeared frozen at "1 of N". Step from the clamped value the UI is actually showing. Pre-existing behaviour, unrelated to the streaming-parser refactor — surfaced during the device smoke test for that PR.
Multi-agent review on the streaming-parser PR surfaced 13 actionable findings; this commit applies 10 fixes across parser hardening, a viewer regression, and the project's localization rule. Two test additions (share-work exception propagation, ManifestViewerViewModelTest) deferred — the first is race-prone without controlled dispatchers, the second has no existing pattern to follow. Parser hardening: cap attributeCount at 4096 and string entry sizes at 1M chars/bytes to bound allocations on adversarial AXML; validate attributeStart >= 20 (symmetric to the existing attributeSize check) so attribute records cannot overlap the attrExt header; add bounds check on the namespace chunk body so readU32 never reads past the chunk into adjacent bytes. Exception classification: BinaryXmlException now classifies as Unavailable(MALFORMED_APK) instead of transient Failure. A structurally-corrupt APK was being re-parsed every scan because the Failure outcome was never memory-cached and the hint scanner never inserted a row for it; the MALFORMED_APK path caches the negative outcome so the scanner skips known-bad APKs. Broaden ResourceNameResolver.forPackage's catch from NameNotFoundException to Exception (DeadObjectException for unmounted secondary storage, etc.) and the per-id tryResolve to RuntimeException so a transient PackageManager hiccup degrades to unresolved IDs instead of failing the whole viewer load. Viewer: track active non-android namespaces in ManifestSectionVisitor and emit xmlns:* declarations on section roots so manifests with prefixed elements (<dist:module>, <tools:*>) render as well-formed XML — previously the bindings were dropped, regressing vs the pre-streaming raw-XML path. Drop the 300ms search debounce in ManifestViewerViewModel; within the debounce window state.value.totalMatchCount was stale, swallowing rapid type-then-Next taps. Localize the 'Unknown error' fallback via apps_manifest_viewer_error_unknown — inject ApplicationContext so the VM resolves the string before it reaches the screen. Repo: document parseFullOrAwait's intentional asymmetry vs parseQueriesOrAwait (the viewer cannot piggyback on a queries-only parse). Fall back to getApplicationInfo when PackageInfo.applicationInfo is null on Android 14+ — the previous null-fallthrough misclassified installed packages as PKG_NOT_FOUND.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
App scans and the manifest viewer now use significantly less memory. The underlying manifest parser used to load each app's full resource table (often 10-40 MB per app) and inflate the manifest through a doubling buffer; both are gone. Background hint scans on devices with constrained RAM should see far fewer aborts under memory pressure, and the manifest viewer opens without risking OOM on large apps. No user-visible changes in what the viewer displays — rendered XML and symbolic references (
@string/app_name,@android:drawable/...) look the same.Technical Context
com.github.d4rken:apk-parserwith a custom streaming binary-AXML parser. Two entry points:readQueries(scanner, ~150 KB peak per parse) andreadFullManifest(viewer, ~200 KB peak, emits rendered XML).resources.arsc. Symbolic refs resolve throughPackageManager.getResourcesForApplication(pkg).getResourceName(id)— public API since API 1, backed by mmap'd native resource data, zero Java heap cost for resolution.Error, notMALFORMED_APK. Rationale: a bug in the new parser should not destroy existing user-visible hints. The scanner retries on the next run; failures increment a telemetry counter so parser regressions surface in Play Console traces.formatVersionfield (defaultnull). kotlinx.serialization would otherwise silently backfill a missing field with the current default and accept pre-migration entries. Old entries now get deleted on read; the queries-only sibling cache (unchanged schema) survives.ManifestRepouses two in-flight maps with one-direction share-work: a scanner call piggybacks on an in-flight viewer parse. The reverse (viewer arriving during a scanner parse) incurs a second full parse — documented tradeoff, viewer-first is the common case.Validation
On-device parity check on Pixel 7a: 355 real APKs parsed with both old and new parsers side-by-side, zero
QueriesInfodivergences. The new parser additionally handled Chrome and WebView, which the old parser crashed on withXmlPullParserException. 384 unit tests pass, including 6 integration tests against aapt2-generated binary manifests.Review guidance
BinaryXmlStreamer.kt— defensive chunk bounds per AOSPResourceTypes.h. Worth extra eyes on the attribute record layout (forward-compat for records >20 bytes) and the namespace-stack prefix resolution.ManifestCache.kt— theformatVersion: Int? = nullis load-bearing; a non-null default would silently accept v1 entries (ManifestCacheTest"raw pre-v2 JSON" guards this).ManifestTextRenderer.kt— the android URI always renders asandroid:regardless of what the AXML binds (required becauseManifestSectionParserruns namespace-unaware and looks for textualandroid:prefixes).app/src/test/fixtures-src/(source XMLs +generate.sh) andapp/src/test/resources/manifest/(5 aapt2-compiled binary blobs).