From 01903df4ca54f2c6cdbef518bc049875532b83a0 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Dec 2025 21:42:18 -0500 Subject: [PATCH 1/3] Initial AGENTS.md --- AGENTS.md | 236 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ CLAUDE.md | 1 + 2 files changed, 237 insertions(+) create mode 100644 AGENTS.md create mode 100644 CLAUDE.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..08b8cd6 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,236 @@ +# AGENTS.md + +This file provides guidance to AI agents when working with code in this repository. + +## Project Overview + +UnityDataTools is a .NET 9.0 command-line tool for analyzing Unity build output (AssetBundles, Player builds, Addressables). It extracts data from Unity's proprietary binary formats into SQLite databases and human-readable text files. The tool showcases the UnityFileSystemApi native library and serves as both a production tool and reference implementation. + +## Common Commands + +### Building +```bash +# Build entire solution in Release mode +dotnet build -c Release + +# Build from solution file +dotnet build UnityDataTools.sln -c Release + +# Build specific project +dotnet build UnityDataTool/UnityDataTool.csproj -c Release +``` + +Output location (Windows): `UnityDataTool\bin\Release\net9.0\UnityDataTool.exe` + +### Publishing (Mac-specific) +```bash +# Intel Mac +dotnet publish UnityDataTool -c Release -r osx-x64 -p:PublishSingleFile=true -p:UseAppHost=true + +# Apple Silicon Mac +dotnet publish UnityDataTool -c Release -r osx-arm64 -p:PublishSingleFile=true -p:UseAppHost=true +``` + +### Testing +```bash +# Run all tests +dotnet test + +# Run tests for specific project +dotnet test UnityFileSystem.Tests/UnityFileSystem.Tests.csproj +dotnet test Analyzer.Tests/Analyzer.Tests.csproj +dotnet test UnityDataTool.Tests/UnityDataTool.Tests.csproj + +# Run tests with filter +dotnet test --filter "FullyQualifiedName~SerializedFile" +``` + +Test projects: UnityFileSystem.Tests, Analyzer.Tests, UnityDataTool.Tests, TestCommon (helper library) + +### Running the Tool +```bash +# Show all commands +UnityDataTool --help + +# Analyze AssetBundles into SQLite database +UnityDataTool analyze /path/to/bundles -o database.db + +# Dump binary file to text format +UnityDataTool dump /path/to/file.bundle -o /output/path + +# Extract archive contents +UnityDataTool archive extract file.bundle -o contents/ + +# Find reference chains to an object +UnityDataTool find-refs database.db -n "ObjectName" -t "Texture2D" +``` + +## Architecture + +### Component Hierarchy +``` +UnityDataTool (CLI executable) +├── Analyzer → SQLite database generation +├── TextDumper → Human-readable text output +├── ReferenceFinder → Object reference chain tracing +└── UnityFileSystem → C# wrapper for native library + └── UnityFileSystemApi (native .dll/.dylib/.so) +``` + +### Key Architectural Patterns + +**Native Interop**: UnityFileSystem wraps UnityFileSystemApi (native library from Unity Editor) via P/Invoke in `DllWrapper.cs`. The native library reads Unity Archive and SerializedFile formats. + +**TypeTree Navigation**: Unity binary files contain TypeTrees that describe object serialization. The `RandomAccessReader` class navigates these trees like property accessors: `reader["m_Name"].GetValue()`. This enables the tool to interpret objects without hardcoded type knowledge. + +**Parser Pattern**: `ISQLiteFileParser` interface allows multiple parsers to handle different file formats: +- `SerializedFileParser` - Unity binary files (AssetBundles, Player data) +- `AddressablesBuildLayoutParser` - JSON build reports + +**Handler Registry**: Type-specific handlers extract specialized properties for Unity object types. Handlers implement `ISQLiteHandler` and are registered in `SerializedFileSQLiteWriter.m_Handlers`: +- `MeshHandler` - vertices, indices, bones, blend shapes +- `Texture2DHandler` - width, height, format, mipmaps +- `ShaderHandler` - variants, keywords, subprograms +- `AudioClipHandler` - compression, channels, frequency +- `AnimationClipHandler` - legacy flag, events +- `AssetBundleHandler` - dependencies, preload data +- `PreloadDataHandler` - preloaded assets + +**SQL Schema Resources**: Each handler has an embedded `.sql` resource file defining its tables and views (e.g., `Analyzer/SQLite/Resources/Mesh.sql`). Views join type-specific tables with the base `objects` table. + +**Command Pattern**: SQL operations are encapsulated in classes derived from `AbstractCommand` with `CreateCommand()`, `SetValue()`, `ExecuteNonQuery()` methods. + +### Data Flow (Analyze Command) + +1. `Program.cs` → `HandleAnalyze()` → `AnalyzerTool.Analyze()` +2. AnalyzerTool finds files matching search pattern +3. For each file, parsers are tried in order (JSON first, then SerializedFile) +4. `SerializedFileParser.ProcessFile()`: + - Checks for Unity Archive signature → calls `MountArchive()` + - Otherwise treats as SerializedFile → calls `OpenSerializedFile()` +5. `SerializedFileSQLiteWriter.WriteSerializedFile()`: + - Iterates through `sf.Objects` + - Gets TypeTree via `sf.GetTypeTreeRoot(objectId)` + - Creates `RandomAccessReader` to navigate properties + - Looks up type-specific handler in `m_Handlers` dictionary + - Handler extracts specialized properties (e.g., MeshHandler reads vertex count) + - Writes to `objects` table + type-specific table (e.g., `meshes`) + - Optionally processes PPtrs (references) and calculates CRC32 +6. SQLiteWriter finalizes database with indexes and views + +### Important Files + +**Entry Points**: +- `UnityDataTool/Program.cs` - CLI using System.CommandLine +- `UnityDataTool/Commands/` - Command handlers (Analyze.cs, Dump.cs, Archive.cs, FindReferences.cs) + +**Core Libraries**: +- `UnityFileSystem/UnityFileSystem.cs` - Init(), MountArchive(), OpenSerializedFile() +- `UnityFileSystem/DllWrapper.cs` - P/Invoke bindings to native library +- `UnityFileSystem/SerializedFile.cs` - Represents binary data files +- `UnityFileSystem/RandomAccessReader.cs` - TypeTree property navigation + +**Analyzer**: +- `Analyzer/AnalyzerTool.cs` - Main API entry point +- `Analyzer/SQLite/SQLiteWriter.cs` - Base class for database writers +- `Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs` - Handler registration +- `Analyzer/SQLite/Writers/AddressablesBuildLayoutSQLWriter.cs` - JSON report processing +- `Analyzer/SQLite/Handlers/` - Type-specific extractors +- `Analyzer/SerializedObjects/` - RandomAccessReader-based property readers +- `Analyzer/SQLite/Resources/` - SQL DDL schema files + +**TextDumper**: +- `TextDumper/TextDumperTool.cs` - Converts binary to YAML-like text + +**ReferenceFinder**: +- `ReferenceFinder/ReferenceFinderTool.cs` - Traces object dependency chains + +## Extending the Tool + +### Adding New Unity Type Support + +1. Create handler class implementing `ISQLiteHandler`: + ``` + Analyzer/SQLite/Handlers/FooHandler.cs + ``` + +2. Create reader class using RandomAccessReader: + ``` + Analyzer/SerializedObjects/Foo.cs + ``` + +3. Register handler in `SerializedFileSQLiteWriter.cs`: + ```csharp + m_Handlers["Foo"] = new FooHandler(); + ``` + +4. Create SQL schema resource: + ``` + Analyzer/SQLite/Resources/Foo.sql + ``` + Define tables (e.g., `foos`) and views (e.g., `foo_view` joining `objects` and `foos`) + +5. Reference the schema in handler's GetResourceName() method + +### Adding New File Format Support + +1. Create parser implementing `ISQLiteFileParser` +2. Create writer derived from `SQLiteWriter` +3. Add parser to `AnalyzerTool.parsers` list +4. Create SQL schema and Command classes as needed + +Example: Addressables support uses `AddressablesBuildLayoutParser` + `AddressablesBuildLayoutSQLWriter` to parse JSON build reports. + +## Important Concepts + +### TypeTrees +TypeTrees describe how Unity objects are serialized (property names, types, offsets). They enable: +- Backward compatibility - reading files from different Unity versions +- Generic parsing without hardcoded type definitions +- Support for custom MonoBehaviours/ScriptableObjects + +**Critical**: Player builds exclude TypeTrees by default for performance. To analyze Player data, enable the "ForceAlwaysWriteTypeTrees" diagnostic switch during build. + +### File Formats +- **Unity Archive** - Container format (AssetBundles, .data files). Can be mounted as virtual filesystem. +- **SerializedFile** - Binary format storing Unity objects with TypeTree metadata. +- **Addressables BuildLayout** - JSON build report (buildlogreport.json, AddressablesReport.json) + +### Database Views +The SQLite output uses views extensively to join base `objects` table with type-specific tables: +- `object_view` - All objects with basic properties +- `mesh_view` - Objects + mesh-specific columns +- `texture_view` - Objects + texture-specific columns +- `shader_view` - Objects + shader-specific columns +- `view_breakdown_by_type` - Aggregated size by type +- `view_potential_duplicates` - Assets included multiple times +- `asset_view` - Explicitly assigned assets only +- `shader_keyword_ratios` - Keyword variant analysis + +See `Analyzer/README.md` and `Documentation/addressables-build-reports.md` for complete database schema documentation. + +### Common Issues + +**TypeTree Errors**: "Invalid object id" during analyze means SerializedFile lacks TypeTrees. Enable ForceAlwaysWriteTypeTrees or use files built with TypeTrees. + +**File Loading Warnings**: "Failed to load... File may be corrupted" is normal for non-Unity files in analyzed directories. Use `-p` search pattern to filter (e.g., `-p "*.bundle"`). + +**SQL UNIQUE Constraint Errors**: Occurs when same SerializedFile name appears in multiple archives. This happens when analyzing multiple builds in same directory or using AssetBundle variants. See `Documentation/comparing-builds.md` for solutions. + +**Mac Security**: "UnityFileSystemApi.dylib cannot be opened" - Open System Preferences → Security & Privacy and allow the library. + +## Native Library (UnityFileSystemApi) + +The native library is included for Windows, Mac, and Linux in `UnityFileSystem/` directory. It's backward compatible and reads data files from most Unity versions. + +To use a specific Unity version's library: +1. Find library in Unity Editor installation: `{UnityEditor}/Data/Tools/` +2. Copy to `UnityDataTool/UnityFileSystem/`: + - Windows: `UnityFileSystemApi.dll` + - Mac: `UnityFileSystemApi.dylib` + - Linux: `UnityFileSystemApi.so` +3. Rebuild the tool + +## Testing Data + +UnityFileSystemTestData is a Unity project that generates test data for the test suites. TestCommon provides shared test utilities. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..47dc3e3 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file From 5c8b5fcfb64257d5029390db881537f5afa76f67 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Dec 2025 22:12:52 -0500 Subject: [PATCH 2/3] FIx bug detected by tests asset_view could be created multiple times when looking at player data. --- Analyzer/Resources/AssetBundle.sql | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Analyzer/Resources/AssetBundle.sql b/Analyzer/Resources/AssetBundle.sql index 26aed13..4e7afdb 100644 --- a/Analyzer/Resources/AssetBundle.sql +++ b/Analyzer/Resources/AssetBundle.sql @@ -8,13 +8,13 @@ CREATE TABLE IF NOT EXISTS asset_dependencies( dependency INTEGER ); -CREATE VIEW asset_view AS +CREATE VIEW IF NOT EXISTS asset_view AS SELECT a.name AS asset_name, o.* FROM assets a INNER JOIN object_view o ON o.id = a.object; -CREATE VIEW asset_dependencies_view AS +CREATE VIEW IF NOT EXISTS asset_dependencies_view AS SELECT a.id, a.asset_name, a.asset_bundle, a.type, od.id dep_id, od.asset_bundle dep_asset_bundle, od.name dep_name, od.type dep_type FROM asset_view a INNER JOIN asset_dependencies d ON a.id = d.object From b536deb94d7f7f08a95eb7546c32975778b8bf18 Mon Sep 17 00:00:00 2001 From: Andrew Skowronski Date: Thu, 11 Dec 2025 22:19:13 -0500 Subject: [PATCH 3/3] Add workflow to run tests on commits --- .github/workflows/test.yml | 58 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 .github/workflows/test.yml diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml new file mode 100644 index 0000000..72cac53 --- /dev/null +++ b/.github/workflows/test.yml @@ -0,0 +1,58 @@ +name: Test UnityDataTools + +on: + workflow_dispatch: + push: + branches: + - main + pull_request: + branches: + - main + +jobs: + test: + strategy: + matrix: + os: [windows, macos] + arch: [x64, arm64] + exclude: + - os: windows + arch: arm64 + - os: macos + arch: x64 + fail-fast: false + + runs-on: ${{ matrix.os }}-latest + + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Setup .NET + uses: actions/setup-dotnet@v4 + with: + dotnet-version: 9.0.x + + - name: Restore dependencies + run: dotnet restore + + - name: Build solution + run: dotnet build -c Release --no-restore + + - name: Run UnityFileSystem.Tests + run: dotnet test UnityFileSystem.Tests/UnityFileSystem.Tests.csproj -c Release --no-build --verbosity normal --logger "trx;LogFileName=UnityFileSystem.Tests.trx" + + - name: Run Analyzer.Tests + run: dotnet test Analyzer.Tests/Analyzer.Tests.csproj -c Release --no-build --verbosity normal --logger "trx;LogFileName=Analyzer.Tests.trx" + + - name: Run UnityDataTool.Tests + run: dotnet test UnityDataTool.Tests/UnityDataTool.Tests.csproj -c Release --no-build --verbosity normal --logger "trx;LogFileName=UnityDataTool.Tests.trx" + + - name: Upload test results + if: always() + uses: actions/upload-artifact@v4 + with: + name: test-results-${{ matrix.os }}-${{ matrix.arch }} + path: | + **/TestResults/*.trx + retention-days: 30