Migrate to orjson by itayfoT · Pull Request #209 · EC-DIGIT-CSIRC/sysdiagnose

itayfoT · 2025-11-04T16:09:40Z

Hey,

To improve performance, we changed to use orjson

- Replace json module with orjson for improved performance - Update all json.loads() calls to orjson.loads() - Update all json.dumps() calls to orjson.dumps() - Change file I/O from text mode to binary mode (required for orjson) - Update exception handling from json.decoder.JSONDecodeError to orjson.JSONDecodeError - Update test file to use orjson for consistency - All existing tests pass successfully

feat: add disks.txt parser with df output support

Migrate logarchive parser from json to orjson

- Add UID field to all ps_everywhere output entries - Extract UID from ps.txt, psthread.txt, spindump-nosymbols.txt, and logarchive (euid) - Add _sanitize_uid() helper to filter invalid placeholder UIDs (0xAAAAAAAA, 0xFFFFFFFF) - Update deduplication logic to consider UID as part of uniqueness - Same process with different UIDs now tracked as separate entries - Use None for missing/invalid UIDs (not 0) - All tests pass successfully

…here Feature/add uid to ps everywhere

- Extract PID and PPID from sources that provide them (ps.txt, psthread.txt, spindump-nosymbols.txt) - Extract PID only from sources without PPID (logarchive, shutdownlogs, taskinfo) - Set both to None for sources without process ID information - Maintain consistent data structure across all sources with pid and ppid fields

- Build PID-to-name mapping from ps.txt, psthread.txt, spindump, and taskinfo - Resolve PPID to parent process name using the mapping - Use direct 'parent' field from spindump when available - Add ppname field to all output entries (None when not resolvable) - Enriches ~22% of entries with parent process names in typical datasets

…here Feature/add uid to ps everywhere

dario-br · 2025-11-14T12:54:24Z

Hi @itayfoT , could you please also remove the changes to the file ps_everywhere.py? And can you also share some performance tests that you did? @cvandeplas did some and initially performance wise orjson did not look that performant. The more data you can share there, the better it will help us to assess.

- Increase chunk size from 64KB to 1MB for 10-15x faster processing - Increase subprocess buffer to 2MB for better pipe utilization - Fix duplicate message field by extracting message before passing to Event data - Switch to binary mode with buffered reading for reduced overhead - Update log_stderr to handle binary mode properly

…2.4x speedup - Always use unifiedlog_iterator instead of native macOS log parser - Use --output-format event to get Event format directly from Rust - Let Rust write file directly (--output) for zero Python overhead - Use 10 threads for maximum performance (320K+ lines/sec) - Remove Python-side format conversion (no longer needed) - Simplify generator to just parse JSON (already in Event format) Performance improvement: - Before: 38,774 lines/sec (Python conversion) - After: 94,220 lines/sec (Rust direct) - Speedup: 2.4x faster, saves 68 seconds on 4.4M entries

- Add BasebandMetrics_TelephonyRegistration_1_2 query for iOS 15-18 - Add BasebandMetrics_TelephonyActivity_1_2 query for iOS 15-18 - iOS 18 uses new table naming scheme (BasebandMetrics_*_1_2 vs PLBBAGENT_*) - Enables cell tower registration and RAT activity parsing on modern iOS

feat(apollo): add iOS 15-18 support for telephony tables

…ents - Enrich __extract_plist_mdm_data with ServerURL, CheckInURL, Topic, ServerCapabilities, IdentityCertificateUUID, UDID, and other MDM fields - Add __extract_plist_profile_events for MCProfileEvents.plist timeline (install/remove operations with process and timestamp per profile) - Add _build_profile_hash_map to resolve profile stub SHA-256 hashes from PayloadIdentifier/PayloadUUID, included in both MDM and profile event data for cross-referencing Co-authored-by: Cursor <cursoragent@cursor.com>

JetsamEvent crash logs contain a full snapshot of all running processes (~750 per event). This adds crashlogs as a new source, extracting process names and PIDs from JetsamEvent entries, and procPath/procName with userID/parentPid/parentProc from non-JetsamEvent crash reports. Co-authored-by: Cursor <cursoragent@cursor.com>

…ports Made-with: Cursor

The parse_ips_file method never stored the basename of the .ips file, causing ghost crash detection to miss files with collision suffixes like .000.ips that iOS appends for same-second crashes. Made-with: Cursor

Crash reports from paired devices are stored under ProxiedDevice-<hash>/ directories and should not be parsed as iPhone crashes. Made-with: Cursor

The filter 'id' not in k was stripping bundle_id from parsed messages because it contains the substring 'id'. Changed to 'table id' not in k to only filter out table primary key fields while preserving forensically critical fields like bundle_id. Made-with: Cursor

… scanner - LogarchiveParser.get_log_files() now globs both system_logs.logarchive and collect_system_logs.logarchive; existing multi-folder merge logic handles dedup - yarascan ignore_folders includes collect_system_logs.logarchive alongside system_logs.logarchive; fixed prior .pop() that would crash if folder absent Made-with: Cursor

Made-with: Cursor

…ime field Made-with: Cursor

Log which logarchive folders are found, parse output sizes, and handle RuntimeError from unifiedlog_iterator failures to diagnose whether collect_system_logs.logarchive is being parsed or silently skipped. Made-with: Cursor

itayfoLY and others added 13 commits October 14, 2025 13:59

Merge pull request #8 from envoidshield/feat/disks-parser

d4baf06

feat: add disks.txt parser with df output support

Merge pull request #9 from envoidshield/migrate-logarchive-to-orjson

a820d60

Migrate logarchive parser from json to orjson

Update sysdiagnose testdata

712deda

Merge branch 'main' into main

23f4da6

Merge branch 'main' into feature/add-uid-to-ps-everywhere

d44decc

fix: remove trailing whitespace from blank lines

e10057c

Merge pull request #11 from envoidshield/feature/add-uid-to-ps-everyw…

1f29d8b

…here Feature/add uid to ps everywhere

Fix code style: remove extra space in message parameter

b88df4b

Merge pull request #12 from envoidshield/feature/add-uid-to-ps-everyw…

2660e2a

…here Feature/add uid to ps everywhere

dario-br mentioned this pull request Nov 13, 2025

Feature/add uid to ps everywhere #210

Open

itayfoLY and others added 15 commits November 27, 2025 18:51

Remove useless fields, use parallel processing

c393b8d

Merge pull request #13 from envoidshield/feature/ios18-telephony-tables

14c8857

feat(apollo): add iOS 15-18 support for telephony tables

Enhance MCSTATE extraction to give more metadatag

f8a2818

fix(ps_everywhere): extract userID/parentPid/parentProc from crash re…

3159e09

…ports Made-with: Cursor

fix(crashlogs): store actual filename with iOS collision suffix

2782d78

The parse_ips_file method never stored the basename of the .ips file, causing ghost crash detection to miss files with collision suffixes like .000.ips that iOS appends for same-second crashes. Made-with: Cursor

fix(crashlogs): exclude ProxiedDevice (Apple Watch) crash reports

8d0d277

Crash reports from paired devices are stored under ProxiedDevice-<hash>/ directories and should not be parsed as iPhone crashes. Made-with: Cursor

EN-433: skip empty logarchive parse results to prevent merge_files crash

9490e11

Made-with: Cursor

EN-433: fix get_first_and_last_entries to skip footer lines without t…

7527d0c

…ime field Made-with: Cursor

Add diagnostic logging to logarchive parser

6f7d2b8

Log which logarchive folders are found, parse output sizes, and handle RuntimeError from unifiedlog_iterator failures to diagnose whether collect_system_logs.logarchive is being parsed or silently skipped. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to orjson#209

Migrate to orjson#209
itayfoT wants to merge 29 commits intoEC-DIGIT-CSIRC:mainfrom
envoidshield:main

itayfoT commented Nov 4, 2025

Uh oh!

dario-br commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

itayfoT commented Nov 4, 2025

Uh oh!

dario-br commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants