Open
Conversation
- Replace json module with orjson for improved performance - Update all json.loads() calls to orjson.loads() - Update all json.dumps() calls to orjson.dumps() - Change file I/O from text mode to binary mode (required for orjson) - Update exception handling from json.decoder.JSONDecodeError to orjson.JSONDecodeError - Update test file to use orjson for consistency - All existing tests pass successfully
feat: add disks.txt parser with df output support
Migrate logarchive parser from json to orjson
- Add UID field to all ps_everywhere output entries - Extract UID from ps.txt, psthread.txt, spindump-nosymbols.txt, and logarchive (euid) - Add _sanitize_uid() helper to filter invalid placeholder UIDs (0xAAAAAAAA, 0xFFFFFFFF) - Update deduplication logic to consider UID as part of uniqueness - Same process with different UIDs now tracked as separate entries - Use None for missing/invalid UIDs (not 0) - All tests pass successfully
…here Feature/add uid to ps everywhere
- Extract PID and PPID from sources that provide them (ps.txt, psthread.txt, spindump-nosymbols.txt) - Extract PID only from sources without PPID (logarchive, shutdownlogs, taskinfo) - Set both to None for sources without process ID information - Maintain consistent data structure across all sources with pid and ppid fields
- Build PID-to-name mapping from ps.txt, psthread.txt, spindump, and taskinfo - Resolve PPID to parent process name using the mapping - Use direct 'parent' field from spindump when available - Add ppname field to all output entries (None when not resolvable) - Enriches ~22% of entries with parent process names in typical datasets
…here Feature/add uid to ps everywhere
Contributor
|
Hi @itayfoT , could you please also remove the changes to the file ps_everywhere.py? And can you also share some performance tests that you did? @cvandeplas did some and initially performance wise orjson did not look that performant. The more data you can share there, the better it will help us to assess. |
- Increase chunk size from 64KB to 1MB for 10-15x faster processing - Increase subprocess buffer to 2MB for better pipe utilization - Fix duplicate message field by extracting message before passing to Event data - Switch to binary mode with buffered reading for reduced overhead - Update log_stderr to handle binary mode properly
…2.4x speedup - Always use unifiedlog_iterator instead of native macOS log parser - Use --output-format event to get Event format directly from Rust - Let Rust write file directly (--output) for zero Python overhead - Use 10 threads for maximum performance (320K+ lines/sec) - Remove Python-side format conversion (no longer needed) - Simplify generator to just parse JSON (already in Event format) Performance improvement: - Before: 38,774 lines/sec (Python conversion) - After: 94,220 lines/sec (Rust direct) - Speedup: 2.4x faster, saves 68 seconds on 4.4M entries
- Add BasebandMetrics_TelephonyRegistration_1_2 query for iOS 15-18 - Add BasebandMetrics_TelephonyActivity_1_2 query for iOS 15-18 - iOS 18 uses new table naming scheme (BasebandMetrics_*_1_2 vs PLBBAGENT_*) - Enables cell tower registration and RAT activity parsing on modern iOS
feat(apollo): add iOS 15-18 support for telephony tables
…ents - Enrich __extract_plist_mdm_data with ServerURL, CheckInURL, Topic, ServerCapabilities, IdentityCertificateUUID, UDID, and other MDM fields - Add __extract_plist_profile_events for MCProfileEvents.plist timeline (install/remove operations with process and timestamp per profile) - Add _build_profile_hash_map to resolve profile stub SHA-256 hashes from PayloadIdentifier/PayloadUUID, included in both MDM and profile event data for cross-referencing Co-authored-by: Cursor <cursoragent@cursor.com>
JetsamEvent crash logs contain a full snapshot of all running processes (~750 per event). This adds crashlogs as a new source, extracting process names and PIDs from JetsamEvent entries, and procPath/procName with userID/parentPid/parentProc from non-JetsamEvent crash reports. Co-authored-by: Cursor <cursoragent@cursor.com>
…ports Made-with: Cursor
The parse_ips_file method never stored the basename of the .ips file, causing ghost crash detection to miss files with collision suffixes like .000.ips that iOS appends for same-second crashes. Made-with: Cursor
Crash reports from paired devices are stored under ProxiedDevice-<hash>/ directories and should not be parsed as iPhone crashes. Made-with: Cursor
The filter 'id' not in k was stripping bundle_id from parsed messages because it contains the substring 'id'. Changed to 'table id' not in k to only filter out table primary key fields while preserving forensically critical fields like bundle_id. Made-with: Cursor
… scanner - LogarchiveParser.get_log_files() now globs both system_logs.logarchive and collect_system_logs.logarchive; existing multi-folder merge logic handles dedup - yarascan ignore_folders includes collect_system_logs.logarchive alongside system_logs.logarchive; fixed prior .pop() that would crash if folder absent Made-with: Cursor
…ime field Made-with: Cursor
Log which logarchive folders are found, parse output sizes, and handle RuntimeError from unifiedlog_iterator failures to diagnose whether collect_system_logs.logarchive is being parsed or silently skipped. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey,
To improve performance, we changed to use orjson