|
2 | 2 |
|
3 | 3 | ## Overview |
4 | 4 |
|
5 | | -# sf-org-object-scanner |
| 5 | +The **Salesforce Objects Scanner Tool** is a Robot Framework–based automation solution designed to analyze Salesforce org data footprint by retrieving record counts across all queryable objects. |
6 | 6 |
|
7 | | -sf-org-object-scanner: A Salesforce CLI utility to scan all queryable sObjects in an org and retrieve record counts per object. Ideal for data volume analysis, storage optimization, LDV risk identification, org health checks, migration planning, and cleanup prioritization. |
| 7 | +The architecture focuses on **safe execution, structured outputs, and scalability**, making it suitable for large Salesforce environments with hundreds to thousands of objects. |
8 | 8 |
|
9 | | -**Author:** Bhimeswara Vamsi Punnam |
| 9 | +The solution combines: |
| 10 | +- Salesforce CLI (`sf`) for metadata discovery and query execution |
| 11 | +- Robot Framework for orchestration and control flow |
| 12 | +- Process-based execution with timeout safeguards |
| 13 | +- Structured JSON outputs and Excel reporting for analysis |
10 | 14 |
|
11 | | -**Role:** Lead SDET / Automation Architect |
| 15 | +--- |
| 16 | + |
| 17 | +## Why This Architecture |
| 18 | + |
| 19 | +Salesforce does not provide a single unified way to retrieve record counts across all objects efficiently. |
| 20 | + |
| 21 | +Key challenges: |
| 22 | +- Large number of objects (standard + custom + tooling) |
| 23 | +- Some objects require filters or are not queryable |
| 24 | +- Long-running queries can block execution |
| 25 | + |
| 26 | +This architecture addresses these challenges by: |
| 27 | + |
| 28 | +- Using Salesforce CLI for consistent and authenticated query execution |
| 29 | +- Applying timeout protection to prevent long-running failures |
| 30 | +- Classifying skipped objects for transparency |
| 31 | +- Generating structured outputs for downstream analysis |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## High-Level Architecture |
| 36 | + |
| 37 | +### Architecture Breakdown |
| 38 | + |
| 39 | +The system follows a layered execution model: |
| 40 | + |
| 41 | +<p align="center"> |
| 42 | + <img src="architecture.png" width="700"> |
| 43 | +</p> |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +### Control Layer (Salesforce CLI) |
| 48 | + |
| 49 | +- Uses `sf sobject list --json` to discover queryable objects |
| 50 | +- Executes `SELECT COUNT()` queries via CLI |
| 51 | +- Handles authentication using Salesforce CLI session |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +### Orchestration Layer (Robot Framework) |
| 56 | + |
| 57 | +- Coordinates full scan workflow |
| 58 | +- Applies filtering logic for unsupported objects |
| 59 | +- Handles retry-safe, deterministic execution |
| 60 | +- Manages logging and reporting |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +### Execution Layer |
| 65 | + |
| 66 | +- Executes queries per object |
| 67 | +- Applies per-query timeout protection |
| 68 | +- Tracks execution duration |
| 69 | +- Ensures controlled and predictable runtime |
| 70 | + |
| 71 | +--- |
| 72 | + |
| 73 | +### Output Layer |
| 74 | + |
| 75 | +- JSON artifacts: |
| 76 | + - `data.json` |
| 77 | + - `tooling.json` |
| 78 | + - `skipped.json` |
| 79 | + - `durations.json` |
| 80 | +- Excel report: |
| 81 | + - `SF_Objects_<timestamp>.xlsx` |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +## Repository Structure |
| 86 | + |
| 87 | +``` |
| 88 | +salesforce-objects-scanner/ |
| 89 | +├── output/ # Generated JSON + Excel reports |
| 90 | +├── results/ # Robot execution logs |
| 91 | +│ ├── log.html |
| 92 | +│ ├── output.xml |
| 93 | +│ └── report.html |
| 94 | +├── src/ |
| 95 | +│ └── robot/ |
| 96 | +│ ├── libraries/ |
| 97 | +│ │ └── ExcelWriter.py |
| 98 | +│ ├── orchestrator/ |
| 99 | +│ │ └── scan.robot |
| 100 | +│ └── resources/ |
| 101 | +│ └── keywords.robot # Core logic |
| 102 | +├── .gitignore |
| 103 | +├── CODE_OF_CONDUCT.md |
| 104 | +├── CONTRIBUTING.md |
| 105 | +├── README.md |
| 106 | +├── requirements.txt |
| 107 | +└── SECURITY.md |
| 108 | +
|
| 109 | +``` |
| 110 | + |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## Folder Responsibilities |
| 115 | + |
| 116 | +- **docs/** – Architecture and design documentation |
| 117 | +- **src/robot/** – Core test suites and libraries |
| 118 | +- **output/** – Generated JSON and Excel reports |
| 119 | +- **results/** – Robot Framework logs and reports |
| 120 | +- **ci/** – CI test suites |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## Execution Model |
| 125 | + |
| 126 | +### Authentication |
| 127 | + |
| 128 | +- Managed via Salesforce CLI (`sf org login web`) |
| 129 | +- No credentials stored in code |
| 130 | +- Session-based authentication reused across commands |
| 131 | + |
| 132 | +--- |
| 133 | + |
| 134 | +### Object Discovery |
| 135 | + |
| 136 | +- Retrieve all objects via CLI |
| 137 | +- Filter: |
| 138 | + - Non-queryable objects |
| 139 | + - Unsupported types |
| 140 | + - Known noisy patterns |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +### Query Execution Flow |
| 145 | + |
| 146 | +1. Discover objects |
| 147 | +2. Filter unsupported objects |
| 148 | +3. Execute `SELECT COUNT()` per object |
| 149 | +4. Apply timeout control |
| 150 | +5. Capture success or skip reason |
| 151 | +6. Track execution duration |
| 152 | +7. Persist results |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## Failure and Handling Model |
| 157 | + |
| 158 | +- Objects that fail are classified into: |
| 159 | + - `COUNT_NOT_SUPPORTED` |
| 160 | + - `REQUIRES_WHERE` |
| 161 | + - `INVALID_TYPE` |
| 162 | +- No retry amplification (predictable execution) |
| 163 | +- Failures are recorded in `skipped.json` |
| 164 | +- Execution continues without interruption |
| 165 | + |
| 166 | +--- |
| 167 | + |
| 168 | +## Security Architecture |
| 169 | + |
| 170 | +- Authentication delegated to Salesforce CLI |
| 171 | +- No credentials stored in repository |
| 172 | +- Uses existing authenticated sessions |
| 173 | +- Sensitive files excluded via `.gitignore` |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +## Runtime vs Source Separation |
| 178 | + |
| 179 | +| Category | Location | Notes | |
| 180 | +|----------------|-------------|--------------------------------| |
| 181 | +| Source code | `src/robot/`| Version-controlled | |
| 182 | +| Outputs | `output/` | Generated at runtime | |
| 183 | +| Reports | `results/` | Execution logs | |
| 184 | +| CI tests | `ci/` | Smoke test automation | |
| 185 | +| Documentation | `docs/` | Architecture & design | |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +## Design Principles |
| 190 | + |
| 191 | +- Deterministic execution (no retries) |
| 192 | +- Timeout-controlled processing |
| 193 | +- Clear separation of concerns |
| 194 | +- Structured and traceable outputs |
| 195 | +- Scalable for large orgs |
| 196 | +- CLI-based authentication (no secrets in code) |
| 197 | +- CI/CD compatible and headless execution ready |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +## Scalability Considerations |
| 202 | + |
| 203 | +- Handles hundreds to thousands of objects |
| 204 | +- Sequential execution ensures stability |
| 205 | +- Future-ready for parallel execution (Pabot) |
| 206 | +- Performance depends on: |
| 207 | + - Org size |
| 208 | + - Network latency |
| 209 | + - Query response time |
| 210 | + |
| 211 | +--- |
| 212 | + |
| 213 | +## Extensibility |
| 214 | + |
| 215 | +The framework can be extended with: |
| 216 | + |
| 217 | +- Additional filters for object classification |
| 218 | +- Parallel execution support (Pabot) |
| 219 | +- Custom analytics on output data |
| 220 | +- Integration with dashboards or databases |
| 221 | + |
| 222 | +--- |
| 223 | + |
| 224 | +## Observability and Monitoring |
| 225 | + |
| 226 | +- Robot Framework HTML reports (`log.html`, `report.html`) |
| 227 | +- JSON outputs for structured analysis |
| 228 | +- Execution duration tracking per object |
| 229 | +- Clear success vs skip visibility |
| 230 | + |
| 231 | +--- |
| 232 | + |
| 233 | +## Deployment Model |
| 234 | + |
| 235 | +- Local developer environments |
| 236 | +- CI/CD pipelines (GitHub Actions, Jenkins) |
| 237 | +- Headless execution environments |
| 238 | +- Containerized environments (future scope) |
| 239 | + |
| 240 | +--- |
| 241 | + |
| 242 | +**Author:** Bhimeswara Vamsi Punnam |
| 243 | +**Role:** Lead Software Development Engineer in Test (SDET) |
0 commit comments