Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 0 additions & 10 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,18 @@ output.xml
log.html
report.html
pabot_results/
*.png
*diagnosis.json
.robotframework/

# Salesforce CLI
.sf/
.sfdx/
org_info.json
deploy-options.json
*.auth.json

# Downloads & temp
downloads/*
!downloads/.gitkeep
output/*
!output/.gitkeep
chromedriver.exe
chromedriver

.pabotsuitenames

Expand All @@ -54,10 +48,6 @@ Desktop.ini
*.swp
*.swo

# Generated docs/files from runs (if created at root)
CDL_DOC
CV_DOC
smoke_doc

# UUID-like run folders that may end up at repo root
????????????????????????????????
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ sf org login web
Before submitting changes, ensure all tests pass:

```bash
robot src/robot/tests/
robot src/robot/orchestrator/
```

---
Expand Down
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ Native Salesforce tools have limitations:

This design ensures predictable, scalable, and observable execution.

<p align="center">
<img src="docs/architecture.png" width="700">
</p>

## Technology Stack

- Robot Framework
Expand Down Expand Up @@ -121,7 +125,7 @@ pip install -r requirements.txt
```
2. Run:
```bash
robot -d results --variable ORG_ALIAS:<org_name> src/robot/tests/Test.robot
robot -d results --variable ORG_ALIAS:<org_name> src/robot/orchestrator/scan.robot
```
3. Check outputs:
```text
Expand All @@ -134,20 +138,20 @@ pip install -r requirements.txt

```
salesforce-objects-scanner/
├── output/ # Generated JSON + Excel reports
├── results/ # Robot execution logs
├── output/ # Generated JSON + Excel reports
├── results/ # Robot execution logs
│ ├── log.html
│ ├── output.xml
│ └── report.html
├── src/
│ └── robot/
│ ├── libraries/
│ │ └── ExcelWriter.py
│ └── tests/
│ └── Test.robot
├── Support.robot # Core logic
│ ├── orchestrator/
│ │ └── scan.robot
│ └── resources/
│ └── keywords.robot # Core logic
├── .gitignore
├── .pabotsuitenames # Pabot suite cache file
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── README.md
Expand Down
2 changes: 1 addition & 1 deletion ci/robot/smoke.robot
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Library OperatingSystem
Library Collections
Library BuiltIn
Library String
Resource ../../src/robot/tests/Support.robot
Resource ../../src/robot/resources/keywords.robot

*** Keywords ***
Is Windows
Expand Down
240 changes: 236 additions & 4 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,242 @@

## Overview

# sf-org-object-scanner
The **Salesforce Objects Scanner Tool** is a Robot Framework–based automation solution designed to analyze Salesforce org data footprint by retrieving record counts across all queryable objects.

sf-org-object-scanner: A Salesforce CLI utility to scan all queryable sObjects in an org and retrieve record counts per object. Ideal for data volume analysis, storage optimization, LDV risk identification, org health checks, migration planning, and cleanup prioritization.
The architecture focuses on **safe execution, structured outputs, and scalability**, making it suitable for large Salesforce environments with hundreds to thousands of objects.

**Author:** Bhimeswara Vamsi Punnam
The solution combines:
- Salesforce CLI (`sf`) for metadata discovery and query execution
- Robot Framework for orchestration and control flow
- Process-based execution with timeout safeguards
- Structured JSON outputs and Excel reporting for analysis

**Role:** Lead SDET / Automation Architect
---

## Why This Architecture

Salesforce does not provide a single unified way to retrieve record counts across all objects efficiently.

Key challenges:
- Large number of objects (standard + custom + tooling)
- Some objects require filters or are not queryable
- Long-running queries can block execution

This architecture addresses these challenges by:

- Using Salesforce CLI for consistent and authenticated query execution
- Applying timeout protection to prevent long-running failures
- Classifying skipped objects for transparency
- Generating structured outputs for downstream analysis

---

## High-Level Architecture

### Architecture Breakdown

The system follows a layered execution model:

<p align="center">
<img src="architecture.png" width="700">
</p>

---

### Control Layer (Salesforce CLI)

- Uses `sf sobject list --json` to discover queryable objects
- Executes `SELECT COUNT()` queries via CLI
- Handles authentication using Salesforce CLI session

---

### Orchestration Layer (Robot Framework)

- Coordinates full scan workflow
- Applies filtering logic for unsupported objects
- Handles retry-safe, deterministic execution
- Manages logging and reporting

---

### Execution Layer

- Executes queries per object
- Applies per-query timeout protection
- Tracks execution duration
- Ensures controlled and predictable runtime

---

### Output Layer

- JSON artifacts:
- `data.json`
- `tooling.json`
- `skipped.json`
- `durations.json`
- Excel report:
- `SF_Objects_<timestamp>.xlsx`

---

## Repository Structure

```
salesforce-objects-scanner/
├── output/ # Generated JSON + Excel reports
├── results/ # Robot execution logs
│ ├── log.html
│ ├── output.xml
│ └── report.html
├── src/
│ └── robot/
│ ├── libraries/
│ │ └── ExcelWriter.py
│ ├── orchestrator/
│ │ └── scan.robot
│ └── resources/
│ └── keywords.robot # Core logic
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── README.md
├── requirements.txt
└── SECURITY.md

```


---

## Folder Responsibilities

- **docs/** – Architecture and design documentation
- **src/robot/** – Core test suites and libraries
- **output/** – Generated JSON and Excel reports
- **results/** – Robot Framework logs and reports
- **ci/** – CI test suites

---

## Execution Model

### Authentication

- Managed via Salesforce CLI (`sf org login web`)
- No credentials stored in code
- Session-based authentication reused across commands

---

### Object Discovery

- Retrieve all objects via CLI
- Filter:
- Non-queryable objects
- Unsupported types
- Known noisy patterns

---

### Query Execution Flow

1. Discover objects
2. Filter unsupported objects
3. Execute `SELECT COUNT()` per object
4. Apply timeout control
5. Capture success or skip reason
6. Track execution duration
7. Persist results

---

## Failure and Handling Model

- Objects that fail are classified into:
- `COUNT_NOT_SUPPORTED`
- `REQUIRES_WHERE`
- `INVALID_TYPE`
- No retry amplification (predictable execution)
- Failures are recorded in `skipped.json`
- Execution continues without interruption

---

## Security Architecture

- Authentication delegated to Salesforce CLI
- No credentials stored in repository
- Uses existing authenticated sessions
- Sensitive files excluded via `.gitignore`

---

## Runtime vs Source Separation

| Category | Location | Notes |
|----------------|-------------|--------------------------------|
| Source code | `src/robot/`| Version-controlled |
| Outputs | `output/` | Generated at runtime |
| Reports | `results/` | Execution logs |
| CI tests | `ci/` | Smoke test automation |
| Documentation | `docs/` | Architecture & design |

---

## Design Principles

- Deterministic execution (no retries)
- Timeout-controlled processing
- Clear separation of concerns
- Structured and traceable outputs
- Scalable for large orgs
- CLI-based authentication (no secrets in code)
- CI/CD compatible and headless execution ready

---

## Scalability Considerations

- Handles hundreds to thousands of objects
- Sequential execution ensures stability
- Future-ready for parallel execution (Pabot)
- Performance depends on:
- Org size
- Network latency
- Query response time

---

## Extensibility

The framework can be extended with:

- Additional filters for object classification
- Parallel execution support (Pabot)
- Custom analytics on output data
- Integration with dashboards or databases

---

## Observability and Monitoring

- Robot Framework HTML reports (`log.html`, `report.html`)
- JSON outputs for structured analysis
- Execution duration tracking per object
- Clear success vs skip visibility

---

## Deployment Model

- Local developer environments
- CI/CD pipelines (GitHub Actions, Jenkins)
- Headless execution environments
- Containerized environments (future scope)

---

**Author:** Bhimeswara Vamsi Punnam
**Role:** Lead Software Development Engineer in Test (SDET)
Binary file added docs/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
*** Settings ***
Documentation Retrieve and log record counts for Salesforce objects from Excel list
Resource Support.robot
Resource ../resources/keywords.robot
Suite Teardown Cleanup Suite

*** Test Cases ***
Object_Scanner
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ ${DISCOVER_TOOLING_OBJECTS} ${TRUE}
@{COUNT_NOT_SUPPORTED_OBJECTS} DataEncryptionKey
# Known objects that require additional WHERE clause (avoid wasting time; classify clearly)
@{REQUIRES_WHERE_OBJECTS} DataStatistics
@{TEMP_FILES} PIPE log.html output.xml report.html

*** Keywords ***
Check Prerequisites
Expand Down Expand Up @@ -523,3 +524,22 @@ Save Results To Excel
Log To Console ${result.stdout}
Log To Console ${result.stderr}
Should Be Equal As Integers ${result.rc} 0 Excel generation failed:\n${result.stdout}\n${result.stderr}

Cleanup Runtime Artifacts
[Documentation] Cleans Pabot temp files, Excel handles, and process artifacts from project root.
${items}= List Directory ${EXECDIR}
FOR ${item} IN @{items}
${full_path}= Set Variable ${EXECDIR}${/}${item}
${is_uuid}= Evaluate len($item) == 32 and all(c in "0123456789abcdef" for c in $item)
${is_known_temp}= Evaluate $item in $TEMP_FILES
IF ${is_uuid} or ${is_known_temp}
${is_file}= Run Keyword And Return Status File Should Exist ${full_path}
IF ${is_file}
Log Removing temp file: ${item}
Remove File ${full_path}
END
END
END

Cleanup Suite
Cleanup Runtime Artifacts
Loading