A Ktor-based REST API to validate if accounting datasets follow Benford’s Law statistically.
Accounting Data:
When we send accounts data we have to rearrange the data from accounts according to the string format provided below.
Result:
The result consist of following information,
observedDistribution - observed distribution of the digits and the percentage of occurrence in the dataset
expectedDistribution - expected distribution of the digits if the data is in compliance with the Benford's Law
chiSquareStatistic
criticalValue
isBenfordCompliant - true/false
To build or run the project, use one of the following tasks:
Task | Description |
---|---|
./gradlew test |
Run the tests |
./gradlew build |
Build everything |
./gradlew run |
Run the server |
curl -X POST http://0.0.0.0:8080/api/v1/analysis/benfords
-H "Content-Type: application/json"
-d '{ "data": "<Accounting Data>", "significanceLevel": <Significance Level>}'
The string consists of key value pairs. Key and value separated by a colon. Each key value pair is separated by a semicolon and a space.
<key>:<value>; <key>:<value>;
Example: "inv_004:1123.20; exp_utilities:1560.00; refund_01:115.75; deposit_01:1980.00;
This dataset is compliance with the Benford's law.
Request:
curl -H "Content-Type: application/json" -X POST http://0.0.0.0:8080/api/v1/analysis/benfords -d '{ "data": "inv_001:1230.00; inv_002:1050.00; inv_003:1875.50; inv_004:1123.20; exp_utilities:1560.00; refund_01:115.75; deposit_01:1980.00; inv_005:2267.50; exp_software:2450.00; salary_01:2890.00; tax_vat:235.60; inv_006:3199.99; payment_03:3214.00; consulting:3250.00; exp_marketing:450.00; fee_bank:415.20; exp_office:567.89; adjustment_01:523.45; exp_travel:632.00; misc_01:678.10; payment_02:721.15; refund_02:875.50; insurance:912.00", "significanceLevel": 0.05}'
Response:
{
"observedDistribution":{
"1":30,
"2":17,
"3":13,
"4":8,
"5":8,
"6":8,
"7":4,
"8":4,
"9":4
},
"expectedDistribution":{
"1":30,
"2":17,
"3":12,
"4":9,
"5":7,
"6":6,
"7":5,
"8":5,
"9":4
},
"chiSquareStatistic":0.3,
"criticalValue":15.51,
"isBenfordCompliant":true
}
This dataset is not compliance with the Benford's law.
Request:
curl -H "Content-Type: application/json" -X POST http://0.0.0.0:8080/api/v1/analysis/benfords -d '{ "data": "inv_001:9230.00; inv_002:9050.00; inv_003:9875.50; inv_004:1923.20; exp_utilities:9560.00; refund_01:915.75; deposit_01:9980.00; inv_005:2267.50; exp_software:2450.00; salary_01:2890.00; tax_vat:235.60; inv_006:3199.99; payment_03:3214.00; consulting:3250.00; exp_marketing:450.00; fee_bank:415.20; exp_office:567.89; adjustment_01:523.45; exp_travel:632.00; misc_01:678.10; payment_02:721.15; refund_02:875.50; insurance:912.00", "significanceLevel": 0.05}'
Response:
{
"observedDistribution":{
"1":4,
"2":17,
"3":13,
"4":8,
"5":8,
"6":8,
"7":4,
"8":4,
"9":30
},
"expectedDistribution":{
"1":30,
"2":17,
"3":12,
"4":9,
"5":7,
"6":6,
"7":5,
"8":5,
"9":4
},
"chiSquareStatistic":38.97,
"criticalValue":15.51,
"isBenfordCompliant":false
}
Large datasets available in src/test/resources folder
curl -H "Content-Type: application/json" -X POST http://0.0.0.0:8080/api/v1/analysis/benfords -d@src/test/resources/1000_dataset.json
Input Validation
- Checks for
key:value;
format in the input string. - Validates numeric values and significance level range (0.01 to 0.1).
Statistical Analysis
- Calculates observed vs. expected digit frequencies (digits 1–9).
- Performs a chi-square (χ²) test with 8 degrees of freedom (df = 9 - 1).
- Returns
isBenfordCompliant: Boolean
based on the test result.
Error Handling
- Custom exceptions (
InvalidInputException
,StatisticalCalculationException
). - HTTP status codes (400 Bad Request, 415 Unsupported Media Type).
Separation of Concerns
- Router (HTTP layer) vs. Service (business logic) vs. Model (data).
Dependency Injection
- Koin framework for dependency management.
by inject<T>()
for lazy service initialization.
Immutable Data
- Uses
data class
for responses (e.g.,BenfordResult
).
Input Sanitization
- Rejects malformed data (e.g., missing
:
or non-numeric values).
CI/CD
- GitHub Actions for automated testing on every push.
Category | Tools/Libraries Used |
---|---|
Language | Kotlin (JVM 17+) |
Framework | Ktor (Netty engine) |
Testing | JUnit 5 |
CI/CD | GitHub Actions |
Logging | Logback |
org.apache.commons:commons-math3
Used for statistical calculations (chi-square distribution and inverse CDF) in Benford's Law validation.
- Implement API rate limiting (e.g., 100 requests/minute per IP)
- Configurable tiers for different user types
- Stream processing for large datasets (>1GB)
- Multipart file upload endpoint for large datasets
- Support CSV/Excel file uploads
- Swagger/OpenAPI documentation endpoint
- Prometheus metrics endpoint (/metrics)
- JWT authentication