Read Me First

Sample service exposing an API that can be used to validate a .csv file containing trading data and enrich it with additional content if necessary. By design the service needs to be able to handle very large sets of trades (millions) and a large set of products (10k to 100k).

The list of currently supported products:

productId	productName
1	Treasury Bills Domestic
2	Corporate Bonds Domestic
3	REPO Domestic
4	Interest rate swaps International
5	OTC Index Option
6	Currency Options
7	Reverse Repos International
8	REPO International
9	766A_CORP BD
10	766B_CORP BD

Requirements

To build and run the application you need the following:

How to run the service

There are several ways to run a Spring Boot application on your local machine. One way is to execute the main method in the com.interview.AbcBankTradeProcessingApplication class from your IDE.

Alternatively you can use the Spring Boot Maven plugin like so:

mvn spring-boot:run

Similarly, test suite can be run either from your IDE or from a terminal by executing:

mvn spring-boot:test-run

How to use the API

The service expects a multipart/form-data form with .csv file, containing a header row and a comma-separated list of values with the following structure/format:

date	productId	currency	price
20250101	1	EUR	10.0
20250101	2	EUR	20.1
20250101	3	EUR	30.34

An example .csv file is located in /src/test/resources.

With the service running locally, and after changing the local directory to this code repository, you can run an example flow like so:

curl --form file="@src/test/resources/trade.csv" --header 'Content-Type: multipart/form-data' http://localhost:8080/api/v1/enrich

Discussion/comment

I believe that because parsing a file is inherently a sequential task, simply adding more threads won't result in an easy performance boost (because of synchronization). On the other hand we can easily end up with a much more complex solution.
Performance is achieved by doing all the work in-memory, but this means the file needs to be streamed line-by-line to not cause OutOfMemoryErrors when working with large files.
Of course off-loading the work to a database engine (for instance Postgres) could give us some benefits if data operations become more complex but this brings the additional cost of communication with the database, how this affects the general performance could be assessed with a simple POC.
External cache like Redis is another possibility, especially if RAM usage will become a burden.
With this approach we don't track what exact files arrived and were returned to the client, I think a safer approach would be to work with files being uploaded to a directory, move incoming files to preserve them, then generate a file that can be downloaded by normal means not involving our API.
Logs below from a manual performance test with 10KK trades and 100K products on AMD Ryzen 5 7600X (4.7 GHz, 6 cores).

TODO:

Swagger documentation
Performance testing with a tool like Gatling
Maybe changing GC to non-generational could improve performance? We could also perform GC manually after each batch.

curl --form file="@src/test/resources/trade_full.csv" --header 'Content-Type: multipart/form-data' http://localhost:8080/api/v1/enrich --output output.csv -v
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
* using HTTP/1.x
> POST /api/v1/enrich HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.13.0
> Accept: */*
> Content-Length: 277781740
> Content-Type: multipart/form-data; boundary=------------------------TFXvmR2A0lG5EvKeqOrpmd
> Expect: 100-continue
>
< HTTP/1.1 100
<
} [65536 bytes data]
 46  264M    0     0   46  122M      0   243M  0:00:01 --:--:--  0:00:01  244M* upload completely sent off: 277781740 bytes
< HTTP/1.1 200
< Transfer-Encoding: chunked
< Date: Fri, 25 Jul 2025 10:04:28 GMT
<
{ [8110 bytes data]
100  644M    0  379M  100  264M  8527k  5955k  0:00:45  0:00:45 --:--:-- 8674k

Reference Documentation

For further reference, please consider the following sections:

Guides

The following guides illustrate how to use some features concretely:

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Read Me First

Requirements

How to run the service

How to use the API

Discussion/comment

Reference Documentation

Guides

About

Uh oh!

Releases

Packages

Languages

eszpakowski/trade-processing

Folders and files

Latest commit

History

Repository files navigation

Read Me First

Requirements

How to run the service

How to use the API

Discussion/comment

Reference Documentation

Guides

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages