Skip to content

Sample service exposing an API that can be used to validate and enrich large quantities of trade data

Notifications You must be signed in to change notification settings

eszpakowski/trade-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Read Me First

Sample service exposing an API that can be used to validate a .csv file containing trading data and enrich it with additional content if necessary. By design the service needs to be able to handle very large sets of trades (millions) and a large set of products (10k to 100k).

The list of currently supported products:

productId productName
1 Treasury Bills Domestic
2 Corporate Bonds Domestic
3 REPO Domestic
4 Interest rate swaps International
5 OTC Index Option
6 Currency Options
7 Reverse Repos International
8 REPO International
9 766A_CORP BD
10 766B_CORP BD

Requirements

To build and run the application you need the following:

How to run the service

There are several ways to run a Spring Boot application on your local machine. One way is to execute the main method in the com.interview.AbcBankTradeProcessingApplication class from your IDE.

Alternatively you can use the Spring Boot Maven plugin like so:

mvn spring-boot:run

Similarly, test suite can be run either from your IDE or from a terminal by executing:

mvn spring-boot:test-run

How to use the API

The service expects a multipart/form-data form with .csv file, containing a header row and a comma-separated list of values with the following structure/format:

date productId currency price
20250101 1 EUR 10.0
20250101 2 EUR 20.1
20250101 3 EUR 30.34

An example .csv file is located in /src/test/resources.

With the service running locally, and after changing the local directory to this code repository, you can run an example flow like so:

curl --form file="@src/test/resources/trade.csv" --header 'Content-Type: multipart/form-data' http://localhost:8080/api/v1/enrich

Discussion/comment

  • I believe that because parsing a file is inherently a sequential task, simply adding more threads won't result in an easy performance boost (because of synchronization). On the other hand we can easily end up with a much more complex solution.

  • Performance is achieved by doing all the work in-memory, but this means the file needs to be streamed line-by-line to not cause OutOfMemoryErrors when working with large files.

  • Of course off-loading the work to a database engine (for instance Postgres) could give us some benefits if data operations become more complex but this brings the additional cost of communication with the database, how this affects the general performance could be assessed with a simple POC.

  • External cache like Redis is another possibility, especially if RAM usage will become a burden.

  • With this approach we don't track what exact files arrived and were returned to the client, I think a safer approach would be to work with files being uploaded to a directory, move incoming files to preserve them, then generate a file that can be downloaded by normal means not involving our API.

  • Logs below from a manual performance test with 10KK trades and 100K products on AMD Ryzen 5 7600X (4.7 GHz, 6 cores).

TODO:

  • Swagger documentation
  • Performance testing with a tool like Gatling
  • Maybe changing GC to non-generational could improve performance? We could also perform GC manually after each batch.
curl --form file="@src/test/resources/trade_full.csv" --header 'Content-Type: multipart/form-data' http://localhost:8080/api/v1/enrich --output output.csv -v
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
* using HTTP/1.x
> POST /api/v1/enrich HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.13.0
> Accept: */*
> Content-Length: 277781740
> Content-Type: multipart/form-data; boundary=------------------------TFXvmR2A0lG5EvKeqOrpmd
> Expect: 100-continue
>
< HTTP/1.1 100
<
} [65536 bytes data]
 46  264M    0     0   46  122M      0   243M  0:00:01 --:--:--  0:00:01  244M* upload completely sent off: 277781740 bytes
< HTTP/1.1 200
< Transfer-Encoding: chunked
< Date: Fri, 25 Jul 2025 10:04:28 GMT
<
{ [8110 bytes data]
100  644M    0  379M  100  264M  8527k  5955k  0:00:45  0:00:45 --:--:-- 8674k

Reference Documentation

For further reference, please consider the following sections:

Guides

The following guides illustrate how to use some features concretely:

About

Sample service exposing an API that can be used to validate and enrich large quantities of trade data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages