[FSU] Inference FSU with Shared memory #2969

DonghakPark · 2025-02-25T09:44:02Z

[FSU] Inference FSU with Shared memory

To reduce memory usage during inference by utilizing FSU, and to minimize speed degradation by performing loading during forwarding, changed to use shared memory. and ensure the existing swap in training mode is also performed normally.

Commit 1 : [FSU] Update FSU Forwarding (Load) Logis

Change FSU Forwarding Logic ( Load weight with look ahead)

Commit 2 : [FSU] Update swap device & cache element

Update Swap Device's function to Support FSU (Inference)

Commit 3 : [FSU] Update FSU mem allocate Logic

Update Memory Allocation to Shared Mem

Commit 4 : [FSU] add FSU file offset info

Add Weight bin file offset that can pass to swap device

Commit 5 : [FSU] Apply Shared Mem & FSU

Update Logic to support both Inference Mode & Training Mode

This PR was include #2957 #2927 #2949, So i will close previous PRs

SeoHyungjun

Has the formatting style changed?
I notice some differences despite no code modifications.

nntrainer/graph/network_graph.cpp

haehun · 2025-03-04T02:14:36Z

nntrainer/models/neuralnet.cpp

+      model_graph.LoadTensors(f);
      model_graph.checkLoadComplete(f);
      node->forwarding(training);
-      model_graph.UnloadTensors(f);


No need to unload now?

Yes, no longer need unload --> at unload logic there are 2 work : 1. dealloc, 2. write to file

we use shared memory that allocated fixed amount of mem and reuse, so the allocated memory will dealloc at end of inference

inference workload : no need to write at file

DonghakPark · 2025-03-04T04:24:10Z

Has the formatting style changed? I notice some differences despite no code modifications.

Has the formatting style changed? --> No! Sorry for confuse there some conflicts on my local machine
i will fix formatting issues

dkjung

LGTM

Update FSU forwarding logic - FSU will handle look ahead tensor inside of pool - so we don't need to call Loadtensor for f + i **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Add memory ptr for allocate shared mem - add mem_ptr - add unmap - array for manage unmapped ptr **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

I have changed the method from using dynamic memory allocation to using static memory allocation. In order to prevent multiple frees, I added a map to check whether the mem_address has already been processed. Previously, memory was allocated through buf, but now it is being allocated directly. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Co-authored-by: jijoong.moon <[email protected]> Signed-off-by: Donghak PARK <[email protected]>

make neuralnet can pass path to the swap_device & weight offset (file offset) it can make calculate weight file's offset **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Co-authored-by: hyeonseok <[email protected]> Signed-off-by: Donghak PARK <[email protected]>

Apply Shared mem & FSU - when inference mode : read from weight bin ( weight offset ) - when train mode : same logic with swap **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Fix Unittest Fail bug at Training Case Swap - There are some issue on PutBuffer that can not free ptr **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Apply clang format at changed File **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Update FSU Unitter - For now, we should set our weight & input size as pagesize * N - For later i will add Page Align Algorithm **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

DonghakPark · 2025-03-11T08:32:48Z

This PR include at #3004. --> close

DonghakPark requested review from EunjuYang, SeoHyungjun, again4you, anyj0527, baek2sm, djeong20, dkjung, gichan-jang, haehun, jaeyun-jung, jihochu, jijoongmoon, leemgs, lhs8928, myungjoo, skykongkong8, songgot and wooksong as code owners February 25, 2025 09:44

This was referenced Feb 25, 2025

[FSU] Apply Static Memory allocation at FSU(Swap) #2949

Closed

[Wait For #2906] [FSU] file offset #2927

Closed

github-actions bot added the Need Review label Feb 25, 2025

DonghakPark mentioned this pull request Feb 25, 2025

[FSU] Update File Load for FSU & Static Memory Use #2957

Closed

DonghakPark added the DO NOT MERGE label Feb 25, 2025

DonghakPark force-pushed the FSU_with_shared_Mem branch 3 times, most recently from ec044b7 to d328785 Compare February 26, 2025 06:44

DonghakPark removed the DO NOT MERGE label Feb 26, 2025

DonghakPark force-pushed the FSU_with_shared_Mem branch 3 times, most recently from e41af8c to fb045da Compare February 27, 2025 00:41

SeoHyungjun reviewed Mar 4, 2025

View reviewed changes

nntrainer/graph/network_graph.cpp Outdated Show resolved Hide resolved

haehun reviewed Mar 4, 2025

View reviewed changes

DonghakPark added the DO NOT MERGE label Mar 4, 2025

DonghakPark force-pushed the FSU_with_shared_Mem branch from fb045da to ab147da Compare March 4, 2025 05:22

DonghakPark removed the DO NOT MERGE label Mar 4, 2025

DonghakPark self-assigned this Mar 4, 2025

dkjung approved these changes Mar 7, 2025

View reviewed changes

This comment was marked as duplicate.

Sign in to view

DonghakPark and others added 7 commits March 11, 2025 10:09

[Formatting] Apply clang-format

72a2c22

Apply clang format at changed File **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

DonghakPark force-pushed the FSU_with_shared_Mem branch from ab147da to ef465db Compare March 11, 2025 01:10

DonghakPark force-pushed the FSU_with_shared_Mem branch from ef465db to 69b8783 Compare March 11, 2025 01:39

DonghakPark mentioned this pull request Mar 11, 2025

[FSU] Apply FSU on CPU with Shared & memcpy-free #3004

Merged

DonghakPark closed this Mar 11, 2025

DonghakPark deleted the FSU_with_shared_Mem branch March 20, 2025 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSU] Inference FSU with Shared memory #2969

[FSU] Inference FSU with Shared memory #2969

Uh oh!

DonghakPark commented Feb 25, 2025

Uh oh!

SeoHyungjun left a comment

Uh oh!

Uh oh!

haehun Mar 4, 2025

Uh oh!

DonghakPark Mar 4, 2025

Uh oh!

DonghakPark commented Mar 4, 2025

Uh oh!

dkjung left a comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

DonghakPark commented Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FSU] Inference FSU with Shared memory #2969

[FSU] Inference FSU with Shared memory #2969

Uh oh!

Conversation

DonghakPark commented Feb 25, 2025

Uh oh!

SeoHyungjun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

haehun Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

DonghakPark Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

DonghakPark commented Mar 4, 2025

Uh oh!

dkjung left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

DonghakPark commented Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants