Skip to content

Conversation

@saiarthiraguram
Copy link
Contributor

@saiarthiraguram saiarthiraguram commented Jun 20, 2025

Ticket

Fixes: Link to Github Issue

Problem description

dev-requirements - contains model based requirements
core-requirements - contains compiler based requirements

Changes

  1. Added Compiler only dependencies to core-requirements
  2. Added other requirements to dev-requirements - Will split this to model based requirements in future
  3. Added sanity tests for each frameworks to be tested on slim docker.
  4. Adding a slim docker with only compiler based requirements installed
  5. Having a dev docker with all requirements installed

To-Do

  • Raise model based requirements separately for each model.

@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from e9bda84 to e1fdf50 Compare June 20, 2025 11:34
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch 2 times, most recently from 41edac3 to 63606a8 Compare June 20, 2025 11:40
@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2025

Codecov Report

❌ Patch coverage is 73.11828% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.39%. Comparing base (7d35604) to head (a69f5b2).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
forge/csrc/test/passes/gtest_main.cpp 73.11% 25 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2364      +/-   ##
==========================================
+ Coverage   62.33%   62.39%   +0.06%     
==========================================
  Files         157      157              
  Lines       13362    13451      +89     
==========================================
+ Hits         8329     8393      +64     
- Misses       5033     5058      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch 5 times, most recently from 2f5f4eb to 1af0d82 Compare June 23, 2025 09:17
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from 1af0d82 to 7db3ee1 Compare June 24, 2025 04:42
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch 2 times, most recently from ca93cb5 to d428984 Compare July 2, 2025 11:11
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from d428984 to b95d954 Compare July 3, 2025 05:15
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from 97083e8 to 30f95cf Compare July 3, 2025 05:41
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from 30f95cf to 4a62e1b Compare July 3, 2025 05:45
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch 3 times, most recently from 9e9cbf3 to 5a73c58 Compare July 3, 2025 09:13
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from 5a73c58 to a994bd5 Compare July 3, 2025 14:36
- Cherry-picked setup.py fix (commit 171c063ab) onto main commit 19ee67f674b2ea05e38531162f49885dcf4ad9d4
- This resolves the SameFileError during tt_tvm wheel build
- Maintains compatibility with main branch expectations
- Cherry-picked setup.py fix (commit 171c063ab) onto main commit 25c5feec112a33f5dfba1916b6e8a50ea9544bf5
- This resolves the SameFileError during tt_tvm wheel build
- Maintains compatibility with main branch expectations
- Only TVM submodule updated
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from 5d41948 to 354a07c Compare August 18, 2025 06:25
Resolves GitHub conflict detection while preserving TVM setup.py fix.
TVM submodule updated to include SameFileError fix on latest main base.
- Use commit 25c5feec112a33f5dfba1916b6e8a50ea9544bf5 from main TVM repository
- This commit is accessible by GitHub Actions CI
- Previous custom commit 533f92d3b was not pushed to remote repository
@saiarthiraguram
Copy link
Contributor Author

Unit tests, Failure update:

**Current Test Results: 15/31 PASSING **

All consistently passing tests:

  1. DecomposeNdReshapeSplitTest.basic_dimension_split_optimization
  2. DecomposeNdReshapeSplitTest.invalid_cases_should_be_skipped
  3. CommuteBroadcastThroughTranspose.commute_broadcast_through_transpose
  4. UpdateReshapeNamedAttrsTest.update_named_attrs
  5. UpdateConcatNamedAttrsTest.update_named_attrs
  6. UpdateReduceSumAttrsTest.ReduceSumDim
  7. UpdateReduceMaxAttrsTest.ReduceMaxDim
  8. EraseInverseOpsSqueezeAndUnsqueeze.erase_inv_ops_sq_unsq
  9. CommuteTransposeThroughReduce.commute_transpose_through_higher_dim_reduce
  10. EraseUnnecessary4DSeqTwoOps.two_operands
  11. EraseUnnecessary4DSeqTwoOps.na1
  12. EraseUnnecessary4DSeqTwoOps.na2
  13. EraseUnnecessary4DSeqThreeOps.three_operands

@saiarthiraguram
Copy link
Contributor Author

saiarthiraguram commented Aug 18, 2025

@nvukobratTT ,
MLIR seg fault issue:

**Problem ** :

  • TensorFlow links its own LLVM with command-line options for StableHLO/quantization
  • TT-MLIR links its own LLVM with command-line options for TT passes
  • Both try to register options in overlapping LLVM registries
  • Iterator corruption occurs when one library's registry is modified while another is iterating

Technical Details:

  • llvm::cl::AddLiteralOption() modifies global LLVM command-line option collections
  • SmallPtrSetIteratorImpl becomes invalid when underlying container is modified during iteration
  • Multiple LLVM static initializations
  • Memory layout corruption leads to invalid pointer dereference (0x100000001)
    planning on finding a way to have split up but with this issue mitigated by installing some problematic requirements in core itself.

Solution added:
Added guard on loads which systematically loads LLVM in a way that seg fault is fixed

@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch 3 times, most recently from ada131f to 0940461 Compare August 18, 2025 15:28
}
catch (...)
{
log_error(LogMLIRCompiler, "Unknown exception during MLIR pass registration");

Check warning

Code scanning / CodeQL

Expression has no effect Warning

This expression has no effect (because
log_error
has no external side effects).

Copilot Autofix

AI 3 months ago

To fix the problem, explicitly cast the call to log_error on line 89 to void. This documents that the value is intentionally ignored and the function is called for its side effect. This change should be made only to line 89 in forge/csrc/passes/mlir_passes.cpp. No imports or additional definitions are required.


Suggested changeset 1
forge/csrc/passes/mlir_passes.cpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/forge/csrc/passes/mlir_passes.cpp b/forge/csrc/passes/mlir_passes.cpp
--- a/forge/csrc/passes/mlir_passes.cpp
+++ b/forge/csrc/passes/mlir_passes.cpp
@@ -86,7 +86,7 @@
     }
     catch (...)
     {
-        log_error(LogMLIRCompiler, "Unknown exception during MLIR pass registration");
+        (void)log_error(LogMLIRCompiler, "Unknown exception during MLIR pass registration");
         throw;
     }
 }
EOF
@@ -86,7 +86,7 @@
}
catch (...)
{
log_error(LogMLIRCompiler, "Unknown exception during MLIR pass registration");
(void)log_error(LogMLIRCompiler, "Unknown exception during MLIR pass registration");
throw;
}
}
Copilot is powered by AI and may make mistakes. Always verify output.
}
catch (...)
{
log_error(LogMLIRCompiler, "Unknown error during MLIR pass registration");

Check warning

Code scanning / CodeQL

Expression has no effect Warning

This expression has no effect (because
log_error
has no external side effects).

Copilot Autofix

AI 3 months ago

To address the warning and clarify that the value of the expression is intentionally ignored (and that the function is called for its side effect), explicitly cast the call to log_error to (void). This is a common idiom in C/C++ to indicate that the return value is intentionally discarded, and it will silence static analysis tools like CodeQL. Only line 117 in forge/csrc/passes/mlir_passes.cpp needs to be changed:

  • Replace:
    log_error(LogMLIRCompiler, "Unknown error during MLIR pass registration");
  • With:
    (void)log_error(LogMLIRCompiler, "Unknown error during MLIR pass registration");

No new imports, methods, or definitions are needed.


Suggested changeset 1
forge/csrc/passes/mlir_passes.cpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/forge/csrc/passes/mlir_passes.cpp b/forge/csrc/passes/mlir_passes.cpp
--- a/forge/csrc/passes/mlir_passes.cpp
+++ b/forge/csrc/passes/mlir_passes.cpp
@@ -114,7 +114,7 @@
             }
             catch (...)
             {
-                log_error(LogMLIRCompiler, "Unknown error during MLIR pass registration");
+                (void)log_error(LogMLIRCompiler, "Unknown error during MLIR pass registration");
                 registration_successful = false;
             }
         });
EOF
@@ -114,7 +114,7 @@
}
catch (...)
{
log_error(LogMLIRCompiler, "Unknown error during MLIR pass registration");
(void)log_error(LogMLIRCompiler, "Unknown error during MLIR pass registration");
registration_successful = false;
}
});
Copilot is powered by AI and may make mistakes. Always verify output.
// Verify registration was successful
if (!registration_successful.load())
{
log_warning(LogMLIRCompiler, "MLIR pass registration was not successful");

Check warning

Code scanning / CodeQL

Expression has no effect Warning

This expression has no effect (because
log_warning
has no external side effects).

Copilot Autofix

AI 3 months ago

To fix the warning, explicitly cast the call to log_warning to (void). This documents that the return value is intentionally ignored and that the expression is used for its side effects (logging). This is a standard idiom in C/C++ to silence such warnings from static analysis tools.
Change:

  • In forge/csrc/passes/mlir_passes.cpp, line 125, change
    log_warning(LogMLIRCompiler, "MLIR pass registration was not successful");
    to
    (void)log_warning(LogMLIRCompiler, "MLIR pass registration was not successful");
    No new imports or definitions are needed.

Suggested changeset 1
forge/csrc/passes/mlir_passes.cpp

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/forge/csrc/passes/mlir_passes.cpp b/forge/csrc/passes/mlir_passes.cpp
--- a/forge/csrc/passes/mlir_passes.cpp
+++ b/forge/csrc/passes/mlir_passes.cpp
@@ -122,7 +122,7 @@
     // Verify registration was successful
     if (!registration_successful.load())
     {
-        log_warning(LogMLIRCompiler, "MLIR pass registration was not successful");
+        (void)log_warning(LogMLIRCompiler, "MLIR pass registration was not successful");
     }
 }
 
EOF
@@ -122,7 +122,7 @@
// Verify registration was successful
if (!registration_successful.load())
{
log_warning(LogMLIRCompiler, "MLIR pass registration was not successful");
(void)log_warning(LogMLIRCompiler, "MLIR pass registration was not successful");
}
}

Copilot is powered by AI and may make mistakes. Always verify output.
std::cout << "Setting up test environment..." << std::endl;

// Step 1: Setup environment variables
setup_llvm_environment();

Check warning

Code scanning / CodeQL

Expression has no effect Warning test

This expression has no effect (because
setup_llvm_environment
has no external side effects).

Copilot Autofix

AI 3 months ago

Copilot could not generate an autofix suggestion

Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.

@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from 0940461 to b0a8017 Compare August 18, 2025 15:46
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from e3ee0cc to c63762c Compare August 19, 2025 11:07
- Update openMeshDevice() call to match new API signature
- meshShape is now passed as first parameter instead of options.meshShape
- Resolves build error caused by TT-MLIR API changes in merge
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from c63762c to ce1334c Compare August 19, 2025 12:19
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from 3311765 to c431091 Compare August 25, 2025 03:52
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch 3 times, most recently from bdb6481 to 8f64472 Compare August 25, 2025 11:35
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from 8f64472 to ac88849 Compare August 25, 2025 11:36
@saiarthiraguram saiarthiraguram force-pushed the sai_arthi_raguram/deps_split branch from dfc798b to a69f5b2 Compare August 26, 2025 07:59
@saiarthiraguram
Copy link
Contributor Author

saiarthiraguram commented Sep 9, 2025

9/09/2025
Based on discussion on call, posting current status to get info on it
Context: Segmentation fault (SegFault) occurs in the CI when executing the test-sub workflow.

  • Debugged which files cause the seg fault by creating a simple test which doesn't import many modules and checking each imports in test files.
  • When commenting out the problematic imports in conftest.py we are able to collect the pytests.

Current Fix in docker env is to

  • Uninstall and reinstall torch, tensorflow, paddlepaddle - fixes the segfault
  • trying to find a more sensible fix that reinstallation in test-sub workflow which defeats the purpose of docker installation

@saiarthiraguram saiarthiraguram mentioned this pull request Oct 10, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Restructure Python requirements to be divided into core and dev requirements

6 participants