-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate internal changes #589
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e3e14ef
to
abd985e
Compare
pranavm-nvidia
approved these changes
Mar 26, 2025
702c03d
to
2c628d7
Compare
This change integrates the following internal changes from oldest to newest: ``` -- 1e0da2f0c6ee3fee2b3a63f224bbc8a85850b75e by Chris Bate <[email protected]>: [tensorrt] Fix build for TensorRT 8.6 Fixes issues in plugin utilities that caused build breakage for TRT 8.6 for going on 4 weeks. Partially reverts 44ccb0370f3435b3f3366dfc534e7d17a0f2dc4e (just the testing portions, we don't lose much) to restore everything to working order. -- 4791ad391a765ce55a3e1a7fc0ef704a9c767028 by Chris Bate <[email protected]>: [tensorrt] Add change missing from 1e0da2f0 Adds test changes missing from TRT 8 build fix change. -- f63b24fcfd3f47ee1c5081ebf524b6b51df004fe by Chris Bate <[email protected]>: [tensorrt] Fix TRT 10.9 plugin-related errors and warnings Fix misc issues related to TRT 10.9-conditional code for the new plugin types introduced. -- 39005ce88949b3aaa08e7497c1f325de78fd5ffa by Chris Bate <[email protected]>: [compiler] Remove deprecated APIs Removes APIs from Python/C for creation of explicit objects that model the "StablehloToExecutableOptions" and the corresponding pipeline (this was the initial API for "StableHLOToExecutable" before we realized it obviously doesn't scale to more tasks/pipelines). Removal of these APIs also lets us delete some ancillary C++ code. -- ff52dce59cd2d026423ce196501df660505af73c by Chris Bate <[email protected]>: [compiler] refactor implementation of top-level pipeline options Previously, we used a custom `llvm::cl::SubCommand` subclass called "OptionsContext" to implement classes/structs that encapsulated pipeline options and could be printed/parsed using the llvm CL infrastructure. The OptionsContext had the feature of allowing creation of an "adaptor" that could be bound to classes that didn't depend on the CL classes at all (essentially, using all options with external storage), which is some that MLIR's builtin "mlir::detail::PassOptions|mlir::PassPipelineOptions" does not support. However, the use of this specific feature/mechanism dwindled, and use of a class other than that derived from `mlir::detail::PassOptions` has some big downsides -- if you want to use your class with the MLIR pass infrastructure to manage options, then we had to create a convoluted adaptor struct that would produce a PassPipelineOptions on the fly. It was very convoluted and had some major limitations (no support for enum options in the conversion adaptor, for example). I bit the bullet and just ported the existing infrastructure to be based on `mlir::detail::PassOptions` (which is a subclass of `llvm::cl::SubCommand`) instead of having a custom llvm::cl::SubCommand class. To handle "debug options" (e.g. all those flags corresponding to MLIR's global flags for IR printing, crash reproducers, timing, etc), I added an option to optionally attach these flags to the pipeline's options, allowing one to avoid use of any global CL options if desired. Previously, we always had these options but "hid" them so that their descriptions wouldn't appear in CLI tools. We could in the future also do something like XLA's `LLVMCommandLineOptionsLock`, and let the compilation task entrypoint functions (e.g. `client.get_compilation_task`) provide a different set of options to be parsed using the global LLVM CL subcommand. -- 9a305370418e284a27edd0401a8ed822fda0d132 by Christopher Bate <[email protected]>: [runtime] Lua TRT module: allow multi-dim shape tensors TensorRT has support for multi-dimensional shape tensors. Remove the assertion that shape tensors have rank-1. There's no harm in removing the assertion because TensorRT will issue an error at build time if it a shape tensor violates some type constraint. Predicating the assertion based on loaded TRT version is also unnecessary since running with a mismatched TRT runtime/compile-time version is not supported outside of whatever guarantees TensorRT makes. Also ensures that support for shape tensors of various data types is future-proofed. -- e77c68c7be74311c542b0e05d449634e0a668dca by Christopher Bate <[email protected]>: [compiler] Dialect/Plan: improve 'plan-eliminate-shape-ops' pass This change fixes the logic in 'plan-eliminate-shape-ops' which was not correctly checking the callers of a function before removing unused arguments. It also adds support for 'func.call' caller in addition to the tensorrt call operations. We only recently observed bugs associated with this because previous test cases all happened to avoid dead arguments in shape tensor functions. -- 41b25b50d20e7447f797ee6a9d68350f0bf12abf by Christopher Bate <[email protected]>: [compiler] Improve 'stablehlo-scalar-to-arith' support for 'stablehlo.reduce' The converter should have allowed converting any 'stablehlo.reduce' operation that reduces the entire tensor. This is fixed and additional tests are added. -- b4024a211652fb0c24f570a05441fd83e45b0993 by Christopher Bate <[email protected]>: [compiler] Improve conversion and folding of 'stablehlo.scatter' This change improves our ability to convert 'stablehlo.scatter' to TensorRT. Previously, we were too restrictive in recognizing scatter ops that have a semantic corresponding to ONNX/TensorRT ScatterND. Furthermore, Stablehlo upstream contains no routines that simplify stablehlo.scatter or recognize scatters that are trivial no-ops. This change adds a pattern that recognize when a scatter is a slice insertion that completely overwrites the source tensor. This op's result can just be replaced by the (possibly reshaped) update tensor. -- 1f7187668ef5af3d633927ad0cf648dee25bea60 by Christopher Bate <[email protected]>: [tensorrt] If shape tensor estimation was incorrect, don't set value bounds Sometimes we may incorrectly predict a tensor is a shape tenosr, but TensorRT thinks it is an execution tensor. When this occurs, we emit a warning. Previously we also tried to set value bounds on the execution tensor. This causes TensorRT to also emit a warning. However, when the tensor is larger than the max shape tensor size, it can also cause TensorRT to crash. This change ensures that we don't set the bounds, which was the original intended behavior. -- 4a12de1a0b851601467e27571223bd92ababed57 by Christopher Bate <[email protected]>: [compiler] NFC: Fix duplicate instrumentation added Fixes duplicate pass pipeline instrumentation setup occuring in the StablehloToExecutable and CompilationTaskBase constructors; it should only occur in the base class. -- 2de0f76d609b463e1beaab72a72ee39c9ca2e41c by Chris Bate <[email protected]>: [tensorrt] Adds an option to workaround TRT limitations by forcing slice parameters in-bounds Adds option 'force-default-slice-in-bounds' to the 'tensorrt-apply-bug-wars' pass. This updates the dynamic offset and dynamic size operands all slice with "default" mode (no OOB access allowed) to clamp offsets to zero when negative and clamp the size to the shape of the input tensor. This works around a limitations of TRT where, under certain conditions, it will fail if we don't have very specific bounds for input shape tensors. -- a3c52b09859e42c53d784c1c7bd74f018ca77c4c by Christopher Bate <[email protected]>: [compiler] Don't analyze nested modules in alloc-tensors There are two locations where we perform bufferization analysis in `plan-alloc-tensors`. In the last update which introduced improved Plan dialect bufferization routines, I forgot to update one call. This doesn't have a functional effect other than preventing the analysis from uselessly analyzing potentially nested modules. -- ff915bd515ca532985eed6e1469d10708b03049c by Chris Bate <[email protected]>: [compiler] Fix incorrect assumption in modification of 'scf.while' In 'plan-alloc-tensors', we perform empty tensor elimination in the bodies of loops and attempt to establish DPS connectivity to the loop-carried arguments. In this process, we had an incorrect assumption that the yielded values of the 'scf.while' "before" region were of the same size as the region arguments. This is not true, since they can differ in number. This change adds a small procedure to perform best effort mapping of yielded values to the region arguments that should be used for empty tensor elimination. -- f62e7df1ca41684c10b8901bf280d33d95ab6de2 by Chris Bate <[email protected]>: [tensorrt] Update TRT "layer name" translations In the TensorRT NvInfer API, each created layer can be given a name. Previously, we used a combination of the MLIR op name and the printed form of the MLIR location. However, because MLIR locations can recursively refer to other locations, this caused us to generate large layer names for code coming from JAX, and TensorRT would emit a warning about exceeding the character limit. This change translates the locations while avoiding printing recursive locations and truncates layer names as if they beyond the character limit. Note that this feature of setting the layer name based on the MLIR op name and location information is purely for debugging and hasn't really proven to be very useful. -- 3185b4b4e4e0692e1772361394e9669df8085dde by Zixin Huang <[email protected]>: [compiler] Fix SerializationBitWidth computation This MR fixes the SerializationBitWidth computation. Test case is added in `executor/test/Translation/translate-constant-resource.mlir` -- 90b9ad5784fe6306b6a625d2c6b101a40b11a837 by Chris Bate <[email protected]>: [executor] Rename 'executor.constant_resource' to 'executor.data_segment' In preparation for the refactoring of how we manage global variables and global initialization in the Executor dialect/runtime, this change renames 'executor.constant_resource' to 'executor.data_segment'. This is done in order to better reflect the purpose. The 'executor.data_segment' operation will gain the ability to represent mutable, runtime allocated memory regions in a future commit, while currently it only represents read-only binary data encoded into the program. -- 7df71589bb68738370fbd949c24e7517d1ddb3fb by Chris Bate <[email protected]>: [executor] NFC: Rename Constant -> DataSegment in Executable schema -- 1712da1da26a3b1f3a3cd1f51a2827f30f476383 by Chris Bate <[email protected]>: [executor] Add new fields to DataSegmentOp Adds additional fields to the DataSegmentOp to indicate access type, whether the segment is uninitialized, and the required alignment. Previously the op was only used as a way to store constant data, uses are updated appropriately. -- 15f62a72dd27ca3d41a9557f1c521e3b3f0d023c by Christopher Bate <[email protected]>: [compiler] Dialect/Plan: Don't modify nested modules in plan-alloc-tensors This fixes an issue where the final pattern set applied in the `plan-alloc-tensors` pass was not restricted to functions in the top-level module scope. -- 17d04aa2dccf89310fe929f31f86a1e92ed0e952 by Chris Bate <[email protected]>: Integrate LLVM-Project @ 729416e586fba71b4f63d71b1b5c765aefbf200b This change upgrades our llvm-project and llvm-solid dependencies by bumping ~7 days forward from our last rebase point. -- e6689fd28afac42239aa277501738ba8a21c2866 by Christopher Bate <[email protected]>: [executor] Fix dyn_cast -> dyn_cast_if_present if type conversion In target materializations, the "originalType" may be null. Discovered in non-standard use of the ExecutorTypeConverter in a downstream project. -- 2459e7c9a948578adee9ab131e0318d397c5325c by Chris Bate <[email protected]>: [compiler] Improve joint bufferization of nested modules This change updates 'plan-module-bufferize' so that it bufferizes inner modules prior to outer modules. In addition, it saves the final `FuncAnalysisState` of each bufferized module into a cache. This enables outer modules that call into inner modules to access the appropriate function analysis state required for giving more refined answers to bufferization interface queries. -- 7f7a7338819f0276a7a8574b1091bfe8ff5e59a1 by Chris Bate <[email protected]>: [compiler] Move cluster kind definitions to `compiler/lib/Backends` This moves the cluster kind definitions to `compiler/lib/Backends` to make the Plan dialect more self-contained. Previously, it was difficult for developers to discover where the definitions of each "ClusterKind" were located, since we had this notion of "builtin" cluster kinds which were located in `compiler/lib/Dialect/Plan/IR/BuiltinClusterKinds.cpp`. We now no longer have "builtins" and the cluster/backend kinds TensorRT and Host are now located under the "Backends" directory. They are extensions to the Plan dialect instead of being "builtin". This also makes some Plan dialect transformations easier to grok. The CreateClosedRegions and OutlineClusters passes no longer have special logic for handling TensorRT or Host cluster kinds. Instead, that logic is now moved to be colocated with the backend definitions. Some minor improvements were made to the ClusterKindAttrInterface to support moving the logic. -- 415792e81123f784948af29b896da23e660e48fd by Chris Bate <[email protected]>: [compiler] Further improve Plan cluster backend organization Finally, this change adds a `plan::InputKind` attribute to the Plan dialect to indicate the category of input IR that is entered into the Plan segmentation pipeline. This information can be passed along to the backends via the `plan::ClusterKindAttr` interface methods, and different backends can indicate whether they support a given input kind and change their behavior based on the input kind. This further allows us to begin removing the tight coupling of Stablehlo to the Plan dialect transformations. --- 1d99e7b9addbf00e745aca6b0c6b07a6daf912d5 by Chris Bate <[email protected]>: Fix requiring CUDA for some translation tests Fixes issue where CUDA is initialized for `executor-runner` even if user is just using `--dump-function-signatures` to inspect the executable instead of running it. GitOrigin-RevId: 1d99e7b9addbf00e745aca6b0c6b07a6daf912d5
068be9c
to
95ec8d0
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change integrates the following internal changes from oldest
to newest:
GitOrigin-RevId: ac4be0cd2e515cf3762f579bbcd27a7a023a4932