Skip to content

Fix for issue 20371 related to deep depencies #20392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fgarciacorona
Copy link

@fgarciacorona fgarciacorona commented Feb 19, 2025

The goal of this PR is to fix #20371 and improve the performance of csharp runtime when dealing with deep nested dependencies of .proto files.

  • The fix consists in caching the recursive calls for the search of extensions.

  • A test dataset has been created with deeply nested .proto files that showcases the performance penalty.

  • A set of benchmark metrics have been created to evaluate the performance impact before and after the fix.

  • Unfortunately due to not being able to unload all the existing objects inside the same testcase, it is only possible to run the test before and after enabling the caching mechanism. I am open to any other idea on how to improve this.

The test results before and after the fix are the following:
Before the proposed fix:

FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString
   Source: DescriptorsTest.cs line 40
   Duration: 2,1 min

  Message: 
  Expected: 399
  But was:  76306932


  Stack Trace: 
DescriptorsTest.FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString() line 61
1)    at Google.Protobuf.Reflection.DescriptorsTest.FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString() in D:\Git\protobuf\csharp\src\Google.Protobuf.Test\Reflection\DescriptorsTest.cs:line 61

  Standard Output: 
Running performance test for extension registry caching: With caching
{ }
GetAllExtensionsCount: 402
GetAllGeneratedExtensionsCount: 573
GetAllDependedExtensionsCount: 76306932
GetAllDependedExtensionsFromMessageCount: 644966683
TotalReturnedExtensionsCount: 118
w/ cache elapsed: 00:02:04.6190917

After the proposed fix:

 FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString
   Source: DescriptorsTest.cs line 40
   Duration: 117 ms

  Standard Output: 
Running performance test for extension registry caching: With caching
{ }
GetAllExtensionsCount: 402
GetAllGeneratedExtensionsCount: 573
GetAllDependedExtensionsCount: 399
GetAllDependedExtensionsFromMessageCount: 39
TotalReturnedExtensionsCount: 118
w/ cache elapsed: 00:00:00.0700036

@fgarciacorona fgarciacorona requested a review from a team as a code owner February 19, 2025 10:51
@fgarciacorona fgarciacorona requested review from jskeet and removed request for a team February 19, 2025 10:51
Copy link

google-cla bot commented Feb 19, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@jskeet jskeet removed their request for review February 19, 2025 11:05
@tonyliaoss
Copy link
Member

Hello @fgarciacorona, we won't be able to start reviewing your PR until the CLA has been signed.

Also, I suspect that we won't accept the newly added test protos, but we can discuss that after we start our review process.

@fgarciacorona
Copy link
Author

fgarciacorona commented Feb 25, 2025

Hello @fgarciacorona, we won't be able to start reviewing your PR until the CLA has been signed.

Also, I suspect that we won't accept the newly added test protos, but we can discuss that after we start our review process.

@tonyliaoss CLA has been passed! I cannot modify the wait for user action label

@fgarciacorona
Copy link
Author

@tonyliaoss Is there any additional action required from my side to have the PR marked as safe for test and proceed with the testing of the PR? Additionally, I think it is missing the C# label.

@fgarciacorona fgarciacorona requested review from a team as code owners March 5, 2025 14:14
@fgarciacorona fgarciacorona requested review from JasonLunn, haberman, dmaclach, googleberg and Logofile and removed request for a team March 5, 2025 14:14
@JasonLunn
Copy link
Contributor

JasonLunn commented Mar 5, 2025

Could you rebase this PR? Github thinks it is changing almost 800 files which makes review impractical.

Edit, nevermind about the rebase. I see what @tonyliaoss means about the test protos, though. Is there no other way to excite the issue that this PR aims to fix?

@JasonLunn JasonLunn added c# wait for user action 🅰️ safe for tests Mark a commit as safe to run presubmits over labels Mar 5, 2025
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 5, 2025
@fgarciacorona
Copy link
Author

Could you rebase this PR? Github thinks it is changing almost 800 files which makes review impractical.

Edit, nevermind about the rebase. I see what @tonyliaoss means about the test protos, though. Is there no other way to excite the issue that this PR aims to fix?

Right, there are 3 files that have actually been changed and the rest are half of them .proto and the other half the .cs generated classes from the protos.

I could only think of dynamically generating .proto files highly nested from within the testcase (if you want to reduce the number of existing files in the PR) and then somehow try to run protoc to generate the corresponding classes.

@fgarciacorona fgarciacorona marked this pull request as draft March 11, 2025 10:20
@fgarciacorona fgarciacorona force-pushed the fix_20371_csharp_deep_dependencies branch 2 times, most recently from 3515086 to a521e55 Compare March 11, 2025 11:22
@fgarciacorona
Copy link
Author

@JasonLunn how could I have the repo marked as "safe for tests" without having to request it for every commit?

@JasonLunn
Copy link
Contributor

@JasonLunn how could I have the repo marked as "safe for tests" without having to request it for every commit?

Humans have to apply the tag by policy for third party contributions to make sure they're safe to run on our CI infrastructure.

@JasonLunn JasonLunn added 🅰️ safe for tests Mark a commit as safe to run presubmits over and removed wait for user action labels Mar 12, 2025
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 12, 2025
@JasonLunn
Copy link
Contributor

Failed FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString [263 ms]
  Error Message:
     Expected: 402
  But was:  419

  Stack Trace:
     at Google.Protobuf.Reflection.DescriptorsTest.FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString() in D:\a\protobuf\protobuf\csharp\src\Google.Protobuf.Test\Reflection\DescriptorsTest.cs:line 59

1)    at Google.Protobuf.Reflection.DescriptorsTest.FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString() in D:\a\protobuf\protobuf\csharp\src\Google.Protobuf.Test\Reflection\DescriptorsTest.cs:line 59

Interestingly, this only fails on the Windows test configuration, but it passed on Linux...

Improve performance of Extension search

Profiling/Benchmarking code still included and enabled !

Add assertions to unit test and enable caching by default

Added .proto files based on a deeper hierarchy
@fgarciacorona fgarciacorona force-pushed the fix_20371_csharp_deep_dependencies branch from a521e55 to 5bb0e0a Compare March 14, 2025 15:45
@JasonLunn JasonLunn added 🅰️ safe for tests Mark a commit as safe to run presubmits over and removed wait for user action labels Mar 14, 2025
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 14, 2025
@fgarciacorona
Copy link
Author

Failed FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString [263 ms]
  Error Message:
     Expected: 402
  But was:  419

  Stack Trace:
     at Google.Protobuf.Reflection.DescriptorsTest.FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString() in D:\a\protobuf\protobuf\csharp\src\Google.Protobuf.Test\Reflection\DescriptorsTest.cs:line 59

1)    at Google.Protobuf.Reflection.DescriptorsTest.FileDescriptor_CreateMessageWithDeepDependencies_BuildFromByteString() in D:\a\protobuf\protobuf\csharp\src\Google.Protobuf.Test\Reflection\DescriptorsTest.cs:line 59

Interestingly, this only fails on the Windows test configuration, but it passed on Linux...

@JasonLunn the reason why it was not failing in linux and arm64 is because the metrics for benchmarking are only generated for Debug (windows) but not for Release (linux and arm64).

Additionally we were storing the benchmark metrics in Debug mode in a static variable that was impacted by all unit tests. This needs to be reset before the specific test that looked at them.

@fgarciacorona fgarciacorona marked this pull request as ready for review March 14, 2025 16:07
@fgarciacorona
Copy link
Author

@JasonLunn would it be possible to remove the assigned reviewers and I will do a rebase and hopefully based on that the right reviewers will be assigned? I think because I didn't rebase it properly the first time a lot of commits from main triggered so many people.

@JasonLunn
Copy link
Contributor

It would expedite review if you could refactor the PR so that there are not ~800 files. I assume you have a script that generated the deeply nested test .proto files in the first place - could you use that in a genrule to generate the test input files at test time so that they don't have to be individually reviewed?

@fgarciacorona
Copy link
Author

fgarciacorona commented Mar 28, 2025

It would expedite review if you could refactor the PR so that there are not ~800 files. I assume you have a script that generated the deeply nested test .proto files in the first place - could you use that in a genrule to generate the test input files at test time so that they don't have to be individually reviewed?

The script that we have anonymized our data, so the proto files are based on actual data. But thanks for the feedback, knowing that a generator could also be used, I will focus on that.

Question though: do the generated .cs files need to be also committed? because I see them committed in the repo for other test .proto files, and that would eventually defeat the purpose (we would have ~400 only). Additionally if we don't commit the generated .cs files I am not sure whether it would be an accepted approach to have the test project not able to build if the test classes have not been generated.

If we can assume that every developer will always run generate_protos.sh script before starting development then the test csharp classes will be generated.

@Logofile Logofile removed their request for review April 2, 2025 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Runtime performance degradation for highly nested depending csharp files
3 participants