Skip to content

Support very long string as benchmark arguments #1248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 16, 2020

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Sep 17, 2019

Fixes #1431

/cc @eerhardt

if (methodName.Length <= MaxBenchmarkNameLength)
return methodName;

return methodName.Substring(0, MaxBenchmarkNameLength);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that such an approach may lead to the same "BenchmarkName"s for different benchmarks. Can it be a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch! In such case, we are going to overwrite the trace file.

This is obviously far from perfect. I could generate some unique file name (using guid for example) but it would make it harder for the end-user to connect the trace file name with the benchmark.

@eerhardt let's say that you want to run following benchmark and use EtwProfiler to profile it:

public IEnumerable<object> Arguments()
{
    yield return new string('a', 200_000);
    yield return new string('a', 200_000 - 1) + "b";
}

[Benchmark]
[ArgumentsSource(nameof(Arguments))]
public void Some(string value)

We can't use full benchmark name with the super long string as a trace file name and we can not take just the first part of the string because the file name would be the same for two benchmarks.

Would using a guid in such case as a trace file name be suprising to the end user?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest keeping the beginning of the name the same, so I could tell that these 2 benchmarks were Some benchmarks. But then appending the Guid to the end would be understandable to me.

So either leaving out the args all together:

MyNamespace.MyClass.MyMethod.NewGuid

or chopping the arguments NewGuid characters before the max, and then appending NewGuid to the end.

MyNamespace.MyClass.MyMethod.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaNewGuid

I don't think purely using a Guid for the whole name would be understandable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be nice if generate the same name for the same benchmarking. I'm suggesting to use a hashcode of the whole string at the end of the "shortified" name instead of a random GUID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be nice if generate the same name for the same benchmarking. I'm suggesting to use a hashcode of the whole string at the end of the "shortified" name instead of a random GUID

This is a good suggestion, however, afaik in .NET Core, the hash codes are the same for given string value only for current program lifetime. If you close the app and run it again, then the value is different (afaik for some security reasons)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility would be to use a different hash algorithm, ex. SHA or MurmurHash.

We use Murmur hashing in ML.NET if you want to copy the code:

https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Core/Utilities/Hashing.cs

@adamsitnik
Copy link
Member Author

@AndreyAkinshin @eerhardt it's been a while but the PR is now ready for review.

@adamsitnik adamsitnik added this to the v0.12.2 milestone Apr 27, 2020
@eerhardt
Copy link
Member

// file copied from https://github.com/dotnet/runtime/blob/master/src/libraries/Common/tests/System/IO/PathFeatures.cs

You probably want to retain the copyright on this and Hashing.cs files.


Refers to: src/BenchmarkDotNet/Extensions/PathFeatures.cs:1 in 183927e. [](commit_id = 183927e, deletion_comment = False)

eerhardt
eerhardt previously approved these changes Apr 27, 2020
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (for what its worth).

# Conflicts:
#	tests/BenchmarkDotNet.IntegrationTests/ArgumentsTests.cs
Copy link
Member

@AndreyAkinshin AndreyAkinshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you specify the source of a copied file like

// file copied from https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Core/Utilities/Hashing.c

it's better to reference a specific commit or a specific tag. The source code in the master branch can be changed at any moment; it may complicate tracking the origin of the copied source code.

@adamsitnik
Copy link
Member Author

it's better to reference a specific commit or a specific tag

done

@AndreyAkinshin AndreyAkinshin merged commit 59080cd into master Jul 16, 2020
@AndreyAkinshin AndreyAkinshin deleted the longStringAsArguments branch July 16, 2020 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System.IO.FileNotFoundException with EtwProfiler
3 participants