Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up ProhibitSurrogateCharactersValidator #9492

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SpecLad
Copy link

@SpecLad SpecLad commented Aug 7, 2024

Description

I've noticed that this validator is using a per-character loop. Replacing it with a regex results in a pretty significant speedup. Here are results from my benchmark:

String length Old implementation time (sec) New implementation time (sec)
1 2.833e-07 1.765e-07
10 5.885e-07 2.030e-07
100 3.598e-06 4.144e-07
1000 3.329e-05 2.463e-06
10000 0.0003338 2.449e-05
100000 0.003338 0.0002284
1000000 0.03333 0.002278
10000000 0.3389 0.02377
100000000 3.250 0.2365

For large strings, the speedups are more than an order of magnitude.

I've noticed that this validator is using a per-character loop. Replacing it
with a regex results in a pretty significant speedup. Here are results from
my benchmark:

    String length      Old implementation    New implementation
                           time (sec)            time (sec)

              1          2.833e-07             1.765e-07
             10          5.885e-07             2.030e-07
            100          3.598e-06             4.144e-07
           1000          3.329e-05             2.463e-06
          10000          0.0003338             2.449e-05
         100000          0.003338              0.0002284
        1000000          0.03333               0.002278
       10000000          0.3389                0.02377
      100000000          3.250                 0.2365

For large strings, the speedups are more than an order of magnitude.
@SpecLad
Copy link
Author

SpecLad commented Aug 7, 2024

For the record, here's the benchmark:

import timeit

from rest_framework.validators import ProhibitSurrogateCharactersValidator

validator = ProhibitSurrogateCharactersValidator()

for i in range(0, 9):
    length = 10 ** i
    string = "a" * length

    timer = timeit.Timer("validator(string)", globals=globals())

    loops, duration = timer.autorange()

    print(f"{length}\t{duration / loops}")

@auvipy auvipy self-requested a review August 8, 2024 08:52
@tomchristie
Copy link
Member

Okay, so this is a good exercise case...

It's a neat and properly defined little PR that's clearly an improvement.
We'd need to be wilfully obtuse to say no. Why might we choose to reject even this properly targeted improvement?

Well... because having a steadfast "no" policy is really uncomplicated, and means we don't have continual PR pressure & creep, which on balance just results in unnecessary busy-work and risk.

I'd suggest we need to be more clear in our language in CONTRIBUTING.md, and just issue a really clear no on essentially all code PRs here. If we're unambiguous about this then we don't have to second guess ourselves.

Suggested language might. be...

REST framework is considered complete. We may accept...

  • Pull requests improving the documentation, or adding third party packages to the documentation.
  • Pull requests that resolve a CVE'd security issue.
  • Pull requests that are strictly required if updated Django versions don't pass some part of our existing test suite.

All other pull requests and issues will be closed.

Yes, there's clearly areas where there's potential performance impacts. There's also plenty of areas where there's loosely defined behaviour. However being really clear that this is essentially a finished project (excepting the points above) would probably be a net benefit.

(The alternative is similar to above, but we allow ourselves some leeway with a "We'll only consider pull requests that meet an unambiguously high bar of quality". 🤔)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants