Skip to content

Add a property to disable prefetch read#18669

Open
gp1314 wants to merge 2 commits intoAlluxio:master-2.xfrom
gp1314:dev/gup/add_prefetch_read_config
Open

Add a property to disable prefetch read#18669
gp1314 wants to merge 2 commits intoAlluxio:master-2.xfrom
gp1314:dev/gup/add_prefetch_read_config

Conversation

@gp1314
Copy link
Contributor

@gp1314 gp1314 commented Aug 8, 2024

What changes are proposed in this pull request?

  • We queried alluxio via impala, MaxDirectMemorySize was configured with 4G, but OOM appeared very quickly
  • impala caches a large number of file descriptors and also reads data from these files
  • I found that after reading the alluxio worker only once, the worker may send 64M (block size) of data to the client, causing the client to cache a large amount of data that the client may not use
  • how much data to send is not controlled by the client, the worker controls how many chunks to send at a time through grpc's isReady and tooManyPendingChunks
  • isreaby and tooManyPendingChunks are not || but && . I don't understand why they are set up so that if isreaby doesn't return false for a read, If the tooManyPendingChunks limit is reached, it is possible to continue sending data to the client
  • In fact, in my environment the worker sends 20 chunks (chunk size 1M) to the client
  • I want to add a configuration to turn off the aforementioned preread multiple chunks

Why are the changes needed?

  1. Add a property alluxio.user.prefetch.read.enabled , client set false, the client only receives data of a certain size
  2. Related issue: impala client OutOfDirectMemoryError when querying alluxio #18118

Does this PR introduce any user facing changes?

Please list the user-facing changes introduced by your change, including

  1. Add a property alluxio.user.prefetch.read.enabled, default is true, maintain original logic
  2. If alluxio.user.prefetch.read.enabled set false, the client only receives data of a chunk size

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: FAIL
    • It looks like your commits can't be linked to a valid Github account.
      Your commits are made with the email gupeng@gupengdeMacBook-Pro.local, which does not allow your contribution to be tracked by Github.
      See this link for possible reasons this might be happening.
      To change the author email address that your most recent commit was made under, you can run:
      git -c user.name="Name" -c user.email="Email" commit --amend --reset-author
      See this answer for more details about how to change commit email addresses.
      Once the author email address has been updated, update the pull request by running:
      git push --force https://github.com/gp1314/alluxio.git dev/gup/add_prefetch_read_config

Some checks failed. Please fix the reported issues and reply
alluxio-bot, check this please
to re-run checks.

@gp1314 gp1314 force-pushed the dev/gup/add_prefetch_read_config branch from d38f9f3 to 8858012 Compare August 8, 2024 07:27
@gp1314
Copy link
Contributor Author

gp1314 commented Aug 8, 2024

alluxio-bot, check this please

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: FAIL
    • It looks like your commits can't be linked to a valid Github account.
      Your commits are made with the email gupeng@gupengdeMacBook-Pro.local, which does not allow your contribution to be tracked by Github.
      See this link for possible reasons this might be happening.
      To change the author email address that your most recent commit was made under, you can run:
      git -c user.name="Name" -c user.email="Email" commit --amend --reset-author
      See this answer for more details about how to change commit email addresses.
      Once the author email address has been updated, update the pull request by running:
      git push --force https://github.com/gp1314/alluxio.git dev/gup/add_prefetch_read_config

Some checks failed. Please fix the reported issues and reply
alluxio-bot, check this please
to re-run checks.

@gp1314 gp1314 force-pushed the dev/gup/add_prefetch_read_config branch from 8858012 to 9449fe1 Compare August 8, 2024 08:08
@gp1314
Copy link
Contributor Author

gp1314 commented Aug 8, 2024

alluxio-bot, check this please

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: FAIL
    • It looks like your commits can't be linked to a valid Github account.
      Your commits are made with the email gupeng@gupengdeMacBook-Pro.local, which does not allow your contribution to be tracked by Github.
      See this link for possible reasons this might be happening.
      To change the author email address that your most recent commit was made under, you can run:
      git -c user.name="Name" -c user.email="Email" commit --amend --reset-author
      See this answer for more details about how to change commit email addresses.
      Once the author email address has been updated, update the pull request by running:
      git push --force https://github.com/gp1314/alluxio.git dev/gup/add_prefetch_read_config

Some checks failed. Please fix the reported issues and reply
alluxio-bot, check this please
to re-run checks.

@gp1314 gp1314 force-pushed the dev/gup/add_prefetch_read_config branch from 9449fe1 to 56c59ff Compare August 8, 2024 08:46
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: PASS

All checks passed!

@gp1314
Copy link
Contributor Author

gp1314 commented Sep 19, 2024

@lucyge2022 Could you please help core review it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants