Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable short circuit IO for Dora by default #16892

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

dbw9580
Copy link
Contributor

@dbw9580 dbw9580 commented Feb 15, 2023

What changes are proposed in this pull request?

Disable short-circuit IO for Dora by default.

Why are the changes needed?

Dora does not support short circuit IO any more.

Does this PR introduce any user facing changes?

Yes, the default value is changed.

JiamingMai and others added 30 commits November 21, 2022 14:09
### What changes are proposed in this pull request?

Remove the restriction of UFS for SDK.

### Why are the changes needed?

To meet the business requirements.

### Does this PR introduce any user facing changes?

No, it doesn't.

pr-link: Alluxio#16570
change-id: cid-cca2be339f42d6035968999f258c04ad76d1b85e
### What changes are proposed in this pull request?
Add property key to enable dora client read location polic

### Why are the changes needed?

enable dora client read location polic

### Does this PR introduce any user facing changes?

addition of property keys

pr-link: Alluxio#16601
change-id: cid-30229fa9aac78359ebdd12b1cb317245016c370c
### What changes are proposed in this pull request?

Implement the skeleton of dora work

### Why are the changes needed?


### Does this PR introduce any user facing changes?

no

pr-link: Alluxio#16615
change-id: cid-e48f5bd1e4611d5e6b300cf9851fce18adb808c1
### What changes are proposed in this pull request?

Add method to get file status.

### Why are the changes needed?

We need to get file status in Dora.

### Does this PR introduce any user facing changes?

No, it doesn't.

pr-link: Alluxio#16627
change-id: cid-d877d3d1240663afc89bc6d6d079bf75a8751711
### What changes are proposed in this pull request?

Please outline the changes and how this PR fixes the issue.

### Why are the changes needed?

Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, describe the bug.

### Does this PR introduce any user facing changes?

Please list the user-facing changes introduced by your change, including
1. change in user-facing APIs
2. addition or removal of property keys
3. webui

pr-link: Alluxio#16632
change-id: cid-6affd9d6cb0a01ea8352bfa7af7731dab49333ae
### What changes are proposed in this pull request?

Please outline the changes and how this PR fixes the issue.

### Why are the changes needed?

Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, describe the bug.

### Does this PR introduce any user facing changes?

Please list the user-facing changes introduced by your change, including
1. change in user-facing APIs
2. addition or removal of property keys
3. webui

pr-link: Alluxio#16637
change-id: cid-99bc5802f28799347181313efb9e37b2cd46b186
Use mdtest to get file status of 1000 files, by removing the isFile
call, the get status performance can double.

pr-link: Alluxio#16678
change-id: cid-31edf9b34080ddd5595c9ce613ddaa8e29901a66
### What changes are proposed in this pull request?

Remove extra get status call in fuse.open()

### Why are the changes needed?

Reduce one unneeded get metadata call

### Does this PR introduce any user facing changes?

NA

pr-link: Alluxio#16664
change-id: cid-5a9abb7673491737ba376cdf2eeb2a13a1baf63e
### What changes are proposed in this pull request?

Remove some overhead in big file read (cache hit/ cache miss) in fuse
sdk.

### Why are the changes needed?

Improve overall read performance

### Does this PR introduce any user facing changes?

NA

pr-link: Alluxio#16670
change-id: cid-d0d32eeebf9039753881b160443791325ccf5dad
### What changes are proposed in this pull request?

Enable default client-side metadata cache (20,000 entries with 40MB
memory).

### Why are the changes needed?

Each fuse.open() has two extra get file status calls which cannot be
provided via kernel metadata cache but can via user space metadata
cache. Enable user space metadata cache can largely improve the overall
fuse FIO small file read performance. S3FS-FUSE has similar settings of
enable user space metadata cache (100,000 entries with 40MB memory) by
default.

FIO benchmarking FUSE SDK against s3-fuse in terms of small file read:
fio --name=sequentialread --rw=read --bs=100k --numjobs=1
--filesize=100k --direct=1 --group_reporting --nrfiles 1000

<google-sheets-html-origin><style type="text/css"><!--td {border: 1px
solid #cccccc;}br {mso-data-placement:same-cell;}--></style>

| S3 FUSE | ALLUXIO FUSE
-- | -- | --
Original | 9000 KiB/s | 1500 KiB/s
Kernal metadata cache (attr_timeout=7200, entry_timeout=7200) | 9000
KiB/s | 2400 KiB/s
User metadata cache | 9000 KiB/s | 1150 MiB/s


### Does this PR introduce any user facing changes?

Yeah, alluxio-fuse mount <..> metadata cache option with default value
changed
-o metadata_cache_size=<size> (Default=\"20000\" (around 40MB memory))
-o metadata_cache_expire=<timeout> (Default=\"no expire\")

pr-link: Alluxio#16662
change-id: cid-5423f1ab7f37ed89cdaa1ac0e85328478fbd1ce4
Implement basic functionalities for Dora worker.

pr-link: Alluxio#16644
change-id: cid-522ecaf0f7f39915fc8d55a2aa83b5b114355458
### What changes are proposed in this pull request?

Fix an NPE in PermissionCheckTest.

### Why are the changes needed?

The `mMetricsMaster` was initialized to `null` and the setup was adding
a null pointer to the registry. A null pointer should have been rejected
by the registry but it was not the case until refactoring was done in
Alluxio#16644.

### Does this PR introduce any user facing changes?
No.

pr-link: Alluxio#16734
change-id: cid-7c61bd50e97dba6c211d6fea29f1555d0fa91e16
Set default fuse version to 3 because of easy to use and better
performance.
One can sudo yum install fuse3 and then alluxio-fuse to be able to enjoy
the
better performance brought by MAX_IDLE_THREADS=64 (64 threads instead 10
idle threads to execute fuse operations for better performance).

pr-link: Alluxio#16682
change-id: cid-bfc194d628a887b2508e249a2bb583c64540d8f9
### What changes are proposed in this pull request?

Add mission proto lock content.

### Why are the changes needed?

Alluxio#16578 Modified the proto file but not the `proto.lock` file.

### Does this PR introduce any user facing changes?
No user facing changes.

pr-link: Alluxio#16651
change-id: cid-adbcdf461ceb5f8c43379ff510c3c9ffabcd992f
### What changes are proposed in this pull request?

Dora client side affinity

### Why are the changes needed?

skip grpc call of the get block location from master

### Does this PR introduce any user facing changes?
new configruations

pr-link: Alluxio#16726
change-id: cid-bfda07a61fc0a9ed21e9f47652e973cc4e59ca9b
### What changes are proposed in this pull request?

Enable data transmission with Netty.

User is able to enable Netty data transmission by setting the following
configurations:
alluxio.user.netty.data.transmission.enabled=true
alluxio.worker.network.netty.channel=nio

### Why are the changes needed?

Support Netty data transmission that has higher performance.

### Does this PR introduce any user facing changes?

No, it doesn't.

pr-link: Alluxio#16599
change-id: cid-0af83e5128edee32c1512beea6ad79c545ada424
Implement `getChannel`. The implementation does not yet enable zero-copy
transfer with netty.

pr-link: Alluxio#16742
change-id: cid-a2ecf8d06c0c82f56c259f45e2348f6a90d0eeb1
### What changes are proposed in this pull request?

A temporary solution for metadata caching on worker

### Why are the changes needed?

reduce the api call to ufs

### Does this PR introduce any user facing changes?

Please list the user-facing changes introduced by your change, including
add properties

pr-link: Alluxio#16732
change-id: cid-28a01ea4081202b80a31393e097df3f59530e821
In the async page store, when worker threads run out, the caller thread
should perform the action.

The test had a race condition where all the threads could block,
including the calling thread.

pr-link: Alluxio#16731
change-id: cid-81d809c343968d68b57988978575d8b9e4ace63a

Original commit
Alluxio@7a4e0a5

pr-link: Alluxio#16797
change-id: cid-c0113006a76824e1cef41c6f44820e344e047ebe
Fixes Alluxio#16793
Set "FileId" in UfsBaseFileSystem to fulfill the local cache
requirement.

Enable FUSE SDK tests with different cache combinations.

pr-link: Alluxio#16767
change-id: cid-96cfa458ce05ad317c1917cf40e14959f83a128f
Add a property key to enable or disable dora metadata cache
`alluxio.dora.client.metadata.cache.enabled=true`.

Added full fuse tests with dora data cache only under
`alluxio.client.fuse.dor`a.
Added read only from UFS FUSE tests with dora data cache and metadata
cache `alluxio.client.fuse.dora.readonly`.
Currently the fuse related tests are duplicated copies with small
modification, will refactor the test cases to avoid code copying in the
future

pr-link: Alluxio#16809
change-id: cid-95f08387cf221656811a503f3cfeb73fc7dd07c7
Remove Alluxio-fuse script. Rename alluxio-fuse-sdk script to
alluxio-fuse script.
Modify the generate tarball logics to match the changes

pr-link: Alluxio#16814
change-id: cid-4f5274c276bb75f1f353a6f36af7263eb86f3ed8
### What changes are proposed in this pull request?

Implement local cache invalidate method

### Why are the changes needed?

support ttl

### Does this PR introduce any user facing changes?

no

pr-link: Alluxio#16783
change-id: cid-2b1b606be2b4e1b756a813b0757144d6175f7b58
Signed-off-by: Huang Hua <[email protected]>

### What changes are proposed in this pull request?

Basic framework of metadata store for Dora

### Why are the changes needed?

Dora needs to persist some file/dir metadata

### Does this PR introduce any user facing changes?

A new PropertyKey "alluxio.dora.worker.meta.store.rocksdb.dir" is added.

pr-link: Alluxio#16735
change-id: cid-e39150d1ea202ccd926e29aaa94d6f3844aaa47d
### What changes are proposed in this pull request?

Fix caching not working in dora worker.

### Why are the changes needed?

Two bugs are fixed:
1. Page store in Dora worker was not created with the worker specific
property keys. Keys for client local cache was used, so the settings
like cache dir location and sizes did not have effect.
2. `DoraCacheFileSystem.open` did not honor client side read default
options, and was using hard-coded defaults from gRPC definition, which
is `NO_CACHE`.

### Does this PR introduce any user facing changes?
No.

pr-link: Alluxio#16837
change-id: cid-f2614d63e05a2f616bc09ab907e26f2e233934d4
### What changes are proposed in this pull request?
Fix bug that paged block store can neither write nor cache after
reading.

Please outline the changes and how this PR fixes the issue.
Fix the bug by using PagedBlockStore's method to build BlockReader.

### Why are the changes needed?
Paged block store cannot be used if we don't fix this bug.

### Does this PR introduce any user facing changes?
No, it doesn't.

pr-link: Alluxio#16785
change-id: cid-8f51e0eb2acf59ecd49872ed86abd643b7cd72db
### What changes are proposed in this pull request?

Please outline the changes and how this PR fixes the issue.

### Why are the changes needed?

Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, describe the bug.

### Does this PR introduce any user facing changes?

Please list the user-facing changes introduced by your change, including
1. change in user-facing APIs
2. addition or removal of property keys
3. webui

pr-link: Alluxio#16859
change-id: cid-084a3d5e779bda5bad932a5af331d78ee50222d2
@alluxio-bot alluxio-bot added the API Change Changes covering public API label Feb 15, 2023
Copy link
Contributor

@huanghua78 huanghua78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@beinan
Copy link
Contributor

beinan commented Feb 15, 2023

alluxio-bot, merge this please.

@alluxio-bot
Copy link
Contributor

Check evaluation failed:
Github action modules: (alluxio.server.,!alluxio.server.ft.) concluded as failure, which is not success or skipped
Github action modules: (alluxio.client.,!alluxio.client.fs.,!alluxio.client.cli.,!alluxio.client.fuse.) concluded as failure, which is not success or skipped
Github action modules: (alluxio.client.,alluxio.master.) concluded as failure, which is not success or skipped
Github action modules: (alluxio.client.fs.,!alluxio.client.fs.concurrent.,!alluxio.client.fs.io.) concluded as failure, which is not success or skipped
Github action modules: (alluxio.client.cli.fs.
) concluded as failure, which is not success or skipped

@beinan
Copy link
Contributor

beinan commented Feb 15, 2023

rerun the test

Copy link
Contributor

@JiamingMai JiamingMai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@dbw9580
Copy link
Contributor Author

dbw9580 commented Feb 16, 2023

Too many of our tests implicitly rely on the fact the short circuit IO is enabled by default. These tests are not really relevant to Dora and fixing them would take a lot of time. So for the time being I think we should just specify the setting manually.

@Xenorith Xenorith force-pushed the dora branch 3 times, most recently from 253ed31 to 6ecdda5 Compare April 28, 2023 21:46
@github-actions
Copy link

github-actions bot commented Jun 6, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The PR/Issue does not have recent activities and will be closed automatically label Jun 6, 2023
@apc999 apc999 force-pushed the main branch 2 times, most recently from 2fec0ec to b597c61 Compare October 17, 2023 19:11
@github-actions github-actions bot removed the stale The PR/Issue does not have recent activities and will be closed automatically label Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Change Changes covering public API POM Change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants