-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support access more ufs in dora worker #17609
base: main
Are you sure you want to change the base?
Conversation
…e with jdk17 Add dockerfile with java 17 base and java 8 added for compilation, mirroring the java 11 one pr-link: Alluxio#16885 change-id: cid-aa63908b5ef3ef2ef83a022677ec58706d29730d
Modern object stores ensure strong consistency as follows: Alibaba oss: https://www.alibabacloud.com/help/en/object-storage-service/latest/what-is-oss Amazon S3: https://aws.amazon.com/s3/consistency/ GCP Cloud Storage: https://cloud.google.com/storage/docs/consistency Huawei OBS: https://support.huaweicloud.com/intl/en-us/api-obs/obs_04_0118.html When this is non-zero and the UFS is strongly consistent, and Alluxio is out of sync, some operations may retry over and over even though nothing will change, slowing down the system for long periods of time. This value should be 0 by default with strong consistency as Alluxio will see the most up to date version the first time it accesses the object. pr-link: Alluxio#16887 change-id: cid-164e455d71f4b17d797ddfbaff2ef7595c0bea90
…rt for LocalAlluxioCluster Add embedded journal support for LocalAlluxioCluster in integration tests To test some features that only works with embedded journals N/A pr-link: Alluxio#16875 change-id: cid-218958b272917280916ef8bed0a26e0ce9bdd6d6
### What changes are proposed in this pull request? Add a rejectOnStandbyMasters on version grpc endpoint ### Why are the changes needed? We added a PR to make standby master return unavailable on this version service endpoint Alluxio#16854 However, in addition to the polling master inquire client, the AbstractMasterClient also needs this endpoint. https://github.com/Alluxio/alluxio/blob/master/core/common/src/main/java/alluxio/AbstractClient.java#L172 When a client makes a request to standby master, such check will constantly fail and resulted in failures. So we want the logic done in Alluxio#16854 only applies to PollingMasterInquireClient and hence we added this boolean field to bypass the check. ### Does this PR introduce any user facing changes? N/A pr-link: Alluxio#16890 change-id: cid-379f8af78250d2a230bceff9d7ea75739979e198
add a missing javadoc `@param` pr-link: Alluxio#16891 change-id: cid-fa5c6ba2f4315994e8f5167a482c1047210711a7
### What changes are proposed in this pull request? This change makes ttl default action be free. And some changes to make tests passed. ### Why are the changes needed? Fix Alluxio#12316 ### Does this PR introduce any user facing changes? No pr-link: Alluxio#16823 change-id: cid-142490712d94004a0303f18399d4637e12d81523
… get content hash of uploaded file Currently when complete is called on a file in Alluxio, a fingerprint of the file will be created by performing a GetStauts on the file on the UFS. If due to a concurrent write, the state of the file is different than what was written through Alluxio, the fingerprint will not actually match the content of the file in Alluxio. If this happens the state of the file in Alluxio will always be out of sync with the UFS, and the file will never be updated to the most recent version. This is because metadata sync uses the fingerprint to see if the file needs synchronization, and if the fingerprint does not match the file in Alluxio there will be inconsistencies. This PR fixes this by having the contentHash field of the fingerprint be computed while the file is actually written on the UFS. For object stores, this means the hash is taken from the result of the call to PutObject. Unfortunately HDFS does not have a similar interface, so the content hash is taken just after the output stream is closed to complete the write. There could be a small chance that someone changes the file in this window between the two operations. pr-link: Alluxio#16597 change-id: cid-64723be309bdb14b05613864af3b6a1bb30cba6d
What changes are proposed in this pull request? Update cn version of Spark on Kubernetes doc. Why are the changes needed? There is no corresponding Chinese documentation for upgrade. Does this PR introduce any user facing changes? More Chinese users can access Alluxio documentation more easily. pr-link: Alluxio#16855 change-id: cid-59e640d1ad5bd270d71546d226fa9f546607cf93
### What changes are proposed in this pull request? Rename PROXY_S3_OPTIMIZED_VERSION_ENABLED to PROXY_S3_V2_VERSION_ENABLED in comment. ### Why are the changes needed? Using a error property about how to enable the s3 v2 api in `alluxio.proxy.s3.S3RequestServlet`. pr-link: Alluxio#16896 change-id: cid-b10e399e4d94511c1d0c134fe5ab171c95a8f1c8
### What changes are proposed in this pull request? Fix typo. ### Why are the changes needed? NA ### Does this PR introduce any user facing changes? NA pr-link: Alluxio#16917 change-id: cid-4bd1a54d173b081b75c97b7132c03f492d8b748e
Fixes Alluxio/Community#624 Alluxio/Community#624 pr-link: Alluxio#16871 change-id: cid-1c61908eb034cdb6d84b5510798ac8c2c7c029e9
…File Support overwrite option in createFile before this change, if we have the same name file existing in Alluxio, we will try `getStatus`, `deleteFile`, and then create the new one. now we just need to call `createFile` with overwrite option. no matter in hdfs api or s3 api. Excessive RPCs are saved. add a new option in CreateFileOption for overwriting. pr-link: Alluxio#16886 change-id: cid-5b84132d9c4da731b7d1bbf35d71885052e8c5b0
### What changes are proposed in this pull request? Fix typo. pr-link: Alluxio#16751 change-id: cid-ef5fa5eab7cbfa6e424d514b9745c0cf41fb5a98
…tyCommandTest As the title. If Options has a WorkerInfoField which POptions does not have, or if POptions has a WorkerInfoField which Options does not have, this test will be failed. The latest test version can not check these two error, as shown in the PR below. [Add missing variant in gRPC WorkerInfoField](Alluxio#16457) No. pr-link: Alluxio#16507 change-id: cid-23fa04dbd9b0402bff37ddb8ae42f5ec6e18f719
### What changes are proposed in this pull request? Added benchmarks for `PagedBlockStore` that read from local storage rather than UFS. ### Why are the changes needed? This piece is missing as `PagedBlockStore` didn't support creating local blocks then. ### Does this PR introduce any user facing changes? No. pr-link: Alluxio#16804 change-id: cid-5e0226beb45ae714cae4417c43076f0857cdf7c6
### What changes are proposed in this pull request? Adding metrics sink to job master. ### Why are the changes needed? Fix the issue that the job master is unable to sink metrics. ### Does this PR introduce any user facing changes? Yes, this change will enable users to sink metrics from the job master. pr-link: Alluxio#16899 change-id: cid-7394471270d4617007eeb97e1674b90585337624
Add read limit for s3 proxy when getObject return files. We use NVME to speed up reading algorithm model, but we find that the reading speed of alluxio is too fast and k8s container will consume a lot of network card resources, and then affect other containers of the same host, so we need to limit the reading speed. Add two properties: 1. `alluxio.proxy.s3.global.read.rate.limit.mb` to limit all connections rate; 2. `alluxio.proxy.s3.single.connection.read.rate.limit.mb` to limit single connection rate. pr-link: Alluxio#16866 change-id: cid-613baec7d469bb68b3c75343c49d6822ee4bd1a6
### What changes are proposed in this pull request? Fix potential bugs in freeWorker command. ### Why are the changes needed? When a worker has been decommissioned, its metadata can not be got by calling `getWorkerInfolist()`. This method accesses `LoadingCache<String, List<WorkerInfo>> mWorkerInfoCache` in `DefaultBlockMaster.java`, which will not refresh instantly. As to method `removeDecommissionedWorker()` in `BlockMasterClientServiceHandler.java`, if we don't add FieldRanges, the list `decommissionedWorkers` would not get enough information to run the loop below successfully, though the worker has been decommissioned. ### Does this PR introduce any user facing changes? No. pr-link: Alluxio#16458 change-id: cid-101865dc9ec4f40e7561f81f38287c6efc2ae23f
…ss than expected Fix a bug where `fs head` and `fs tail` output less data than it is expected to. The code is buggy: it only calls `read` once, and does not check if the returned number of bytes read is equal to the total number of bytes to read as specified by the cli option. Compare with the `cat` command: https://github.com/Alluxio/alluxio/blob/73f3ce83c8a3ef77ac3eebb4579bb7d412784ec9/shell/src/main/java/alluxio/cli/fs/command/CatCommand.java#L57-L63 No. pr-link: Alluxio#16928 change-id: cid-86b76a3444fa9efe2cd63a4b42a42e4f62b8f21b
…dfsFileInputStream Implement unbuffer interface for HdfsFileInputStream. Fix Alluxio#16016. If the unbuffer method is not implemented, then impala will not be able to use the file handle cache. Implement CanUnbuffer and StreamCapabilities for HdfsFileInputStream. pr-link: Alluxio#16017 change-id: cid-b50163c7b4f199b8a61d5818a0e4739039f2745c
…sticHashPolicy Add a new block location policy `CapacityBaseDeterministicHashPolicy`. We want a `CapacityBaseRandomPolicy` that is deterministic. See also Alluxio#16187. Yes, a new block location policy is available for config item `alluxio.user.ufs.block.read.location.policy` and `alluxio.user.block.write.location.policy.class`. pr-link: Alluxio#16237 change-id: cid-47ba9b1d197b5ad546ac1a993590d49e963c3811
### What changes are proposed in this pull request? Incorrect usage of flag in atomic rename for the final step of completing the target multipart-upload file. ### Why are the changes needed? if write type is cache_thru or thru, the atomic rename ( delete target and rename src to target ) will incorrectly delete alluxio-only instead of deleting UFS, hence making the renaming op in UFS fail. ### Does this PR introduce any user facing changes? N/A pr-link: Alluxio#16941 change-id: cid-b38904c24dee066adac2f854127f2d877bd21dcd
### What changes are proposed in this pull request? When the metadata sync descendant type is NONE, stop loading the children of the sync root. If a metadata sync is trigged by a GetStatus() call on a directory Previous behavior: The directory itself, as well as all its sub directories in the inode store will be synced. New behavior: ONLY the directory itself will be loaded. ### Why are the changes needed? This PR addresses Alluxio#16922. The incorrect metadata sync behavior on GetStatus for a directory loads more children of the directory than expected and put a lot of pressure on UFS side. ### Does this PR introduce any user facing changes? Yes. The metadata sync behavior has been changed. See the comment above. The previous behavior was actually wrong and we added a hidden feature flag to allow customers to fallback. pr-link: Alluxio#16935 change-id: cid-2a5a2b4959422ecff74149881e400659d07c2163
…ker for pagestore when free/delete file Support removeBlock for pagestore. To remove metadata and data of blocks and pages on worker when free or delete a file. N/A pr-link: Alluxio#16895 change-id: cid-a6cc6c0074907f62b9778a8a1cfc0e9f61e74135
### What changes are proposed in this pull request? Fix client stressbench concurrency problem. ### Why are the changes needed? Reproduce: Cluster: 1 master and 2 workers Command: `bin/alluxio runClass alluxio.stress.cli.client.StressClientIOBench --operation Write --base alluxio:///stress-client-io-base --write-num-workers 2 --file-size 1m --threads 8` Result: user_root.log <img width="1152" alt="image" src="https://user-images.githubusercontent.com/42070967/220555496-c275578d-9eb7-4244-b897-9e5142d977d1.png"> ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#16934 change-id: cid-10a6a78a50a8a8bcac455a1bc7f7d1fe43c3642a
### What changes are proposed in this pull request? Fix the wrong option in stress bench doc ### Why are the changes needed? Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, describe the bug. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#16925 change-id: cid-923b234e2c29b132b5fbafdfdb64b9903ae5a464
It will through ArrayOutOfBoundException if we build alluxio in a node which have no git installed, and the `VERSION` would be shorter than 8, even empty string, so substring(8) cannot work anymore. This PR check there are more than 8 chars in the `VERSION` first, otherwise, do not cut the `VERSION` string. pr-link: Alluxio#16888 change-id: cid-4e02cc9214317d86bba9d00a6121c5f013dd3255
### What changes are proposed in this pull request? Improve listStatus 5X performance in some scenarios. ### Why are the changes needed? For instance, Under the scenario: 1. Use Hadoop compatible system to access Alluxio(listStatus) 2. There are many mount point such as more than 500. 3. There are more than 2000 files in a directory. The PathUtils.hasPrefix (comes from MountTable.getMountPoint)method will be called at least 10w (500 * 2000) times. But actually we don't need the information of mount point. The test can be reduced from about 400ms to 70ms under the master branch. The test can be reduced from about 700ms to about 100ms under 2.7.1 branch. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including no pr-link: Alluxio#16893 change-id: cid-71d9a351744033426fb1b0633d7b194f94f322b5
…p golang.org/x/net to 0.7.0 in /integration/docker/csi Bumps [golang.org/x/net](https://github.com/golang/net) from 0.0.0-20210510120150-4163338589ed to 0.7.0. <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/golang/net/commits/v0.7.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=golang.org/x/net&package-manager=go_modules&previous-version=0.0.0-20210510120150-4163338589ed&new-version=0.7.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Alluxio/alluxio/network/alerts). </details> pr-link: Alluxio#16947 change-id: cid-6d9f72601394aed56df5d25eeee43eeed668a502
1. Support more rocksdb options in dora meta store 2. Add a microbench to profile the dora meta store result 3. Removed the memory metadata cache based on the profiling result 4. Introduced a DoraMetaManager to handle data loading/ page cache invalidation 5. Introduced fingerprint to determine if file contents are changed. Performance testing: Microbench result on a single worker with a metastore storing 10m metadata entries Rocks + 32MB rocks block cache: Read op/s 81k (1 thread)/ 143k (16 threads) Write op/s 285k (1 thread)/ 196k (16 threads) Rocks + 1GB rocks block cache: Read op/s 163k (1 thread)/ 480k (16 threads) Write op/s 286k (1 thread)/ 134k (16 threads) Rocks + 4GB on-heap cache: Read ops 560k (1 thread)/ OOM (16 threads) Write ops 330k (1 thread)/ 363k (16 threads) -> Observed severe GC issue when using an on-heap cache <img width="1386" alt="Screen Shot 2023-05-29 at 4 16 52 PM" src="https://github.com/Alluxio/alluxio/assets/6771554/b156d4c9-97ba-4216-929c-4c81b4f5ce3c"> Both rocksDB built-in cache approaches meet our performance requirement and given that the GC issue of on-heap cache, we decided to abandon the on-heap memory cache. pr-link: Alluxio#17458 change-id: cid-220612ec553b4d747136abe5363280aa07c376d0
### What changes are proposed in this pull request? Adding the predicates for move jobs in order to be triggered by the policy engine. ### Why are the changes needed? Adding two predicates: 1. unmodifiedFor 2. dateFromFileNameOlderThan ### Does this PR introduce any user facing changes? NA pr-link: Alluxio#17525 change-id: cid-e6b43bc8f0281ba4a3a06c9aa187f7a3331bc4c2
Fix the bug that Alluxio Client is blocked when it requests to write an empty file. pr-link: Alluxio#17562 change-id: cid-5e05464ac01b3316d0c151a992042ac9a1f1eeca
### What changes are proposed in this pull request? 1. refactor move API for ee 2. add get mount Id API so we can use this API to get correct ufs in ee ### Why are the changes needed? na ### Does this PR introduce any user facing changes? na pr-link: Alluxio#17565 change-id: cid-81489ef469d4134526b38b263dd1ef57f926cc85
### What changes are proposed in this pull request? Fix LocalPageStore NPE. ### Why are the changes needed? I encountered the following exception while using local cache ``` 2023-05-31T17:25:13.453+0800 ERROR 20230531_092513_00010_uqx2a.1.0.0-7-153 alluxio.client.file.cache.NoExceptionCacheManager Failed to put page PageId{FileId=76f9c79d5d43c725de31295c263291e0, PageIndex=534}, cacheContext CacheContext{cacheIdentifier=null, cacheQuota=alluxio.client.quota.CacheQuota$1@1f, cacheScope=CacheScope{id=.}, hiveCacheContext=null, isTemporary=false} java.lang.NullPointerException: Cannot invoke "String.contains(java.lang.CharSequence)" because the return value of "java.lang.Exception.getMessage()" is null at alluxio.client.file.cache.store.LocalPageStore.put(LocalPageStore.java:80) at alluxio.client.file.cache.LocalCacheManager.putAttempt(LocalCacheManager.java:345) at alluxio.client.file.cache.LocalCacheManager.putInternal(LocalCacheManager.java:274) at alluxio.client.file.cache.LocalCacheManager.put(LocalCacheManager.java:234) at alluxio.client.file.cache.CacheManagerWithShadowCache.put(CacheManagerWithShadowCache.java:52) at alluxio.client.file.cache.NoExceptionCacheManager.put(NoExceptionCacheManager.java:55) at alluxio.client.file.cache.CacheManager.put(CacheManager.java:196) at alluxio.client.file.cache.LocalCacheFileInStream.localCachedRead(LocalCacheFileInStream.java:218) at alluxio.client.file.cache.LocalCacheFileInStream.bufferedRead(LocalCacheFileInStream.java:144) at alluxio.client.file.cache.LocalCacheFileInStream.readInternal(LocalCacheFileInStream.java:242) at alluxio.client.file.cache.LocalCacheFileInStream.positionedRead(LocalCacheFileInStream.java:287) at alluxio.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:153) at alluxio.hadoop.HdfsFileInputStream.readFully(HdfsFileInputStream.java:170) at org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111) at io.trino.filesystem.hdfs.HdfsInput.readFully(HdfsInput.java:42) at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readInternal(TrinoParquetDataSource.java:64) at io.trino.parquet.AbstractParquetDataSource.readFully(AbstractParquetDataSource.java:120) at io.trino.parquet.AbstractParquetDataSource$ReferenceCountedReader.read(AbstractParquetDataSource.java:330) at io.trino.parquet.ChunkReader.readUnchecked(ChunkReader.java:31) at io.trino.parquet.reader.ChunkedInputStream.readNextChunk(ChunkedInputStream.java:149) at io.trino.parquet.reader.ChunkedInputStream.read(ChunkedInputStream.java:93) ``` ### Does this PR introduce any user facing changes? NO pr-link: Alluxio#17556 change-id: cid-6dd42d35817816cba35fa52afe95d154bf9d9acf
### What changes are proposed in this pull request? - Add delete/rename/createDirectory RPC. Invalidate metadata and data if file/dir is deleted or renamed. - Continue operations in DoraFileOutStream.close() when exception happens. - reduce worker block heartbeat interval. - decrease Netty close timeout value. - add PropertyKeys in MultiProcessCluster for Dora. - Add PropertyKeys in AbstractFuseDoraTest for Dora. - use different names to avoid test conflicts in FuseFileSystemMetadataTest ### Why are the changes needed? Client writes data synchronously to worker PagingStore as a cache, and to UFS as write through. So delete/rename/createDirectory RPCs are going to do metadata and data operations and invalidations on worker side. ### Does this PR introduce any user facing changes? N/A pr-link: Alluxio#17545 change-id: cid-b8905052e043bbf9142b40ca3d1e3c3773013e98
Update FUSE SDK docs pr-link: Alluxio#17529 change-id: cid-3a5226a850291041f2ac073371de219330444553
Correct some information in the README.md for docker. pr-link: Alluxio#17387 change-id: cid-79265df0f0843bc324c950cd8d1eed667c71f30c
### What changes are proposed in this pull request? Please outline the changes and how this PR fixes the issue. ### Why are the changes needed? Please clarify why the changes are needed. For instance, 1. If you propose a new API, clarify the use case for a new API. 2. If you fix a bug, describe the bug. ### Does this PR introduce any user facing changes? Please list the user-facing changes introduced by your change, including 1. change in user-facing APIs 2. addition or removal of property keys 3. webui pr-link: Alluxio#17578 change-id: cid-7df58ee0748c1e25e0eae47e4409bf37bc7a8594
Bump Apache Ratis version to 2.5.1 Followup to Alluxio#17394. pr-link: Alluxio#17571 change-id: cid-a247f1042687ead5ac2d3742bc2de4428f42baf3
### What changes are proposed in this pull request? Modify interface for extension ### Why are the changes needed? In some situation, eg. permission check, it is requiring to inherit the function to add extra check. ### Does this PR introduce any user facing changes? NA pr-link: Alluxio#17586 change-id: cid-a507ccf2453149e4803d8fe1ea135fa1389d299e
Thank you for your pull request. |
worker configuration
verify
|
Automated checks report:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks. |
2bb57b6
to
a71b5fb
Compare
Automated checks report:
All checks passed! |
@huiboliu2020 Different under filesystem may have different configuration, It seems that no relevant implementation has been seen from worker configuration and code. |
hi @huiboliu2020 thanks for your contribution! We do plan to implement support for Dora workers to mount multiple UFSes. A discussion on the plan and design (with our engineers and some marquee users) will take place on 2023/07/07. You are welcome to join! See the link https://docs.qq.com/doc/DRmxQTFBEUklvd3NC for how to join this discussion. Looking forward to seeing you there! |
2fec0ec
to
b597c61
Compare
What changes are proposed in this pull request?
#17575
Does this PR introduce any user facing changes?
Add this PropertyKey:
alluxio.dora.cached.file.system.enabled