-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix create worker process timeout #13772
base: master-2.x
Are you sure you want to change the base?
Conversation
`./dev/scripts/generate-tarballs` is really convinient for alluxio developers to generate custom alluxio tarballs. But as a custom version, not all underfs and tool jars under ./lib are necessary to be added to the tarballs. Maybe a custom `libjars` flag is helpful to slim the tarball by size when generating custom tarballs. pr-link: Alluxio#13292 change-id: cid-05e32b43f576e9198ac16b1eb893672c21bacf88
pr-link: Alluxio#13301 change-id: cid-2b370bd10ffb70a8fccecf7698bdebc3c6a6ff4b
MigrateConfig has an overwrite flag that didn't do anything. This has the side effect where distributedCp and distributedMv are now going to overwrite and the retries are going to work in more cases (when destination file is partially written). pr-link: Alluxio#13262 change-id: cid-343fd1cb344ba26a5c2e44554df971e4298ea7c6
pr-link: Alluxio#13304 change-id: cid-ab69b10959432889cc70b4dd94d3d38fedd6da6d
Before this PR: I want to run Alluxio in my IDE.But it has such error data:image/s3,"s3://crabby-images/f75b6/f75b68b106be679d267d9ded6cee1ce2d4b6ad88" alt="image" and master.log `org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor aroundWriteTo MessageBodyWriter not found for media type=application/json, type=class alluxio.wire.MasterWebUIInit, genericType=class alluxio.wire.MasterWebUIInit` After this PR: (It has no effect on packaging Alluxio cause it is `provided`) master: data:image/s3,"s3://crabby-images/3cde3/3cde322db8f6c1725cc409699944c8be1fabb637" alt="image" worker: data:image/s3,"s3://crabby-images/67b9e/67b9ea9b2f2589fbd9de58e80c41404d80294a58" alt="image" pr-link: Alluxio#13299 change-id: cid-02422891708872733cdac586fc31fe25150fb25f
pr-link: Alluxio#13305 change-id: cid-a29e313701fe62976d2e996b6f4c0793a85eaf39
When enabling SSE, PUT/GET requests to S3 must use HTTPS or the response will be a 403. Instead of having the user specify the need for HTTPS manually, we should enable it by default whenever SSE is specified. pr-link: Alluxio#13306 change-id: cid-cbdb8085d13da4cab83d054f5ff7d917b37aff6e
pr-link: Alluxio#13308 change-id: cid-3e30b8b8066e70b125e742b7cdee2b715f37e33e
`ExpectedException.none()` is deprecated in JUnit 4.13. It's recommended to change it to `Assert.assertThrows`. ``` package org.junit.rules; public class ExpectedException implements TestRule { /** * Returns a {@linkplain TestRule rule} that expects no exception to * be thrown (identical to behavior without this rule). * * @deprecated Since 4.13 * {@link org.junit.Assert#assertThrows(Class, org.junit.function.ThrowingRunnable) * Assert.assertThrows} can be used to verify that your code throws a specific * exception. */ @deprecated public static ExpectedException none() { return new ExpectedException(); } ``` Considering the number of tests to update, I will do this module by module. This PR takes care of the client package. pr-link: Alluxio#12840 change-id: cid-78078c85087ffc82a885443b29634df54092c82f
pr-link: Alluxio#13302 change-id: cid-6ae2788ae89c93beac76514f85ace6324ccb3d86
Remove ceph module from underfs pom so that it will be skipped for maven build. Temporary workaround for Alluxio#13309 pr-link: Alluxio#13310 change-id: cid-1d3a2deb581811ff8643863c8fc8c25e7abdc430
needed to have the publish tarball build pass after Alluxio#13310 pr-link: Alluxio#13312 change-id: cid-597939de25d2c9ca16605d2c35b106582875005c
This PR is being made to address git [issue Alluxio#13203](Alluxio#13203). I tested that the Helm chart rendered using the following in my `config.yaml` ``` tolerations: [ {"key": "test", "operator": "Equal", "value": "test", "effect": "NoSchedule"} ] ``` and that the tolerations worked as intended on k8s nodes I tainted using `kubectl taint nodes <node> test=test:NoSchedule` **Note:** I tested the Helm chart using all default values; so no FUSE nor logging-server deployments. pr-link: Alluxio#13214 change-id: cid-81d850f3b53308a78b5bb207531b294ae09d8cdb
pr-link: Alluxio#13313 change-id: cid-2fb13b3e3bcbd4d52c65b95ddc9d587424793049
pr-link: Alluxio#13307 change-id: cid-8733a2486624645b792c74fbdc56af19258cd1bf
Fix Alluxio#10385 pr-link: Alluxio#13293 change-id: cid-3d846422f0541f8e6eaea1ca5a48063239e2155c
This change will allow for the Helm chart to specify `serviceAccountName` to the Alluxio pods. pr-link: Alluxio#13297 change-id: cid-44b49766eb0c3ed2fa2a44784cfb6ff08e42b8f0
Alluxio#13310 workaround for Alluxio#13309 doesn't work. Remove the whole module validated to be able to solve the issue. pr-link: Alluxio#13320 change-id: cid-40b3f2958b5349a089ab164d822b9c4a8a1b60e1
<img width="1124" alt="Screen Shot 2021-04-29 at 4 56 00 PM" src="https://user-images.githubusercontent.com/14806853/116526083-cb91da80-a90b-11eb-8fc7-26f5343c5dfd.png"> pr-link: Alluxio#13316 change-id: cid-22ada119782755973ecf37aada8eb4eb54e5068f
This property is only used in tests. It also should not be set explicitly by the users. It's confusing to display the property on the doc page. <img width="905" alt="Screen Shot 2021-04-29 at 4 53 54 PM" src="https://user-images.githubusercontent.com/14806853/116525859-8a012f80-a90b-11eb-8cad-74f52a729686.png"> pr-link: Alluxio#13315 change-id: cid-894a1a818bb1ad835ba39d5c0ec4daadc1bc9893
Remove all the cephfs and ceph-hadoop related stuff from master branch. as a workaround for Alluxio#13309 pr-link: Alluxio#13327 change-id: cid-e6dc92b5586412fccc4e8ea930b6ead2cda56e5c
Fix issue problem in Alluxio#13323 Need to change statefulset `podManagementPolicy` from default to `Parallel`, so that multi-master node can start at same time. pr-link: Alluxio#13325 change-id: cid-dc0bb964873214d4b1fdfdb680c1912e496f6d9c
Improve efficiency of `distributedLoad` in two fold: 1. calling `BlockInStream.create` to obtain the InputStream without RPC, where the original `blockStore.getOutStream(blockId,...)` makes one RPC to master to get blockInfo. 2. removing using `BlockOutStream` and only reads the block data to alluxio job worker without writing it back to the block store on worker. This will further save at least two RPCs and the overhead to transfer data back to worker. Note that, the original code is not removed to handle the case Pinned Media, because currently reading a block does not support promoting this block to a medium but only a tier. After discussed with @calvinjia , we will leave this once we consolidate medium and tier on worker pr-link: Alluxio#13146 change-id: cid-e4e50c5a23d8beda7b398f642a4263942780447e
Fix Alluxio#13287 pr-link: Alluxio#13288 change-id: cid-e58d20f50c0c9ebb5ba6b6d0cf3ea9016014cdb3
pr-link: Alluxio#13286 change-id: cid-26158f42c7f924e4c32d57343f8d9f02f389aef9
Add a test to verify that symlinks work when the underlying ufs supports it pr-link: Alluxio#13318 change-id: cid-97d57c402d8cc537fd8fe7c70e01af56fd2c3f7c
Trying to `./bin/alluxio free` an empty (0-bytes) files from Alluxio results in the corresponding Alluxio job workers to repeatedly attempt to free the file until reaching the job timeout. This is because the implementation for `free` in [FreeCommand](https://github.com/Alluxio/alluxio/blob/master/shell/src/main/java/alluxio/cli/fs/command/FreeCommand.java#L81) uses `getInAlluxioPercentage() == 0` as the stopping condition, but empty (length 0) files in Alluxio are [hardcoded as having 100 in Alluxio percentage](https://github.com/Alluxio/alluxio/blob/40af661d15104813f74eeda3790b7a366f7cb28f/core/server/master/src/main/java/alluxio/master/file/DefaultFileSystemMaster.java#L2114). pr-link: Alluxio#13329 change-id: cid-267db50986cfab2156e31aec47a3cd6fee079bd5
The previous version slowed down the sync because the number of prefetch threads were much smaller than the number of rpc threads that were performing a sync in many cases. pr-link: Alluxio#13334 change-id: cid-224a19d976bea2a55c6674d670861dad8d3a4c1f
RocksDB engineer claims that it is possible for the readOptions to get garbage collected during iteration because RocksIterator doesn't hold a reference to the object but instead a native resource owned by readOptions. pr-link: Alluxio#13340 change-id: cid-f2dc7c29f22e0851fe1d0b4b96fe5aaaa22ba806
Add unit test for client side FIFO Evictor. pr-link: Alluxio#13339 change-id: cid-1b039bec331d002a3db73df943e6f741892a4231
### What changes are proposed in this pull request? Update Readme ### Why are the changes needed? Various minor tunings on readme file ### Does this PR introduce any user facing changes? None pr-link: Alluxio#13783 change-id: cid-f52f0d6d88e96507dc8252c70cd6f9d03d3c62f7
05073b4
to
f3e40ba
Compare
Fix Alluxio#13726 ### What changes are proposed in this pull request? Export JVM related metrics ### Why are the changes needed? Add ClassLoadingGaugeSet and CachedThreadStatesGaugeSet to Alluxio MetricRegistry. ### Does this PR introduce any user facing changes? Add more metrics. pr-link: Alluxio#13762 change-id: cid-0bdad6a119292c20dc5378cfa8f2a104878cb46e
### What changes are proposed in this pull request? Switching GRPC and WEB start sequence. ### Why are the changes needed? In my tests I found that grpc starts quickly (about 1s) but the web service takes about 15s. Switching the order can make GRPC accept the request earlier but the impact on the web service is minimal when from follower to leader, ### Does this PR introduce any user facing changes? no user facing changes. pr-link: Alluxio#13781 change-id: cid-2a896d3e9226149219ef8c0ece7ec28731cefe6c
Codecov Report
@@ Coverage Diff @@
## master #13772 +/- ##
============================================
- Coverage 43.59% 43.53% -0.06%
+ Complexity 9071 9062 -9
============================================
Files 1385 1385
Lines 80483 80489 +6
Branches 9801 9800 -1
============================================
- Hits 35083 35043 -40
- Misses 42446 42498 +52
+ Partials 2954 2948 -6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments left.
@jiacheliu3 it seems having a property key here is necessary.
@ggezer as we discussed before, let's allow more keys and I will have separate issues to propose having internal / hidden property keys from public keys
Co-authored-by: Bin Fan <[email protected]>
Co-authored-by: Bin Fan <[email protected]>
Co-authored-by: Bin Fan <[email protected]>
Please add the reason to the description section so people can know what this is about, after we merge this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements! Added a few comments.
core/server/worker/src/main/java/alluxio/worker/AlluxioWorkerProcess.java
Show resolved
Hide resolved
Co-authored-by: Jiacheng Liu <[email protected]>
@shenghui361 can you please fix your github account email so that our checks are cleared? |
bbfc248
to
a58e410
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions. |
What changes are proposed in this pull request?
Please outline the changes and how this PR fixes the issue.
Why are the changes needed?
Please clarify why the changes are needed. For instance,
Does this PR introduce any user facing changes?
Please list the user-facing changes introduced by your change, including