Added an optimisation to spare not required shuffle of unpaired reads#2282
Added an optimisation to spare not required shuffle of unpaired reads#2282benraha wants to merge 3 commits intobigdatagenomics:masterfrom
Conversation
…when they are unpaired
|
Jenkins, test this please |
|
Test FAILed. Build result: FAILURE[...truncated 5 lines...]Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse origin/pr/2282/merge^{commit} # timeout=10 > git branch -a -v --no-abbrev --contains 1654f45 # timeout=10Checking out Revision 1654f45 (origin/pr/2282/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 1654f4582adc31c44bbf7eae908934548ba29afdFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.7.5,2.12,3.0.1,ubuntuTriggering ADAM-prb ? 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb ? 2.7.5,2.12,2.4.7,ubuntuADAM-prb ? 2.7.5,2.12,3.0.1,ubuntu completed with result FAILUREADAM-prb ? 2.7.5,2.11,2.4.7,ubuntu completed with result FAILUREADAM-prb ? 2.7.5,2.12,2.4.7,ubuntu completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
|
The Jenkins failure logs are a bit much to go through, this is the relevant part |
|
@heuermh Thanks! The tests should pass now, I'm ready to review and merge :) |
|
Test FAILed. Build result: FAILURE[...truncated 5 lines...]Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse origin/pr/2282/merge^{commit} # timeout=10 > git branch -a -v --no-abbrev --contains f142e4c # timeout=10Checking out Revision f142e4c (origin/pr/2282/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f f142e4c93b3a8733c1f790ac4c73fb6d138ee442First time build. Skipping changelog.Triggering ADAM-prb ? 2.7.5,2.12,3.0.1,ubuntuTriggering ADAM-prb ? 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb ? 2.7.5,2.12,2.4.7,ubuntuADAM-prb ? 2.7.5,2.12,3.0.1,ubuntu completed with result FAILUREADAM-prb ? 2.7.5,2.11,2.4.7,ubuntu completed with result FAILUREADAM-prb ? 2.7.5,2.12,2.4.7,ubuntu completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
|
Jenkins, test this please |
|
Test FAILed. Build result: FAILURE[...truncated 5 lines...]Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse origin/pr/2282/merge^{commit} # timeout=10 > git branch -a -v --no-abbrev --contains f142e4c # timeout=10Checking out Revision f142e4c (origin/pr/2282/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f f142e4c93b3a8733c1f790ac4c73fb6d138ee442First time build. Skipping changelog.Triggering ADAM-prb ? 2.7.5,2.12,3.0.1,ubuntuTriggering ADAM-prb ? 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb ? 2.7.5,2.12,2.4.7,ubuntuADAM-prb ? 2.7.5,2.12,3.0.1,ubuntu completed with result FAILUREADAM-prb ? 2.7.5,2.11,2.4.7,ubuntu completed with result FAILUREADAM-prb ? 2.7.5,2.12,2.4.7,ubuntu completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
|
Jenkins, test this please |
|
Test FAILed. Build result: FAILURE[...truncated 5 lines...]Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse origin/pr/2282/merge^{commit} # timeout=10 > git branch -a -v --no-abbrev --contains f142e4c # timeout=10Checking out Revision f142e4c (origin/pr/2282/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f f142e4c93b3a8733c1f790ac4c73fb6d138ee442First time build. Skipping changelog.Triggering ADAM-prb ? 2.7.5,2.12,3.0.1,ubuntuTriggering ADAM-prb ? 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb ? 2.7.5,2.12,2.4.7,ubuntuADAM-prb ? 2.7.5,2.12,3.0.1,ubuntu completed with result FAILUREADAM-prb ? 2.7.5,2.11,2.4.7,ubuntu completed with result FAILUREADAM-prb ? 2.7.5,2.12,2.4.7,ubuntu completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
|
Test FAILed. Build result: FAILURE[...truncated 5 lines...]Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse origin/pr/2282/merge^{commit} # timeout=10 > git branch -a -v --no-abbrev --contains bb2f078 # timeout=10Checking out Revision bb2f078 (origin/pr/2282/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f bb2f0780be008639963f08cff4fdddfaab8a37c4First time build. Skipping changelog.Triggering ADAM-prb ? 2.7.5,2.12,3.0.1,ubuntuTriggering ADAM-prb ? 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb ? 2.7.5,2.12,2.4.7,ubuntuADAM-prb ? 2.7.5,2.12,3.0.1,ubuntu completed with result SUCCESSADAM-prb ? 2.7.5,2.11,2.4.7,ubuntu completed with result SUCCESSADAM-prb ? 2.7.5,2.12,2.4.7,ubuntu completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
|
There seems to be some kind of issue unpacking the hadoop tar gz |
|
Sorry for all the trouble with Jenkins, I have seen something similar recently on another project That Apache mirror download link 404s, not sure what the actual issue is. Also odd that |
|
And a note from our admin staff:
|
|
Jenkins, test this please |
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 717 B...][WARNING] Reference path does not exist: /home/jenkins/gitcaches/adam.referenceFetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=30 > git --version # 'git version 1.7.1' > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=30 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=30 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=30Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse e6332ef^{commit} # timeout=30Checking out Revision e6332ef (detached) > git config core.sparsecheckout # timeout=30 > git checkout -f e6332ef # timeout=30Commit message: "code format fix"First time build. Skipping changelog.Triggering ADAM-prb » 2.7.5,2.12,2.4.7,ubuntuTriggering ADAM-prb » 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb » 2.7.5,2.12,3.0.1,ubuntuADAM-prb » 2.7.5,2.12,2.4.7,ubuntu completed with result FAILUREADAM-prb » 2.7.5,2.11,2.4.7,ubuntu completed with result SUCCESSADAM-prb » 2.7.5,2.12,3.0.1,ubuntu completed with result FAILURENotifying endpoint with url 'https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Setting status of e6332ef to FAILURE with url https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/3136/ and message: 'Build finished. ' |
|
Jenkins, test this please |
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 717 B...][WARNING] Reference path does not exist: /home/jenkins/gitcaches/adam.referenceFetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=30 > git --version # 'git version 2.25.1' > git fetch --tags --force --progress -- https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=30 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=30 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=30Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --force --progress -- https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse e6332ef^{commit} # timeout=30Checking out Revision e6332ef (detached) > git config core.sparsecheckout # timeout=30 > git checkout -f e6332ef # timeout=30Commit message: "code format fix" > git rev-list --no-walk e6332ef # timeout=30Triggering ADAM-prb » 2.7.5,2.12,2.4.7,ubuntuTriggering ADAM-prb » 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb » 2.7.5,2.12,3.0.1,ubuntuADAM-prb » 2.7.5,2.12,2.4.7,ubuntu completed with result FAILUREADAM-prb » 2.7.5,2.11,2.4.7,ubuntu completed with result SUCCESSADAM-prb » 2.7.5,2.12,3.0.1,ubuntu completed with result SUCCESSNotifying endpoint with url 'https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Setting status of e6332ef to FAILURE with url https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/3137/ and message: 'Build finished. ' |
|
Re This in Jenkins Build: Execute shell is apparently no longer correct |
|
test this please |
i fixed JAVA_HOME to point to /usr/java/latest (java 8) |
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 713 B...][WARNING] Reference path does not exist: /home/jenkins/gitcaches/adam.referenceFetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=30 > git --version # 'git version 2.25.1' > git fetch --tags --force --progress -- https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=30 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=30 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=30Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --force --progress -- https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse e6332ef^{commit} # timeout=30Checking out Revision e6332ef (detached) > git config core.sparsecheckout # timeout=30 > git checkout -f e6332ef # timeout=30Commit message: "code format fix" > git rev-list --no-walk e6332ef # timeout=30Triggering ADAM-prb » 2.7.5,2.12,2.4.7,ubuntuTriggering ADAM-prb » 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb » 2.7.5,2.12,3.0.1,ubuntuADAM-prb » 2.7.5,2.12,2.4.7,ubuntu completed with result FAILUREADAM-prb » 2.7.5,2.11,2.4.7,ubuntu completed with result SUCCESSADAM-prb » 2.7.5,2.12,3.0.1,ubuntu completed with result SUCCESSNotifying endpoint with url 'https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Setting status of e6332ef to FAILURE with url https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/3138/ and message: 'Build finished. ' |
|
test this please |
|
(i also updated the build config so that the main prb build runs on the workers, and not the primary node) |
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 1.02 KB...]Using reference repository: /home/jenkins/gitcaches/adam.referenceFetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git --version # 'git version 2.25.1' > git fetch --tags --force --progress -- https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --force --progress -- https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse e6332ef^{commit} # timeout=10Checking out Revision e6332ef (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e6332ef # timeout=10Commit message: "code format fix" > git rev-list --no-walk e6332ef # timeout=10Triggering ADAM-prb » 2.7.5,2.12,2.4.7,ubuntuTriggering ADAM-prb » 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb » 2.7.5,2.12,3.0.1,ubuntuADAM-prb » 2.7.5,2.12,2.4.7,ubuntu completed with result FAILUREADAM-prb » 2.7.5,2.11,2.4.7,ubuntu completed with result SUCCESSADAM-prb » 2.7.5,2.12,3.0.1,ubuntu completed with result SUCCESSNotifying endpoint with url 'https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Setting status of e6332ef to FAILURE with url https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/3139/ and message: 'Build finished. ' |
|
Jenkins, test this please |
|
Refer to this link for build results (access rights to CI server needed): Build result: FAILURE[...truncated 905 B...]Wiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Using reference repository: /home/jenkins/gitcaches/adam.referenceFetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git --version # 'git version 2.25.1' > git fetch --tags --force --progress -- https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --force --progress -- https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse e6332ef^{commit} # timeout=10Checking out Revision e6332ef (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e6332ef # timeout=10Commit message: "code format fix" > git rev-list --no-walk a895649 # timeout=10Triggering ADAM-prb » 3.2.1,2.12,3.0.2,ubuntuADAM-prb » 3.2.1,2.12,3.0.2,ubuntu completed with result FAILURENotifying endpoint with url 'https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Setting status of e6332ef to FAILURE with url https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/3157/ and message: 'Build finished. ' |
This PR adds an option to avoid shuffling the entire alignment dataset when transforming to FragmentDataset.
This adds a boolean parameter to the function toFragments() that, when specified, the FragmentDataset is created with one read per Fragment record.
This resolved issue #2281 .