Update usage docu for newcomers#1037
Conversation
|
@nf-core-bot fix linting |
This reverts commit e37c9f4.
Suggested restructure of more info page for newcomers
… nf-core style guide (GPT-5.3-Codex)
Co-authored-by: Diego Alvarez S. <dialvarezs@gmail.com>
Co-authored-by: Diego Alvarez S. <dialvarezs@gmail.com>
Co-authored-by: Diego Alvarez S. <dialvarezs@gmail.com>
…nto update-docu-for-newcomers
d4straub
left a comment
There was a problem hiding this comment.
I think looks pretty good already.
| With high-depth long reads, long-read assembly typically yields more coherent results ([Agustinho et al. 2024](https://doi.org/10.1038/s41592-024-02262-1)). | ||
| Short-read-first assembly performs better with high-depth short reads or low-quality long reads and produces more fragmented but higher-accuracy assemblies ([Overholt et al. 2020](https://doi.org/10.1111/1462-2920.15186), [Meyer et al. 2022](https://doi.org/10.1038/s41592-022-01431-4)). | ||
|
|
||
| **Recommendation**: run as many assemblers as computationally feasible. |
There was a problem hiding this comment.
I think it would be helpful to add here approximate RAM and time requirements. Same for binners. Maybe we can add numbers that we produce for the manuscrtipt review here? Alternatively, a student of mine is working on checking computational requirements and we could add that numbers when they are available.
There was a problem hiding this comment.
I'm always loathe to do make such things outside a dedicated study (which I really REALLY want to see, but few do) as it can be so highly dependent on the data and complexity input... but yes I guess we could add it from the stuff @dialvarezs is adding.
There was a problem hiding this comment.
I guess we can with that provide read counts, but I assume the CAMI stuff is static? And then what metric of metagenome complexity can we use?
There was a problem hiding this comment.
More context: I had this problem with a taxprofiler paper we just submitted, the only two metrics we could really report on were RAM and harddrive usagee...
Number of CPUs you can choose, you can set to 1 or 1000 it depends on how much is available, and time also depends many different factors: how old the CPU is, what type it is, how busy the scheduler is, what the IO bandwidth is, which are all driven by the infrastructure not the pipeline/tools themselves
There was a problem hiding this comment.
I would add this to give an approximation, i.e. does the specific assembler need 1 hour or 1 day, what tool needs longer? Is the RAM 10GB, 100GB, or 1TB? Number of CPUs kept on default. And thats it. After all its for people who are new to this to give an idea what to expect. We have that questions occationally in slack afaik.
And also how computational requirements relate to each other. I assume that in many cases the RAM and time ratios between alternative tools hold true, e.g. when one needs 10GB RAM and another needs 100GB, then the latter will usually need 10x more than the former. And that would be helpful in itself as well, I think.
There was a problem hiding this comment.
But we can also postpone that and attempt to add it in a separate PR? No need to complicate things?
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
…-for-newcomers
|
So how its going here? Do we wait for more opinions? |
jfy133
left a comment
There was a problem hiding this comment.
Apparently I dreamt I merged this 🤦
|
I can merge it bypassing merging rules (because |
This is a first proposal to make the usage documentation more friendly to scientists new to the field. It attempts to explain the defaults of the pipeline and when to deviate from them.
Up for discussion.
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).