Skip to content

Commit 6e5aff7

Browse files
Merge pull request #95 from jhudsl/kweav-nameChunks
Update 01-intro.Rmd with code chunk names and fix urls
2 parents 682f37e + 4749b1a commit 6e5aff7

9 files changed

+31
-26
lines changed

01-intro.Rmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,15 @@ One of the key challenges in cancer informatics is dealing with and managing the
2121
This course is intended for researchers, including postdocs and students, with limited to intermediate experience with informatics research. The conceptual material will also be useful for those in management roles who are collecting data and using informatics pipelines.
2222

2323

24-
```{r, fig.align='center', echo = FALSE, fig.alt= "For individuals whom: Have no formal training in informatics. Are relatively new to informatics. Want to learn the basics of computers and shared computing resources. Want guidance for choosing computing options", out.width= "100%"}
24+
```{r for_individuals_who, fig.align='center', echo = FALSE, fig.alt= "For individuals whom: Have no formal training in informatics. Are relatively new to informatics. Want to learn the basics of computers and shared computing resources. Want guidance for choosing computing options", out.width= "100%"}
2525
2626
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.g11db82d2864_1_65")
2727
2828
```
2929

3030
## Topics covered:
3131

32-
```{r, fig.align='center', echo = FALSE, fig.alt= "Concepts discussed in the Computing for Cancer Informatics course: How computer hardware and software work. Computing resources designed for research Data sizes and computational capacity. Guidance about computing resource decisions. How shared computing resources work. Etiquette for shared computing resources.", out.width= "100%"}
32+
```{r topics_covered, fig.align='center', echo = FALSE, fig.alt= "Concepts discussed in the Computing for Cancer Informatics course: How computer hardware and software work. Computing resources designed for research Data sizes and computational capacity. Guidance about computing resource decisions. How shared computing resources work. Etiquette for shared computing resources.", out.width= "100%"}
3333
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.g11db82d2864_1_81")
3434
```
3535

@@ -38,6 +38,6 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE
3838
The course will cover key underlying principles and concepts in computing. We will go over concrete discussions of the differences between cloud and local computing. The course will also highlight a number of computing options and describe etiquette basics for using shared resources.
3939

4040

41-
```{r, fig.align='center', echo = FALSE, fig.alt= "Overall Course Learning Objectives. This course will demonstrate how to: 1.Recognize various data management systems especially for cancer research related data, 2.Compare and make informed decisions about computation platforms (including economic considerations),3.Implement best practices for data security and privacy, 4. Share data safely and securely in a variety of contexts,5.Handle IRB and data access requests,6.Apply ethical consideration in data management workflows", out.width= "100%"}
41+
```{r learning_objectives, fig.align='center', echo = FALSE, fig.alt= "Overall Course Learning Objectives. This course will demonstrate how to: 1.Recognize various data management systems especially for cancer research related data, 2.Compare and make informed decisions about computation platforms (including economic considerations),3.Implement best practices for data security and privacy, 4. Share data safely and securely in a variety of contexts,5.Handle IRB and data access requests,6.Apply ethical consideration in data management workflows", out.width= "100%"}
4242
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf5f8818810_1_5")
4343
```

03-Binary_data_to_computations.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ Previously, back when a university might have one single computer, as they were
113113
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf96b1d997a_0_1")
114114
```
115115

116-
There were many [different kinds](https://www.jkmscott.net/data/Punched%20Cards.html) of punch cards over time, see @scott_collection_2016 for a collection.
116+
There were many [different kinds](https://www.jkmscott.net/data/PunchedCards/PunchedCards.html) of punch cards over time, see @scott_collection_2016 for a collection.
117117

118118

119119

@@ -125,7 +125,7 @@ Also check out @hardware_history_2021 for really interesting and more extensive
125125

126126
Also, here is some fascinating additional reading on the role of women as computer operators starting in the 1940s. Initially computer science was actually thought of as a field for women, however this changed over time (and now women and gender minorities are hopefully becoming more represented) :
127127

128-
* [Article titled: Woman pioneered computer programming. Then men took their industry over](https://timeline.com/women-pioneered-computer-programming-then-men-took-their-industry-over-c2959b822523) [@visions_women_2017]
128+
* [Article titled: Woman pioneered computer programming. Then men took their industry over](https://pages.memoryoftheworld.org/library/Josh%20O%27Connor/Women%20pioneered%20computer%20programming.%20Then%20men%20took%20their%20industry%20over_%20%28321%29/Women%20pioneered%20computer%20programming.%20Then%20-%20Josh%20O%27Connor.pdf) [@visions_women_2017]
129129
* [Article titled: Untold History of AI: Invisible Women Programmed America's First Electronic Computer The “human computers” who operated ENIAC have received little credit](https://spectrum.ieee.org/untold-history-of-ai-invisible-woman-programmed-americas-first-electronic-computer) [@untold_2019]
130130

131131

04-Computing_Systems.Rmd

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -295,7 +295,7 @@ Many of us use cloud storage regularly for Google Docs and backing up photos usi
295295
Furthermore, this also allows for more opportunity to scale your work to a larger extent, as there is generally more computing capacity possible with most cloud resources [@cloudvstrad].
296296

297297

298-
Companies like Amazon, Google, Microsoft Azure, and others provide cloud computing resources. **Somewhere these companies have clusters of computers that paying customers use through the internet.** In addition to these commercial options, there are newer national government funded resource options like [Jetstream](https://portal.xsede.org/jetstream) (described in the next section). We will compare computing options in another chapter coming up.
298+
Companies like Amazon, Google, Microsoft Azure, and others provide cloud computing resources. **Somewhere these companies have clusters of computers that paying customers use through the internet.** In addition to these commercial options, there are occasionally national government funded resource options like Texas Advanced Computing Center (TACC) and others previously funded by the former project called [XSEDE](https://portal.xsede.org/) (described in the next section). We will compare computing options in another chapter coming up.
299299

300300

301301

@@ -308,7 +308,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE
308308

309309
It's important to remember that all of the shared computing options that we previously described involve a [data center](https://en.wikipedia.org/wiki/Data_center) where are large number of computers are physically housed.
310310

311-
```{r, fig.align='center', echo = FALSE, fig.alt= "Examples of servers or shared computers include clusters that may exist at your institution or national computing resources like Xsede.", out.width= "100%"}
311+
```{r, fig.align='center', echo = FALSE, fig.alt= "Examples of servers or shared computers include clusters that may exist at your institution or national computing resources", out.width= "100%"}
312312
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_23")
313313
```
314314

@@ -319,26 +319,30 @@ You may have access to a [HPC (which stands for High Performance Computing) clus
319319
If your university or institution has a HPC [cluster](https://en.wikipedia.org/wiki/Computer_cluster), this means that they have a group of computers acting like a server that people can use to store data or assist with intensive computations. Often institutions can support the cost of many computers within an HPC cluster. This means that multiple computers will simultaneously perform different parts of the computing required for a given task, thus significantly speeding up the process compared to you trying to perform the task on just your computer!
320320

321321

322-
If your institute doesn't have a shared computing resource like the HPCs we just described, you could also consider a national resource option like [Xsede](https://www.xsede.org/).
323-
[Xsede](https://www.xsede.org/) is led by the University of Illinois National Center for Supercomputing Applications (NCSA) and includes 18 other partnering institutions (which are mostly other universities). Through this partnership, they currently support 16 supercomputers. Universities and non-profit researchers in the United States can request access to their computational and data storage resources. See [here](https://portal.xsede.org/allocations/resource-info) for descriptions of the available resources.
322+
If your institute doesn't have a shared computing resource like the HPCs we just described, you could also consider a national resource option like the [Texas Advanced Computing Center (TACC)](https://en.wikipedia.org/wiki/Texas_Advanced_Computing_Center) which was funded by the National Science Foundation (NSF) [XSEDE](https://www.xsede.org/) program.
323+
Universities and non-profit researchers in the United States can request access to their computational and data storage resources. Other resource options include:
324324

325+
- [San Diego Supercomputer Center (SDSC)](https://www.sdsc.edu/) at the University of California, San Diego
326+
- [National Institute for Computational Sciences (NICS)](https://www.nics.tennessee.edu/), at the University of Tennessee, Knoxville
327+
- [Pittsburgh Supercomputing Center (PSC)](https://www.psc.edu/) at the Carnegie Mellon University and University of Pittsburgh
325328

326-
Here you can see a photo of Stampede2, one of the supercomputers that members of Xsede can utilize.
327329

328-
```{r, fig.align='center', echo = FALSE, fig.alt= "An image of Stampede2 one of the supercomputers that members of Xsede can use.", out.width= "100%"}
330+
Here you can see a photo of Stampede2, one of the supercomputers that members of TACC could utilize (it has now been replaced with Stampede3).
331+
332+
```{r, fig.align='center', echo = FALSE, fig.alt= "An image of Stampede2 one of the supercomputers that members of TACC could use.", out.width= "100%"}
329333
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_63")
330334
```
331335

332336

333-
[[source](https://www.xsede.org/ecosystem/resources)]
337+
[[source](https://www.xsede.org/)]
334338

335339
> Stampede2, generously funded by the National Science Foundation (NSF) through award ACI-1134872, is one of the Texas Advanced Computing Center (TACC), University of Texas at Austin's flagship supercomputers.
336340
337-
See [here](https://portal.xsede.org/tacc-stampede2) for more information about how you could possibly connect to and utilize Stampede2.
341+
See [this article about Stampede2 and the transition to Stampede3](https://tacc.utexas.edu/news/latest-news/2023/07/24/taccs-new-stampede3-advances-nsf-supercomputing-ecosystem/) for more information about their resources and see [their getting started website](https://tacc.utexas.edu/use-tacc/getting-started) on how you could possibly use their resources.
338342

339-
Importantly when you use shared computers like national resources like Stampede2 available through Xsede, as well as institutional HPCs, you will share these resources with many other people and so you need to learn the proper etiquette for using and sharing these resources. We will discuss this more in a coming chapter.
343+
Importantly when you use shared computers like national resources like [Stampede2](https://tacc.utexas.edu/systems/stampede2/) and [Stampede3](https://docs.tacc.utexas.edu/hpc/stampede3/), as well as institutional HPCs, you will share these resources with many other people and so you need to learn the proper etiquette for using and sharing these resources. We will discuss this more in a coming chapter.
340344

341-
However, there is also now an option to access the different XSEDE computing resources through a cloud environment option called [Jetstream2](https://jetstream-cloud.org/).
345+
There is also an option to access national computing resources through a cloud environment option called [Jetstream2](https://jetstream-cloud.org/).
342346

343347
Here is a video about Jetstream2:
344348

@@ -348,7 +352,6 @@ knitr::include_url("https://www.youtube.com/embed/NQ3flxJANTw")
348352

349353

350354

351-
352355
We will also discuss how the use of these various computing options differ in the next chapters. Importantly there are also some computing platforms that have been especially designed for scientists and specific types of researchers, so it is also useful to know about these options.
353356

354357

@@ -367,6 +370,6 @@ In conclusion, here are some of the major take-home messages:
367370
7) A supercomputer is a computer that has much more storage, memory, and computing capacity than a typical personal computer. Supercomputers are generally much more expensive than using a group of more typical computers that together would have the same collective computing and storage capacity.
368371
8) There are two general types of servers: clusters and grids. Cluster approaches work by having several computers working on pieces of the same task simultaneously in a method called parallel computing. Grid approaches work by having different types of computers working on different tasks.
369372
9) Cloud computing is essentially the use of many servers accessed through the internet. This is often more reliable because there are many servers to use, even if one other users are performing large tasks or if a server goes down. We will talk more about the pros and cons of this option in the coming chapters.
370-
10) If your institute doesn't provide you access to a shared computing resource and you don't want to use a commercial cloud option, you could consider options like [Xsede](https://www.xsede.org/) and or [Jetstream2](https://jetstream-cloud.org/), which is a national resource that you can request access to.
373+
10) If your institute doesn't provide you access to a shared computing resource and you don't want to use a commercial cloud option, you could consider options like [TACC](https://en.wikipedia.org/wiki/Texas_Advanced_Computing_Center) and or [Jetstream2](https://jetstream-cloud.org/), which is a national resource that you can request access to.
371374

372375

05-Shared_computing_etiquette.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Each cluster or other shared computing resource will have different rules and re
3030

3131
One major aspect to consider is keeping the computers in the cluster safe from harm. You wouldn't want to lose your precious data stored on the cluster and neither would your colleagues!
3232

33-
- Use a good [secure password](https://its.lafayette.edu/policies/strongpasswords/) that is not easy for someone else to guess.
33+
- Use a good [secure password](https://help.lafayette.edu/guidelines-for-strong-passwords/) that is not easy for someone else to guess.
3434

3535
Some people suggest using sentences that are easy for you to remember, you could consider a line of lyrics from song or poem that you like, or maybe a movie. Modify part of it to include symbols and numbers [@passwords].
3636

@@ -138,7 +138,7 @@ Typically a program is used to schedule jobs. Remember that jobs are the individ
138138

139139
Such job scheduling programs assign jobs to available node resources as they become available and if they have the required resources to meet the job. These programs have their own commands for running jobs, checking resources, and checking jobs. Remember to use the management system to run your jobs using the compute nodes not the login nodes (nodes for users to log in). There are often nodes set up for transferring files as well.
140140

141-
In the case of the JHPCE, a program called Sun Grid Engine (SGE) is used, but there are others job management programs. See [here](https://jhpce.jhu.edu/wp-content/uploads/2021/06/JHPCE-Overview-2021-10.pdf) for more information on how people use SGE for the JHPCE shared resource.
141+
In the case of the JHPCE, a program called Sun Grid Engine (SGE) is used, but there are others job management programs. See [here](https://jhpce.jhu.edu/orient/images/sge-orient.pdf) for more information on how people use SGE for the JHPCE shared resource.
142142

143143
### Specifying memory (RAM) needs
144144

0 commit comments

Comments
 (0)