|
47 | 47 |
|
48 | 48 | ???+ question "Submit a Slurm job" |
49 | 49 |
|
50 | | - - Close the [cifar10 resnet repository](https://github.com/akamaster/pytorch_resnet_cifar10?tab=readme-ov-file) and edit the run.sh by adding appropriate slurm sbatch commands. |
| 50 | + - Submit `sentiment_analysis.py` with appropriate flags for its slurm job. |
51 | 51 |
|
52 | 52 | ??? tip "Answer" |
53 | | - - edit a file using you prefered editor, named `my_bio_worksflow.sh`, for example, with the content |
| 53 | + - edit a file using you prefered editor, named `sentiment_analysis_batch.sh`, for example, with the content |
54 | 54 | |
55 | 55 | ```bash |
56 | 56 | #!/bin/bash -l |
|
59 | 59 | #SBATCH -p node |
60 | 60 | #SBATCH -N 1 |
61 | 61 | #SBATCH -t 01:00:00 |
62 | | - #SBATCH -J cifar_demo |
| 62 | + #SBATCH -J sentiment_analysis |
63 | 63 | #SBATCH -M snowy |
64 | 64 | #SBATCH --gres=gpu:1 |
65 | 65 |
|
66 | | - module load python_ML_packages/3.9.5-gpu |
| 66 | + source ..... |
67 | 67 |
|
68 | 68 | python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.get_device_properties(0)); print(torch.randn(1).cuda())" |
69 | 69 |
|
70 | | - #for model in resnet20 resnet32 resnet44 resnet56 resnet110 resnet1202 |
71 | | - for model in resnet20 resnet110 |
72 | | - do |
73 | | - echo "python -u trainer.py --arch=$model --save-dir=save_$model |& tee -a log_$model" |
74 | | - python -u trainer.py --arch=$model --save-dir=save_$model |& tee -a log_$model |
75 | | - done |
| 70 | + echo "running sentiment_analysis.py" |
76 | 71 |
|
| 72 | + python .....sentiment_analysis.py |
77 | 73 | ``` |
78 | 74 |
|
79 | | - - make the job script executable |
| 75 | + - make the job script executable, if not already. |
80 | 76 | ```bash |
81 | | - $ chmod a+x run.sh |
| 77 | + $ chmod a+x sentiment_analysis_batch.sh |
82 | 78 | ``` |
83 | 79 | |
84 | 80 | - submit the job |
85 | 81 | ```bash |
86 | | - $ sbatch run.sh |
| 82 | + $ sbatch sentiment_analysis_batch.sh |
87 | 83 | ``` |
88 | | - |
| 84 | + |
| 85 | + - Similarly run `sentiment_analysis.ipynb` jupyter notebook. Add matplotblib and seaborn to your environment. Start a jupyter server on snowy compute node and then tunnel to the host from your local. |
| 86 | + |
| 87 | + ??? tip "Answer" |
| 88 | + |
| 89 | + * Install matplotlib and seaborn and start an interactive snowy session: |
| 90 | + |
| 91 | + ```console |
| 92 | + source torch_env/bin/acivate |
| 93 | + pip install matplotlib seaborn |
| 94 | + interactive -A uppmax2025-3-5 -M snowy -p node -N 1 -t 1:01:00 --gres=gpu:1 |
| 95 | + source torch_env/bin/acivate |
| 96 | + jupyter notebook --ip 0.0.0.0 --no-browser |
| 97 | + ``` |
| 98 | + * Then tunnel: |
| 99 | + `ssh -L 8888:s123:8888 [email protected]` |
| 100 | + |
| 101 | + * Copy the localhost url and paste it in your browser |
| 102 | + |
| 103 | + |
| 104 | +### Cache management |
| 105 | + |
| 106 | +* By default HF will install the models and temp files to yur `$HOME` folde which is rather limited to 32 GB and 300k files. |
| 107 | +* To avoid that, you can set `HF_HOME` and `HF_HUB_CACHE` variables to your project folder. Follow instructions on https://huggingface.co/docs/transformers/en/installation?cpu-only=PyTorch#cache-directory. |
| 108 | + |
89 | 109 | <!-- ## Doing installations |
90 | 110 |
|
91 | 111 |
|
|
0 commit comments