-
Notifications
You must be signed in to change notification settings - Fork 0
Running on TACC
First, ssh onto Stampede. Read the directions in the user guide, including the Accessing the System, Accessing the Compute Nodes, and Good Citizenship sections.
Then run idev -t 02:00:00 and wait until you have a session on a compute node. That will usually run relatively fast (it uses a special development queue, which is usually less busy than the normal queue). However, there is a time limit of 2 hours. If you need more time, use the normal queue. For example: idev -p normal -t 08:00:00 to get 8 hours.
Once you are running on a compute node and have a command prompt again, to activate the environment:
cd /work/03206/mortonne/stampede2/wikipedia/wiki2vec
. venv2/bin/activate
. setup.shCheck the version of python (should be 2.7.15):
python --versionCheck that everything is set up correctly:
which prep-text.sh
python -c 'import nltk; print(nltk)'The first command should display the path to the prep-text.sh script, and the second command should display the path to the nltk python package.
In /work/03206/mortonne/stampede2/wikipedia, there is:
-
vectors.txt- all 3 million word2vec vectors -
enwiki-20191020- directory with text for all wikipedia pages -
wiki2vec- directory with code for wiki2vec, downloaded from GitHub
You'll need to add a map file for the items you're getting vectors for. From a terminal on your local machine (not ssh-ing into Stampede), replacing [username] with your user name, if the map file is in your current directory as item_map.txt:
scp item_map.txt [username]@stampede2.tacc.utexas.edu:/work/03206/mortonne/stampede2/wikipediaSCP works using a similar process as starting an ssh session, and will require entering your password and two-factor authentication. Once it finishes, you should see the item_map.txt file in the wikipedia directory on Stampede.