Skip to content

Running on TACC

Neal W Morton edited this page Nov 15, 2019 · 3 revisions

Setup

First, ssh onto Stampede. Read the directions in the user guide, including the Accessing the System, Accessing the Compute Nodes, and Good Citizenship sections.

Then run idev -t 02:00:00 and wait until you have a session on a compute node. That will usually run relatively fast (it uses a special development queue, which is usually less busy than the normal queue). However, there is a time limit of 2 hours. If you need more time, use the normal queue. For example: idev -p normal -t 08:00:00 to get 8 hours.

Once you are running on a compute node and have a command prompt again, to activate the environment:

cd /work/03206/mortonne/stampede2/wikipedia/wiki2vec
. venv2/bin/activate
. setup.sh

Check the version of python (should be 2.7.15):

python --version

Check that everything is set up correctly:

which prep-text.sh
python -c 'import nltk; print(nltk)'

The first command should display the path to the prep-text.sh script, and the second command should display the path to the nltk python package.

Data

In /work/03206/mortonne/stampede2/wikipedia, there is:

  • vectors.txt - all 3 million word2vec vectors
  • enwiki-20191020 - directory with text for all wikipedia pages
  • wiki2vec - directory with code for wiki2vec, downloaded from GitHub

You'll need to add a map file for the items you're getting vectors for. From a terminal on your local machine (not ssh-ing into Stampede), replacing [username] with your user name, if the map file is in your current directory as item_map.txt:

scp item_map.txt [username]@stampede2.tacc.utexas.edu:/work/03206/mortonne/stampede2/wikipedia

SCP works using a similar process as starting an ssh session, and will require entering your password and two-factor authentication. Once it finishes, you should see the item_map.txt file in the wikipedia directory on Stampede.

Clone this wiki locally