Running on TACC

Setup

First, ssh onto Stampede. Read the directions in the user guide, including the Accessing the System, Accessing the Compute Nodes, and Good Citizenship sections.

Then run idev -t 02:00:00 and wait until you have a session on a compute node. That will usually run relatively fast (it uses a special development queue, which is usually less busy than the normal queue). However, there is a time limit of 2 hours. If you need more time, use the normal queue. For example: idev -p normal -t 08:00:00 to get 8 hours.

Once you are running on a compute node and have a command prompt again, to activate the environment:

cd /work/03206/mortonne/stampede2/wikipedia/wiki2vec
. venv2/bin/activate
. setup.sh

Check the version of python (should be 2.7.15):

python --version

Check that everything is set up correctly:

which prep-text.sh
python -c 'import nltk; print(nltk)'

The first command should display the path to the prep-text.sh script, and the second command should display the path to the nltk python package.

Data

In /work/03206/mortonne/stampede2/wikipedia, there is:

vectors.txt - all 3 million word2vec vectors
enwiki-20191020 - directory with text for all wikipedia pages
wiki2vec - directory with code for wiki2vec, downloaded from GitHub

You'll need to add a map file for the items you're getting vectors for. From a terminal on your local machine (not ssh-ing into Stampede), replacing [username] with your user name, if the map file is in your current directory as item_map.txt:

scp item_map.txt [username]@stampede2.tacc.utexas.edu:/work/03206/mortonne/stampede2/wikipedia

SCP works using a similar process as starting an ssh session, and will require entering your password and two-factor authentication. Once it finishes, you should see the item_map.txt file in the wikipedia directory on Stampede.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running on TACC

Setup

Data

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally