In this README, you can find a reference to the most used AWS CLI commands for this course.
The full documentation is available here.
This file will be updated throughout the course.
IMPORTANT - Profile information (access key, secret access key, and token) must be updated every time you restart the Virtual Lab.
To create (or to update) a profile, issue the following command.
# Replace <my_profile_name> with a name of your choice; what I use is aws-2425-class1-egallinucci (we will create another key-pair for class2)
aws configure --profile <my_profile_name> You will be asked the following details:
- AWS Access Key ID: you find it in the AWS Acamedy website (where you cliuck the "Start Lab" button); here, look for AWS details > AWS CLI > aws_access_key_id
- AWS Secret Access Key: as above
- Default region name: us-east-1
- Default output format: leave None (just hit Enter)
Next, set the environment variable pointing to the profile.
# UNIX (or using Git Bash on Windows)
export AWS_PROFILE=<my_profile_name>
# Using Window's Command Prompt
setx AWS_PROFILE <my_profile_name>Add the token to enable access.
aws configure set aws_session_token <my_token>
# Replace <my_token> with the token you find in the AWS Acamedy website, together with the aws_access_key_id (see above)Show existing key-pairs
aws ec2 describe-key-pairsCreate a key-pair
aws ec2 create-key-pair --key-format ppk --key-name <my_key_pair> --output text > keys/<my_key_pair>.ppk
# Replace <my_key_pair> with a name of your choice; this command creates the ppk and downloads it into the keys folderDocumentation is available here (lower-level commands) and here (higher-level commands).
See available buckets
aws s3api list-bucketsCreate a bucket
aws s3api create-bucket --bucket <my_bucket_name>Create a folder
aws s3api put-object --bucket <my_bucket_name> --key <my_folder_name>/Copy, rename, and delete a folder
aws s3 cp s3://<my_bucket_name>/<my_folder_name>/ s3://<my_bucket_name>/<another_folder_name>/
aws s3 mv s3://<my_bucket_name>/<my_folder_name>/ s3://<my_bucket_name>/<another_folder_name>/
aws s3 rm s3://<my_bucket_name>/<my_folder_name>/Upload a file
aws s3api put-object --bucket <my_bucket_name> --key <path_to_file_on_s3> --body <local_path_to_file>List entire content of a bucket
aws s3api list-objects-v2 --bucket <my_bucket_name>Filter content of a bucket (e.g., to find specific folders and files)
aws s3api list-objects-v2 --bucket <my_bucket_name> --prefix <path_to_filter_on>Documentation is available here.
Enable inbound SSH connections to the Security Group of the Master node (must be done only once, but one cluster must have been created first - see next command).
aws ec2 authorize-security-group-ingress --group-name ElasticMapReduce-master --ip-permissions IpProtocol=tcp,FromPort=22,ToPort=22,IpRanges="[{CidrIp=0.0.0.0/0}]"To decide which EMR versions to load, see here.
Create cluster (see examples here); this is the default configuration for laboratories, but you can change it for the project (e.g., to increase the computational power if needed).
aws emr create-cluster \
--name "Big Data Cluster" \
--release-label "emr-7.3.0" \
--applications Name=Hadoop Name=Spark \
--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \
--service-role EMR_DefaultRole \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=tommiKey \
--region "us-east-1"
List clusters
# All clusters
aws emr list-clusters
# Only the last one
aws emr list-clusters --max-items 1See cluster details (needs the cluster_id, which is given at creation time and can be found by listing clusters)
aws emr describe-cluster --cluster-id <my_cluster_id>Find the MasterPublicDnsName (to be used for SSH connections)
aws emr list-instances --cluster-id <cluster_id> --instance-group-type MASTERTerminate a cluster
aws emr terminate-clusters --cluster-ids <cluster_id>