Skip to content

Latest commit

 

History

History
557 lines (384 loc) · 26.5 KB

File metadata and controls

557 lines (384 loc) · 26.5 KB

Infrastructure

Overview

  • The TTA Hub application(s) are run on the cloud.gov platform
  • cloud.gov uses AWS and leverages Cloud Foundry tools
  • CircleCI is used for automated build/test/deploy jobs, the project can be found here.

Architecture

  • Application consists of a Node.js backend and React frontend.
  • PostgreSQL is used extensively for persistant data storage.
  • Redis handles caching and some realtime presence features
  • AWS S3 is used for DB backups and some external data exchange

Environments

  • There are three environment "levels": prod, staging, and dev
  • Each environment is isolated to its own cloud.org space
  • Prod and staging follow similar configurations, with multiple instances serving incoming traffic
  • There are multiple deployments within dev (blue, green, red, etc). Each dev deployment consists of a single instance with dedicated database and redis, and has its own DNS entry.
  • Dev and staging databases are restored to an anonymized version of production data every night, via a schedule job that runs via CircleCI

Continuous Integration (CI)

The bulk of CI configurations can be found in this repo's .circleci/config.yml file, the application manifest and the environment specific deployment_config variable files. Linting, unit tests, test coverage analysis, and an accessibility scan are all run automatically on each push to the HHS/Head-Start-TTADP repo. Merges to the main branch are blocked if the CI tests do not pass. For more information on the security audit and scan tools used in the continuous integration pipeline see ADR 0009.

Continuous Deployment (CD)

  • The main branch is automatically deployed to staging on merge, after tests pass
  • The production branch is automatically deployed to production on merge, after tests pass

Deploy changes directly to a test environment

You can deploy changes from any remote branch to a non-production environment by following these steps:

  • Log in to CircleCI, go to pipelines https://app.circleci.com/pipelines
  • Select your branch from the dropdown on the top right side
  • Click "Trigger Pipeline" button in the top right
  • Select deploy_manual, choose the environment you want to deploy to (ie dev-blue) from the dropdown, then run the pipeline.

Secret Management

CircleCI's project-based "environment variables" are used for secret management. These secrets include:

  • Cloud.gov deployer account username and password. These keys are specific to each cloud.gov organization/space, i.e. each deployment environment, and can be regenerated by developers who have proper cloud.gov permissions at any time.
  • HSES authentication middleware secrets passed to the application as AUTH_CLIENT_ID and PRIVATE_JWK_64.
  • The application SESSION_SECRET.
  • The application JWT_SECRET.
  • NewRelic license key, passed to the application as NEW_RELIC_LICENSE_KEY

Exception:

  • The environment specific postgres database URI is automatically available in the relevant cloud.gov application environment (because they share a cloud.gov "space"). The URI is accessible to the application as POSTGRES_URL. Consequently, this secret does not need to be managed by developers.

Adding environment variables to an application

There are a few different things you will need to do in order to add a new secret or config variable, depending on whether the value is secret and whether it will change per environment or not.

  • First, add it under the env: section in manifest.yml. This is what populates values when the application is deployed to cloud.gov
  • Next, add the value to each of the files under deployment_config/.
    • If the value is non-secret, simply add it in cleartext to these configs, ie NEW_VAR: false
    • If the value is secret, for now you will add the var name here with reference to what it will be called in CircleCI. ie NEW_VAR: "${CIRCLE_VAR_NAME}"
  • If you created a secret-style var, you will now need to add it as a project-based "environment variable" in CircleCI.
    • Go to CircleCI project settings. You can create a separate value for each environment here, or use the same value across all environments, depending on what you defined in the deployment config yml files.

Interacting with a deployed application or database

Read TTAHUB-System-Operations for information on how production may be accessed.

Our project includes four deployed Postgres databases, one to interact with each application environment (sandbox, dev, staging, prod).

First, log into Cloud Foundry instance

  1. Install the lastest version (Version 8 as of this writing) of the Cloud Foundry CLI tool

    • On MacOS: brew install cloudfoundry/tap/cf-cli@8
    • On other platforms: Download and install cf. Be sure to get version 8.x
  2. Login to cloud.gov account

    cf login -a api.fr.cloud.gov --sso
    # follow temporary authorization code prompts
  3. Follow prompts to target the desired space

Second, choose an interaction method

Open an SSH session to a running application instance

Open a session to a running application instance cf ssh tta-smarthub-dev-blue

Run /tmp/lifecycle/launcher /home/vcap/app sh '{}' or add it to the SSH command to open a shell with .profile settings active cf ssh tta-smarthub-dev-blue -t -c "/tmp/lifecycle/launcher /home/vcap/app sh '{}'"

Run psql commands directly

  1. If you haven't used the the cloud foundry plugin cf-service-connect before, install it now

    # Mac OSX ARM
    cf install-plugin https://github.com/cloud-gov/cf-service-connect/releases/download/v1.1.4/cf-service-connect_darwin_arm64 
    # Mac OSX non-ARM
    cf install-plugin https://github.com/cloud-gov/cf-service-connect/releases/download/v1.1.4/cf-service-connect_darwin_amd64
    # Windows
    cf install-plugin https://github.com/cloud-gov/cf-service-connect/releases/download/v1.1.4/cf-service-connect_windows_386
    # Linux
    cf install-plugin https://github.com/cloud-gov/cf-service-connect/releases/download/v1.1.4/cf-service-connect_linux_amd64
  2. Connect to your desired database

    # list services (ie postgres, redis, etc)
    cf services
    cf connect-to-service <app_name> <service_instance_name>
    # Example for sandbox pg
    cf connect-to-service tta-smarthub-sandbox ttahub-sandbox
    # Example for sandbox redis
    cf connect-to-service tta-smarthub-sandbox ttahub-redis-sandbox
    # ctrl-d to disconnect

    On success, your terminal prompt will change to match the db_name from the database instance credentials. This indicates you are in an open psql session, the command-line interface to PostgreSQL. You will need to have the pg/redis client installed locally and findable in your $PATH. Production instances are generally inaccessible for direct connection, although this can be disabled when necessary.

    Note: This plugin will not work for connecting to a replica database.
    Instead, use the script ./bin/replica-connect.sh from this repo.

    ./bin/replica-connect.sh tta-smarthub-dev-blue
     Establishing SSH tunnel with PID: 38115
     Connecting to db replica for tta-smarthub-dev-blue on port 5432...
     cgawsbrokerprodbt584djy6n6cnuz=> 
    

Run script as task

  1. Use cf run-task command

    cf run-task <app_name> --command "<yarn command>"
    # Example 1: running data validation script against sandbox
    cf run-task tta-smarthub-sandbox --command "yarn db:validation"
    # Example 2: undo most recent database migration
    cf run-task tta-smarthub-sandbox --command "yarn db:migrate:undo:prod:last"
  2. Check log output, including those from task

    cf logs <app_name> --recent
    # Example 1: checking sandbox logs
    cf logs tta-smarthub-sandbox --recent
    # Example 2: checking sandbox logs, grep just for task logs
    cf logs tta-smarthub-sandbox --recent | grep APP/TASK/

Run script in an interactive shell

  1. If on prod, enable shh in space first

    cf allow-space-ssh ttahub-prod
  2. Ssh into your desired application (to see application names run cf apps)

    cf ssh <app_name>
    # ssh example for sandbox application
    cf ssh tta-smarthub-sandbox
  3. Open shell

    /tmp/lifecycle/shell
  4. Run your desired command

    # example
    node ./build/server/src/tools/dataValidationCLI.js
  5. If on prod, disable ssh in space

    cf disallow-space-ssh ttahub-prod

Manual Monitoring Import

These are the steps to manually import monitoring and generate any Goals associated with new monitoring review findings.

Prerequisites:

  • Cloud Foundry CLI (cf CLI) installed.
  • SSH access to the tta-smarthub-prod application in cloud.gov.
  • Be logged into the production environment using the following command before running the script:
cf login -a api.fr.cloud.gov --sso

and choose the production option (option 2 as of the writing of these instructions)

Basic command sequence:

  1. cf ssh tta-smarthub-prod to connect to the app
  2. /tmp/lifecycle/shell to get a shell with the correct environment
  3. node ./build/server/src/tools/importSystemCLI.js download 1 to fetch the next zip file from ITAMS
  4. node ./build/server/src/tools/importSystemCLI.js process 1 to process the contents of the zip file into the database
  5. yarn cli:create-monitoring-goals to create any monitoring Goals indicated by the new data

The download and process steps will need to be run once each per day that needs catchup, such as on a Monday after the weekend. Thus, the steps would go: 1,2,3,4,3,4,3,4,5.

Adding validation via the prod database:

These are easiest to do in a separate terminal window alongside the one you're using to run the basic command sequence. Otherwise you would have to exit out and rerun steps 1 and 2 above in order to resume.

  1. cf connect-to-service tta-smarthub-prod ttahub-prod to connect to the production database console
  2. SELECT "ftpFileInfo"->>'name' filename ,status, "createdAt" FROM "ImportFiles" ORDER BY 3 DESC LIMIT 4; to inspect the progress of steps 3 and 4 above. This will show the last four files downloaded. If only step 3 has been run, the process is finished if status has reached COLLECTED. If step 4 has been run, the process is finished if it reaches PROCESSED.
  3. SELECT LEFT(r.name,35) recipient, "regionId" region, COUNT(*) cnt FROM "Goals" g JOIN "Grants" gr ON g."grantId" = gr.id JOIN "Recipients" r ON gr."recipientId" = r.id WHERE "createdVia" = 'monitoring' AND g."createdAt" > (NOW() - INTERVAL '1 hour') GROUP BY 1,2 ORDER BY 2,1; creates a small report of monitoring goals created in the last hour. It is useful after step 5. This may be important information OHS will want to know if the import is being run manually for some reason.

Taking a production backup via CircleCI

We can quickly take a production backup via the CircleCI web interface. To do so, go to the production branch there and trigger a pipeline with the variable manual-trigger set to true. You can then retrieve this backup with the script bin/latest_backup.sh.

Data in non-production environments

In order to keep the non-production environments as close to production as possible we developed a way to transform a restored version of the production database locally if using local database. This process is run automatically every night, so all non-prod envs will have a fresh, anonymized database each morning.

Running a db refresh manually

While process does run every night automatically, it can also be run on-demand via CircleCI or by executing the script directly.

The script can be run using the following:

yarn cli:process-data

The transformed database can then be restored in the non-production environments. For details on how to perform a backup and restore, there is information on the cloud.gov site:

https://cloud.gov/docs/management/database-backup-restore/

Using Maintenance Mode

if you need to put the application into maintenance mode, you can run the maintenance script located at bin/maintenance.

This script require that you have Cloud Foundry's CLI v7 installed to run.

The script takes two flags

  • -m | --maintenance-mode controls whether the script takes the app into maintenance mode or out of it.
    • Options are "on" or "off
  • -e | --environment controls which environment you are targeting.
    • Options are "sandbox", "dev", "staging", and "prod"

Ex.

# Puts the dev environment into maintenance mode
./bin/maintenance -e dev -m on

If you are not logged into the cf cli, it will ask you for an sso temporary password. You can get a temporary password at https://login.fr.cloud.gov/passcode. The application will stay in maintenance mode even through deploys of the application. You need to explicitly run ./bin/maintenance -e ${env} -m off to turn off maintenance mode.

Creating a new environment

cf login -a api.fr.cloud.gov --sso
cf create-service aws-rds [size] [name] // ex micro-psql, ttahub-dev-green
cf create-service aws-elasticache-redis [size] [name] // ex redis-dev ttahub-redis-dev-green
cf create-service s3 [size] [name] // ex basic ttahub-document-upload-dev-green

Update the contentSecurityPolicy in app.js with the new full hostname. Contact cloud.gov to allow the new routing

If you need to bind an identity provider:

# a provider can be reused within the same space
 cf create-service cloud-gov-identity-provider oauth-client oauth-provider-dev

 # create a service key for each env
 cf create-service-key \
     oauth-provider-dev \
     oauth-key-dev-green \
     -c '{
         "redirect_uri": [
             "https://tta-smarthub-dev-green.app.cloud.gov/authenticated",
             "https://tta-smarthub-dev-green.app.cloud.gov/logout"
         ]
     }'

 # retrieve created id & secret
 cf service-key oauth-provider-dev oauth-key-dev-green

Shared Services

In order to access a service from multiple 'spaces', run the following command: cf share-service SERVICE-INSTANCE -s OTHER-SPACE

Currently, database restore is done by sharing lower-env db access into the prod environment, where the s3 db backups are located. An automated script will run in that environment and run updates and migrations on the lower-env databases.

Creating and Applying a Deploy Key

In order for CircleCi to correctly pull the latest code from Github, we need to create and apply a SSH token to both Github and CircleCi. This has already been done for existing environments but documented here for future reference

The following links outline the steps to take: https://circleci.com/docs/github-integration/#create-a-github-deploy-key https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent

Steps to create and apply deploy token:

  1. Open the Git Bash CMD window
  2. Enter the following command with your github (admin) e-mail: ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
  3. When prompted to enter a file name leave blank and press ENTER
  4. When prompted to enter a PASSPHRASE leave blank and press ENTER (twice)
  5. Search for the file created with the name "id_rsa"
  6. Notice that two files have been created private and public in the .ssh folder
  7. Open the public file and copy the entire contents of the file
  8. In Github to the TTAHUB project and click 'Settings' in the top right corner
  9. Under 'Security' click 'Deploy Keys' then 'Add deploy Key'
  10. Give the key a name 'TTAHUB' and paste the private key contents, CHECK 'Allow write access' then click 'Add Key'
  11. Open the private key file that was created and copy the entire contents of the file
  12. Go to CircleCi and open the 'Head-Start-TTADP' project
  13. Click 'Project settings' in the top right corner
  14. Click 'SSH keys' and scroll down to the section 'Additional SSH Keys'
  15. Click 'Add SSH Key', in 'Hostname' enter github.com then paste the contents of the private file in 'Private Key' section
  16. Click 'Add SSH Key'

Removing, creating and binding a service from the command line

In the past, we've needed to destroy and recreate particular services (for example, redis). This can be done through the Cloud.gov UI, through the Terraform architecture, and through the cloud foundry command line interface. The following are instructions for using the cloud foundry CLI (cf) for this.

  • Login and target the environment you wish to make changes to. (cf login --sso).
  • You can use cf services to list your services
  • Remember that you can use cf help COMMAND to get the documentation for a particular command

To delete and recreate a service (this should not be done lightly, as it is a destructive action)

1 Unbind a service: cf us APP_NAME SERVICE ex: cf us tta-smarthub-staging ttahub-redis-staging

2 Delete a service: cf ds SERVICE ex: cf ds ttahub-redis-staging

3 Create a service: cf cs SERVICE_TYPE SERVICE_LABEL SERVICE ex: cf cs aws-elasticache-redis redis-dev ttahub-redis-staging

4 Bind a service: cf bs APP_NAME SERVICE ex: cf bs ttahub-smarthub-staging ttahub-redis-staging

  1. Trigger a redeploy through the Circle CI UI (rather than restaging)

  2. Finally, you may need to reconfigure the network policies to allow the app to connect to the virus scanning api. Check your network policies with: cf network-policies If you see nothing there, you'll need to add an appropriate policy. cf add-network-policy tta-smarthub-APP_NAME clamav-api-ttahub-APP_NAME --protocol tcp --port 9443 ex: cf add-network-policy tta-smarthub-dev clamav-api-ttahub-dev --protocol tcp --port 9443 You may need to connect across spaces (for example, our clamav-api-ttahub-dev app is shared by all of our ephemeral environments). If so, use the -s flag. ex: cf add-network-policy tta-smarthub-staging -s ttahub-dev clamav-api-ttahub-dev --protocol tcp --port 9443

Terraform

The persistent AWS infrastructure is deployed as Cloud.gov assets via Terraform. For a description of persistent infrastructure and how it differs from ephemeral infrastructure see the root README.md.

These docs are verbose because this is technology with which developers will rarely interact. I suggest you settle in for a nice long read with your favorite drink of choice.

Set up

Install Terraform

  • On MacOS: brew install terraform
  • On other platforms: [Download and install terraform][tf]

Use githook for formatting

Terraform has a specific whitespace formatting style that is difficult to maintain. Terraform includes a formatting command, terraform fmt, to help developers maintain the correct style. This repository contains a pre-commit hook that runs terraform fmt on all staged files so you don't have to remember to run this command.

If you are not using your own custom pre-commit hooks:

# start from repo root directory

# make the pre-commit file executable
chmod 755 .githooks/pre-commit

# change your default hooks directory to `.githooks`.
git config core.hooksPath .githooks

If you are already using git hooks, add the .githooks/pre-commit contents to your hooks directory or current pre-commit hook. Remember to make the file executable.

Install Version 7 of the Cloud Foundry CLI tool

  • On MacOS: brew install cloudfoundry/tap/cf-cli@7
  • On other platforms: Download and install cf. Be sure to get version 7.x

Add target environment credentials

We are using Terraform to create Cloud Foundry resources in a Cloud.gov account. Creating infrastructure on the Cloud.gov platform requires a Cloud.gov service account username and password. These keys are specific to each cloud.gov organization/space, i.e. each deployment environment, and can be generated by developers who have proper Cloud.gov permissions at any time.

Follow the steps below to generate a new set of credentials. If you need more information, check out the service account docs.

# login
cf login -a api.fr.cloud.gov --sso
# follow temporary authorization code prompts
# select org "hhs-acf-ohs-tta", and the space (env) within which you want to build infrastructure
# dev = ttahub-dev
# staging = ttahub-staging
# prod = ttahub-prod

# create a service instance that can provision service accounts
# the value for < YOUR-NAME > can be any version of your name, it isn't significant
cf create-service cloud-gov-service-account space-deployer < YOUR-NAME >

# bind a service key to the service instance
cf create-service-key < YOUR-NAME > space-deployer-key

# return a username/password pair for the service instance
cf service-key < YOUR-NAME > space-deployer-key

Add the username and password output from the last command to a secrets.auto.tfvars file in each environment directory. Terraform automatically loads this variable definition file. You can also provide variable values via environment variables. For more on this, check out [terraform variable definitions][tf-vars].

For example, your terraform/dev/secrets.auto.tfvars file should look something like this:

cf_user = "some-dev-user"
cf_password = "some-dev-password"

Additionally, for environments other than dev and production, we need to grant the space deployer credentials access to the dev space to enable setting up network policies to the dev ClamAV server.

# Grant some-dev-user from previous step SpaceDeveloper access to ttahub-dev
cf set-space-role <some-dev-user from previous step> hhs-acf-ohs-tta ttahub-dev SpaceDeveloper

Create S3 bucket credentials

We are using an S3 bucket created by Cloud Foundry in Cloud.gov as our remote backend for Terraform. The backend maintains the "state" of Terraform and makes it possible for multiple developers to implement changes in a linear fashion.

Follow these directions to create a new service account and generate credentials. If you need more information check out the [services docs][cloudgov-service-keys].

# login
cf login -a api.fr.cloud.gov --sso
# follow temporary authorization code prompts
# select org "hhs-acf-ohs-tta", space "infrastructure-config"

# create a service instance
# the value for < YOUR-NAME > can be any version of your name
# it can be the same or different from the name you used for environment credentials in the previous step
cf create-service-key ohs-ttahub-iac-state < YOUR-NAME >

# return a username/password pair for the service instance
cf service-key ohs-ttahub-iac-state < YOUR-NAME >

These credentials are for an S3 bucket that is used for holding the state of the dev, staging, and prod terraform environments; you only need to create one set of these ohs-ttahub-iac-state S3 bucket credentials. If you're already using AWS CLI you may simply add your newly generated access_key_id, secret_access_key values to ~/.aws/credentials, and your region value to ~/.aws/config. If this is your first time using AWS CLI [install the tool][aws-install] and then follow the [Quick Configuration with aws configure guide][aws-config].

Your ~/.aws/credentials file should look something like this:

[default]
aws_access_key_id = foo
aws_secret_access_key = bar

Your ~/.aws/config file should look something like this:

[default]
region = us-gov-west-1

Workflow

Tip: You run terraform files from the directory in which they are stored. For example, to instantiate a database described in terraform/dev/main.tf you would run all the commands below from terraform/dev.

Initialize your working directory

The first time you are working in a new environment, for example, after cloning this repository, you will need to initialize a working directory containing Terraform configuration files using the init command. It is safe to run this command multiple times.

terraform init

Check that your state is clean

Terraform configuration files committed to the main branch should describe the infrastructure that is currently in use. This is described as a "clean state". Use the plan command to display a list of infrastructure that Terraform will need to update, delete or create to match what is in your local Terraform configuration files.

  • If state is clean, plan will not display any changes.
  • If state is not clean and you are working on a feature branch try merging in the main branch; your local terraform files might be behind the current/applied state. If plan still shows an unclean state, reach out to your fellow developers for clarification and advice on how to proceed. It is very likely that changes were accidentally applied and a developer is currently working on a fix.
terraform plan

Make changes and get feedback

Make any needed changes to your local Terraform configuration files. Open a PR for those changes. In your PR include the output from terraform plan so reviewers can see what resources will be updated, created and destroyed.

Merge and apply changes immediately

Tip: Before you merge your PR ensure you have enough time to sit and work through any unexpected problems that could arise. Database changes can take upwards of ten minutes to apply.

Merge your PR into main and then immediately apply your changes. Always merge and apply in one sitting.

terraform apply

Bind the infrastructure to the application

Cloud Foundry/cloud.gov requires managed services to be bound to the application instance. In this repo, those bindings are declared in the services: block of manifest.yml, which is the source of truth for the current set of services bound during deployment. The ((env)) value comes from the environment-specific files under deployment_config/. If you add another managed service, add it to the manifest services: list so it is bound during deployment. See the cloud.gov documentation for more direction on this.