TwitterRecentTweetFetcher

This is a sample project to show how you could interact with the Twitter API using Apache Beam. This pipeline will download Tweets present in the last 7 days for a given Twitter handle and store the results into a BigQuery table.

For more information, please visit my blog post on Pythian's Official Blog.

Getting Started

Prerequisites

To build and use this project, you must:

Create a Google Cloud Platform project
The latest version of the gcloud CLI tools
Setup a Twitter API app in the Twitter Developer Portal
Install JDK 8 and Maven 3.8.1

Running Locally

You can run this Apache Beam pipeline locally using the Direct Runner:

export api_key=<YOUR_KEY>
export api_secret=<YOUR_SECRET>
export access_token=<YOUR_TOKEN>
export access_token_secret=<YOUR_TOKEN_SECRET>
export twitter_handle=<YOUR_HANDLE>
export temp_bq_location=<YOUR_TEMP_GCS_LOCATION>
export bq_sink_table=<YOUR_BQ_TABLE_NAME>

mvn compile exec:java \
  -Dexec.mainClass=ca.evanseabrook.twitter.TwitterRecentTweetFetcher \
  -Dexec.args="--runner=direct --apiKey=${api_key} \
               --apiSecret=${api_secret} \
               --accessToken=${access_token} \ 
               --accessTokenSecret=${access_token_secret} \
               --twitterHandle=${twitter_handle} \
               --temporaryBQLocation=${temp_bq_location} \
               --sinkBQTable=${bq_sink_table}"

Running on GCP Dataflow

You can also choose to publish a custom Dataflow template, which can then be used to create a Dataflow job on GCP.

export template_location=<YOUR_TEMPLATE_GCS_LOCATION>
export project_id=<YOUR_GCP_PROJECT_ID>

mvn compile exec:java \
    -Dexec.mainClass=ca.evanseabrook.twitter.TwitterRecentTweetFetcher \
    -Dexec.args="--runner=DataflowRunner \
                --project=${project_id} \
                --templateLocation=${template_location} \
                --region=us-central1"

A TWITTER_FETCHER_metadata file has also been provided, which is used to tell Dataflow with runtime parameters the pipeline accepts. This should be put in the same location as your Dataflow template file.

If you've named your template something other than "TWITTER_FETCHER", you will need to rename this metadata file to match whatever you've named it, i.e. HELLO_WORLD_metadata.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/main		src/main
.gitignore		.gitignore
README.md		README.md
TWITTER_FETCHER_metadata		TWITTER_FETCHER_metadata
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TwitterRecentTweetFetcher

Getting Started

Prerequisites

Running Locally

Running on GCP Dataflow

About

Uh oh!

Releases

Packages

Uh oh!

Languages

evanseabrook/TwitterFetcher

Folders and files

Latest commit

History

Repository files navigation

TwitterRecentTweetFetcher

Getting Started

Prerequisites

Running Locally

Running on GCP Dataflow

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages