|
| 1 | += Project Axon: Bank Branch Performance Analytics |
| 2 | +:author: Yash Gulati |
| 3 | +:revdate: 2025-06-20 |
| 4 | +:toc: |
| 5 | +:toclevels: 2 |
| 6 | + |
| 7 | +== Introduction |
| 8 | + |
| 9 | +*Project Axon* is a comprehensive end-to-end demonstration of Cloudera’s capabilities across the full data lifecycle — from data ingestion to dashboarding. |
| 10 | + |
| 11 | +The goal of this project is to help partners: |
| 12 | +- Understand how to practically use Cloudera Private Cloud for real-time and batch analytics. |
| 13 | +- Identify a relevant and easy-to-explain use case. |
| 14 | +- Showcase a ready-to-deploy demo to customers after initial discovery conversations. |
| 15 | + |
| 16 | +**Use Case Chosen:** *Bank Branch Performance Analytics* |
| 17 | + |
| 18 | +This use case helps simulate and analyze the operational performance of various bank branches using dummy data, allowing visual insights via dashboards. |
| 19 | + |
| 20 | +== Prerequisites |
| 21 | + |
| 22 | +=== 1. Linux Server for running Dummy Data Generator App |
| 23 | + |
| 24 | +Ensure you have access to any running **Linux server** for hosting the dummy data generator application. |
| 25 | +A minimal cloud instance like **t3.small** (2 vCPUs, 2 GB RAM) is sufficient for running the application. |
| 26 | + |
| 27 | +==== Required Open Ports |
| 28 | +Make sure the following ports are open on the server's firewall or cloud security group: |
| 29 | + |
| 30 | +- 8000 |
| 31 | +- 8085 |
| 32 | +- 5001 |
| 33 | +- 5003 |
| 34 | +- 5400 |
| 35 | +- 5500 |
| 36 | + |
| 37 | +=== 2. Cloudera Platform Requirements |
| 38 | + |
| 39 | +Ensure you have a running **Cloudera Public Cloud Environment** with the following components: |
| 40 | + |
| 41 | +- Data Lake |
| 42 | +- Cloudera Data Flow |
| 43 | +- Cloudera Data Warehouse |
| 44 | +- Cloudera Data Visualization |
| 45 | + |
| 46 | +This project was developed and tested on the following component versions: |
| 47 | + |
| 48 | +- **Datalake Version**: 7.2.18 |
| 49 | +- **Cloudera Data Flow**: 2.10.0-h3-b3 |
| 50 | +- **Cloudera Data Warehouse**: 1.10.3-b8 |
| 51 | +- **Cloudera Data Visualization**: 7.2.9-b41 |
| 52 | + |
| 53 | +== Technology Stack |
| 54 | + |
| 55 | +- **Data Generator**: Python (Flask + Faker) |
| 56 | +- **Data Ingestion**: Cloudera Data Flow |
| 57 | +- **Storage**: S3 (Parquet format) |
| 58 | +- **Data Query Layer**: Cloudera Data Warehouse via Hue |
| 59 | +- **Visualization**: Cloudera Data Visualization |
| 60 | + |
| 61 | +== Project Workflow |
| 62 | + |
| 63 | +image::../images/project_flow_cloud.png[project_flow] |
| 64 | + |
| 65 | +== Steps to Run |
| 66 | + |
| 67 | +=== 1. Clone the Dummy Data Generator Repository |
| 68 | + |
| 69 | +Clone the repository containing the dummy data generators and run the script to start all services: |
| 70 | + |
| 71 | +[source,shell] |
| 72 | +---- |
| 73 | +git clone https://github.com/cloudera/cloudera-partners.git |
| 74 | +cd cloudera-partners |
| 75 | +git checkout project-axon |
| 76 | +cd Project-Axon/On-cloud |
| 77 | +---- |
| 78 | + |
| 79 | +=== 2. Set Up Python Virtual Environment and Install Dependencies |
| 80 | + |
| 81 | +[source,shell] |
| 82 | +---- |
| 83 | +sudo yum install -y python3 git |
| 84 | +python3 -m ensurepip --upgrade |
| 85 | +python3 -m venv venv |
| 86 | +source venv/bin/activate |
| 87 | +pip3 install -r ../assets/requirements.txt |
| 88 | +
|
| 89 | +# Verify Flask version |
| 90 | +python3 -m flask --version |
| 91 | +---- |
| 92 | + |
| 93 | +=== 3. Run the application |
| 94 | + |
| 95 | +[source,shell] |
| 96 | +---- |
| 97 | +bash ../assets/run_all.sh |
| 98 | +---- |
| 99 | + |
| 100 | +- After running the script, verify that the dummy data endpoints are active using a `curl` command. |
| 101 | +- Replace `<your-server-ip>` with the public IP of the node where you ran the script. |
| 102 | + |
| 103 | +Example: |
| 104 | +[source,shell] |
| 105 | +---- |
| 106 | +curl http://<your-server-ip>:5400/footfall/summary |
| 107 | +curl http://<your-server-ip>:8000/campaign-details |
| 108 | +---- |
| 109 | + |
| 110 | +Sample JSON response from the campaign API: |
| 111 | +[source,json] |
| 112 | +---- |
| 113 | +{ |
| 114 | + "Budget": 351527.55, |
| 115 | + "CampaignID": 17, |
| 116 | + "CampaignName": "Mclean-Tran Loan Offer", |
| 117 | + "Channel": "Bank Website", |
| 118 | + "EndDate": "2025-07-21", |
| 119 | + "SeasonID": 3, |
| 120 | + "StartDate": "2025-07-14", |
| 121 | + "Status": "Active" |
| 122 | +} |
| 123 | +---- |
| 124 | + |
| 125 | +You should see a JSON response similar to the above. |
| 126 | + |
| 127 | +=== 4. Generate the CDP Workload Password for Your Profile |
| 128 | + |
| 129 | +- Login to the Cloudera Public Cloud Console using your credentials. |
| 130 | +- click your login name at the lower-left corner → *Profile*. |
| 131 | ++ |
| 132 | +image::../images/profile_name.png[profile name] |
| 133 | ++ |
| 134 | +- Click *Set Workload Password*. |
| 135 | +- Enter `Changeme123!` (note capital C) or your desired password in both fields and click *Set Workload Password*. |
| 136 | ++ |
| 137 | +image::../images/set_workload.png[set_workload] |
| 138 | ++ |
| 139 | +- A confirmation message will appear once your password is set successfully — **remember this password, as it will be used in later steps**. |
| 140 | + |
| 141 | +=== 5. Import the NiFi Flow into the Cloudera Flow Management Catalog |
| 142 | + |
| 143 | +. Navigate to the **Cloudera Flow Management** service and open the **Catalog**. |
| 144 | ++ |
| 145 | +image::../images/cloudera_data_flow.png[cloudera data flow, width=300, height=300] |
| 146 | ++ |
| 147 | +. Click on *Import Flow Definition*. |
| 148 | ++ |
| 149 | +image::../images/import_catalog.png[import catalog] |
| 150 | ++ |
| 151 | +. Enter a descriptive name for your flow (for example, `Project-Axon`) and choose the desired collection. |
| 152 | +. Upload the `Project-Axon` flow file as the *NiFi Flow Configuration File*, then click *Import*. |
| 153 | ++ |
| 154 | +image::../images/import_wizard.png[import wizard, width=400, height=500] |
| 155 | ++ |
| 156 | +. Once the flow appears in the Catalog, click to open it, then select *Deploy* to create a NiFi flow deployment. |
| 157 | ++ |
| 158 | +image::../images/deploy_flow.png[deploy flow, width=500, height=800] |
| 159 | + |
| 160 | +==== Deployment Steps |
| 161 | + |
| 162 | +. In the deployment wizard: |
| 163 | +.. Select the target workspace (your Cloudera Public Cloud environment) and click *Continue*. |
| 164 | ++ |
| 165 | +image::../images/deploy_target_env.png[deploy target env, width=450, height=600] |
| 166 | ++ |
| 167 | +.. Provide a name for your deployment, choose the target project, and click *Next*. |
| 168 | +.. Under *NiFi Configuration*, keep the default settings and click *Next*. |
| 169 | ++ |
| 170 | +image::../images/nifi_configuration.png[NiFi Configuration, width=700, height=900] |
| 171 | ++ |
| 172 | +.. In the *Parameters* section: |
| 173 | + * Enter your **CDP Workload Username** and **CDP Workload User Password** for your tenant. |
| 174 | + * In the `http url` parameter, update only the IP address portion with the *Public IP address* of the server running your dummy data generator app. |
| 175 | + * Click *Next*. |
| 176 | ++ |
| 177 | +image::../images/update_parameters.png[Update Parameters, width=600, height=900] |
| 178 | ++ |
| 179 | +.. Under *Sizing and Scaling*, keep the default settings and click *Next*. |
| 180 | +.. Leave *Key Performance Indicators (KPIs)* empty unless you wish to define them. |
| 181 | +.. Review the configuration and click *Deploy*. |
| 182 | ++ |
| 183 | +image::../images/review_wizard.png[review wizard, width=600, height=700] |
| 184 | ++ |
| 185 | +. To open and view the deployed flow, go to *Actions* and select *View in NiFi*. |
| 186 | ++ |
| 187 | +image::../images/view_in_nifi.png[view in nifi, width=500, height=900] |
| 188 | ++ |
| 189 | +. After starting the flow, run it for no more than **5 minutes** to generate about **50–80 flow files**, then right-click the process group and select *Stop* to prevent it from running indefinitely. |
| 190 | ++ |
| 191 | +image::../images/stop_flow.png[stop flow] |
| 192 | + |
| 193 | +=== 6. Create Hive Tables via Hue |
| 194 | + |
| 195 | +Go to **Cloudera Data Warehouse** and under Virtual warehouses, click on `Hue` for hive virtual warehouse for your environment. |
| 196 | + |
| 197 | +To create all the required databases and tables at once, simply: |
| 198 | + |
| 199 | +- Open the https://github.com/cloudera/cloudera-partners/blob/project-axon/Project-Axon/create_queries.txt[create_queries.txt] file from the cloned folder. |
| 200 | +- Copy the entire content. |
| 201 | +- Paste it into the Hue Query Editor. |
| 202 | +- Select all and click the **Run** button. |
| 203 | ++ |
| 204 | +image::../images/hive_queries.png[hive_queries, width=800, height=500] |
| 205 | + |
| 206 | +This will create all the necessary Hive tables and databases for the project in one go. |
| 207 | + |
| 208 | +==== 6.1. Verify Table Creation & Data Load |
| 209 | + |
| 210 | +To verify that all tables were successfully created and contain data: |
| 211 | + |
| 212 | +- Copy the content of the file verify_tables.txt — this includes a Hive query to count rows across all expected tables. |
| 213 | +- Paste it into the *Hue Query Editor*. |
| 214 | +- Click *Run*. |
| 215 | + |
| 216 | +You should see a list of table names with their row counts. |
| 217 | + |
| 218 | +image::../images/table_verify.png[tables verify] |
| 219 | + |
| 220 | +If any table shows a count of `0`, you may need to revisit the data ingestion step for that table. |
| 221 | + |
| 222 | +=== 7. Connect Data Visualization to Impala |
| 223 | + |
| 224 | +To enable Data Visualization to read data from Impala, you need to create a connection in the Data Visualization UI. |
| 225 | + |
| 226 | +While Hive is supported, it is *recommended to use Impala* for creating the connection, as Impala is a high-performance, distributed SQL engine optimized for fast, interactive analytics on large-scale datasets. |
| 227 | + |
| 228 | +- Go to *Cloudera Data Warehouse* and click on Data Visualization and click on your environment name. |
| 229 | ++ |
| 230 | +image::../images/cloudera_data_warehouse.png[cloudera data warehouse, width=300, height=300] |
| 231 | +- After getting inside, click on `Open Data Visualization` navigate to the *Data* tab. |
| 232 | ++ |
| 233 | +image::../images/cdw_dataviz.png[tables verify] |
| 234 | ++ |
| 235 | +- Click *+ New Connection* → *CDW Impala*. |
| 236 | ++ |
| 237 | +image::../images/connection.png[make connection, width=500, height=300] |
| 238 | ++ |
| 239 | +[width="90%",cols="40%,50%",options="header"] |
| 240 | +|=== |
| 241 | +|**Parameter** |**Value** |
| 242 | +|*Connection Name* |Impala-Axon (or any name you prefer) |
| 243 | +|*Connection type* |CDW Impala |
| 244 | +|*CDW Warehouse* |Select the name of your Impala Virtual Warehouse |
| 245 | +|*Hostname* |It will be auto populated when you select CDW Warehouse |
| 246 | +|*Port* |28000 (for Impala) |
| 247 | +|*Credentials* |Leave it Empty |
| 248 | +|=== |
| 249 | ++ |
| 250 | +- Click *Test Connection* to verify. |
| 251 | ++ |
| 252 | +image::../images/connection_cdw_impala.png[verify connection, width=450, height=500] |
| 253 | ++ |
| 254 | +- Once successful, click *Save*. |
| 255 | +- You can now use this connection to create/import datasets and build/import dashboards from Impala tables. |
| 256 | + |
| 257 | +=== 8. Import Dashboard into Cloudera Data Visualization |
| 258 | + |
| 259 | +- Go to *Cloudera Data Visualization*. |
| 260 | + |
| 261 | +- Navigate to the *Data* tab, then click on *Import visual artifacts*. |
| 262 | ++ |
| 263 | +image::../images/import_visual.png[Import Visual] |
| 264 | ++ |
| 265 | +- Upload the dashboard JSON file: https://github.com/cloudera/cloudera-partners/blob/project-axon/Project-Axon/project_axon_dashboard.json[project_axon_dashboard.json]. |
| 266 | +- After uploading, click on *Accept and Import*, you will see an *Import Successful* message along with the list of datasets that were imported as part of the dashboard. |
| 267 | ++ |
| 268 | +image::../images/dashboard_import_verify.png[import success, width=800, height=450] |
| 269 | ++ |
| 270 | +- Once imported, navigate to the *Visuals* tab and click on the dashboard to open and view it. |
| 271 | ++ |
| 272 | +image::../images/dashboard.png[dashboard] |
| 273 | + |
0 commit comments