Skip to content

Commit 9f4ae1a

Browse files
Added Project-Axon
1 parent 6bd7e99 commit 9f4ae1a

File tree

201 files changed

+777605
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

201 files changed

+777605
-0
lines changed

Project-Axon/On-cloud/Project-Axon-cloud.json

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

Project-Axon/On-cloud/README.adoc

Lines changed: 273 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,273 @@
1+
= Project Axon: Bank Branch Performance Analytics
2+
:author: Yash Gulati
3+
:revdate: 2025-06-20
4+
:toc:
5+
:toclevels: 2
6+
7+
== Introduction
8+
9+
*Project Axon* is a comprehensive end-to-end demonstration of Cloudera’s capabilities across the full data lifecycle — from data ingestion to dashboarding.
10+
11+
The goal of this project is to help partners:
12+
- Understand how to practically use Cloudera Private Cloud for real-time and batch analytics.
13+
- Identify a relevant and easy-to-explain use case.
14+
- Showcase a ready-to-deploy demo to customers after initial discovery conversations.
15+
16+
**Use Case Chosen:** *Bank Branch Performance Analytics*
17+
18+
This use case helps simulate and analyze the operational performance of various bank branches using dummy data, allowing visual insights via dashboards.
19+
20+
== Prerequisites
21+
22+
=== 1. Linux Server for running Dummy Data Generator App
23+
24+
Ensure you have access to any running **Linux server** for hosting the dummy data generator application.
25+
A minimal cloud instance like **t3.small** (2 vCPUs, 2 GB RAM) is sufficient for running the application.
26+
27+
==== Required Open Ports
28+
Make sure the following ports are open on the server's firewall or cloud security group:
29+
30+
- 8000
31+
- 8085
32+
- 5001
33+
- 5003
34+
- 5400
35+
- 5500
36+
37+
=== 2. Cloudera Platform Requirements
38+
39+
Ensure you have a running **Cloudera Public Cloud Environment** with the following components:
40+
41+
- Data Lake
42+
- Cloudera Data Flow
43+
- Cloudera Data Warehouse
44+
- Cloudera Data Visualization
45+
46+
This project was developed and tested on the following component versions:
47+
48+
- **Datalake Version**: 7.2.18
49+
- **Cloudera Data Flow**: 2.10.0-h3-b3
50+
- **Cloudera Data Warehouse**: 1.10.3-b8
51+
- **Cloudera Data Visualization**: 7.2.9-b41
52+
53+
== Technology Stack
54+
55+
- **Data Generator**: Python (Flask + Faker)
56+
- **Data Ingestion**: Cloudera Data Flow
57+
- **Storage**: S3 (Parquet format)
58+
- **Data Query Layer**: Cloudera Data Warehouse via Hue
59+
- **Visualization**: Cloudera Data Visualization
60+
61+
== Project Workflow
62+
63+
image::../images/project_flow_cloud.png[project_flow]
64+
65+
== Steps to Run
66+
67+
=== 1. Clone the Dummy Data Generator Repository
68+
69+
Clone the repository containing the dummy data generators and run the script to start all services:
70+
71+
[source,shell]
72+
----
73+
git clone https://github.com/cloudera/cloudera-partners.git
74+
cd cloudera-partners
75+
git checkout project-axon
76+
cd Project-Axon/On-cloud
77+
----
78+
79+
=== 2. Set Up Python Virtual Environment and Install Dependencies
80+
81+
[source,shell]
82+
----
83+
sudo yum install -y python3 git
84+
python3 -m ensurepip --upgrade
85+
python3 -m venv venv
86+
source venv/bin/activate
87+
pip3 install -r ../assets/requirements.txt
88+
89+
# Verify Flask version
90+
python3 -m flask --version
91+
----
92+
93+
=== 3. Run the application
94+
95+
[source,shell]
96+
----
97+
bash ../assets/run_all.sh
98+
----
99+
100+
- After running the script, verify that the dummy data endpoints are active using a `curl` command.
101+
- Replace `<your-server-ip>` with the public IP of the node where you ran the script.
102+
103+
Example:
104+
[source,shell]
105+
----
106+
curl http://<your-server-ip>:5400/footfall/summary
107+
curl http://<your-server-ip>:8000/campaign-details
108+
----
109+
110+
Sample JSON response from the campaign API:
111+
[source,json]
112+
----
113+
{
114+
"Budget": 351527.55,
115+
"CampaignID": 17,
116+
"CampaignName": "Mclean-Tran Loan Offer",
117+
"Channel": "Bank Website",
118+
"EndDate": "2025-07-21",
119+
"SeasonID": 3,
120+
"StartDate": "2025-07-14",
121+
"Status": "Active"
122+
}
123+
----
124+
125+
You should see a JSON response similar to the above.
126+
127+
=== 4. Generate the CDP Workload Password for Your Profile
128+
129+
- Login to the Cloudera Public Cloud Console using your credentials.
130+
- click your login name at the lower-left corner → *Profile*.
131+
+
132+
image::../images/profile_name.png[profile name]
133+
+
134+
- Click *Set Workload Password*.
135+
- Enter `Changeme123!` (note capital C) or your desired password in both fields and click *Set Workload Password*.
136+
+
137+
image::../images/set_workload.png[set_workload]
138+
+
139+
- A confirmation message will appear once your password is set successfully — **remember this password, as it will be used in later steps**.
140+
141+
=== 5. Import the NiFi Flow into the Cloudera Flow Management Catalog
142+
143+
. Navigate to the **Cloudera Flow Management** service and open the **Catalog**.
144+
+
145+
image::../images/cloudera_data_flow.png[cloudera data flow, width=300, height=300]
146+
+
147+
. Click on *Import Flow Definition*.
148+
+
149+
image::../images/import_catalog.png[import catalog]
150+
+
151+
. Enter a descriptive name for your flow (for example, `Project-Axon`) and choose the desired collection.
152+
. Upload the `Project-Axon` flow file as the *NiFi Flow Configuration File*, then click *Import*.
153+
+
154+
image::../images/import_wizard.png[import wizard, width=400, height=500]
155+
+
156+
. Once the flow appears in the Catalog, click to open it, then select *Deploy* to create a NiFi flow deployment.
157+
+
158+
image::../images/deploy_flow.png[deploy flow, width=500, height=800]
159+
160+
==== Deployment Steps
161+
162+
. In the deployment wizard:
163+
.. Select the target workspace (your Cloudera Public Cloud environment) and click *Continue*.
164+
+
165+
image::../images/deploy_target_env.png[deploy target env, width=450, height=600]
166+
+
167+
.. Provide a name for your deployment, choose the target project, and click *Next*.
168+
.. Under *NiFi Configuration*, keep the default settings and click *Next*.
169+
+
170+
image::../images/nifi_configuration.png[NiFi Configuration, width=700, height=900]
171+
+
172+
.. In the *Parameters* section:
173+
* Enter your **CDP Workload Username** and **CDP Workload User Password** for your tenant.
174+
* In the `http url` parameter, update only the IP address portion with the *Public IP address* of the server running your dummy data generator app.
175+
* Click *Next*.
176+
+
177+
image::../images/update_parameters.png[Update Parameters, width=600, height=900]
178+
+
179+
.. Under *Sizing and Scaling*, keep the default settings and click *Next*.
180+
.. Leave *Key Performance Indicators (KPIs)* empty unless you wish to define them.
181+
.. Review the configuration and click *Deploy*.
182+
+
183+
image::../images/review_wizard.png[review wizard, width=600, height=700]
184+
+
185+
. To open and view the deployed flow, go to *Actions* and select *View in NiFi*.
186+
+
187+
image::../images/view_in_nifi.png[view in nifi, width=500, height=900]
188+
+
189+
. After starting the flow, run it for no more than **5 minutes** to generate about **50–80 flow files**, then right-click the process group and select *Stop* to prevent it from running indefinitely.
190+
+
191+
image::../images/stop_flow.png[stop flow]
192+
193+
=== 6. Create Hive Tables via Hue
194+
195+
Go to **Cloudera Data Warehouse** and under Virtual warehouses, click on `Hue` for hive virtual warehouse for your environment.
196+
197+
To create all the required databases and tables at once, simply:
198+
199+
- Open the https://github.com/cloudera/cloudera-partners/blob/project-axon/Project-Axon/create_queries.txt[create_queries.txt] file from the cloned folder.
200+
- Copy the entire content.
201+
- Paste it into the Hue Query Editor.
202+
- Select all and click the **Run** button.
203+
+
204+
image::../images/hive_queries.png[hive_queries, width=800, height=500]
205+
206+
This will create all the necessary Hive tables and databases for the project in one go.
207+
208+
==== 6.1. Verify Table Creation & Data Load
209+
210+
To verify that all tables were successfully created and contain data:
211+
212+
- Copy the content of the file verify_tables.txt — this includes a Hive query to count rows across all expected tables.
213+
- Paste it into the *Hue Query Editor*.
214+
- Click *Run*.
215+
216+
You should see a list of table names with their row counts.
217+
218+
image::../images/table_verify.png[tables verify]
219+
220+
If any table shows a count of `0`, you may need to revisit the data ingestion step for that table.
221+
222+
=== 7. Connect Data Visualization to Impala
223+
224+
To enable Data Visualization to read data from Impala, you need to create a connection in the Data Visualization UI.
225+
226+
While Hive is supported, it is *recommended to use Impala* for creating the connection, as Impala is a high-performance, distributed SQL engine optimized for fast, interactive analytics on large-scale datasets.
227+
228+
- Go to *Cloudera Data Warehouse* and click on Data Visualization and click on your environment name.
229+
+
230+
image::../images/cloudera_data_warehouse.png[cloudera data warehouse, width=300, height=300]
231+
- After getting inside, click on `Open Data Visualization` navigate to the *Data* tab.
232+
+
233+
image::../images/cdw_dataviz.png[tables verify]
234+
+
235+
- Click *+ New Connection* → *CDW Impala*.
236+
+
237+
image::../images/connection.png[make connection, width=500, height=300]
238+
+
239+
[width="90%",cols="40%,50%",options="header"]
240+
|===
241+
|**Parameter** |**Value**
242+
|*Connection Name* |Impala-Axon (or any name you prefer)
243+
|*Connection type* |CDW Impala
244+
|*CDW Warehouse* |Select the name of your Impala Virtual Warehouse
245+
|*Hostname* |It will be auto populated when you select CDW Warehouse
246+
|*Port* |28000 (for Impala)
247+
|*Credentials* |Leave it Empty
248+
|===
249+
+
250+
- Click *Test Connection* to verify.
251+
+
252+
image::../images/connection_cdw_impala.png[verify connection, width=450, height=500]
253+
+
254+
- Once successful, click *Save*.
255+
- You can now use this connection to create/import datasets and build/import dashboards from Impala tables.
256+
257+
=== 8. Import Dashboard into Cloudera Data Visualization
258+
259+
- Go to *Cloudera Data Visualization*.
260+
261+
- Navigate to the *Data* tab, then click on *Import visual artifacts*.
262+
+
263+
image::../images/import_visual.png[Import Visual]
264+
+
265+
- Upload the dashboard JSON file: https://github.com/cloudera/cloudera-partners/blob/project-axon/Project-Axon/project_axon_dashboard.json[project_axon_dashboard.json].
266+
- After uploading, click on *Accept and Import*, you will see an *Import Successful* message along with the list of datasets that were imported as part of the dashboard.
267+
+
268+
image::../images/dashboard_import_verify.png[import success, width=800, height=450]
269+
+
270+
- Once imported, navigate to the *Visuals* tab and click on the dashboard to open and view it.
271+
+
272+
image::../images/dashboard.png[dashboard]
273+

Project-Axon/On-prem/Project-Axon.json

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)