Graph Algorithms Assignment

Name: Pulkit Garg
UFID: 31125456

Overview

This project implements three fundamental graph algorithms:

Connected Components - Using Depth-First Search (DFS)
Cycle Detection - Using Depth-First Search (DFS)
Shortest Paths - Using Dijkstra's Algorithm

All algorithms work on undirected graphs represented using adjacency lists.

Files Description

1. graph_operations.cpp

Contains the Graph class with:

Adjacency list representation using std::map and std::vector
connected_components() - Returns list of connected components
one_cycle() - Returns a cycle if one exists
shortest_paths(source) - Returns shortest paths from source to all reachable vertices

2. graph_simulator.cpp

Implements six graph generators:

N-Cycle: Circular arrangement of n vertices
Complete Graph: Every vertex connected to every other vertex
Empty Graph: n isolated vertices (no edges)
Heap: Binary heap structure
Truncated Heap: Heap with vertices m through n-1
Equivalence Mod k: Vertices connected if difference divisible by k

3. simulated_test.cpp

Main testing program that:

Generates graphs of varying sizes
Runs all three algorithms
Measures execution time using std::chrono
Tracks peak memory usage using Windows API
Outputs results to console with automatic performance metrics

4. results.txt

Contains:

Data structure descriptions
Time and space complexity analysis
Experimental results for all graph types
Performance observations
Memory usage estimates

5. real_data_test.cpp (Bonus)

Real-world graph analysis program with three different adjacency criteria.

Note: The assignment specifies realgraph_make.xxx and run_realgraph_make.xxx as separate files. For simplicity and maintainability, these have been combined into real_data_test.cpp which includes:

Three Adjacency Criteria Implemented:

Original Edges (loadGraphFromFile())
- Direct connections from the YouTube dataset
- Adjacency: Edge (u,v) exists in original data
High-Activity Filter (loadGraphFiltered())
- Filters for well-connected users
- Adjacency: Edge (u,v) exists AND both have degree ≥ threshold
Degree Similarity (loadGraphDegreeSimilarity())
- Connects users with similar popularity
- Adjacency: Edge (u,v) exists AND degrees are similar within threshold

The program automatically tests all three adjacency criteria with all three algorithms. Handles graphs with 1M+ nodes using optimized implementations.

How to Run

Simulated Data Tests

Step 1: Compile

g++ -std=c++11 -o simulated_test.exe simulated_test.cpp -lpsapi

Step 2: Run

simulated_test.exe

This will test 6 different graph types with all 3 algorithms.

Real-World Data Tests (Bonus - All 3 Criteria)

Step 1: Compile

g++ -std=c++11 -o real_data_test.exe real_data_test.cpp -lpsapi

Step 2: Run

real_data_test.exe

This will automatically test all 3 adjacency criteria on the YouTube graph. Each criteria is tested with all 3 algorithms (Connected Components, Cycle Detection, Shortest Paths). Note: This takes several minutes to complete (~50 seconds total).

Alternative Compilation (Visual Studio)

cl /EHsc /std:c++11 simulated_test.cpp psapi.lib

Note: The program uses Windows API (psapi.lib) for automatic memory tracking.

What the Programs Do

Simulated Tests (simulated_test.exe)

Tests all 6 graph types with multiple sizes
Runs all 3 algorithms on each graph
Reports CPU time in microseconds
Reports peak memory usage in MB
Displays sample outputs for smaller graphs
Limits output for large graphs to avoid clutter

Real-World Tests (real_data_test.exe)

Loads YouTube social network graph (1.13M vertices, 2.99M edges)
Tests 3 different adjacency criteria automatically:
- Original Edges
- High-Activity Filter (degree >= 10)
- Degree Similarity (70% threshold)
Runs all 3 algorithms on each criteria
Reports CPU time and peak memory for each test
Saves complete output to real_world_results.txt

Sample Output Format

------------------------------------------
GRAPH ALGORITHMS TESTING - SIMULATED DATA
Graph Type: UNDIRECTED (due to wide test capability)
------------------------------------------

 <<<<< TEST 1: N-CYCLE GRAPHS >>>>>

------------------------------------------
Testing: N-Cycle (n=10)
------------------------------------------
Vertices: 10, Edges: 10

 <<< Connected Components (DFS) >>>
Number of components: 1
  Component 1 (size 10): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Time taken: 0 microseconds

 <<< Cycle Detection (DFS) >>>
Cycle found (length 10): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
Time taken: 0 microseconds

 <<< Shortest Paths (Dijkstra) >>>
Shortest paths from source 0:
  Number of reachable vertices: 10
  To vertex 0 (length 0): [0]
  To vertex 1 (length 1): [0, 1]
  ...
Time taken: 0 microseconds

Peak Memory Usage: 6.60 MB (+20 KB from baseline)

Algorithm Complexity

Algorithm	Implementation	Time Complexity	Space Complexity
Connected Components	Iterative DFS	O(V + E)	O(V)
Cycle Detection	Iterative DFS	O(V + E)	O(V)
Shortest Paths	Dijkstra with Priority Queue	O((V+E) log V)	O(V²)

Where:

V = number of vertices
E = number of edges

Implementation Notes:

Iterative DFS using explicit stack (avoids stack overflow on large graphs)
Dijkstra's algorithm with priority queue for optimal performance
All algorithms work efficiently on graphs from 10 nodes to 1+ million nodes

Graph Types Tested

1. N-Cycle (n = 10, 100, 1000, 5000)

Properties: 1 component, 1 cycle of length n
Expected: Short cycle, moderate paths

2. Complete Graph (n = 10, 50, 100, 200)

Properties: 1 component, many cycles, all paths length 1
Expected: Dense connectivity, many cycles

3. Empty Graph (n = 10, 100, 1000, 10000)

Properties: n components, no cycles, no paths
Expected: Fast execution, no connections

4. Heap (n = 15, 127, 511, 2047)

Properties: 1 component, no cycles (tree)
Expected: Short paths, acyclic structure

5. Truncated Heap (various m, n)

Properties: Multiple components, no cycles
Expected: Disconnected trees

6. Equivalence Mod k (various n, k)

Properties: k components, each a complete graph
Expected: k separate complete subgraphs

Performance Notes

Sparse Graphs (E ≈ V): All algorithms are very fast, memory ~8 MB
Dense Graphs (E ≈ V²): Priority queue Dijkstra handles efficiently
Large Graphs: Memory scales with V + E for graph storage, up to 160 MB for n=5000
Real-World Graphs: Successfully handles 1M+ nodes with iterative DFS and priority queue Dijkstra

Performance Metrics:

Automatic memory tracking via Windows API (GetProcessMemoryInfo)
CPU time measured in microseconds using std::chrono
Peak memory reported after each test completes
Baseline memory: ~6.60 MB (program overhead)
Largest test (N-Cycle n=5000): 160.46 MB peak, ~6.7 seconds

Testing Verification

Each algorithm was tested against its expected behavior:

Connected Components:

N-cycle, complete, heap - 1 component
Empty graph - n components
Equivalence mod k - k components

Cycle Detection:

N-cycle - finds cycle
Complete graph - finds cycle
Empty graph, heap, truncated heap - no cycles

Shortest Paths:

Correctly finds shortest paths in all graph types
Handles disconnected vertices appropriately
Path reconstruction works correctly

Limitations

Path storage in shortest_paths requires O(V²) space in worst case
Very large complete graphs (V > 1000) may take significant time due to density

Extensions (Bonus)

Real-world graph analysis on large-scale networks:

real_data_test.cpp - Test program for large graphs with 3 adjacency criteria
YouTube social network dataset: 1,134,890 nodes and 2,987,624 edges
All algorithms successfully handle 1M+ node graphs
Tests 3 different adjacency criteria as required by bonus specification

Data Structures Used

Graph Storage: std::map<int, std::vector<int>> (adjacency list)
Visited Tracking: std::set<int>
Components: std::list<std::list<int>>
Paths: std::map<int, std::list<int>>
DFS Stack: std::stack<int> (for iterative DFS)
Dijkstra Priority Queue: std::priority_queue<pair<int, int>> (min-heap)

Memory Estimates

Small graphs (V < 100): < 1 MB
Medium graphs (V ≈ 1000): < 10 MB
Large sparse graphs (V ≈ 10000, E ≈ 10000): ~100 MB
Dense graphs (V ≈ 500, E ≈ 125000): ~500 MB

Bonus: Real-World Graph Analysis

Successfully tested on YouTube Social Network with 1.1 million+ nodes using 3 different adjacency criteria:

Dataset Information:

Source: SNAP (Stanford Network Analysis Project)
File: com-youtube.ungraph.txt
Original Graph: 1,134,890 vertices and 2,987,624 edges

Three Adjacency Criteria Tested:

Original Edges (Direct Connections)
- Function: loadGraphFromFile()
- Definition: u and v are adjacent if edge (u,v) exists in dataset
- Result: 1.13M vertices, 2.99M edges, single component
- Performance: 26.54 sec, 350.86 MB peak memory
High-Activity Filter (degree >= 10)
- Function: loadGraphFiltered(minDegree)
- Definition: Edge exists AND both vertices have degree >= 10
- Result: 90K vertices, 1.24M edges, 66 components
- Performance: 3.85 sec, 186.82 MB peak memory
- Network diameter: 7 hops (tighter than original)
Degree Similarity (70% threshold)
- Function: loadGraphDegreeSimilarity(threshold)
- Definition: Edge exists AND vertices have similar degrees
- Result: 196K vertices, 244K edges, 49,307 components
- Performance: 2.47 sec, 247.69 MB peak memory
- Shows homophily: similar users form isolated communities

Implementation:

Uses iterative DFS to handle large graphs without stack overflow
Implements Dijkstra's algorithm with priority queue for O((V+E) log V) complexity
Test program automatically runs all 3 criteria: real_data_test.cpp

Compilation:

g++ -std=c++11 -o real_data_test.exe real_data_test.cpp -lpsapi

Key Features:

Iterative DFS prevents stack overflow on graphs with millions of nodes
Priority queue Dijkstra enables practical computation on large graphs
All 3 adjacency criteria tested automatically in a single run
Reveals different network properties based on adjacency definition

System Specifications

CPU: AMD Ryzen 5 4600H with Radeon Graphics
Base Speed: 3.00 GHz
Cores: 6
Logical Processors: 12
RAM: 16 GB
OS: Windows 10

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
com-youtube.ungraph.txt		com-youtube.ungraph.txt
graph_operations.cpp		graph_operations.cpp
graph_simulator.cpp		graph_simulator.cpp
real_data_test.cpp		real_data_test.cpp
real_data_test.exe		real_data_test.exe
real_world_results.txt		real_world_results.txt
results.md		results.md
results.pdf		results.pdf
results.txt		results.txt
simulated_test.cpp		simulated_test.cpp
simulated_test.exe		simulated_test.exe

Folders and files

Latest commit

History

Repository files navigation

Graph Algorithms Assignment

Overview

Files Description

1. graph_operations.cpp

2. graph_simulator.cpp

3. simulated_test.cpp

4. results.txt

5. real_data_test.cpp (Bonus)

How to Run

Simulated Data Tests

Real-World Data Tests (Bonus - All 3 Criteria)

Alternative Compilation (Visual Studio)

What the Programs Do

Simulated Tests (simulated_test.exe)

Real-World Tests (real_data_test.exe)

Sample Output Format

Algorithm Complexity

Graph Types Tested

1. N-Cycle (n = 10, 100, 1000, 5000)

2. Complete Graph (n = 10, 50, 100, 200)

3. Empty Graph (n = 10, 100, 1000, 10000)

4. Heap (n = 15, 127, 511, 2047)

5. Truncated Heap (various m, n)

6. Equivalence Mod k (various n, k)

Performance Notes

Testing Verification

Limitations

Extensions (Bonus)

Data Structures Used

Memory Estimates

Bonus: Real-World Graph Analysis

System Specifications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages