- Name: Pulkit Garg
- UFID: 31125456
This project implements three fundamental graph algorithms:
- Connected Components - Using Depth-First Search (DFS)
- Cycle Detection - Using Depth-First Search (DFS)
- Shortest Paths - Using Dijkstra's Algorithm
All algorithms work on undirected graphs represented using adjacency lists.
Contains the Graph class with:
- Adjacency list representation using
std::mapandstd::vector connected_components()- Returns list of connected componentsone_cycle()- Returns a cycle if one existsshortest_paths(source)- Returns shortest paths from source to all reachable vertices
Implements six graph generators:
- N-Cycle: Circular arrangement of n vertices
- Complete Graph: Every vertex connected to every other vertex
- Empty Graph: n isolated vertices (no edges)
- Heap: Binary heap structure
- Truncated Heap: Heap with vertices m through n-1
- Equivalence Mod k: Vertices connected if difference divisible by k
Main testing program that:
- Generates graphs of varying sizes
- Runs all three algorithms
- Measures execution time using
std::chrono - Tracks peak memory usage using Windows API
- Outputs results to console with automatic performance metrics
Contains:
- Data structure descriptions
- Time and space complexity analysis
- Experimental results for all graph types
- Performance observations
- Memory usage estimates
Real-world graph analysis program with three different adjacency criteria.
Note: The assignment specifies realgraph_make.xxx and run_realgraph_make.xxx as separate files.
For simplicity and maintainability, these have been combined into real_data_test.cpp which includes:
Three Adjacency Criteria Implemented:
-
Original Edges (
loadGraphFromFile())- Direct connections from the YouTube dataset
- Adjacency: Edge (u,v) exists in original data
-
High-Activity Filter (
loadGraphFiltered())- Filters for well-connected users
- Adjacency: Edge (u,v) exists AND both have degree ≥ threshold
-
Degree Similarity (
loadGraphDegreeSimilarity())- Connects users with similar popularity
- Adjacency: Edge (u,v) exists AND degrees are similar within threshold
The program automatically tests all three adjacency criteria with all three algorithms. Handles graphs with 1M+ nodes using optimized implementations.
Step 1: Compile
g++ -std=c++11 -o simulated_test.exe simulated_test.cpp -lpsapiStep 2: Run
simulated_test.exeThis will test 6 different graph types with all 3 algorithms.
Step 1: Compile
g++ -std=c++11 -o real_data_test.exe real_data_test.cpp -lpsapiStep 2: Run
real_data_test.exeThis will automatically test all 3 adjacency criteria on the YouTube graph. Each criteria is tested with all 3 algorithms (Connected Components, Cycle Detection, Shortest Paths). Note: This takes several minutes to complete (~50 seconds total).
cl /EHsc /std:c++11 simulated_test.cpp psapi.libNote: The program uses Windows API (psapi.lib) for automatic memory tracking.
- Tests all 6 graph types with multiple sizes
- Runs all 3 algorithms on each graph
- Reports CPU time in microseconds
- Reports peak memory usage in MB
- Displays sample outputs for smaller graphs
- Limits output for large graphs to avoid clutter
- Loads YouTube social network graph (1.13M vertices, 2.99M edges)
- Tests 3 different adjacency criteria automatically:
- Original Edges
- High-Activity Filter (degree >= 10)
- Degree Similarity (70% threshold)
- Runs all 3 algorithms on each criteria
- Reports CPU time and peak memory for each test
- Saves complete output to real_world_results.txt
------------------------------------------
GRAPH ALGORITHMS TESTING - SIMULATED DATA
Graph Type: UNDIRECTED (due to wide test capability)
------------------------------------------
<<<<< TEST 1: N-CYCLE GRAPHS >>>>>
------------------------------------------
Testing: N-Cycle (n=10)
------------------------------------------
Vertices: 10, Edges: 10
<<< Connected Components (DFS) >>>
Number of components: 1
Component 1 (size 10): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Time taken: 0 microseconds
<<< Cycle Detection (DFS) >>>
Cycle found (length 10): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
Time taken: 0 microseconds
<<< Shortest Paths (Dijkstra) >>>
Shortest paths from source 0:
Number of reachable vertices: 10
To vertex 0 (length 0): [0]
To vertex 1 (length 1): [0, 1]
...
Time taken: 0 microseconds
Peak Memory Usage: 6.60 MB (+20 KB from baseline)
| Algorithm | Implementation | Time Complexity | Space Complexity |
|---|---|---|---|
| Connected Components | Iterative DFS | O(V + E) | O(V) |
| Cycle Detection | Iterative DFS | O(V + E) | O(V) |
| Shortest Paths | Dijkstra with Priority Queue | O((V+E) log V) | O(V²) |
Where:
- V = number of vertices
- E = number of edges
Implementation Notes:
- Iterative DFS using explicit stack (avoids stack overflow on large graphs)
- Dijkstra's algorithm with priority queue for optimal performance
- All algorithms work efficiently on graphs from 10 nodes to 1+ million nodes
- Properties: 1 component, 1 cycle of length n
- Expected: Short cycle, moderate paths
- Properties: 1 component, many cycles, all paths length 1
- Expected: Dense connectivity, many cycles
- Properties: n components, no cycles, no paths
- Expected: Fast execution, no connections
- Properties: 1 component, no cycles (tree)
- Expected: Short paths, acyclic structure
- Properties: Multiple components, no cycles
- Expected: Disconnected trees
- Properties: k components, each a complete graph
- Expected: k separate complete subgraphs
- Sparse Graphs (E ≈ V): All algorithms are very fast, memory ~8 MB
- Dense Graphs (E ≈ V²): Priority queue Dijkstra handles efficiently
- Large Graphs: Memory scales with V + E for graph storage, up to 160 MB for n=5000
- Real-World Graphs: Successfully handles 1M+ nodes with iterative DFS and priority queue Dijkstra
Performance Metrics:
- Automatic memory tracking via Windows API (GetProcessMemoryInfo)
- CPU time measured in microseconds using std::chrono
- Peak memory reported after each test completes
- Baseline memory: ~6.60 MB (program overhead)
- Largest test (N-Cycle n=5000): 160.46 MB peak, ~6.7 seconds
Each algorithm was tested against its expected behavior:
Connected Components:
- N-cycle, complete, heap - 1 component
- Empty graph - n components
- Equivalence mod k - k components
Cycle Detection:
- N-cycle - finds cycle
- Complete graph - finds cycle
- Empty graph, heap, truncated heap - no cycles
Shortest Paths:
- Correctly finds shortest paths in all graph types
- Handles disconnected vertices appropriately
- Path reconstruction works correctly
- Path storage in shortest_paths requires O(V²) space in worst case
- Very large complete graphs (V > 1000) may take significant time due to density
Real-world graph analysis on large-scale networks:
real_data_test.cpp- Test program for large graphs with 3 adjacency criteria- YouTube social network dataset: 1,134,890 nodes and 2,987,624 edges
- All algorithms successfully handle 1M+ node graphs
- Tests 3 different adjacency criteria as required by bonus specification
- Graph Storage:
std::map<int, std::vector<int>>(adjacency list) - Visited Tracking:
std::set<int> - Components:
std::list<std::list<int>> - Paths:
std::map<int, std::list<int>> - DFS Stack:
std::stack<int>(for iterative DFS) - Dijkstra Priority Queue:
std::priority_queue<pair<int, int>>(min-heap)
- Small graphs (V < 100): < 1 MB
- Medium graphs (V ≈ 1000): < 10 MB
- Large sparse graphs (V ≈ 10000, E ≈ 10000): ~100 MB
- Dense graphs (V ≈ 500, E ≈ 125000): ~500 MB
Successfully tested on YouTube Social Network with 1.1 million+ nodes using 3 different adjacency criteria:
Dataset Information:
- Source: SNAP (Stanford Network Analysis Project)
- File: com-youtube.ungraph.txt
- Original Graph: 1,134,890 vertices and 2,987,624 edges
Three Adjacency Criteria Tested:
-
Original Edges (Direct Connections)
- Function: loadGraphFromFile()
- Definition: u and v are adjacent if edge (u,v) exists in dataset
- Result: 1.13M vertices, 2.99M edges, single component
- Performance: 26.54 sec, 350.86 MB peak memory
-
High-Activity Filter (degree >= 10)
- Function: loadGraphFiltered(minDegree)
- Definition: Edge exists AND both vertices have degree >= 10
- Result: 90K vertices, 1.24M edges, 66 components
- Performance: 3.85 sec, 186.82 MB peak memory
- Network diameter: 7 hops (tighter than original)
-
Degree Similarity (70% threshold)
- Function: loadGraphDegreeSimilarity(threshold)
- Definition: Edge exists AND vertices have similar degrees
- Result: 196K vertices, 244K edges, 49,307 components
- Performance: 2.47 sec, 247.69 MB peak memory
- Shows homophily: similar users form isolated communities
Implementation:
- Uses iterative DFS to handle large graphs without stack overflow
- Implements Dijkstra's algorithm with priority queue for O((V+E) log V) complexity
- Test program automatically runs all 3 criteria: real_data_test.cpp
Compilation:
g++ -std=c++11 -o real_data_test.exe real_data_test.cpp -lpsapiKey Features:
- Iterative DFS prevents stack overflow on graphs with millions of nodes
- Priority queue Dijkstra enables practical computation on large graphs
- All 3 adjacency criteria tested automatically in a single run
- Reveals different network properties based on adjacency definition
- CPU: AMD Ryzen 5 4600H with Radeon Graphics
- Base Speed: 3.00 GHz
- Cores: 6
- Logical Processors: 12
- RAM: 16 GB
- OS: Windows 10