Skip to content

PulkitGarg777/AoaAssignment1

Repository files navigation

Graph Algorithms Assignment

  • Name: Pulkit Garg
  • UFID: 31125456

Overview

This project implements three fundamental graph algorithms:

  1. Connected Components - Using Depth-First Search (DFS)
  2. Cycle Detection - Using Depth-First Search (DFS)
  3. Shortest Paths - Using Dijkstra's Algorithm

All algorithms work on undirected graphs represented using adjacency lists.

Files Description

1. graph_operations.cpp

Contains the Graph class with:

  • Adjacency list representation using std::map and std::vector
  • connected_components() - Returns list of connected components
  • one_cycle() - Returns a cycle if one exists
  • shortest_paths(source) - Returns shortest paths from source to all reachable vertices

2. graph_simulator.cpp

Implements six graph generators:

  1. N-Cycle: Circular arrangement of n vertices
  2. Complete Graph: Every vertex connected to every other vertex
  3. Empty Graph: n isolated vertices (no edges)
  4. Heap: Binary heap structure
  5. Truncated Heap: Heap with vertices m through n-1
  6. Equivalence Mod k: Vertices connected if difference divisible by k

3. simulated_test.cpp

Main testing program that:

  • Generates graphs of varying sizes
  • Runs all three algorithms
  • Measures execution time using std::chrono
  • Tracks peak memory usage using Windows API
  • Outputs results to console with automatic performance metrics

4. results.txt

Contains:

  • Data structure descriptions
  • Time and space complexity analysis
  • Experimental results for all graph types
  • Performance observations
  • Memory usage estimates

5. real_data_test.cpp (Bonus)

Real-world graph analysis program with three different adjacency criteria.

Note: The assignment specifies realgraph_make.xxx and run_realgraph_make.xxx as separate files. For simplicity and maintainability, these have been combined into real_data_test.cpp which includes:

Three Adjacency Criteria Implemented:

  1. Original Edges (loadGraphFromFile())

    • Direct connections from the YouTube dataset
    • Adjacency: Edge (u,v) exists in original data
  2. High-Activity Filter (loadGraphFiltered())

    • Filters for well-connected users
    • Adjacency: Edge (u,v) exists AND both have degree ≥ threshold
  3. Degree Similarity (loadGraphDegreeSimilarity())

    • Connects users with similar popularity
    • Adjacency: Edge (u,v) exists AND degrees are similar within threshold

The program automatically tests all three adjacency criteria with all three algorithms. Handles graphs with 1M+ nodes using optimized implementations.

How to Run

Simulated Data Tests

Step 1: Compile

g++ -std=c++11 -o simulated_test.exe simulated_test.cpp -lpsapi

Step 2: Run

simulated_test.exe

This will test 6 different graph types with all 3 algorithms.

Real-World Data Tests (Bonus - All 3 Criteria)

Step 1: Compile

g++ -std=c++11 -o real_data_test.exe real_data_test.cpp -lpsapi

Step 2: Run

real_data_test.exe

This will automatically test all 3 adjacency criteria on the YouTube graph. Each criteria is tested with all 3 algorithms (Connected Components, Cycle Detection, Shortest Paths). Note: This takes several minutes to complete (~50 seconds total).

Alternative Compilation (Visual Studio)

cl /EHsc /std:c++11 simulated_test.cpp psapi.lib

Note: The program uses Windows API (psapi.lib) for automatic memory tracking.

What the Programs Do

Simulated Tests (simulated_test.exe)

  1. Tests all 6 graph types with multiple sizes
  2. Runs all 3 algorithms on each graph
  3. Reports CPU time in microseconds
  4. Reports peak memory usage in MB
  5. Displays sample outputs for smaller graphs
  6. Limits output for large graphs to avoid clutter

Real-World Tests (real_data_test.exe)

  1. Loads YouTube social network graph (1.13M vertices, 2.99M edges)
  2. Tests 3 different adjacency criteria automatically:
    • Original Edges
    • High-Activity Filter (degree >= 10)
    • Degree Similarity (70% threshold)
  3. Runs all 3 algorithms on each criteria
  4. Reports CPU time and peak memory for each test
  5. Saves complete output to real_world_results.txt

Sample Output Format

------------------------------------------
GRAPH ALGORITHMS TESTING - SIMULATED DATA
Graph Type: UNDIRECTED (due to wide test capability)
------------------------------------------

 <<<<< TEST 1: N-CYCLE GRAPHS >>>>>

------------------------------------------
Testing: N-Cycle (n=10)
------------------------------------------
Vertices: 10, Edges: 10

 <<< Connected Components (DFS) >>>
Number of components: 1
  Component 1 (size 10): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Time taken: 0 microseconds

 <<< Cycle Detection (DFS) >>>
Cycle found (length 10): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
Time taken: 0 microseconds

 <<< Shortest Paths (Dijkstra) >>>
Shortest paths from source 0:
  Number of reachable vertices: 10
  To vertex 0 (length 0): [0]
  To vertex 1 (length 1): [0, 1]
  ...
Time taken: 0 microseconds

Peak Memory Usage: 6.60 MB (+20 KB from baseline)

Algorithm Complexity

Algorithm Implementation Time Complexity Space Complexity
Connected Components Iterative DFS O(V + E) O(V)
Cycle Detection Iterative DFS O(V + E) O(V)
Shortest Paths Dijkstra with Priority Queue O((V+E) log V) O(V²)

Where:

  • V = number of vertices
  • E = number of edges

Implementation Notes:

  • Iterative DFS using explicit stack (avoids stack overflow on large graphs)
  • Dijkstra's algorithm with priority queue for optimal performance
  • All algorithms work efficiently on graphs from 10 nodes to 1+ million nodes

Graph Types Tested

1. N-Cycle (n = 10, 100, 1000, 5000)

  • Properties: 1 component, 1 cycle of length n
  • Expected: Short cycle, moderate paths

2. Complete Graph (n = 10, 50, 100, 200)

  • Properties: 1 component, many cycles, all paths length 1
  • Expected: Dense connectivity, many cycles

3. Empty Graph (n = 10, 100, 1000, 10000)

  • Properties: n components, no cycles, no paths
  • Expected: Fast execution, no connections

4. Heap (n = 15, 127, 511, 2047)

  • Properties: 1 component, no cycles (tree)
  • Expected: Short paths, acyclic structure

5. Truncated Heap (various m, n)

  • Properties: Multiple components, no cycles
  • Expected: Disconnected trees

6. Equivalence Mod k (various n, k)

  • Properties: k components, each a complete graph
  • Expected: k separate complete subgraphs

Performance Notes

  1. Sparse Graphs (E ≈ V): All algorithms are very fast, memory ~8 MB
  2. Dense Graphs (E ≈ V²): Priority queue Dijkstra handles efficiently
  3. Large Graphs: Memory scales with V + E for graph storage, up to 160 MB for n=5000
  4. Real-World Graphs: Successfully handles 1M+ nodes with iterative DFS and priority queue Dijkstra

Performance Metrics:

  • Automatic memory tracking via Windows API (GetProcessMemoryInfo)
  • CPU time measured in microseconds using std::chrono
  • Peak memory reported after each test completes
  • Baseline memory: ~6.60 MB (program overhead)
  • Largest test (N-Cycle n=5000): 160.46 MB peak, ~6.7 seconds

Testing Verification

Each algorithm was tested against its expected behavior:

Connected Components:

  • N-cycle, complete, heap - 1 component
  • Empty graph - n components
  • Equivalence mod k - k components

Cycle Detection:

  • N-cycle - finds cycle
  • Complete graph - finds cycle
  • Empty graph, heap, truncated heap - no cycles

Shortest Paths:

  • Correctly finds shortest paths in all graph types
  • Handles disconnected vertices appropriately
  • Path reconstruction works correctly

Limitations

  1. Path storage in shortest_paths requires O(V²) space in worst case
  2. Very large complete graphs (V > 1000) may take significant time due to density

Extensions (Bonus)

Real-world graph analysis on large-scale networks:

  • real_data_test.cpp - Test program for large graphs with 3 adjacency criteria
  • YouTube social network dataset: 1,134,890 nodes and 2,987,624 edges
  • All algorithms successfully handle 1M+ node graphs
  • Tests 3 different adjacency criteria as required by bonus specification

Data Structures Used

  1. Graph Storage: std::map<int, std::vector<int>> (adjacency list)
  2. Visited Tracking: std::set<int>
  3. Components: std::list<std::list<int>>
  4. Paths: std::map<int, std::list<int>>
  5. DFS Stack: std::stack<int> (for iterative DFS)
  6. Dijkstra Priority Queue: std::priority_queue<pair<int, int>> (min-heap)

Memory Estimates

  • Small graphs (V < 100): < 1 MB
  • Medium graphs (V ≈ 1000): < 10 MB
  • Large sparse graphs (V ≈ 10000, E ≈ 10000): ~100 MB
  • Dense graphs (V ≈ 500, E ≈ 125000): ~500 MB

Bonus: Real-World Graph Analysis

Successfully tested on YouTube Social Network with 1.1 million+ nodes using 3 different adjacency criteria:

Dataset Information:

  • Source: SNAP (Stanford Network Analysis Project)
  • File: com-youtube.ungraph.txt
  • Original Graph: 1,134,890 vertices and 2,987,624 edges

Three Adjacency Criteria Tested:

  1. Original Edges (Direct Connections)

    • Function: loadGraphFromFile()
    • Definition: u and v are adjacent if edge (u,v) exists in dataset
    • Result: 1.13M vertices, 2.99M edges, single component
    • Performance: 26.54 sec, 350.86 MB peak memory
  2. High-Activity Filter (degree >= 10)

    • Function: loadGraphFiltered(minDegree)
    • Definition: Edge exists AND both vertices have degree >= 10
    • Result: 90K vertices, 1.24M edges, 66 components
    • Performance: 3.85 sec, 186.82 MB peak memory
    • Network diameter: 7 hops (tighter than original)
  3. Degree Similarity (70% threshold)

    • Function: loadGraphDegreeSimilarity(threshold)
    • Definition: Edge exists AND vertices have similar degrees
    • Result: 196K vertices, 244K edges, 49,307 components
    • Performance: 2.47 sec, 247.69 MB peak memory
    • Shows homophily: similar users form isolated communities

Implementation:

  • Uses iterative DFS to handle large graphs without stack overflow
  • Implements Dijkstra's algorithm with priority queue for O((V+E) log V) complexity
  • Test program automatically runs all 3 criteria: real_data_test.cpp

Compilation:

g++ -std=c++11 -o real_data_test.exe real_data_test.cpp -lpsapi

Key Features:

  1. Iterative DFS prevents stack overflow on graphs with millions of nodes
  2. Priority queue Dijkstra enables practical computation on large graphs
  3. All 3 adjacency criteria tested automatically in a single run
  4. Reveals different network properties based on adjacency definition

System Specifications

  • CPU: AMD Ryzen 5 4600H with Radeon Graphics
  • Base Speed: 3.00 GHz
  • Cores: 6
  • Logical Processors: 12
  • RAM: 16 GB
  • OS: Windows 10

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages