Skip to content
View HMUNACHI's full-sized avatar
  • Cactus Compute
  • London
  • 03:18 (UTC +01:00)

Block or report HMUNACHI

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
HMUNACHI/README.md

Henry Ndubuaku

LinkedIn Twitter Email Spotify

Studied Electronics & Computing, with a masters in Maths & AI, worked as an ML Software Engineer, then an AI Research Engineer, now Building an Open-Source framework for edge ML (Cactus), with funding from YCombinator, Oxford Seed Fund and Google for Startups Work email, see pinned repo for my code samples.

Core Expertise

Maths Computing Deep Learning Reinforcement Learning Multimodal AI Distributed ML Hardware-Aware AI Realtime Edge AI

Engineering Expertise

Python C++ Go Rust PyTorch JAX TensorFlow CUDA Metal Docker Kubernetes gRPC Cloud

Career Progression

  • 2025-Present: Cactus Compute (YC S25) - Founder
  • 2024-25: Deep Render - AI Research Engineer (Hardware-Aware models for realtime video codec)
  • 2021-24: Wisdm - ML Software Engineer (Perception AI for Maxar Defence satelite imagery)
  • 2019-21: Open-source activities during MSc (NanoDl, SuperLazyLLM, CUDARepo, etc.)
  • 2018-19: Google Africa Developer Scholarship Programme with Andela (pre-MSc)
  • 2014-18: Uni coursework in computing, electronics, data structures, algorithms, maths, physics.

Fun Highlights

  • My research with previous employers were all proprietary, but you'd like this and this.
  • I wrote This ML Handbook. and executable code for maths, ML, and computing, ideal for diving into the depth of ML foundations
  • Kevin Murphy (DeepMind Principal), Daniel Holtz (Mid Journey Founder), Steve Messina (IBM CTO) followed back on X.
  • After CUDARepo, Nvidia reached out, I did 7 technical rounds, got a verbal offer, back-and-forth over YOE/pay, then I got YC.
  • Did MSc at QMUL, just to work with Prof Matt Purver (Ex-Stanford Researcher on CALO), did my project/thesis with his team.
  • Did BEng under Prof Onyema Uzoamaka (Rumoured first Nigerian CS grad from MIT), he taught computing archs off-head!
  • I contribute to the JAX ecosystem, and am a Google Developer Expert in AI and JAX.
  • Recieved the British Talent Immigration within 24hrs of application (no prority appeal or anything).
  • I co-host this monthly dinner for AI researchers, engineers and founders in London.
  • I gave this lecture to a small ML group in Nigeria, on optimising large-scale ML in JAX.

Life Principles

  • When the talented fail to work hard, the hardworking beat the talented.
  • Everything should be an adventure, not a race, everyone gets their moment someday.
  • Make the best of your situation, complaining and pointing fingers do nothing.
  • It often takes 120% effort, focus and passion, failure often results from giving less.

Future PhD Interests

  • Building realtime ML models directly into FGPAs for one of the following:
    • Multimodal EEG-to-Instruction AI for Brain-Machine Interfaces
    • Multimodal World Models for phones, drones, VR headsets, medical devices etc.

Working Profiency + Gemini Q/A

JS/TS React Dart Flutter Swift Kotlin

Pinned Loading

  1. cactus-compute/cactus cactus-compute/cactus Public

    Framework for AI on mobile devices and wearables, hardware-aware C/CPP backend, with wrappers for Kotlin, Java, Swift, React, Flutter.

    C++ 133 24

  2. nanodl nanodl Public

    A Jax-based library for building transformers, includes implementations of GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT and more.

    Python 287 10

  3. cuda-tutorials cuda-tutorials Public

    CUDA tutorials or Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.

    Cuda 182 5

  4. super-lazy-autograd super-lazy-autograd Public

    Hand-derived memory-efficient super lazy PyTorch VJPs for training LLMs on laptop, all using one op (bundled scaled matmuls).

    Python 117 14

  5. pete pete Public

    Parameter-efficient transformer embeddings replace learned embeddings with hardware-aware polynomial expansions of token IDs.

    Python 5

  6. tango tango Public

    Decentralised ML engine, where tiny edge devices like smart watches, phones, VR headsets, game consoles etc., could contribute.

    Go 1