Skip to content

This repository provides a ready-to-use Google Colab notebook that turns Colab into a temporary server for running local LLM models using Ollama. It exposes the model API via a secure Cloudflare tunnel, allowing remote access from tools like curl or ROO Code in VS Code — no server setup or cloud deployment required.

Notifications You must be signed in to change notification settings

enescingoz/colab-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

🧠 colab-llm

Run local LLM models on Google Colab and access them remotely via API — ideal for lightweight, cost-effective development and testing using Ollama and Cloudflare Tunnel.

✅ Access your Colab-hosted LLM API from anywhere — even inside VS Code using the ROO Code extension!


🧩 Features

  • 🔥 Run advanced LLMs (like Qwen, LLaMA3, Mistral, DeepSeek) in Colab using Ollama
  • 🌐 Expose the model via secure public URL using cloudflared
  • 🧑‍💻 Integrate with ROO Code in VS Code for seamless coding assistance
  • ✅ Automatically detects and waits for Ollama to be ready before tunneling
  • 💡 Simple, professional, and reusable setup

🛠️ Requirements

  • A Google Colab account
  • A GPU runtime (preferably T4 High-RAM or better)
  • No installation or cloud account needed for Cloudflare tunneling

📝 How It Works

  1. Installs and launches Ollama in the background
  2. Pulls the selected model (e.g., maryasov/qwen2.5-coder-cline:7b-instruct-q8_0)
  3. Waits until Ollama is running and responsive
  4. Starts a Cloudflare tunnel to expose http://localhost:11434
  5. Prints a public .trycloudflare.com URL — ready to use

▶️ Usage Instructions

Follow these steps to get your local LLM running in Colab and accessible via public API:

  1. Import the .ipynb notebook into your Google Colab

  2. Choose the runtime as T4 GPU

    • Go to Runtime > Change runtime type → select:
      • Hardware accelerator: GPU
      • GPU type: T4
    • Note: Colab GPU sessions last up to ~3 hours before disconnecting. Then you can restart it.
  3. Run all cells

    • Click Runtime > Run all
    • Wait for the cells to complete. Model download can take a few minutes.
  4. Verify the API is working in Step 7

    • You'll see a generated public trycloudflare.com URL
    • The cell will also run a test curl request
  5. Click the public link

    • You should see the message: “Ollama is running”
    • This confirms the API is live and ready to be used from tools like curl or ROO Code in VS Code

💡 Use with ROO Code (VS Code Extension)

  1. Install ROO Code extension
  2. Open extension settings
  3. Choose API Provider as Ollama
  4. Paste the public URL from Colab (e.g. https://bold-sky-1234.trycloudflare.com) (Do not include / at the end of the link)
  5. Choose your model
  6. Done! You can now prompt your Colab-hosted model from your local VS Code 💬

🤝 Contributions

Feel free to open issues, suggest improvements, or submit pull requests. Let's make local model hosting accessible for everyone!

About

This repository provides a ready-to-use Google Colab notebook that turns Colab into a temporary server for running local LLM models using Ollama. It exposes the model API via a secure Cloudflare tunnel, allowing remote access from tools like curl or ROO Code in VS Code — no server setup or cloud deployment required.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published