Skip to content

🧠 Local LLM Integration Example — A Python demo for CIS 3500 students to integrate a local language model (e.g., phi-3.1-mini-128k-instruct) into applications. Explore prompt engineering, data extraction, and more—without API costs or rate limits. Ideal for hands-on LLM experimentation. 🚀💻

License

Notifications You must be signed in to change notification settings

CIS-3500/local-llm-integration-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local LLM Integration Example

This repository demonstrates how to integrate a Local Language Model into a Python application. While public LLM APIs (e.g., OpenAI, Cohere) are popular, they can be expensive or rate-limited. These limits are enough to reduce how much exploration a programmer does.

By experimenting with a locally hosted model, you can freely explore prompt engineering, data extraction, and more—without incurring API costs.

We're using a 2GB-sized model (phi-3.1-mini-128k-instruct) for simplicity, but you can also experiment with larger models (e.g., 20GB, like Cohere's Command R) for more advanced tasks. This local setup is enough to try out basic tasks such as structured data extraction from text.

Why a Local LLM?

  • Cost-Effective: No pay-per-call API fees.
  • Faster Iteration: Experiments run locally, no network latency.
  • Privacy: Your data never leaves your machine.

Sample Prompt

In this application, we will use the following sample prompt to extract email addresses from text:

You can find several other potential prompts in the SAMPLE-PROMPTS.md file.

Extracting Email Addresses

Prompt:

Extract all email addresses from the text below. Provide the emails in a JSON list.

Text:
"Hello David, please reach out to [email protected] and [email protected]. 
Also, don’t forget to CC [email protected]."

Expected Output (Example):

Installation & Setup

1. Install LM Studio

  1. Download LM Studio from lmstudio.ai.
  2. Follow the installation instructions for your operating system.
  3. Start LM Studio, and verify it's running on http://127.0.0.1:1234 (the default).

Tip: If LM Studio doesn't start on that port, check its Preferences > API settings.

2. Model Configuration

  1. Search for a Model: In LM Studio, click Models → Add New Model (or something similar) to browse available models.
  2. Install the phi-3.1-mini-128k-instruct Model: This is a ~2GB model that can handle simple tasks like data extraction. It's enough to demonstrate the flow without using too much GPU/CPU.
  3. Load Your Model: In LM Studio, ensure the newly downloaded model is loaded and “running” (LM Studio usually shows a green check or “ready” status).
  4. Turn on API: After activating "Developer mode" (using the toggle at the bottom of the window), go to the Developer tab (the second tab on the right toolbar, with a terminal icon), and make sure that the "Status: Running" toggle in On, at the top of the window. This will enable the local API for LM Studio. You can configure API settings by clicking on the nearby "Settings" button.
  5. Check the API: Confirm that LM Studio's local server is running by visiting http://127.0.0.1:1234/v1/models. This is the only endpoint that supports the GET request from the browser. You should see a JSON list of models, including "phi-3.1-mini-128k-instruct".

3. Local Environment Setup

This repo uses pipenv for dependency management. Make sure you have Python 3.12 (or similar) installed.

  1. Fork (or import/copy) this repository to your own GitHub account, keep the name local-llm-integration-example.

  2. Clone this repository (e.g., via GitHub).

  3. Install Dependencies:

    pipenv install

    This will install requests, openai, and loguru.

  4. Run the Script:

    pipenv run python main.py
    • The script will:
      1. Check that LM Studio is up and that your requested model is available.
      2. Send a prompt to the model to extract data (e.g., email addresses).
      3. Print out the raw or parsed JSON response.

Suggestions for Experimentation

  1. Change Log Level to DEBUG
    In main.py, the line:

    loguru.logger.add(sys.stderr, level="INFO")

    sets the default log level. Change "INFO" to "DEBUG" if you'd like to see more detailed logs about request payloads and model responses.

  2. Inspect the JSON Outputs
    Each request's text output is attempted to be parsed as JSON via helper.extract_json(). Sometimes the model includes extra text or formatting around the JSON. If you want to pretty-print the final JSON, you can use tools like jsonformatter.org/json-pretty-print or Python's json.dumps(obj, indent=2) in your code.

  3. Try Different Prompts
    The included example asks for email extraction. You can experiment with other data-extraction tasks (like phone numbers, product details, or structured outputs). Adjust the prompt in main.py and see how the local model handles it.


Happy experimenting with your Local LLM! If you run into issues, make sure that:

  1. LM Studio is running and the correct port (default: 1234) is used.
  2. Your model is actually loaded in LM Studio and shows up in GET /v1/models.
  3. You have enough system resources to run the model (2GB in memory usage or more, depending on your hardware).

Enjoy building your own offline GPT-like applications!

About

🧠 Local LLM Integration Example — A Python demo for CIS 3500 students to integrate a local language model (e.g., phi-3.1-mini-128k-instruct) into applications. Explore prompt engineering, data extraction, and more—without API costs or rate limits. Ideal for hands-on LLM experimentation. 🚀💻

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages