This repository demonstrates how to integrate a Local Language Model into a Python application. While public LLM APIs (e.g., OpenAI, Cohere) are popular, they can be expensive or rate-limited. These limits are enough to reduce how much exploration a programmer does.
By experimenting with a locally hosted model, you can freely explore prompt engineering, data extraction, and more—without incurring API costs.
We're using a 2GB-sized model (phi-3.1-mini-128k-instruct) for simplicity, but you can also experiment with larger models (e.g., 20GB, like Cohere's Command R) for more advanced tasks. This local setup is enough to try out basic tasks such as structured data extraction from text.
- Cost-Effective: No pay-per-call API fees.
- Faster Iteration: Experiments run locally, no network latency.
- Privacy: Your data never leaves your machine.
In this application, we will use the following sample prompt to extract email addresses from text:
You can find several other potential prompts in the SAMPLE-PROMPTS.md file.
Prompt:
Extract all email addresses from the text below. Provide the emails in a JSON list.
Text:
"Hello David, please reach out to [email protected] and [email protected].
Also, don’t forget to CC [email protected]."
Expected Output (Example):
[
"[email protected]",
"[email protected]",
"[email protected]"
]
- Download LM Studio from lmstudio.ai.
- Follow the installation instructions for your operating system.
- Start LM Studio, and verify it's running on
http://127.0.0.1:1234(the default).
Tip: If LM Studio doesn't start on that port, check its Preferences > API settings.
- Search for a Model: In LM Studio, click Models → Add New Model (or something similar) to browse available models.
- Install the
phi-3.1-mini-128k-instructModel: This is a ~2GB model that can handle simple tasks like data extraction. It's enough to demonstrate the flow without using too much GPU/CPU. - Load Your Model: In LM Studio, ensure the newly downloaded model is loaded and “running” (LM Studio usually shows a green check or “ready” status).
- Turn on API: After activating "Developer mode" (using the toggle at the bottom of the window), go to the Developer tab (the second tab on the right toolbar, with a terminal icon), and make sure that the "Status: Running" toggle in On, at the top of the window. This will enable the local API for LM Studio. You can configure API settings by clicking on the nearby "Settings" button.
- Check the API: Confirm that LM Studio's local server is running by visiting http://127.0.0.1:1234/v1/models. This is the only endpoint that supports the GET request from the browser. You should see a JSON list of models, including
"phi-3.1-mini-128k-instruct".
This repo uses pipenv for dependency management. Make sure you have Python 3.12 (or similar) installed.
-
Fork (or import/copy) this repository to your own GitHub account, keep the name
local-llm-integration-example. -
Clone this repository (e.g., via GitHub).
-
Install Dependencies:
pipenv install
This will install
requests,openai, andloguru. -
Run the Script:
pipenv run python main.py
- The script will:
- Check that LM Studio is up and that your requested model is available.
- Send a prompt to the model to extract data (e.g., email addresses).
- Print out the raw or parsed JSON response.
- The script will:
-
Change Log Level to DEBUG
Inmain.py, the line:loguru.logger.add(sys.stderr, level="INFO")
sets the default log level. Change
"INFO"to"DEBUG"if you'd like to see more detailed logs about request payloads and model responses. -
Inspect the JSON Outputs
Each request's text output is attempted to be parsed as JSON viahelper.extract_json(). Sometimes the model includes extra text or formatting around the JSON. If you want to pretty-print the final JSON, you can use tools like jsonformatter.org/json-pretty-print or Python'sjson.dumps(obj, indent=2)in your code. -
Try Different Prompts
The included example asks for email extraction. You can experiment with other data-extraction tasks (like phone numbers, product details, or structured outputs). Adjust the prompt inmain.pyand see how the local model handles it.
Happy experimenting with your Local LLM! If you run into issues, make sure that:
- LM Studio is running and the correct port (default:
1234) is used. - Your model is actually loaded in LM Studio and shows up in
GET /v1/models. - You have enough system resources to run the model (2GB in memory usage or more, depending on your hardware).
Enjoy building your own offline GPT-like applications!