Skip to content

Commit 9442ec0

Browse files
authored
remote ollama dev environment (#195)
1 parent a7aca54 commit 9442ec0

14 files changed

+374
-2
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"label": "AI & LLMs",
3+
"link": {
4+
"type": "doc",
5+
"id": "ai"
6+
}
7+
}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
id: ai
3+
title: AI & LLMs
4+
---
5+
6+
# AI & LLMs
7+
8+
This category is dedicated to exploring how Open Telekom Cloud can be leveraged to build robust artificial intelligence solutions that incorporate large language models. As AI continues to transform industries by enabling smarter applications and automating complex tasks, understanding the cloud architecture that supports such advancements becomes crucial. This section provides insights into optimizing deployments, managing resources efficiently, and scaling applications seamlessly on Open Telekom Cloud's infrastructure. With a focus on real-world use cases, it guides developers and architects through the nuances of integrating AI workloads with LLMs, ensuring high performance and reliability.
9+
10+
Here, you'll find articles that dive into best practices for deploying AI models, including how to handle data processing, storage, and security in an efficient manner. Whether you're looking to implement cutting-edge natural language processing solutions or enhance machine learning pipelines, this category serves as your resource hub in the Open Telekom Cloud world.
Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
---
2+
id: securely-expose-remote-ollama-endpoints-to-your-development-machine
3+
title: Securely Expose Remote Ollama Endpoints to your Development Machine
4+
tags: [ollama, vpn, ai]
5+
---
6+
7+
# Securely Expose Remote Ollama Endpoints to your Development Machine
8+
9+
Exposing Ollama endpoints directly from your cloud environment to your local development machine can be highly beneficial, especially when it comes to optimizing the use of expensive resources like GPUs and integrating them with local cost-effective development hardware.
10+
11+
The benefits of integrating remote Ollama endpoints to local development workflows are multi-faceted but not limited to:
12+
13+
- **Enhanced Development Workflow** : By securely exposing Ollama endpoints locally, you can streamline the development process. This setup allows you to test changes in real-time without deploying them in a public or external environment or requiring expensive development machines; for that matter accelerating iteration and debugging and reducing delivery costs.
14+
- **Cost Efficiency** : Reusing GPU cloud resources for every test or development cycle among developers, can lead to significant cost savings on expensive individual local development devices.
15+
- **Customized Environment Testing** : Exposing endpoints locally allows you to create a controlled environment that mirrors your production setup. This capability ensures that any AI model behavior is consistent with expectations before deploying broadly.
16+
- **Compliance and Regulation Adherence** : Certain industries have stringent compliance requirements for data processing and storage. Running AI models in a controlled, isolated, end-to-end encrypted environment helps adhere to these regulations by ensuring that data does not leave your secure environment without proper safeguards even on development phase.
17+
18+
By securely exposing Ollama endpoints, you can achieve a balance between operational efficiency, security, and compliance while facilitating a robust development process for AI models on your machine.
19+
20+
:::caution
21+
Be aware that this blueprint may incur additional costs related to ingress and egress data.
22+
:::
23+
24+
## Prerequisites
25+
26+
For this Blueprint, we are going to need:
27+
28+
1. an **ECS server (Ubuntu 22.04)**: a GPU-accelerated instance (8 vCPUs, 32 GiB) `pi2.2xlarge.4` will suffice.
29+
2. a **Point-to-Site VPN connection**: We need to establish a connection between our development machine and the VPC hosting the Ollama VM.
30+
3. an **Ollama** instance: Ollama must be installed on the ECS server above.
31+
32+
:::important
33+
If you don't currently have an Ollama setup, please refer to the official guide for both manual and automated installation options available at this [link](https://github.com/ollama/ollama/blob/main/docs/linux.md).
34+
35+
Make sure you [**add Ollama as a startup service**](https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended).
36+
:::
37+
38+
## Creating a Point-to-Site VPN
39+
40+
Exposing an Ollama endpoint by assigning an Elastic IP (EIP) directly to a Virtual Machine or using Destination Network Address Translation (DNAT) through a NAT Gateway is considered dangerous due to several key concerns. First, assigning an EIP makes the VM accessible from the internet, significantly increasing its exposure and risk of unauthorized access. This direct accessibility enlarges the attack surface, leaving it vulnerable to potential breaches.
41+
42+
Additionally, having such an endpoint exposed can make it a target for Distributed Denial of Service (DDoS) attacks aimed at overwhelming your service with excessive traffic, potentially causing downtime or degraded performance. Moreover, using DNAT via a NAT Gateway does not eliminate these risks entirely; it still requires meticulous configuration and management of security groups and firewall rules to ensure only legitimate traffic is allowed. Misconfigurations in such setups can easily result in unintentional exposure.
43+
44+
From a compliance perspective, direct internet exposure might violate certain regulatory requirements that mandate strict data protection and access controls, depending on the industry or region.
45+
46+
On the other hand, using a Point-to-Site (P2S) VPN to connect to the VPC where the Ollama VM resides is often viewed as a secure solution and aligns with best practices for several reasons. A P2S VPN establishes an encrypted tunnel between your local machine(s) and the respective Open Telekom Cloud VPC, ensuring that transmitted data remains secure from eavesdropping.
47+
48+
Moreover, a P2S VPN provides flexibility for multiple users to securely connect from different locations without needing complex infrastructure changes. This approach also facilitates compliance with data protection regulations since access can be restricted to authorized users only and connections can be monitored and logged for auditing and incident response purposes.
49+
50+
You can find instruction on how to create and establish a P2S VPN connection here: [Establish a Point-to-Site VPN Connection between your Development Machine and a VPC](../networking/establish-a-p2s-vpn-connection-with-a-vpc.md).
51+
52+
## Changing Ollama's Listening Address
53+
54+
Let's inspect the network sockets associated with Ollama service:
55+
56+
```shell
57+
sudo ss -tupln | grep ollama
58+
59+
tcp LISTEN 0 4096 127.0.0.1:11434 0.0.0.0:* users:(("ollama",pid=2728253,fd=3))
60+
```
61+
62+
We instantly notice that although Ollama is bound to all available network interfaces (`0.0.0.0`) and listens on any port (`*`), in practice, it restricts incoming connections to `127.0.0.1:11434`. `127.0.0.1` is the loopback IP address (localhost), meaning **it's accessible only from the same machine**. The port `11434` is where connections should be made to communicate with Ollama. That will naturally prohibit us accessing the Ollama endpoint from our development machine, or any other machine for that matter. What we need to do is to instruct Ollama service to allow incoming connection to any (or even better to a specific interface). For that matter we need to:
63+
64+
1. Stop temporarily the Ollama service:
65+
66+
```shell
67+
sudo systemctl stop ollama
68+
```
69+
70+
2. Change the systemd service unit for Ollama to accept incoming connections to all interfaces:
71+
72+
```shell
73+
sudo nano /etc/systemd/system/ollama.service
74+
```
75+
76+
and add to it an additional environment variable: `OLLAMA_HOST=0.0.0.0:11434`
77+
78+
```shell
79+
[Unit]
80+
Description=Ollama Service
81+
After=network-online.target
82+
83+
[Service]
84+
ExecStart=/usr/local/bin/ollama serve
85+
User=ollama
86+
Group=ollama
87+
Restart=always
88+
RestartSec=3
89+
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"
90+
Environment="OLLAMA_HOST=0.0.0.0:11434"
91+
92+
[Install]
93+
WantedBy=default.target
94+
```
95+
96+
3. Reload systemd configuration files:
97+
98+
```shell
99+
sudo systemctl daemon-reload
100+
```
101+
102+
4. Start the Ollama service on your system
103+
104+
```shell
105+
sudo systemctl start ollama
106+
```
107+
108+
## Creating Security Group
109+
110+
1. Go to *Network Console* -> *Security Groups*, and click *Create Security Group*. Add two new **Inbound Rules**:
111+
- **Protocol**: `TCP`
112+
- **Port**: `11434`
113+
- **Source**: `192.168.10.0/24`
114+
115+
:::caution
116+
Although we configured Ollama service to listen to any interface, we want to restrict access to the Ollama endpoint **only** to resources that reside inside the same VPC **and not to anyone that might potentially reach this VM**, `0.0.0.0/0`.
117+
118+
The VPC CIDR for this lab was `192.168.10.0/24`, this might differ in your environment given your individual configuration of the VPC and its Subnets so adjust the **Source** of the Inbound Rule accordingly.
119+
120+
Once the P2S VPN connection is established, your development machine will technically be part of this VPC and will be able to access the Ollama endpoint.
121+
:::
122+
123+
2. Add the new Security Group to the Security Groups of the ECS Server.
124+
125+
## Validation
126+
127+
Verify connectivity from your development machine using cURL:
128+
129+
:::important
130+
131+
- Ensure that the VPN connection is already established.
132+
- `192.168.10.183` is the private IP address assigned by the DHCP server of our VPC to the ECS Server that Ollama is installed on. Remember from the step before that the VPC CIDR for this lab was `192.168.10.0/24`, this might differ in your environment given your individual configuration of the VPC and its Subnets.
133+
134+
:::
135+
136+
```shell
137+
curl http://192.168.10.183:11434/api/tags
138+
```
139+
140+
If you have already pulled some models, the response should look similar to this:
141+
142+
```json
143+
{
144+
"models": [
145+
{
146+
"name": "nomic-embed-text:latest",
147+
"model": "nomic-embed-text:latest",
148+
"modified_at": "2025-01-20T23:46:07.861519801Z",
149+
"size": 274302450,
150+
"digest": "0a109f422b47e3a30ba2b10eca18548e944e8a23073ee3f3e947efcf3c45e59f",
151+
"details": {
152+
"parent_model": "",
153+
"format": "gguf",
154+
"family": "nomic-bert",
155+
"families": [
156+
"nomic-bert"
157+
],
158+
"parameter_size": "137M",
159+
"quantization_level": "F16"
160+
}
161+
},
162+
{
163+
"name": "llama3.3:latest",
164+
"model": "llama3.3:latest",
165+
"modified_at": "2025-01-17T09:17:50.765928869Z",
166+
"size": 42520413916,
167+
"digest": "a6eb4748fd2990ad2952b2335a95a7f952d1a06119a0aa6a2df6cd052a93a3fa",
168+
"details": {
169+
"parent_model": "",
170+
"format": "gguf",
171+
"family": "llama",
172+
"families": [
173+
"llama"
174+
],
175+
"parameter_size": "70.6B",
176+
"quantization_level": "Q4_K_M"
177+
}
178+
},
179+
{
180+
"name": "phi4:latest",
181+
"model": "phi4:latest",
182+
"modified_at": "2025-01-14T09:16:25.911933283Z",
183+
"size": 9053116391,
184+
"digest": "ac896e5b8b34a1f4efa7b14d7520725140d5512484457fab45d2a4ea14c69dba",
185+
"details": {
186+
"parent_model": "",
187+
"format": "gguf",
188+
"family": "phi3",
189+
"families": [
190+
"phi3"
191+
],
192+
"parameter_size": "14.7B",
193+
"quantization_level": "Q4_K_M"
194+
}
195+
},
196+
{
197+
"name": "deepseek-coder-v2:latest",
198+
"model": "deepseek-coder-v2:latest",
199+
"modified_at": "2024-11-09T08:07:18.67583696Z",
200+
"size": 8905126121,
201+
"digest": "63fb193b3a9b4322a18e8c6b250ca2e70a5ff531e962dbf95ba089b2566f2fa5",
202+
"details": {
203+
"parent_model": "",
204+
"format": "gguf",
205+
"family": "deepseek2",
206+
"families": [
207+
"deepseek2"
208+
],
209+
"parameter_size": "15.7B",
210+
"quantization_level": "Q4_0"
211+
}
212+
},
213+
]
214+
}
215+
```

docs/blueprints/by-use-case/computing/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ title: Computing
55

66
# Computing
77

8-
In this category, you can find guidance for building and managing computing solutions on Open Telekom Cloud. Topics include virtual machines, containerization, serverless computing, and high-performance computing architectures. Here are also provided recommendations for workload optimization, auto-scaling, and cost management, enabling organizations to efficiently handle diverse computing demands while maintaining reliability, security, and scalability.
8+
In this category, you can find guidance for building and managing computing solutions on Open Telekom Cloud. Topics include virtual machines, containerization, serverless computing, and high-performance computing architectures. Here are also provided recommendations for workload optimization, auto-scaling, and cost management, enabling organizations to efficiently handle diverse computing demands while maintaining reliability, security, and scalability.

0 commit comments

Comments
 (0)