-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathDATASETS_READY.txt
More file actions
71 lines (50 loc) · 3.06 KB
/
DATASETS_READY.txt
File metadata and controls
71 lines (50 loc) · 3.06 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
═══════════════════════════════════════════════════════════════════════════════
🎉 GENUINE TRAINING DATASETS READY! 🎉
═══════════════════════════════════════════════════════════════════════════════
✅ Downloaded and ready in: testtraindata/
📦 WHAT YOU GOT:
1. Code Alpaca (1,000 examples)
• Python code generation from natural language
• Example: "Write a function to reverse a string" → Python code
• Perfect for: Coding assistants, code completion, programming tutors
2. SQL Generation (500 examples)
• Convert English questions to SQL queries
• Example: "Find users over 30" → SELECT * FROM users WHERE age > 30
• Perfect for: Database assistants, BI tools, query helpers
3. Practical Coding (5 curated examples)
• High-quality algorithm implementations
• Perfect for: Teaching specific patterns, validation sets
═══════════════════════════════════════════════════════════════════════════════
🚀 START TRAINING NOW:
Easiest way (one command):
./train_code_assistant.sh
Interactive mode:
python llamaforge_interactive.py
# Select codellama:latest
# Choose testtraindata/code_alpaca.jsonl
Command-line mode:
python llamaforge.py \
--model codellama/CodeLlama-7b-hf \
--data testtraindata/code_alpaca.jsonl \
--epochs 3
═══════════════════════════════════════════════════════════════════════════════
💡 WHAT YOU'LL CREATE:
A fine-tuned model that can:
✓ Write Python functions from descriptions
✓ Generate SQL queries from English
✓ Implement algorithms and data structures
✓ Solve coding problems
✓ Create boilerplate code
✓ Explain code snippets
═══════════════════════════════════════════════════════════════════════════════
📚 FULL GUIDE:
Read: testtraindata/README.md
🎯 RECOMMENDED MODEL:
codellama:latest or qwen2.5-coder:7b
(You have both in your Ollama collection!)
⏱️ TRAINING TIME:
~2-4 hours on your 20-core CPU
💾 OUTPUT SIZE:
~4-5 GB GGUF file
═══════════════════════════════════════════════════════════════════════════════
Ready to create a genuinely useful AI assistant! 🔥