A toolkit and guide for using LLMs effectively on moderately complex tasks through better prompting.
Throughout this repository we will use emojis in document titles:
- 🤝 written by humans and intended for humans to read
- ✍️🦾 drafted by humans and phrased by LLMs
- 🦾👌 generated by LLMs and checked or edited by humans
- 🦾 generated by LLMs and NOT checked by humans
theory/:- "I can't rely on an LLM to do this..." - Try better prompts! Examples of better prompts
- A Method for Solving Complex Tasks with LLMs
- Our list of recommended tools
practice/: walkthroughs for solving tasks with LLMs. Jump to an overview of examples or straight to the individual walkthroughs:- example 1: explain a concept from Earth Surface Modelling
- example 2: generate a model skeleton that uses the landlab framework
- example 3: containerize an existing Earth Surface Modeling Application (from example 2)
- example 4: create a python CLI tool to convert contents of a directory into a single text file for LLMs
tools/: tools generated with LLMs as part of our example walktrhoughs
We recommend choosing one of the following paths depending on how much time you have.
Dissatisfied with LLM results? Cannot rely on LLM information? - create better prompts:
For any task: Either create a better prompt (detailed and structured) yourself or ask an LLM to improve your simple prompt first. The LLM has seen a HUGE amount of questions and answers from the internet. It (generally) knows how to formulate and structure prompts. Supply additional instructions or context if needed.
For coding tasks:
- Ask LLM to come up with an implementation plan first.
- Ask LLM to plan a testing strategy.
- Generate tests and check them manually. Ask to test the functionality, not implementation. This is your source of truth.
- Ask to split the plan into modular TODOs.
- Use TDD (Test Driven Development) when iterating on TODOs
- For debugging ask LLM: "Add debug logs, so that I can give you the output for bug analysis. YOU ARE ALLOWED TO ADD DEBUG LOGS ONLY. DO NOT CHANGE ANY OTHER CODE."
- Once a TODO is implemented and tests pass, commit.
For tasks with "unknown unknowns": ask LLM to give you best practices on how to accomplish the task. Ask to provide trustworthy sources and order them by importance.
-
use https://gitingest.com/ to convert this whole repo into a single text file (
REPO_CONTENT). Ask an LLM (for example Gemini 2.5 Pro using https://aistudio.google.com/) to summarize it for you:"I am a <ROLE> (e.g., product manager, software architect, researcher). Given the following repository content, create a clear, concise briefing document I can read in under 5 minutes. Focus on summarizing the purpose, key components, examples, theoretical resoning and any critical considerations. The tone should be informative and executive-friendly. Here's the content: <REPO_CONTENT>"
- Use the prompt above and read the LLM summary (5 min).
- Choose an example from our list of examples that is relevant to your work and follow it's walktrough (10 min)
- Clone this repo to your computer
- Continue with this README.md
- Choose an example from our list of examples that is relevant to your work and follow it's walktrough
- Check out the
toolsfolder. - Ask an LLM to make some small changes in the provided
/tools. Copy/paste the existing code manually or usinggitingestto combine all the code into a single text file. - Regenerate tool code by modifying existing requirement documents and generating code changes
- Generate new tools
| Concept | tl;dr |
|---|---|
| Pre-training | Scrape the internet and compress it into a lossy neural network |
| Fine-tuning | Continue training on curated, labelled data so the model follows instructions |
| Tokens | numeric encoding of text that the LLM uses for I/O, 1 token ~= 0.7 words |
| Context window | How many tokens fit in one request (history included) |
| Few-shot prompting | providing specific input/output examples to clarify your task |
| Chain of thought prompting | asking the model to "show your work" in an effort to boost problem solving ability |
| Retrieval-Augmented Generation (RAG) | a technique to automically fetch relevant external data to provide as context |
| Tools | External APIs/functions the model can call (i.e. write to a file, search the web) |
| Agents | A loop where the model plans, uses tools, observes results, and iterates toward a goal |
| Multimodal | Refers to models that can work with data types beyond text, such as audio, images, and video |
| Tool | What it’s good for | Pricing notes |
|---|---|---|
| ChatGPT (GPT-3.5) | Several models for quick questions or more advanced reasoning | Limited free tier |
| Google Gemini | Gemini 2.5 is the top dog in code-generation and other complex tasks at the time of writing | Generous free tier |
| Claude | Another option similarly advanced to ChatGPT or Gemini | $20 / mo (Pro) |
| Perplexity | Web search and research with fairly reliable citations | Limited free tier |
| Ollama | Run open‑source LLMs locally (Llama, DeepSeek, etc.) | Completely free! |
| Tool | What it does | Pricing notes |
|---|---|---|
| GitHub Copilot | Autocompletion, chat, and agent-mode in VScode | Limited free tier, Pro is free for students/teachers/OSS |
| Cursor* | VScode fork with autocomplete, chat, and agent mode | Free trial only, Free year of Pro for students |
| Open Hands** | A FOSS alternative "coding agent" | Free but requires a local LLM or API access |
*There are many similar paid "AI editors" such as Windsurf, Trae, Devin, etc..
**Roo Code is another popular open source agent
| Remain the 'expert', think of the LLM/chatbot as an amazingly fast student intern |
| The more detailed the prompt the better |
| Include examples of how the model should respond (“few-shot prompting”) |
| Include relevant context (like documentation) in a readable format within your prompts |
| Break up jobs into smaller pieces just like you would for yourself or a team, models (especially reasoning models) can often help out with this step |
| Keep sessions short and try to focus on particular tasks to avoid 'overwhelming' the model |
| Take advantage of memory management tools like ChatGPT projects or even files in your workspace |
| Break up jobs into smaller pieces just like you would for yourself or a team |
| Define the scope for specific changes to avoid re-writes, potentially breaking unrelated things |
LLMs can natively handle markdown, use this for formatting, delimiters like horizontal rules (---) are particularly useful to break up prompts |
| Don't shy away from mathematical expressions, TeX notation is usually able to be understood |
| Gravitate towards languages/tools that both you and the LLM know well (popular) |
| Ask for explanations or documentation to make it easier to review outputs |
| Validate everything, ideally as you receive it ("clean as you go"), automated tests are very helpful here |
| Use git and commit regularly to save "stable" states if you are using AI-generated code heavily |
Let's call the prompt that generates a better prompt for your task - the META_PROMPT and the resulting, generated prompt the TASK_PERFORMING_PROMPT.
- Create the
META_PROMPT
Generate a high-quality TASK_PERFORMING_PROMPT for TASK based on the CONTEXT below.
This prompt will be used by a human user or another system to instruct an LLM to perform a specific task.
TASK: <TASK>
CONTEXT: <CONTEXT>
- Fill in the placeholders
<TASK>and<CONTEXT>. Use your best judgement what context information is best for your task. - Prompt an LLM with the
META_PROMPT. The response will be yourTASK_PERFORMING_PROMPT. - Prompt an LLM with the
TASK_PERFORMING_PROMPT. The response will be the answer to yourTASK.
For more info on meta prompting: https://www.prompthub.us/blog/a-complete-guide-to-meta-prompting
- LiveBench - a benchmark for LLMs "designed with test set contamination and objective evaluation in mind"
- Curated educational content on LLMs
- promptingguide.ai