Google’s Prompt Engineering Best Practices

Google recently dropped a 68-page paper about prompt engineering. It's a nice combination of being very approachable for beginners, covering some of the basics like max token and temperature parameters, while also going into some depth into more complex topics like few-shot Chain of Thought.

The paper is packed with insights and we've added all the example prompts into a single group in PromptHub that you can check out here and start experimenting.

‍

A few prompts listed in a list view in PromptHub platform — Access all the prompt templates from Google's Prompt Engineering Paper here

‍

Let’s jump in!

‍

What’s up everyone, how’s it going? Google just released a 68-page paper on prompt engineering, and we’re going to dive into it. I read through the whole thing—it’s super good. It strikes a nice balance: not too basic for experts, but still accessible for beginners, and it goes deep into several important areas. The paper is by Lee Bonra, and we’ll link it below. Tons of good examples, and it covers the full spectrum: model parameters, prompting techniques, best practices, and more. We also created a companion outline on our blog and uploaded all the example prompts from the paper to PromptHub, so you can experiment with them directly. That group is also linked below. First up: model parameters. The paper gives a solid explanation of temperature, top-p, and top-k. These are often the trickiest to understand: - **Top-p (nucleus sampling)**: Selects the smallest set of tokens whose cumulative probability exceeds a threshold (e.g., 0.9). Lower values make the model more focused and deterministic. Think of it as constraining the token pool. - **Top-k**: Not supported by all models (OpenAI doesn’t, Google does). Instead of percentage, it picks from a fixed number of top tokens (e.g., top 30). As you increase k, you get more creativity. - **Temperature**: More familiar to most. Lower = more deterministic (e.g., 0 is greedy decoding), higher = more randomness. The paper includes sample presets. For balanced creativity, they recommend: top-p 0.95, top-k 30, temperature 0.2. You can tweak just one variable at a time—usually temperature is a good starting point. Then it moves into **prompting techniques**, with examples: - **Zero-shot prompting**: Just give the task. E.g., “I have two brothers and three sisters. How many sisters do I have?” - **One-shot**: Add one example to guide format, tone, or logic. - **Few-shot**: Add multiple examples. Helps show patterns, especially for format-heavy outputs. - **System prompting**: Used in system messages. Sets tone, structure, safety, etc. - **Contextual prompting**: Similar to system, but more task-specific or dynamic. - **Role prompting**: Assigns a persona (e.g., “You are a helpful assistant…”). Useful for behavior shaping. - **Step-back prompting**: Ask the model to first think abstractly, then solve the task. Can be two steps or one prompt with stages. - **Chain of Thought (CoT)**: Classic “think step-by-step” method. Especially useful for non-reasoning models, but often unnecessary for reasoning-tuned ones. - **Self-consistency**: Run a prompt multiple times and choose the most frequent output. Boosts reliability. - **Tree of Thought (ToT)**: Guide the model to explore multiple reasoning paths. Sometimes overlaps with self-reflection methods in newer reasoning models. - **ReAct**: Combines reasoning and tool use—models think, act, and reflect in cycles. - **Automatic Prompt Engineering / Meta-prompting**: Ask the model to write or improve prompts. We’ve got a whole PromptHub category for this. **Best Practices** from the paper: - High-quality examples matter. Even reasoning models benefit from showing (not just telling) structure and tone. - Don’t overfit examples—models can cling too tightly to them. Test across diverse inputs. - Start simple: Clear and concise is always the best rule, no matter the model. - Be specific about outputs. People often forget this, but it makes a huge difference. - Use positive instructions instead of negatives (“Do this” vs. “Don’t do that”). - Retest regularly. Models and APIs change. Even eye-testing your prompts can surface issues. - Collaborate: A second set of eyes often spots confusion you missed. - Document: Track your prompts, versions, configs, and performance. PromptHub can help here. **Example Prompt** (also available in PromptHub): - Task: Summarize a product announcement into three bullet points. - Instruction: “Generate three bullet points summarizing the following announcement.” - Output control: Add a one-shot example to show the exact format and length. We skipped “concise” in the instruction and let the example set the tone. - Use train of thought to guide reasoning: Tell the model to identify features, distill into bullets, and think step-by-step. - Variables: Use placeholders to easily test different product announcements. Final takeaway: All the techniques and best practices sound great, but the only way to know what works is to **test**. Prompt engineering is iterative. So try, test, refine. Overall, this is a fantastic guide from Google. Huge thanks to Lee and the team who put it together. Really appreciate when model providers share this kind of insight. Just like OpenAI did last week with their GPT-4.1 prompting guide. That’s it for today—see you in the next one!

‍

Model parameters

The white paper covers all the core model parameters (for a deeper dive into parameters, check out our guide here: Understanding OpenAI Parameters), but we’ll focus on the three used to dial creativity up or down: Top-P, Top-K, and Temperature.

Top-P

Top-P (nucleus sampling) dynamically selects the smallest set of tokens whose combined probability mass exceeds your P threshold. Lower values (0.5) force the model into a smaller, more predictable token pool. Higher values (0.99) let it draw from essentially all the token options, boosting "randomness" and creative diversity .

Top-K

Top-K sampling truncates to the K highest-probability tokens before sampling. A small K (e.g. 10) means the model has only a handful of choices—great for focused outputs—whereas a larger K (e.g. 40) expands the candidate set and increases variation.

Think of Top-P as probability-mass filtering (you choose a percentage of total probability) and Top-K as fixed-count filtering (you pick the top K tokens by probability).

Temperature

Lower values (<1.0) make the model more deterministic—at 0 it performs greedy decoding, always picking its highest-probability token—while higher values (>1.0) flatten the distribution to boost randomness and diversity in the output .

Example presets

Default: P=0.95, K=30, T=0.2 for balanced coherence and creativity.
High creativity: P=0.99, K=40, T=0.9 expands token diversity.
Deterministic: P=0.9, K=20, T=0.1 or even T=0 for single-answer tasks.

‍

Core prompting techniques

The white paper goes over all of the most popular prompt engineering methods and also includes a variety of examples. You can find all the prompt examples mentioned in the paper here, but let’s dive into the methods themselves.

Zero-Shot Prompting: Provide only a description of the task with no examples. It works best for straightforward tasks where the model already understands the domain (e.g., basic classification or summarization).
One-Shot Prompting: Supply exactly one example alongside your task description. This single demonstration helps the model understand tone, style, and output format.
Few-Shot Prompting: Include multiple (typically 3–5) examples to show the desired format, style, or structure. Diverse, high-quality examples are best!
System Prompting: Give high-level, global instructions about the model’s role or output format (e.g., “Only return valid JSON”). Use it to enforce consistency, structure, or safety guardrails across every response.
Contextual Prompting: Embed task-specific background information or data (e.g., “Context: You are writing for a retro-games blog”) so the model tailors its output to the right domain or scenario.
Role Prompting: Assign a persona or job title (e.g., “Act as a travel guide”) to shape tone, vocabulary, and perspective.
Step-Back Prompting: First ask a broad question to surface relevant background knowledge, then feed its answer into the main task prompt for more robust outputs.
Chain-of-Thought (CoT): Instruct the model to “think step by step,” generating intermediate reasoning steps.
Self-Consistency: Run a prompt multiple times under high-temperature sampling and select the most frequent final answer. This majority-vote approach reduces hallucinations and increases reliability.
Tree of Thoughts (ToT): Explore multiple reasoning branches simultaneously by maintaining a “tree” of intermediate thoughts.
ReAct (Reason & Act): Combine natural-language reasoning with external tools (search, code execution, etc.) in a thought–action loop, enabling the model to fetch information or run code mid-prompt.
Automatic Prompt Engineering: Also known as meta prompting. Prompt the model to generate a set of candidate prompts, evaluate them and select the best one.

Best practices

The paper has a ton of best practices scattered throughout. Below are the top ones that stood out, plus here is a link to all the templates so you can start testing them yourself.

Provide high-quality examples: Few-shot prompting is one of the best ways to teach the model the exact format, style, and scope you want. Including edge cases can boost robustness, but you also run the risk of the model overfitting to examples.

Start simple: Nothing beats prompts that are concise and clear. If your own instructions are hard to follow, the model will struggle too.

Be specific about the output: Explicitly state the desired structure, length, and style (e.g., “Return a three-sentence summary in bullet points”).

Use positive instructions over constraints: Framing “what to do” often beats long lists of “don’ts.” Save hard constraints for safety or strict formats.

Don’t forget about Max tokens! When you want to constrain output length, use the max tokens parameter.

Use variables in prompts: Parameterize dynamic values (dates, names, thresholds) with placeholders. This makes your prompts reusable and easier to maintain when context or requirements change.

Experiment with input formats & writing styles: Try tables, JSON schemas, or bullet lists as part of your prompt. Different formats can potentially lead to better outputs.

Continually test: Re-test your prompts whenever you switch models or when new model variants are released. As we covered in our GPT-4.1 model guide, prompts that worked well for previous models may need to be tweaked to maintain performance.

Experiment with output formats: Beyond plain text, ask for JSON, CSV, or markdown. Structured outputs are easier to consume programmatically and reduce post-processing overhead .

Collaborate with your team: Working with your team makes the prompt engineering process easier. With PromptHub you can share and review prompts in a platform designed for prompts.

Chain-of-Thought best practices: When using CoT, keep your “Let’s think step by step…” prompts simple. For more reasoning prompts check out this collection in PromptHub here.

Document prompt iterations: Track versions, configurations, and performance metrics in a centralized platform like PromptHub that will do all the heavy lifting for you.

Putting it all together

Let’s run through a quick example where we want to summarize a product announcement into three bullet points, using all the tips and tricks we’ve learned so far.

1. Simply define the Task

“Generate three concise bullet points summarizing the following announcement.”

2. Choose model parameters

P = 0.95, K = 30, T = 0.2 for that sweet spot between clarity and creativity .

3. Select prompting technique

We’ll use Chain-of-Thought (CoT) to guide the model through its reasoning and include a one-shot example to show the model the output format and tone we want.

4. Best practices to keep in mind

Provide a high-quality example: Show the format and tone we want.
Be specific about output format: “three bullet points.”
Use positive instructions: “Generate three bullet points…” rather than “Don’t write paragraphs.”
Use variables: insert the announcement text as a placeholder so you can reuse this prompt.

5. Sample Prompt

‍

A prompt template in PromptHub platform — Check out the prompt here

‍

6. Iterate & Test

Review the bullets for accuracy and tone, test across different models and with different data for the product announcements
Tweak the model and model parameters
Log each version and its results automatically in PromptHub

Conclusion

The prompt engineering white paper from Google is an awesome resource for anyone interested in LLMs and prompt engineering. Whether you’re just getting started or have some experience, you'll learn a lot from it!

Dan Cleary

Founder

Google’s Prompt Engineering Best Practices

Model parameters

Top-P

Top-K

Temperature

Example presets

Core prompting techniques

Best practices

Putting it all together

Conclusion

Get the week's best prompt engineering and AI content

Join thousands of AI builders

More from the PromptHub Blog

LLMs Are Eating the Context Layer

OpenAI DevDay 2025 Roundup: Apps, Agents, and the New AI Stack

Everything You Need to Know about Claude 4.5