Some prompt engineering techniques are simple and easy to implement (like "According to.." prompting). Some are extremely powerful, but require a more advanced setup with many more API calls, leading to increased costs and latency (like Tree of Thoughts).

Researchers at Virginia Tech and Microsoft have recently developed a new method that stands up against the Tree of Thoughts methhod with a remarkable reduction of 100 times fewer queries needed. Yes, you read that correctly — 100 times fewer queries. (link to the full paper here)

Enter, Algorithm of Thoughts (AoT).

Model inputs and outputs for various prompting methods

Why AoT?

Large Language Models (LLMs) are extremely powerful, but they aren’t perfect. Prompt engineering exists because consistently getting high-quality outputs isn’t a walk in the park.

Many of the top-performing, and widely adopted methods are computationally heavy. For example, the Tree of Thoughts (ToT) method requires multiple rounds of querying as it traverses dozens of branches and nodes.

Designed to address these challenges, AoT presents a structured path of reasoning for LLMs. It's a solution that delivers on efficiency without compromising on the quality of outputs.

Flowcharts for different prompting methods.
Prompt flow for different methods

How AoT works

AoT is designed to mimic algorithmic thinking. It is broken down into a few steps:

  1. Define the Problem: AoT begins by clearly stating the problem behind the query or task.
  2. Gather Information: Before diving into solutions, AoT prompts the LLM to get the necessary context.
  3. Analyze the Information: Next, the LLM breaks down the gathered information, identifying patterns, relationships, or anomalies.
  4. Formulate a Hypothesis: Based on the analysis, the LLM puts together an initial solution.
  5. Test the Hypothesis: The LLM then thinks of ways to validate or refute its hypothesis, envisioning potential outcomes.
  6. Draw Conclusions: After testing, the LLM summarizes its findings, providing a refined solution to the initial problem.
  7. Reflect: Lastly, the LLM considers the broader implications of its conclusion, thinking of potential next steps or further questions.

AoT’s structured approach ensures the LLM isn’t left to wander aimlessly down many paths.

AoT prompt template

The AoT prompt technique isn’t the easiest to implement, but we’ve done our best to make it plug and play. Generally you want to follow this structure:

Here’s a practical example if you were using AoT to do research on the environmental implications of data centers.

  1. Problem Statement: What are the environmental implications of increased data center usage?
  2. Background Information: Data centers account for about 1% of global electricity use.
  3. Initial Hypothesis: Implementing renewable energy sources can mitigate the environmental impact.
  4. Reasoning: Evaluate the feasibility and impact of using renewable energy for data centers.
  5. Conclusion: Based on the analysis, is renewable energy a viable solution for data centers?

You could then restructure that into something a little more LLM-friendly.

"Given that data centers account for about 1% of global electricity use, what are the environmental implications of increased data center usage? I hypothesize that implementing renewable energy sources can mitigate the environmental impact. Can you evaluate the feasibility and impact of using renewable energy for data centers? Based on your analysis, is renewable energy a viable solution for data centers?”


We built a template in PromptHub that converts a normal prompt into a new prompt, following the Algorithm of Thoughts framework. You can access the template here and try it out right away.


If you don't have PromptHub access but want to try it out, reply to the email that gets sent when you join the waitlist and I'll share an access code with you.

Screenshot of the Algorithm of Thoughts Template in the PromptHub platform

Experiments: Game of 24

Setup

The researchers pulled 100 rounds of the Game of 24 from 4nums.com. For a game of 24 to be considered successful, it must use the four given numbers to get to a final answer of 24 using only basic arithmetic operations.

Baselines

Standard prompting and Chain of Thought (CoT) were evaluated under a 5-shot framework. Tree of Thoughts (ToT) was implement with a breadth of 5.

For AoT the same 5-shot framework was used as in standard prompting. These methods were sampled 100 times, with the average success rates being documented.

GPT-4 was the only model used, and the temperature was set to 0.

AoT Prompt

Here is the full version of the AoT prompt used in the game of 24 experiment.

Results

results table of differentprompting methods from the Game of 24 experiment
Prompt engineering method results from Game of 24

AoT destroys CoT and standard prompting. More importantly, AoT  outperforms ToT with just a single LLM query, compared to ToT’s 109. AoT is working smarter, not harder.

Experiments: Mini crosswords

Setup

A collection of 20 games is drawn from goobix.com.

Baselines

The same baselines used in the previous experiment are used here. AoT is compared against standard prompting, ToT, and CoT.

Results

results table of differentprompting methods from the mini crosswords  experiment
Word success represents the percentage of words correctly completed out of the total count.

Once again, AoT outperforms standard prompting and CoT.

AoT slightly underperforms compared to ToT, but it uses significantly fewer queries, by a factor of 100. That’s a lot of additional API calls, tokens, and latency.

Other takeaways

I found one of the final takeaways to be one of the more interesting. The researchers tesed to see if AoT could outperform the algorithm it is designed after (depth-first search, DFS). If it could, that would signal that LLMs have the potential to not just replicate, but surpass the algorithm's efficiency.

The results are in, and it’s looking pretty good for AoT.

Bar chart compared the number of visited nodes for depth-first search versus AoT

This graph shows that AoT systematically visits fewer nodes compared to DFS. This implies that AoT is able to get to a correct final answer, in fewer steps (less node visits). AoT is able to integrate the uniform strategy of the depth-first algorithm, while also integrating intelligent recursive reasoning to limit the steps needed.

Wrapping up

We cover a lot of the latest research in prompt engineering and AI. AoT might be our favorite new method because of it’s ease of use and performance. It is able to compete right alongside ToT, while using significantly less resources.

Additionally, since AoT is only a single query, anyone can try it out. Unlike other methods that demand external resources, AoT requires nothing more than a correctly structured prompt.

Dan Cleary
Founder