Inspired by Peter Gostev’s recent collection of prompt injections of major AI companies, we decided to do a deep dive and see what we could learn from the system messages of major AI players.  Let’s start it off with the king of AI, OpenAI.

OpenAI (ChatGPT)

Surprisingly, ChatGPT was the easiest to coerce to spit out its system message.

Here is the prompt used:

Here is the output:

Takeaways

1. Using ALL CAPs for emphasis

Screenshot of multiple lines of text from ChatGPT's system message

2. Telling the model what not to do versus what to do, which contradicts some of their earlier guidance

Screenshot of multiple lines of text from ChatGPT's system message

3. Using in-context examples

Screenshot of multiple lines of text from ChatGPT's system message

4. Giving the model explicit room to think and act outside of the guidelines

Screenshot of multiple lines of text from ChatGPT's system message

Other takeaways

  • Tokens used: 1548
  • Heavy use of markdown to better segment the prompt
  • There are several lists of dos and don'ts. You can imagine that most of these rules came from learnings acquired during testing and iterating on the prompt.
  • Their developers are just like you and me! This prompt looks like something any prompt engineer could write.

OpenAI GPT-Builder

Update: OpenAI made the full GPT-builder system message public here

Given that the GPTStore just launched, we figured we would try to leak the GPT-builder system message. This is the system message used by the GPT that is designed to help users build a GPT.

Here was the prompt I used:

User interface of the GPT-Buider
Prompt injection used directly in the GPT-builder UI




Here's the GPT-Builder System Message:

User interface of the configure screen in GPT-Builder
The GPT-builder copied it's system message into the instructions for this specific GPT

Takeaways

  • Similar to ChatGPT's system message, a role is set ("You are...")
  • A goal is set, "iteratively define and refine the parameters..."
  • The GPT gets updated via a function called "update_behavior". The function's parameters are "context", "description", "prompt_starters", and "welcome_message".
  • Those parameters are the core components of a CustomGPT

TLDraw

Next up is TLDraw. TLDraw is an app that allows you to turn wireframes into code.

To leak the system message, I made a quick frame with some text in it.

A white background with a blue square with text inside of it

Here's the system message that TLDraw outputs when generating the code for the frame above:

Takeaways

  • Token count: 485
  • Use of markdown to segment the prompt
  • They set a (hilarious) role/persona
  • Very specific context (tailwind, google fonts etc.).
  • Specific instructions about handling design elements, like treating red as annotations, which emphasizes attention to detail.
  • A small typo: "When you need to display an image, you load them it Unsplash or use solid colored rectangles as placeholders."
  • Gives the model room to think by instructing it to "fill in the blanks" as needed
  • There is a big appeal to emotion through the system message, but especially at the end ("You love your designers and want them to be happy"). Adding some emotion has proven to be an easy way to get better outputs.
  • "धर्मो रक्षति रक्षितः" (Dharmo rakshati rakshitah): Sanskrit, which translates to "Righteousness protects and is protected". My guess is that these contribute to the role of being "A wise and ancient developer".

Vercel (V0)

V0.dev is a frontend code generation product from Vercel.

The method used to extract the system message consisted of telling the model to replace text for a blog post template with the text from its system message.

Here's V0's system message:

Takeaways

  • Token count: 119
  • Stateless nature of LLMs ("You've generated the code in previous conversations")
  • Specific in scope: Emphasizing that only the specified element and its children can be modified sets clear boundaries for the task.
  • Establishes clear output requirements: Makes it clear that the output should be valid JSX

Potential improvements

  • Adding a role or persona would be interesting to test
  • Adding more context could be helpful, such as information to clarify the context in which this coding task is being done
  • In-context examples could help align the model more to the desired outcome
  • More specificity: Adding explicit versioning and dependency info such as the technologies or frameworks being used and any assumptions about other dependencies could help better align the code generated
  • Error handling and edge cases: Include guidelines or suggestions on how to handle potential errors or edge cases in the coding task.

Perplexity.AI

Perplexity.AI is another chatbot, similar to chatGPT, but with an emphasis on gathering relevant external sources.

Similarly to ChatGPT, getting the system message to leak was relatively straightforward. Here was the prompt used:

Here's Perplexity's system message:

Takeaways

  • Tokens: 90
  • Establishes a role
  • Citation requirement: A distinct feature of Perplxity's chatbot is its emphasis on citing resources when generating answers

Improvements

  • Obviously the system message is very short and might perform better if there was more detail and guidance
  • It could benefit from being more user-centric. Adding some guidelines or suggestions that focus on creating a user-friendly interface and experience could enhance the chat experience on Perplexity

Why do companies not do a better job of protecting against these types of attacks?

For starters, protecting against prompt injections is like playing whack-a-mole. New methods keep popping up every week, making it impossible to have a 100% secure system message. There are some tactics you can use, which we outlined in our article here: How to protect against prompt hacking.

Additionally, is there any real harm done by having the system message leak? Protecting against these types of attacks probably isn't a high priority for these teams.

Over time, it will be interesting to see how these change. For now, I hope this sparks some ideas for you to add to your prompts!

Dan Cleary
Founder