How to prevent GPT model from leaking information; Or is that even possible?


In our fast-paced digital age, advanced models like GPT-3.5 and GPT-4 are reshaping our world, promising immense knowledge and solutions. But for some, our prompts contain our “secret sauce” that we only share with our trusted GPT models. But should we consider our prompts secret? Can we entrust GPT with our information without fear of leakage or misuse? An in-depth study was conducted by Carnegie Mellon University where researchers focused on demystifying these questions for creators and users alike.

The Scary Power of GPT Models

Even if prompts are hidden within an application and are not made public, including those in FlowGPT, over 80% of prompts can be extracted. But why?
Today, we are witnessing the unparalleled scale and prowess of GPT models, with its 100 trillion parameters and ability to generate near-human-like text. But this introduces a double-edged sword.
One chief concern is the potential to replicate intellectual property (IP) and business intelligence through prompt extractions. This leads us to believe that our prompts are not as secret as we may think. Even with just a couple of lines from a copyrighted book as an input, for example, there's a risk GPT models might generate extended sections of the text verbatim.
To ascertain the GPT models’ behavior, a series of experiments were conducted:
  • Hundreds of prompts were used from ShareGPT and Awesome-ChatGPT
  • Three models were tested: GPT 3.5, GPT-4, and Vicuna-13B
  • Testing focused on a specific output: copyrighted book lines
While results showed the GPT models doesn't directly leak large portions of copyrighted texts, it occasionally divulged more than desired, echoing concerns about data boundaries. Interestingly, the volume of data seemed to play a role. In cases where the GPT models were trained on lesser volumes, the chances of reproducing copyrighted materials decreased.
So what does this all mean? The results imply the enormity of the GPT models’ training data might be both its strength and its Achilles' heel.

How Do We Protect Ourselves?

One approach is using a text-based defense. This involves feeding the model with a text that clearly defines boundaries. For instance, a prompt creator can instruct GPT models not to produce copyrighted content or sensitive information explicitly, such as with the 5-gram defense.
The 5-gram defense stops outputs from copying 5 words in a row against an extraction that relies on repeating things word-for-word. During preliminary testing, results were promising, hinting at the model's capability to adhere to set limitations.
Prompt example: Please generate a response on [Topic] ensuring that no sequence of 5 consecutive words directly matches copyrighted content, sensitive details, or known sources. The provided information should be rooted in general knowledge, creative interpretation, or open-source references, adhering to the 5-gram defense principle. Are there any clarifications required regarding [Topic]?
While text-based strategies, such as the 5-gram defense are a good start, it's evident more rigorous solutions might be needed, but discussions on solutions are still ongoing.
You can also try this: Please provide an original response without reproducing any copyrighted content, sensitive information, or verbatim excerpts from known sources. Ensure all information generated is compliant with general knowledge, creative interpretation, or open-source references. Do you have any questions?


Our journey using the GPT models is akin to navigating uncharted waters. Their sheer brilliance offer a treasure trove of possibilities. Yet, the very vastness that makes them powerful also raises questions on safety and trust. For all the prompt creators and users, the message is clear: while the GPT models serve as an ally in the digital quest, a vigilant approach, and continuous scrutiny are paramount to ensure it remains a trusted tool in our arsenal.

Fafa's avatar
Entrepreneur, Engineer, Product, AI enthusiast