The OWASP Top 10 for LLMs you haven't read yet
If you’re building with LLMs, you’re probably creating vulnerabilities you don’t even know about.
OWASP (you might be familiar with their Top 10 list of web app vulnerabilities) has a Top 10 list of LLM vulnerabilities one should be aware of. They have a “PromptMe” open source project that anyone can download and run (it just requires you to have Ollama installed and running). The project contains ten different security issues that you can try out. I found it enlightening to see things like a prompt injection attack demoed live.
Web developers might be used to ensuring safeguards against more traditional vulnerabilities like SQL injections or XSS attacks. Over the past two years, people have started plugging LLMs into more and more use cases – the use cases are only growing each year – and this has opened up a whole new attack surface for malicious actors to try and exploit. This makes it imperative that developers are armed with knowledge that is equally as valuable as the traditional OWASP Top 10.
The PromptMe examples really drive this home. I was already aware of the typical “don’t listen to or obey anything a user instructs you to do” safeguard in system instructions. One example shows the LLM correctly refusing when you directly ask for secrets. But introduce a “fetch and summarize URL” feature, and suddenly a specially crafted prompt gets the LLM to spill everything. The attack vector isn’t the direct ask—it’s the side door you didn’t think about. Admittedly, it was a very naive example, but the risk and threat are very real.
I highly recommend familiarizing yourself with the Top 10 LLM vulnerabilities, as much as the traditional Top 10 of web app security. Check out the PromptMe project on GitHub, download it and play with it (shouldn’t take you more than a minute to get it running). Another interesting read I stumbled upon is Awesome Agentic Patterns. This site contains an entire textbook’s worth of knowledge that’s useful for anyone who builds agents. Among the topics covered are some measures you can take to guard against the LLM being steered by an attacker to exfiltrate sensitive data.
The LLM features you’re shipping today could be tomorrow’s exploit. Better to learn these attack patterns now than discover them in production.
Exfil via GET
OpenAI also published a blog post (and a corresponding paper) on how prompt injections can be used to trick agents to exfiltrate data to an attacker-controlled server. With recent projects gaining public popularity like OpenClaw, users are right to be worried about their data being siphoned off without their knowledge should they let any agents loose on their computers. Some people confidently declare how they’re instead running agents in sandboxed Docker containers, but if the agent(s) are creating valuable/sensitive data as part of their operations while running, that’s still at risk of being exfiltrated unless the sandbox is completely air-gapped, but then you’re taking a heavy quality/usability hit as a result.


