AI securitythreat detection

How we built per-app LLM egress detection without reading your prompts

June 11, 20268 min read

In 2025, AI-augmented malware stopped being a research curiosity. APT28's PROMPTSTEAL / LAMEHUG queried a live large language model during attacks on Ukrainian targets to generate reconnaissance commands on the fly. In November, Microsoft documented SesameOp, a backdoor that uses the OpenAI Assistants API itself as its command-and-control channel — its traffic is ordinary HTTPS to api.openai.com, indistinguishable from a developer's Copilot call by TLS inspection alone.

That last point is the hard part. When the malicious channel is a legitimate, CDN-fronted, TLS-encrypted API that thousands of your employees use every day, you cannot block your way out, and you cannot read the payload — it is encrypted, and even if you could decrypt it, doing so would mean reading your own people's prompts. So we built a detection surface that never looks inside the request at all. This is how it works.

The observation model: per-app egress, not packets

NetInsightPro already watches network egress per application — which process on a device opened which outbound connection, to which host, and how many bytes flowed in each direction. That existing per-app telemetry turns out to be exactly the substrate an LLM-threat detector needs, because every documented AI-malware family has the same tell: a process that has no business talking to an LLM is talking to one.

The unit of observation is therefore (process, host, direction, bytes, time) — never the request body. A connection from w3wp.exe (an IIS worker) to an LLM endpoint is a strong signal on its own; the actual prompt text is irrelevant to that judgement.

Classifying the endpoint, not the conversation

The first layer is a deterministic classifier that maps a destination host (and, where available, the calling process name and port) onto an AI category. It is a shared, ordered rule set — the same logic runs in the backend and is exported to the desktop agent so both sides agree. Categories include:

api — Programmatic provider endpoints — api.openai.com, api.anthropic.com, generativelanguage.googleapis.com, Mistral, Cohere, Groq, Bedrock, Azure OpenAI, and more.
assistant — Cloud chat UIs — chatgpt.com, claude.ai, gemini.google.com.
sdk — Process-name signatures for known client libraries (OpenAI, Anthropic, LangChain, LlamaIndex).
local_llm — Loopback runtimes — Ollama, LM Studio, llama.cpp, vLLM — matched only when the host is localhost.
plugin / agent — MCP servers and automation agents (AutoGPT, CrewAI, LangGraph).

Classification is host- and process-based, with a precedence order (API matches beat assistant matches beat SDK, and so on) so a single connection resolves to one unambiguous category. None of this requires inspecting what was said — only where the bytes went.

We deliberately kept this layer deterministic rather than learned. A rule set is auditable: a security team can read exactly why a connection was tagged, reproduce the decision, and add their own sanctioned endpoints. It is also portable — because the same ordered rules are compiled into the desktop agent and run in the backend, a connection gets the same category whether it is classified at the edge or re-evaluated centrally, with no model-drift between the two. That consistency is what lets the later heuristics reason about connections across an entire fleet without worrying that two devices disagree about what an "LLM call" even is.

H1 — LLM egress from a process that shouldn't be calling one

The highest signal-to-noise heuristic is the simplest. H1 alerts when a process that is not in the tenant's allowlist establishes a connection to a known LLM API host. A browser or a sanctioned IDE calling api.openai.com is expected. A Windows service, an IIS worker, or a freshly-dropped binary doing the same thing is the foundational indicator behind both the LLM-as-C2 pattern (SesameOp) and the LLM-assisted-recon pattern (PROMPTSTEAL).

Raw, H1 fires on a lot of legitimate developer tooling — Copilot, Cursor, CI jobs running test-generation. We treat that as a calibration problem, not a reason to make the rule more invasive. After a per-tenant allowlist warms up over roughly two weeks, the documented false-positive rate drops to under 5%. The fix is tuning the allowlist, never reading the prompt.

H7 — cross-host correlation: the worm signal

A single odd connection is an anomaly. The same novel binary making LLM calls across your fleet is a worm. H7 escalates when the same process binary hash appears calling an LLM endpoint on three or more distinct devices in a tenant inside a five-minute window — and that hash was not already in the tenant's known process inventory.

New-hash plus multi-host plus LLM-call is a highly specific combination. Legitimate software rollouts are rare and predictable, so this heuristic carries a low false-positive estimate while catching exactly the replication behaviour that distinguishes a spreading agent from a one-off. Crucially, the correlation runs within a single tenant's data only — there is no cross-customer pooling of binaries or behaviour. (For confirmed-bad indicators we can cross-reference external threat-intel, but that is a lookup, not a data-sharing arrangement, and it is off by default.)

The reason cross-host correlation is worth the extra machinery is that it changes what an attacker has to do to stay quiet. Evading a per-device rule is easy — rename the binary, jitter the timing, blend into one machine's noise. But propagation is the entire point of a worm, and propagation is observable in aggregate even when each individual instance looks innocuous. By anchoring the strongest signal to the one behaviour the attacker cannot give up — appearing on many hosts at once — H7 raises the cost of evasion far more than any single-host rule could, and it does so using only the metadata we were already collecting.

Privacy by design: what we never capture

The whole architecture is built around a single constraint: detection must work on metadata, because the content is both encrypted and none of our business. Concretely:

No prompt text and no LLM response content is ever stored, logged, or transmitted. The detector operates on hostnames, byte counts, timing, and process identity.
Process metadata is limited to the process name and a one-way lineage hash. Full command-line arguments are explicitly excluded from the schema.
Payload-entropy capture — the one signal that would touch the request body — is opt-in per tenant and off by default. Most teams will not need to enable it.
A single configuration switch (ai_threat_telemetry_enabled) lets a tenant turn the entire AI-threat path off; when disabled, events carrying AI-threat fields are dropped before they are written.
Telemetry is fail-open: AI-threat fields are optional additions to the existing client-events batch, so if classification or capture misbehaves, ordinary telemetry and the product keep working — a detector outage never breaks the agent.
Per-tenant isolation: every telemetry batch is authenticated with that tenant’s license credential, anomaly rows carry a short TTL, and correlation stays inside the tenant’s own stack.

What the device actually reports

Here is the entire shape of what leaves an endpoint for AI-threat detection — and, just as importantly, what does not.

Captured (metadata)	Used for	Never captured
Destination host	Endpoint classification (H1)	Prompt text
Calling process name	Allowlist match (H1)	LLM response content
Process lineage hash (one-way)	Unexpected-parent signal	Command-line arguments
Binary hash (SHA-256)	Cross-host replication (H7)	File contents
Outbound / inbound byte counts	Byte-asymmetry heuristics	Decrypted request bodies
Connection timing	Polling / correlation windows	User identity beyond tenant

Why metadata is enough

The instinct in security is that more visibility means deeper inspection. The AI-threat case inverts that. The malicious behaviours that matter — a service process reaching an LLM, a beacon-like polling cadence, a brand-new binary lighting up across the fleet — are all visible in the shape of the traffic, not its contents. Reading the prompt would add false confidence, real privacy liability, and very little detection value over the behavioural signal we already have.

That is the bet behind this design, and so far the documented threats validate it: every confirmed in-the-wild family is caught by who is talking to whom, not by what they said.

Go deeper

Read the full AI-worm detection overview — documented threats, the shipped heuristics, and the privacy guardrails — or install the agent and see your own per-app egress.

AI worm detection overview Download NetInsightPro