EntSec

Hollow Pentesting

Published Mar 30, 2026

Confidently using AI in your penitents. Hollow Testing.

Using AI assistance to get faster security assurance.

LMs are useful during pentesting. They parse configs, spot misconfigurations, generate PoC code, and help with reporting. None of that is in dispute.

What should be in dispute is where client data goes when you paste it into a cloud-hosted AI. Depending on the provider, the tier, and how the account is set up, that data may be retained for 7 days to 5 years, read by provider employees, or fed into a training dataset. No major certification body — CREST, OffSec, CHECK, TIGER — has published guidance on this.

Hollow Pentesting is the practice of stripping all client-identifiable information from data before it touches an AI system, while preserving the technical structure that makes the analysis work. This paper covers the methodology, a provider data risk matrix, and a tiered decision framework.

Provider Data Risk Matrix

Provider policies differ by tier and have changed significantly during 2025–2026. Every cell below is sourced; references follow the table.

Risk Factor	OpenAI (ChatGPT Free/Plus/Pro)	OpenAI (API / Enterprise)	Anthropic (Claude Free/Pro/Max)	Anthropic (API / Commercial)	Google (Gemini Consumer)	Google (Gemini API Paid / Workspace)
Training on input by default	No (user-configurable)	No	Yes, unless opted out (since Sept 2025)	No	Yes on free tier	No
Opt-out available	Yes	N/A — off by default	Yes, via Privacy Settings	N/A — excluded	Yes, but kills chat history	N/A — excluded
Default retention	30 days	30 days; ZDR on approval	30 days; 5 years if training opted in	7 days (reduced Sept 2025)	Up to 18 months; 3 years if human-reviewed	Admin-configurable
ZDR available	Enterprise only	Yes, eligible endpoints	No (consumer)	Yes, via addendum	No	Admin-configurable
Human review possible	Yes	Excluded under ZDR	Yes (consumer)	Excluded	Yes; reviewed copies kept up to 3 years	Only with org consent
Risk of data entering training	Low if configured	Very low with ZDR	HIGH if not opted out	Very low	HIGH on free tier	Very low
Suitable for raw client data	No	Conditional (ZDR + DPA)	No	Conditional (ZDR + DPA)	No	Conditional (DPA + admin controls)

Sources (all verified March 2026)

OpenAI data controls — developers.openai.com/api/docs/guides/your-data
OpenAI / NYT litigation retention — openai.com/index/response-to-nyt-data-demands
Anthropic consumer terms (Aug 2025) — anthropic.com/news/updates-to-our-consumer-terms
Anthropic API retention — char.com/blog/anthropic-data-retention-policy
Anthropic privacy centre — privacy.claude.com/en/articles/10023548-how-long-do-you-store-my-data
Google Gemini API abuse policy — ai.google.dev/gemini-api/docs/usage-policies
Google Gemini API terms — ai.google.dev/gemini-api/terms
Google Gemini consumer retention — char.com/blog/google-gemini-data-retention-policy
Google Cloud Gemini governance — docs.cloud.google.com/gemini/docs/discover/data-governance

Consumer tiers are unsuitable for client data under any configuration. API and enterprise tiers are conditionally acceptable with ZDR and a DPA. Provider terms are not stable — Anthropic flipped from no-training to opt-out in a single update in August 2025, and OpenAI had its policies overridden by a court order for months. Treat any assessment as point-in-time.

The Methodology

Hollow Pentesting strips client identity from prompts while keeping the technical structure intact. The AI gets the skeleton of the problem — the misconfiguration pattern, the topology, the vulnerability condition — but nothing that maps back to the client.

Four principles:

Data minimisation — only include what the AI needs for the specific question. Everything else stays out.

Identity elimination — replace or remove hostnames, domains, IPs, subnets, usernames, service accounts, OU names, group names, policy names, geographic references, ticket numbers, and human-readable comments.

Structural preservation — replacements must maintain the relationships in the original data. If two subnets route to each other and that matters, the synthetic versions must too.

Reversible mapping — keep a local encrypted mapping table that lets you translate AI output back to real assets for the report.

Example

Original — not suitable for AI submission:

Source zone:  DMZ-CLIENT-PAYMENT-PROCESSING
Destination:  10.45.12.0/24 (internal DB subnet)
Hostname:     PA-FW-01-LONDON-DC2
Comments:     "Added by J.Smith, REF: INC-2024-4471"

Hollowed — suitable for AI submission:

Source zone:  DMZ-ZONE-A
Destination:  10.0.1.0/24
Hostname:     FW-PRIMARY
Comments:     [removed]

Rule permissiveness, zone policy, ordering — intact. Client identity — gone.

Synthetic Configs

When anonymised data is still structurally distinctive enough to fingerprint the environment, go further: build an entirely synthetic config that reproduces only the vulnerability condition. Feed the synthetic version to the AI. Apply the analysis to the real environment locally.

AD Mapping Table

Data Element	Real Value	Synthetic Replacement
Domain	`acmefinancial.local`	`lab.corp`
Domain Controller	`DC01-LON.acmefinancial.local`	`DC01.lab.corp`
Service Account	`svc_sqlprod`	`svc_app01`
Security Group	`UK-Finance-DBA-Admins`	`GROUP-A-ADMINS`
GPO	`YOURCO-Workstation-BitLocker-Policy`	`GPO-ENDPOINT-001`
OU	`OU=London,OU=Finance,DC=acmefinancial,DC=local`	`OU=Site-A,OU=Dept-1,DC=lab,DC=corp`

Local Inference

Running LLMs on your own hardware means nothing leaves your network. Ollama, llama.cpp, and vLLM support deployment of Qwen 2.5, DeepSeek-R1, Llama 3, and Mistral on 24GB+ VRAM GPUs. Dedicated inference hardware handles larger models with longer context.

Tradeoffs exist. Local models are less capable on complex reasoning. Quesma's October 2025 research showed smaller models comply with prompt injection at rates up to 95% — relevant when your LLM processes content from untrusted sources during an engagement. Validate outputs carefully.

Local inference combined with hollowed data is the highest-assurance option available.

Recommendations

For pentesters: document a data classification policy for AI usage. Maintain per-engagement mapping tables, encrypted and local. Verify provider terms before each engagement. Disclose AI usage to clients at scoping. Put it in the engagement letter. Prefer local inference where you have the hardware.

For organisations commissioning tests: ask about AI tool usage in vendor assessment. Ask which providers, which tiers, which controls. Consider contractual provisions for AI-assisted analysis. Check compatibility with PCI DSS, UK GDPR, NIS2 obligations.

For the industry: no major certification body has guidance on this. That gap is being filled by practitioners making it up as they go. It should not stay that way.

Hollow Pentesting

Provider Data Risk Matrix

Sources (all verified March 2026)

The Methodology

Example

Synthetic Configs

AD Mapping Table

Local Inference

Recommendations

Read next

Juri.

The BYOD Post.

The Cost of Expiration