Hollow Pentesting

Hollow Pentesting

Confidently using AI in your penitents. Hollow Testing.

Using AI assistance to get faster security assurance.

LMs are useful during pentesting. They parse configs, spot misconfigurations, generate PoC code, and help with reporting. None of that is in dispute.

What should be in dispute is where client data goes when you paste it into a cloud-hosted AI. Depending on the provider, the tier, and how the account is set up, that data may be retained for 7 days to 5 years, read by provider employees, or fed into a training dataset. No major certification body — CREST, OffSec, CHECK, TIGER — has published guidance on this.

Hollow Pentesting is the practice of stripping all client-identifiable information from data before it touches an AI system, while preserving the technical structure that makes the analysis work. This paper covers the methodology, a provider data risk matrix, and a tiered decision framework.


Provider Data Risk Matrix

Provider policies differ by tier and have changed significantly during 2025–2026. Every cell below is sourced; references follow the table.

Risk FactorOpenAI (ChatGPT Free/Plus/Pro)OpenAI (API / Enterprise)Anthropic (Claude Free/Pro/Max)Anthropic (API / Commercial)Google (Gemini Consumer)Google (Gemini API Paid / Workspace)
Training on input by defaultNo (user-configurable)NoYes, unless opted out (since Sept 2025)NoYes on free tierNo
Opt-out availableYesN/A — off by defaultYes, via Privacy SettingsN/A — excludedYes, but kills chat historyN/A — excluded
Default retention30 days30 days; ZDR on approval30 days; 5 years if training opted in7 days (reduced Sept 2025)Up to 18 months; 3 years if human-reviewedAdmin-configurable
ZDR availableEnterprise onlyYes, eligible endpointsNo (consumer)Yes, via addendumNoAdmin-configurable
Human review possibleYesExcluded under ZDRYes (consumer)ExcludedYes; reviewed copies kept up to 3 yearsOnly with org consent
Risk of data entering trainingLow if configuredVery low with ZDRHIGH if not opted outVery lowHIGH on free tierVery low
Suitable for raw client dataNoConditional (ZDR + DPA)NoConditional (ZDR + DPA)NoConditional (DPA + admin controls)

Sources (all verified March 2026)

  1. OpenAI data controls — developers.openai.com/api/docs/guides/your-data
  2. OpenAI / NYT litigation retention — openai.com/index/response-to-nyt-data-demands
  3. Anthropic consumer terms (Aug 2025) — anthropic.com/news/updates-to-our-consumer-terms
  4. Anthropic API retention — char.com/blog/anthropic-data-retention-policy
  5. Anthropic privacy centre — privacy.claude.com/en/articles/10023548-how-long-do-you-store-my-data
  6. Google Gemini API abuse policy — ai.google.dev/gemini-api/docs/usage-policies
  7. Google Gemini API terms — ai.google.dev/gemini-api/terms
  8. Google Gemini consumer retention — char.com/blog/google-gemini-data-retention-policy
  9. Google Cloud Gemini governance — docs.cloud.google.com/gemini/docs/discover/data-governance

Consumer tiers are unsuitable for client data under any configuration. API and enterprise tiers are conditionally acceptable with ZDR and a DPA. Provider terms are not stable — Anthropic flipped from no-training to opt-out in a single update in August 2025, and OpenAI had its policies overridden by a court order for months. Treat any assessment as point-in-time.


The Methodology

Hollow Pentesting strips client identity from prompts while keeping the technical structure intact. The AI gets the skeleton of the problem — the misconfiguration pattern, the topology, the vulnerability condition — but nothing that maps back to the client.

Four principles:

Data minimisation — only include what the AI needs for the specific question. Everything else stays out.

Identity elimination — replace or remove hostnames, domains, IPs, subnets, usernames, service accounts, OU names, group names, policy names, geographic references, ticket numbers, and human-readable comments.

Structural preservation — replacements must maintain the relationships in the original data. If two subnets route to each other and that matters, the synthetic versions must too.

Reversible mapping — keep a local encrypted mapping table that lets you translate AI output back to real assets for the report.

Example

Original — not suitable for AI submission:

Source zone:  DMZ-CLIENT-PAYMENT-PROCESSING
Destination:  10.45.12.0/24 (internal DB subnet)
Hostname:     PA-FW-01-LONDON-DC2
Comments:     "Added by J.Smith, REF: INC-2024-4471"

Hollowed — suitable for AI submission:

Source zone:  DMZ-ZONE-A
Destination:  10.0.1.0/24
Hostname:     FW-PRIMARY
Comments:     [removed]

Rule permissiveness, zone policy, ordering — intact. Client identity — gone.

Synthetic Configs

When anonymised data is still structurally distinctive enough to fingerprint the environment, go further: build an entirely synthetic config that reproduces only the vulnerability condition. Feed the synthetic version to the AI. Apply the analysis to the real environment locally.

AD Mapping Table

Data ElementReal ValueSynthetic Replacement
Domainacmefinancial.locallab.corp
Domain ControllerDC01-LON.acmefinancial.localDC01.lab.corp
Service Accountsvc_sqlprodsvc_app01
Security GroupUK-Finance-DBA-AdminsGROUP-A-ADMINS
GPOYOURCO-Workstation-BitLocker-PolicyGPO-ENDPOINT-001
OUOU=London,OU=Finance,DC=acmefinancial,DC=localOU=Site-A,OU=Dept-1,DC=lab,DC=corp

Local Inference

Running LLMs on your own hardware means nothing leaves your network. Ollama, llama.cpp, and vLLM support deployment of Qwen 2.5, DeepSeek-R1, Llama 3, and Mistral on 24GB+ VRAM GPUs. Dedicated inference hardware handles larger models with longer context.

Tradeoffs exist. Local models are less capable on complex reasoning. Quesma's October 2025 research showed smaller models comply with prompt injection at rates up to 95% — relevant when your LLM processes content from untrusted sources during an engagement. Validate outputs carefully.

Local inference combined with hollowed data is the highest-assurance option available.


Recommendations

For pentesters: document a data classification policy for AI usage. Maintain per-engagement mapping tables, encrypted and local. Verify provider terms before each engagement. Disclose AI usage to clients at scoping. Put it in the engagement letter. Prefer local inference where you have the hardware.

For organisations commissioning tests: ask about AI tool usage in vendor assessment. Ask which providers, which tiers, which controls. Consider contractual provisions for AI-assisted analysis. Check compatibility with PCI DSS, UK GDPR, NIS2 obligations.

For the industry: no major certification body has guidance on this. That gap is being filled by practitioners making it up as they go. It should not stay that way.