AI Compliance Checklist: GDPR, HIPAA, SOC 2, and Data Retention for LLM Apps

SCR Security Research Team
May 8, 2026
16 min read
825 words
Share

Compliance Problems Usually Start in the Prompt Pipeline

Teams often ask whether their LLM provider is compliant. That is the wrong first question.

The harder question is whether the application itself handles prompts, outputs, logs, attachments, and training settings in a way that matches your legal and contractual obligations.

Most AI compliance failures come from ordinary engineering decisions:

  • logging too much
  • keeping data too long
  • sending regulated content to the wrong processor
  • failing to support deletion and access requests
  • allowing support or analytics tools to copy AI interactions into extra systems

Compliance for AI applications is mostly a data governance and system design problem.


Start With a Simple Data Flow Map

Before looking at frameworks, map the full path of AI data:

  1. user input and uploaded files
  2. prompt assembly service
  3. retrieval layer and knowledge sources
  4. model provider or self-hosted inference service
  5. output storage and audit logs
  6. support, analytics, and observability tooling

If you cannot describe where the data goes, you cannot honestly claim the system is under control.


GDPR: The Main Questions for LLM Applications

For GDPR-regulated data, teams should be able to answer:

  • what lawful basis applies to the processing?
  • what categories of personal data enter prompts or retrieval?
  • which vendors act as processors or sub-processors?
  • how long are prompts, outputs, and traces retained?
  • can the system support deletion, access, and rectification requests?
  • is personal data used for model training or service improvement?

GDPR controls that matter in practice

  • data minimization in prompts and retrieval
  • masking or tokenization of direct identifiers
  • retention limits for prompts, outputs, and observability data
  • documented processor agreements
  • regional controls where required

HIPAA: What Changes When PHI Is Involved

Once protected health information may enter the workflow, the margin for improvisation disappears.

Teams need to verify:

  • whether PHI can enter prompts, attachments, or retrieved context
  • whether the vendor signs a business associate agreement when required
  • whether audit logging covers access to PHI without oversharing PHI in logs
  • whether role-based access and minimum necessary access are enforced

If the product cannot cleanly control PHI, route those use cases away from the model or through a stricter reviewed workflow.


SOC 2: What Auditors Will Actually Ask

SOC 2 does not give you an AI-specific checklist, but auditors will still look at the controls around confidentiality, access, change management, logging, and vendor risk.

Expect scrutiny on:

  • who can access prompts and transcripts
  • how secrets are handled in AI workflows
  • how vendors are reviewed and approved
  • how production changes to models, prompts, and routing are tested
  • how security incidents involving AI outputs are detected and investigated

For AI systems, the evidence often lives in engineering controls, not in a policy document alone.


The Retention Problem Most Teams Underestimate

Prompt retention is where many organizations quietly accumulate risk.

Retention decisions should be explicit for:

  • raw user prompts
  • uploaded files
  • retrieved context snippets
  • model outputs
  • tracing and observability payloads
  • support escalations and exports

Example retention matrix

Data typeDefault retentionNotes
Raw prompts30 days or lessShorter if prompts may contain customer data
Uploaded filesCase-by-casePrefer temporary processing and deletion
Model outputsBusiness need onlyAvoid keeping low-value generated content indefinitely
Security logsPer policyRedact personal and regulated data first
Fine-tuning datasetsControlled separatelyStronger approval and provenance needed

The point is not that these exact numbers fit every company. The point is to have a deliberate policy rather than accidental retention.


Technical Controls That Support Compliance

Prompt minimization

Do not send full records when only a few fields are required.

Redaction before logging

function redactSensitiveFields(text: string) {
  return text
    .replace(/d{3}-d{2}-d{4}/g, "[REDACTED_SSN]")
    .replace(/d{16}/g, "[REDACTED_CARD]")
    .replace(/[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,}/gi, "[REDACTED_EMAIL]");
}

Access controls on transcripts and traces

Support staff, engineers, and security teams should not all see the same level of conversation detail by default.

Vendor configuration review

Verify training, retention, region, and sub-processor settings for every external AI service.

Deletion and export workflows

If a user requests deletion or access to their data, AI interaction history cannot become the forgotten system that breaks compliance.


An AI Compliance Checklist for Shipping Teams

  • Map every place AI interaction data is stored or copied.
  • Define lawful basis and processor relationships before launch.
  • Minimize what enters prompts, retrieval, and logs.
  • Set retention limits for prompts, outputs, traces, and uploads.
  • Verify vendor training and retention defaults.
  • Restrict who can view AI transcripts and debugging payloads.
  • Support deletion, export, and correction workflows where required.
  • Review PHI handling separately for HIPAA-affected use cases.
  • Collect evidence for SOC 2 around access, change control, and vendor review.
  • Reassess the design whenever the model provider, routing, or data sources change.

Further Reading

Related SecureCodeReviews guides:

The core lesson is straightforward: compliance does not happen because a provider says the word enterprise. It happens because your system minimizes data, enforces policy, and keeps evidence of control.

AI Security Audit

Planning an AI feature launch or security review?

We assess prompt injection paths, data leakage, tool use, access control, and unsafe AI workflows before they become production problems.

Manual review for agent, prompt, and retrieval attack paths
Actionable remediation guidance for your AI stack
Coverage for LLM apps, MCP integrations, and internal AI tools

Talk to SecureCodeReviews

Get a scoped review path fast

Manual review
Actionable fixes
Fast turnaround
Security-focused

Advertisement