AI Compliance Checklist: GDPR, HIPAA, SOC 2, and Data Retention for LLM Apps

Compliance Problems Usually Start in the Prompt Pipeline

Teams often ask whether their LLM provider is compliant. That is the wrong first question.

The harder question is whether the application itself handles prompts, outputs, logs, attachments, and training settings in a way that matches your legal and contractual obligations.

Most AI compliance failures come from ordinary engineering decisions:

logging too much
keeping data too long
sending regulated content to the wrong processor
failing to support deletion and access requests
allowing support or analytics tools to copy AI interactions into extra systems

Compliance for AI applications is mostly a data governance and system design problem.

Start With a Simple Data Flow Map

Before looking at frameworks, map the full path of AI data:

user input and uploaded files
prompt assembly service
retrieval layer and knowledge sources
model provider or self-hosted inference service
output storage and audit logs
support, analytics, and observability tooling

If you cannot describe where the data goes, you cannot honestly claim the system is under control.

For GDPR-regulated data, teams should be able to answer:

what lawful basis applies to the processing?
what categories of personal data enter prompts or retrieval?
which vendors act as processors or sub-processors?
how long are prompts, outputs, and traces retained?
can the system support deletion, access, and rectification requests?
is personal data used for model training or service improvement?

data minimization in prompts and retrieval
masking or tokenization of direct identifiers
retention limits for prompts, outputs, and observability data
documented processor agreements
regional controls where required

HIPAA: What Changes When PHI Is Involved

Once protected health information may enter the workflow, the margin for improvisation disappears.

Teams need to verify:

whether PHI can enter prompts, attachments, or retrieved context
whether the vendor signs a business associate agreement when required
whether audit logging covers access to PHI without oversharing PHI in logs
whether role-based access and minimum necessary access are enforced

If the product cannot cleanly control PHI, route those use cases away from the model or through a stricter reviewed workflow.

SOC 2: What Auditors Will Actually Ask

SOC 2 does not give you an AI-specific checklist, but auditors will still look at the controls around confidentiality, access, change management, logging, and vendor risk.

Expect scrutiny on:

who can access prompts and transcripts
how secrets are handled in AI workflows
how vendors are reviewed and approved
how production changes to models, prompts, and routing are tested
how security incidents involving AI outputs are detected and investigated

For AI systems, the evidence often lives in engineering controls, not in a policy document alone.

The Retention Problem Most Teams Underestimate

Prompt retention is where many organizations quietly accumulate risk.

Retention decisions should be explicit for:

raw user prompts
uploaded files
retrieved context snippets
model outputs
tracing and observability payloads
support escalations and exports

Example retention matrix

Data type	Default retention	Notes
Raw prompts	30 days or less	Shorter if prompts may contain customer data
Uploaded files	Case-by-case	Prefer temporary processing and deletion
Model outputs	Business need only	Avoid keeping low-value generated content indefinitely
Security logs	Per policy	Redact personal and regulated data first
Fine-tuning datasets	Controlled separately	Stronger approval and provenance needed

The point is not that these exact numbers fit every company. The point is to have a deliberate policy rather than accidental retention.

Technical Controls That Support Compliance

Prompt minimization

Do not send full records when only a few fields are required.

Redaction before logging

function redactSensitiveFields(text: string) {
  return text
    .replace(/d{3}-d{2}-d{4}/g, "[REDACTED_SSN]")
    .replace(/d{16}/g, "[REDACTED_CARD]")
    .replace(/[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,}/gi, "[REDACTED_EMAIL]");
}

Access controls on transcripts and traces

Support staff, engineers, and security teams should not all see the same level of conversation detail by default.

Vendor configuration review

Verify training, retention, region, and sub-processor settings for every external AI service.

Deletion and export workflows

If a user requests deletion or access to their data, AI interaction history cannot become the forgotten system that breaks compliance.

An AI Compliance Checklist for Shipping Teams

Map every place AI interaction data is stored or copied.
Define lawful basis and processor relationships before launch.
Minimize what enters prompts, retrieval, and logs.
Set retention limits for prompts, outputs, traces, and uploads.
Verify vendor training and retention defaults.
Restrict who can view AI transcripts and debugging payloads.
Support deletion, export, and correction workflows where required.
Review PHI handling separately for HIPAA-affected use cases.
Collect evidence for SOC 2 around access, change control, and vendor review.
Reassess the design whenever the model provider, routing, or data sources change.

AI Compliance Checklist: GDPR, HIPAA, SOC 2, and Data Retention for LLM Apps

Compliance Problems Usually Start in the Prompt Pipeline

Start With a Simple Data Flow Map

HIPAA: What Changes When PHI Is Involved

SOC 2: What Auditors Will Actually Ask

The Retention Problem Most Teams Underestimate

Example retention matrix

Technical Controls That Support Compliance

Prompt minimization

Redaction before logging

Access controls on transcripts and traces

Vendor configuration review

Deletion and export workflows

An AI Compliance Checklist for Shipping Teams

Further Reading

Planning an AI feature launch or security review?

Related Articles

AI Security: Complete Guide to LLM Vulnerabilities, Attacks & Defense Strategies 2025

AI Security & LLM Threats: Prompt Injection, Data Poisoning & Beyond

AI Red Teaming: How to Break LLMs Before Attackers Do

AI Compliance Checklist: GDPR, HIPAA, SOC 2, and Data Retention for LLM Apps

Compliance Problems Usually Start in the Prompt Pipeline

Start With a Simple Data Flow Map

GDPR: The Main Questions for LLM Applications

GDPR controls that matter in practice

HIPAA: What Changes When PHI Is Involved

SOC 2: What Auditors Will Actually Ask

The Retention Problem Most Teams Underestimate

Example retention matrix

Technical Controls That Support Compliance

Prompt minimization

Redaction before logging

Access controls on transcripts and traces

Vendor configuration review

Deletion and export workflows

An AI Compliance Checklist for Shipping Teams

Further Reading

Planning an AI feature launch or security review?

Related Articles

AI Security: Complete Guide to LLM Vulnerabilities, Attacks & Defense Strategies 2025

AI Security & LLM Threats: Prompt Injection, Data Poisoning & Beyond

AI Red Teaming: How to Break LLMs Before Attackers Do