OpenAI6 min read

Creating With Sora Safely — How OpenAI Built Trust Into an AI Video Platform

By AI Guide News·Monday, March 23, 2026

With Sora 2's launch came serious questions about deepfakes, likeness theft, and misinformation. OpenAI's answer: a layered safety architecture built directly into the platform — from C2PA watermarks to teen protections and consent-based character controls.

[AD] Rectangle 300×250 / In-article

A Powerful Tool Demands Serious Guardrails

When Sora 2 launched in September 2025, it didn't just ship a better video model — it shipped a social platform with the ability to put anyone's face into any scene, generate realistic news-style footage, and produce synchronized audio. The safety implications were obvious and immediate. OpenAI's answer was a layered safety architecture it calls "built in from the very start."

Whether those guardrails are sufficient is a genuinely open question — independent research found that Sora could be prompted to generate misinformation videos 80% of the time. But the framework OpenAI has assembled is more comprehensive than most AI platforms have attempted, and worth understanding in detail.

Provenance: Every Video Is Signed

The foundation of Sora's safety approach is provenance — making sure every AI-generated video can be identified as such. Every video generated with Sora includes both visible and invisible provenance signals. All Sora videos embed C2PA metadata — an industry-standard cryptographic signature — and OpenAI maintains internal reverse-image and audio search tools that can trace videos back to Sora with high accuracy. Many outputs also carry visible, dynamically moving watermarks that include the name of the creator.

This matters because the most dangerous misuse of AI video isn't sophisticated — it's simply labeling synthetic content as real. C2PA metadata doesn't prevent misuse, but it creates a verifiable paper trail. The real-world test came quickly: watermarked Sora videos depicting fabricated police incidents went viral on X, Instagram, and TikTok, with many users believing they were real despite the visible watermark. The infrastructure exists — changing media literacy habits is a harder problem.

Likeness and Consent: You Control Your Character

One of Sora's most powerful — and sensitive — features is the ability to insert real people into AI-generated videos. OpenAI's approach separates this into two distinct layers:

Characters (consent-based): Users can create a Character from their own likeness — appearance and voice — and control exactly who can use it and in what context. Access can be revoked at any time, and all videos containing a character are viewable by the creator.
Image-to-video uploads: Users can upload photos of others to create videos, but must attest they have consent from people featured and the rights to upload the media. These generations are subject to stricter safety guardrails than Characters. Images of minors trigger even stricter moderation, and these videos always carry watermarks upon sharing.

Public figures who haven't consented to OpenAI's use of their likeness are explicitly blocked. This doesn't prevent all misuse — but it creates clear policy lines and accountability.

Teen Protections: Defaults That Protect Without Blocking

Sora includes a dedicated layer of protections for younger users that deserves specific attention:

Teen accounts have default limits on continuous scrolling in the feed
Stricter permissions apply to character generation for teens
Teen profiles are not recommended to adult users, and adults cannot initiate messages with teens
The feed is filtered to remove content that may be harmful or age-inappropriate
Parental controls in ChatGPT allow parents to manage DM settings, disable algorithm personalization, and override scroll limits

These defaults matter. Most platforms protect teens only when parents actively configure settings — Sora inverts that, requiring explicit override to remove protections rather than enable them.

Filtering Harmful Content: Layered, Not Single-Point

Sora uses layered defenses for content moderation rather than a single filter. At creation time, guardrails seek to block unsafe content before it's generated — checking both prompts and outputs across multiple video frames and audio transcripts. Blocked categories include sexual material, terrorist propaganda, and self-harm promotion.

Beyond generation, automated systems scan all feed content against OpenAI's Global Usage Policies and filter out unsafe or age-inappropriate material. These systems are continuously updated as new risk patterns emerge, and human review focuses on the highest-impact harms. Audio adds an additional layer: Sora automatically scans transcripts of generated speech for policy violations and blocks attempts to generate music that imitates living artists or existing works.

The Honest Limits

OpenAI's safety documentation is notably candid about what these systems cannot guarantee. Independent research demonstrated that bypassing guardrails to generate misinformation is possible with moderate effort. Watermark removal tools appeared within days of Sora 2's launch. And audio deepfakes, while harder to generate, represent an evolving threat the platform continues to invest in.

The framing OpenAI uses — "built in from the very start" — is accurate, but "built in" and "solved" are different things. The more honest framing is that Sora represents the most serious attempt yet to deploy AI video generation responsibly at scale. Whether that's sufficient depends heavily on how the platform evolves, who uses it, and how the broader information ecosystem adapts.

Source: openai.com/vi-VN/index/creating-with-sora-safely/

openaisorasora-2ai-videosafetydeepfakec2pacontent-moderation