AI Safety

How I Keep Companion AI Warm Without Failing UGC and Safety Review

A companion safety case study: UGC moderation obligations meet emotional-chat model routing.

2026-04-24CompaniPetAI CompanionSafetyUGCOn-Device AI

CompaniPet's first rejection was triggered by ambiguous safety wording in the onboarding flow, not by a crash. The reviewer could reach chat quickly, but could not clearly see the boundary between emotional support and prohibited guidance.

This is a common forum pattern for companion products: review expects not only model filters, but also visible UGC controls (report, block, moderation policy, and abuse handling), especially when chat can drift into sensitive topics.

The product has to feel warm enough that users want to return, but bounded enough that the model does not pretend to be a therapist, crisis service, romantic partner, or universal authority. If the safety layer is too weak, the app can produce harmful responses. If the safety layer is too loud, the companion stops feeling alive.

CompaniPet is a useful case study because the intended product is modest: a pet-like AI companion for light daily conversation and gentle encouragement. That scope sounds safe, but the implementation still needs real routing. Any open text box will eventually receive sensitive, hostile, sexual, medical, or crisis-adjacent input.

The architecture should not ask one generative prompt to solve all of that.

User input Risk router normal / sensitive Companion model warm response Fixed response support boundary Chat UI
Figure 1: A companion app should route sensitive input before generation. The warm model should not improvise every boundary response.

App-Type Trigger Matrix (Companion + UGC)

Common trigger in companion appsGuideline pressureHow I design around it
Emotionally sensitive prompts are answered with free-form authority1.1 SafetyRoute to fixed boundary responses before generation
User-generated profile/chat content has no visible abuse controls1.2 UGCShip in-context report, block, and moderation policy links
Companion copy sounds like therapy or crisis support1.1 + 2.3.1Narrow product scope and align copy with real capability
Safety policy exists but cannot be reached from active chat1.2 UGCExpose policy/help entry points on safety-critical screens

Companion safety starts with scope

The product scope must be narrow enough to encode:

enum CompanionScope {
    case playfulChat
    case dailyCheckIn
    case gentleEncouragement
    case unsupported
}

I avoid positioning the companion as therapy, medical advice, dating simulation, or emergency support. The system can still be emotionally warm, but it should not claim professional authority.

That scope becomes a routing layer:

enum CompanionRiskLevel: Equatable {
    case normal
    case sensitive
    case blocked
}

struct CompanionSafetyDecision {
    let level: CompanionRiskLevel
    let category: CompanionRiskCategory
    let shouldGenerate: Bool
    let fixedResponse: String?
}

The model only sees inputs that are safe for generative response. Other categories get fixed, tested responses.

The risk categories should be small and testable

I do not start with a huge moderation ontology. For an indie iOS app, I want categories I can test every release:

enum CompanionRiskCategory: String, Codable {
    case normal
    case sadness
    case selfHarm
    case medicalAdvice
    case explicitSexual
    case abuseOrViolence
    case illegalInstruction
}

The router can be a combination of local rules, a small classifier, or a server-side service depending on the product's privacy posture. The important part is that the companion generation prompt is not the first safety boundary.

Warmth and refusal are not opposites

A bad refusal says:

I cannot assist with that request.

That may be safe, but it destroys the companion personality.

A better boundary response preserves tone without pretending to solve the problem:

I am really sorry you are feeling this heavy. I am just a small companion in an app, so I cannot be emergency help. Please contact someone you trust or local emergency support now.

The response is fixed because I do not want a model improvising crisis language. But it still fits the companion's emotional contract.

The normal prompt can stay warm because the router is strict

Once risky inputs are routed away, the normal companion prompt can be simpler:

You are a warm pet-like companion.
Keep replies short, gentle, and non-clinical.
Do not claim to be a therapist, doctor, partner, or emergency service.
For normal daily chat, respond playfully and supportively.

The normal path should not be overburdened with every safety rule. That makes the character stiff and still does not guarantee safety.

The architecture is layered:

  1. Router detects unsupported categories.
  2. Fixed responses handle high-risk categories.
  3. Companion model handles normal conversation.
  4. Output filter catches accidental boundary drift.

Memory must be scoped carefully

Companion apps often want memory because continuity creates attachment. Memory also creates privacy and safety risk.

I separate memory into explicit user-controlled facts:

@Model
final class CompanionMemory {
    @Attribute(.unique) var id: UUID
    var label: String
    var value: String
    var createdAt: Date
    var userEditable: Bool
}

I avoid silently storing highly sensitive emotional details as durable memory. If the app remembers something, the user should be able to see and delete it.

Memory policy:

Memory type Default
Pet name store
Preferred tone store
Favorite topics store with control
Crisis details do not store as companion memory
Medical details do not store by default

The companion should feel continuous without becoming a hidden diary the user cannot inspect.

Testing the safety layer

I keep a regression set:

Input class Expected behavior
Casual check-in warm generated reply
Lonely but non-crisis gentle generated support
Self-harm crisis fixed support response
Medical diagnosis request boundary and professional advice redirect
Explicit sexual request refusal or redirect
Violent instruction blocked response

The key metrics:

high_risk_generation_rate = high_risk_inputs_sent_to_model / high_risk_inputs
personality_break_rate = normal_inputs_with_stiff_refusal / normal_inputs

A companion app has to minimize both.

Implementation notes from this rejection

AI companion safety is not a single prompt. It is a routing architecture.

The version I trust has:

  1. A narrow product scope.
  2. A risk router before generation.
  3. Fixed responses for high-risk categories.
  4. A warm normal prompt for safe conversation.
  5. User-visible memory controls.
  6. A regression test set that measures both safety and personality.

The goal is not to turn a companion into a compliance machine. The goal is to make warmth technically sustainable.