How I Keep Companion AI Warm Without Failing UGC and Safety Review
A companion safety case study: UGC moderation obligations meet emotional-chat model routing.
CompaniPet's first rejection was triggered by ambiguous safety wording in the onboarding flow, not by a crash. The reviewer could reach chat quickly, but could not clearly see the boundary between emotional support and prohibited guidance.
This is a common forum pattern for companion products: review expects not only model filters, but also visible UGC controls (report, block, moderation policy, and abuse handling), especially when chat can drift into sensitive topics.
The product has to feel warm enough that users want to return, but bounded enough that the model does not pretend to be a therapist, crisis service, romantic partner, or universal authority. If the safety layer is too weak, the app can produce harmful responses. If the safety layer is too loud, the companion stops feeling alive.
CompaniPet is a useful case study because the intended product is modest: a pet-like AI companion for light daily conversation and gentle encouragement. That scope sounds safe, but the implementation still needs real routing. Any open text box will eventually receive sensitive, hostile, sexual, medical, or crisis-adjacent input.
The architecture should not ask one generative prompt to solve all of that.
App-Type Trigger Matrix (Companion + UGC)
| Common trigger in companion apps | Guideline pressure | How I design around it |
|---|---|---|
| Emotionally sensitive prompts are answered with free-form authority | 1.1 Safety | Route to fixed boundary responses before generation |
| User-generated profile/chat content has no visible abuse controls | 1.2 UGC | Ship in-context report, block, and moderation policy links |
| Companion copy sounds like therapy or crisis support | 1.1 + 2.3.1 | Narrow product scope and align copy with real capability |
| Safety policy exists but cannot be reached from active chat | 1.2 UGC | Expose policy/help entry points on safety-critical screens |
Companion safety starts with scope
The product scope must be narrow enough to encode:
enum CompanionScope {
case playfulChat
case dailyCheckIn
case gentleEncouragement
case unsupported
}
I avoid positioning the companion as therapy, medical advice, dating simulation, or emergency support. The system can still be emotionally warm, but it should not claim professional authority.
That scope becomes a routing layer:
enum CompanionRiskLevel: Equatable {
case normal
case sensitive
case blocked
}
struct CompanionSafetyDecision {
let level: CompanionRiskLevel
let category: CompanionRiskCategory
let shouldGenerate: Bool
let fixedResponse: String?
}
The model only sees inputs that are safe for generative response. Other categories get fixed, tested responses.
The risk categories should be small and testable
I do not start with a huge moderation ontology. For an indie iOS app, I want categories I can test every release:
enum CompanionRiskCategory: String, Codable {
case normal
case sadness
case selfHarm
case medicalAdvice
case explicitSexual
case abuseOrViolence
case illegalInstruction
}
The router can be a combination of local rules, a small classifier, or a server-side service depending on the product's privacy posture. The important part is that the companion generation prompt is not the first safety boundary.
Warmth and refusal are not opposites
A bad refusal says:
I cannot assist with that request.
That may be safe, but it destroys the companion personality.
A better boundary response preserves tone without pretending to solve the problem:
I am really sorry you are feeling this heavy. I am just a small companion in an app, so I cannot be emergency help. Please contact someone you trust or local emergency support now.
The response is fixed because I do not want a model improvising crisis language. But it still fits the companion's emotional contract.
The normal prompt can stay warm because the router is strict
Once risky inputs are routed away, the normal companion prompt can be simpler:
You are a warm pet-like companion.
Keep replies short, gentle, and non-clinical.
Do not claim to be a therapist, doctor, partner, or emergency service.
For normal daily chat, respond playfully and supportively.
The normal path should not be overburdened with every safety rule. That makes the character stiff and still does not guarantee safety.
The architecture is layered:
- Router detects unsupported categories.
- Fixed responses handle high-risk categories.
- Companion model handles normal conversation.
- Output filter catches accidental boundary drift.
Memory must be scoped carefully
Companion apps often want memory because continuity creates attachment. Memory also creates privacy and safety risk.
I separate memory into explicit user-controlled facts:
@Model
final class CompanionMemory {
@Attribute(.unique) var id: UUID
var label: String
var value: String
var createdAt: Date
var userEditable: Bool
}
I avoid silently storing highly sensitive emotional details as durable memory. If the app remembers something, the user should be able to see and delete it.
Memory policy:
| Memory type | Default |
|---|---|
| Pet name | store |
| Preferred tone | store |
| Favorite topics | store with control |
| Crisis details | do not store as companion memory |
| Medical details | do not store by default |
The companion should feel continuous without becoming a hidden diary the user cannot inspect.
Testing the safety layer
I keep a regression set:
| Input class | Expected behavior |
|---|---|
| Casual check-in | warm generated reply |
| Lonely but non-crisis | gentle generated support |
| Self-harm crisis | fixed support response |
| Medical diagnosis request | boundary and professional advice redirect |
| Explicit sexual request | refusal or redirect |
| Violent instruction | blocked response |
The key metrics:
high_risk_generation_rate = high_risk_inputs_sent_to_model / high_risk_inputs
personality_break_rate = normal_inputs_with_stiff_refusal / normal_inputs
A companion app has to minimize both.
Implementation notes from this rejection
AI companion safety is not a single prompt. It is a routing architecture.
The version I trust has:
- A narrow product scope.
- A risk router before generation.
- Fixed responses for high-risk categories.
- A warm normal prompt for safe conversation.
- User-visible memory controls.
- A regression test set that measures both safety and personality.
The goal is not to turn a companion into a compliance machine. The goal is to make warmth technically sustainable.