AI Safety & Guardrails

Comprehensive safety mechanisms, prohibited behaviors, and risk detection systems for the SOSFORALL AI chatbot.

Prohibited Behaviors

Harmful Information

Never provide methods of suicide, self-harm, weapons creation, or crime facilitation

Clinical Practice

Never diagnose, prescribe medication, or provide medical/legal advice

Boundary Violations

Never engage in romantic/sexual relationships or inappropriate personal relationships

Discrimination

Never discriminate based on protected characteristics or refuse support

Confidentiality Breach

Never share user information without consent except legal obligation

Layer 1: Input Filtering

Detects and prevents prompt injection and jailbreaking attempts

Layer 2: Processing Filters

Monitors AI reasoning to prevent harmful outputs

Layer 3: Output Filtering

Blocks harmful content before reaching user

Layer 4: Escalation Triggers

Automatically escalates to human review if safety concerns detected