Guardrails
Learn how to implement guardrails to make your LLM applications safer
Introduction to Guardrails
Guardrails are an essential component of any customer-facing LLM system. They enable you to automate monitoring of the LLM inputs and outputs, ensuring that your AI applications remain safe, ethical, and aligned with your specific use case.
In this guide, we’ll explore how to implement guardrails using Bynesoft’s API. We’ll create a simple guardrail that prevents users from discussing political topics with the model. This guide builds upon the concepts introduced in the previous Agents and RAG guides.
Why Use Guardrails?
Guardrails offer several benefits for LLM applications:
- Safety: Prevent the model from producing undesirable or harmful content.
- Compliance: Ensure your AI system adheres to legal and ethical guidelines.
- Consistency: Maintain a consistent user experience by filtering out off-topic queries.
- Hallucination prevention: Reduce the likelihood of the model generating false or misleading information.
- Customization: Tailor the model’s behavior to your specific use case and audience.
Prerequisites
- Completion of the Agents guide
- A Bynesoft API key
- Python environment with the
requests
andpandas
libraries installed
What’s in this Guide
In this guide, we will:
- Create a knowledge base for storing forbidden prompts
- Generate example banned phrases
- Ingest the banned phrases into the knowledge base
- Create and attach a guardrail to an existing agent
- Test the guardrail with different queries
This guide is also available as a Colab workbook.
Implementing Guardrails
Step 1: Create a Knowledge Base for Forbidden Prompts
First, we’ll create a special “tech” knowledge base to store our banned prompts:
Step 2: Generate Example Banned Phrases
We’ll create a CSV file containing examples of banned political phrases:
Step 3: Ingest Banned Phrases into the Knowledge Base
Now, we’ll upload and process the CSV file containing our banned phrases:
Step 4: Create and Attach a Guardrail
Now, we’ll create a guardrail and attach it to our existing agent:
Testing the Guardrail
Let’s test our guardrail with two different queries:
Allowed Query
This query should be allowed and return a normal response.
Blocked Query
This query should be blocked by our guardrail, and you’ll receive a response indicating that the guardrail was triggered.
Understanding the Output
When a guardrail is triggered, the response will include a triggeredGuardRails field with details about the violation:
This output indicates that the guardrail successfully blocked the political query.
Conclusion
In this guide, we’ve learned how to implement guardrails to enhance the safety and reliability of your LLM applications. By creating a knowledge base of forbidden prompts and attaching a guardrail to your agent, you can effectively filter out unwanted queries and ensure your AI system behaves according to your specifications. Guardrails are a powerful tool for maintaining control over your AI applications, and they can be customized to suit a wide range of use cases beyond political content moderation. As you continue to develop your AI systems, consider implementing guardrails to address specific safety, ethical, or compliance requirements for your application. For more advanced guardrailing options, explore the API Reference.