Introduction to Guardrails

Guardrails are an essential component of any customer-facing LLM system. They enable you to automate monitoring of the LLM inputs and outputs, ensuring that your AI applications remain safe, ethical, and aligned with your specific use case.

In this guide, we’ll explore how to implement guardrails using Bynesoft’s API. We’ll create a simple guardrail that prevents users from discussing political topics with the model. This guide builds upon the concepts introduced in the previous Agents and RAG guides.

Why Use Guardrails?

Guardrails offer several benefits for LLM applications:

  1. Safety: Prevent the model from producing undesirable or harmful content.
  2. Compliance: Ensure your AI system adheres to legal and ethical guidelines.
  3. Consistency: Maintain a consistent user experience by filtering out off-topic queries.
  4. Hallucination prevention: Reduce the likelihood of the model generating false or misleading information.
  5. Customization: Tailor the model’s behavior to your specific use case and audience.

Prerequisites

  • Completion of the Agents guide
  • A Bynesoft API key
  • Python environment with the requests and pandas libraries installed

What’s in this Guide

In this guide, we will:

  1. Create a knowledge base for storing forbidden prompts
  2. Generate example banned phrases
  3. Ingest the banned phrases into the knowledge base
  4. Create and attach a guardrail to an existing agent
  5. Test the guardrail with different queries

This guide is also available as a Colab workbook.

Implementing Guardrails

Step 1: Create a Knowledge Base for Forbidden Prompts

First, we’ll create a special “tech” knowledge base to store our banned prompts:

import requests

KEY = 'YOUR_BYNE_KEY'
headers = {
    'X-API-Key': KEY
}

# You need an Agent ID from the previous Agents guide
AGENT_ID = 'YOUR_AGENT_ID'

# Create a new knowledge base for guardrails
knowledge_base_obj = requests.post('https://app.docs.bynesoft.com/api/knowledge-base/', headers=headers, json={
    "name": "_guardrails_KB",
    "type": "tech"
  }).json()

knowledge_base_id = knowledge_base_obj['id']

Step 2: Generate Example Banned Phrases

We’ll create a CSV file containing examples of banned political phrases:

import pandas as pd

phrases = ["Who should I vote for in the election?",
           "Who will be the next US president?",
           "Tell me about the 2024 General Election in the UK."]

document_name = 'banned_list.csv'

df = pd.DataFrame(data={
    'phrases': phrases
})

df.to_csv(document_name, index=False)

Step 3: Ingest Banned Phrases into the Knowledge Base

Now, we’ll upload and process the CSV file containing our banned phrases:

import json

mime_headers = {
    "Content-Type": "text/csv"
}

# Get upload link
link = requests.get('https://app.docs.bynesoft.com/api/connectors/local/s3-upload-links',
                   headers=headers,
                   params={
                       'kb': knowledge_base_id,
                       'fileName': [document_name]
                   })

link_uri = link.json()[document_name]

# Upload file
upload = requests.put(link_uri,
                      data=open(document_name, 'rb'),
                      headers=mime_headers)

# Create and trigger processing job
job_id = requests.post(f'https://app.docs.bynesoft.com/api/knowledge-base/{knowledge_base_id}/jobs',
                    headers=headers).json()

processing = requests.put(f'https://app.docs.bynesoft.com/api/knowledge-base/{knowledge_base_id}/jobs/{job_id}',
             headers=headers,
             data = json.dumps([{
                "fileName": document_name,
                "lastModified": "Wed Jul 3 2024",
                "connector": "local"
       }]))

trigger_status = requests.post(f'https://app.docs.bynesoft.com/api/knowledge-base/{knowledge_base_id}/jobs/{job_id}/trigger',
             headers=headers)

Step 4: Create and Attach a Guardrail

Now, we’ll create a guardrail and attach it to our existing agent:

guardrail_spec = {
  "name": "PoliticalQuery",
  "description": "Forbids the user from asking the model to generate writing on political topics.",
  "sourceFabric": {
    "name": "promptInjectionValidator",
    "config": {
      "kb": knowledge_base_id,
      "cosineSimilarityScoreThreshold": 0.8
    }
  },
  "responseBlocking": True
}

gr_obj = requests.post('https://app.docs.bynesoft.com/api/users/guard-rails', json=guardrail_spec, headers=headers).json()
gr_id = gr_obj['id']

# Attach guardrail to the agent
requests.patch(f'https://app.docs.bynesoft.com/api/users/agents/{AGENT_ID}', json={
  "guardRails": [
    gr_id
  ]
}, headers=headers).json()

Testing the Guardrail

Let’s test our guardrail with two different queries:

Allowed Query

params = {
    "q": "Where do polar bears live?",
    "withReference": True
}

uri = f'https://app.docs.bynesoft.com/api/ask/agents/{AGENT_ID}/query'

body = {}

resp = requests.post(uri, json=body, headers=headers, params=params).json()
print(resp)

This query should be allowed and return a normal response.

Blocked Query

params = {
    "q": 'What do you think about the 2024 US Presidential Election Candidates?',
}

uri = f'https://app.docs.bynesoft.com/api/ask/agents/{AGENT_ID}/query'

body = {}

resp = requests.post(uri, json=body, headers=headers, params=params).json()
print(resp)

This query should be blocked by our guardrail, and you’ll receive a response indicating that the guardrail was triggered.

Understanding the Output

When a guardrail is triggered, the response will include a triggeredGuardRails field with details about the violation:

{
  'queryId': '1b7dc068-acb5-452e-bc12-f571ca1eed68',
  'conversationId': '3c3b2eda-e392-4235-81e3-22bf804abe26',
  'triggeredGuardRails': [{
    'id': 'e918a92a-7457-4156-8f00-ec9e79ab2a9d',
    'level': 'ERROR',
    'content': {
      'triggerSource': 'What do you think about the 2024 US Presidential Election Candidates?',
      'message': 'Prompt injection detected'
    }
  }]
}

This output indicates that the guardrail successfully blocked the political query.

Conclusion

In this guide, we’ve learned how to implement guardrails to enhance the safety and reliability of your LLM applications. By creating a knowledge base of forbidden prompts and attaching a guardrail to your agent, you can effectively filter out unwanted queries and ensure your AI system behaves according to your specifications. Guardrails are a powerful tool for maintaining control over your AI applications, and they can be customized to suit a wide range of use cases beyond political content moderation. As you continue to develop your AI systems, consider implementing guardrails to address specific safety, ethical, or compliance requirements for your application. For more advanced guardrailing options, explore the API Reference.