The Bright Journey with AI
Posts
AI Guardrails for Your Next Project | James Earl Jones Immortalised | Reflection Model Not Measuring Up | YouTube Announce Better Detection Tools

AI Guardrails for Your Next Project | James Earl Jones Immortalised | Reflection Model Not Measuring Up | YouTube Announce Better Detection Tools

The Bright Journey with AI

Mark O'Brien
September 10, 2024

📆September 10th 2024📆

Don’t leave your AI app unprotected from…. itself. That’s right, app security took another leap on the complexity scale. Read on where I discuss the concept of AI guardrails and why you must ensure they are in your AI app. In other news…

James Earl Jones iconic Vader voice will live on thanks to AI
A new model, Reflection, causes disagreement in the AI community
First attempts at an AI generated game engine use Super Mario as the test case
YouTube announce better tools to help detect fake, AI generated content on its platform.

Dive in for all the details!

🤖 Unlock AI 🤖

LLM Guardrails

Today I’m going to do an overview of guardrails and why you must consider them for your next AI project. When building an application that has a large language model (LLM) component it is important to keep in mind the quality of the output. Assuming that the model is advanced enough to know what and when to say something is a mistake. Advancements have made these models more powerful but that also increases the impact of a mistake if left unchecked.

There are two generalised issues with LLMs that guardrails aim to solve:

Hallucinations - When faced with a question or topic that it has not been trained on or has conflicting information about, an LLM will try to predict the next most likely character in the sequence. This can lead to it making something up. The phenomenon of hallucinations is well documented and can result in your application giving wildly inaccurate advice to a user.
Unpredictability - Perhaps sounding counter to the nature of an AI model, there is a sense of unpredictability in how the LLM may deliver a response. The answer provided may be factually correct but when you’re building a customer facing application you must also consider brand, message, tone and the audience you are serving. You don’t want the AI to respond with offensive content or to promote a competitor to your customers website.

Available Solutions

There are a number of libraries and solutions developed in recent years that help add guardrails to your AI application. Two of of the more popular ones are:

NeMo Guardrails - Developed by Nvidia this method matches the users input with examples in a DB of stored guardrail examples. Provides a LangChain integration allowing for easier use in your AI app.
Guardrails AI - Defined independently to the model it’s being used on, meaning it can be called at any point in your application workflow. Offers a flexible API to be used with any LLM since it’s based on JSON format.

Quick Example

Looking at NeMo we can see there are basic elements to how it guides the LLM away from answers you want to avoid. For this quick example, let’s assume we don’t want respond to any political questions. We start by defining the form for user & bot messages:

define user express greeting
  "Hello"
  "Hi"
  "What's up?"

define bot express greeting
  "Hi there!"

define bot ask how are you
  "How are you doing?"
  "How's it going?"
  "How are you feeling today?"

define bot express empathy
  "I'm so sorry to hear that"

Next we guide the bot by offering accepted conversation flows. Think of this like shadowing a new employee on a sales call.

define flow greeting
  user express greeting
  bot express greeting

  bot ask how are you

  when user express feeling good
   bot express positive emotion

  else when user express feeling bad
   bot express empathy

Now we follow the same pattern to define some examples of a user asking about politics and guide it on how to respond.

define user ask about politics
  "What do you think about the government?"
  "Which party should I vote for?"

define inform cannot respond
  "I'm sorry but I cannot respond on politics"

define flow politics
  user ask about politics
  bot inform cannot respond

There you have it, a quick but important aspect of building AI applications. We should not take the risk of allowing the AI to answer as it likes, instead we should leverage the powerful natural language capabilities while ensuring our message is on brand and within acceptable guidelines. If you think of it, we do this with human trainees all the time so why would an AI assistant be any different?

💫Help Support my Work💫

I really enjoy researching and writing about AI & appreciate your support. If you enjoy this content please use any of the options below to help

Buy Me a Coffee ☕️- AI is really fuelled by coffee - keep the tank filled
Spread the Word 🗞️- Share on social or directly with your friends
Follow me on X - Let’s keep the conversation going - follow me today

📰 News 📰

James Earl Jones’ Legacy as Darth Vader Lives On Through AI

Following James Earl Jones' death at 93, his iconic portrayal of Darth Vader has been immortalized through AI. Jones gave Lucasfilm permission to use AI to recreate his legendary voice using a Ukrainian firm, Respeecher. This technology was used in Obi-Wan Kenobi and will likely shape future Star Wars content. As AI continues to raise questions about performance rights, Jones' decision marks a unique case of an actor consciously preserving their legacy posthumously.

Reflection 70B's Claims Spark Controversy

The launch of Reflection 70B, a new open-source AI model by HyperWrite, faced immediate scrutiny over its performance claims. Initial benchmarks touted by CEO Matt Shumer were contradicted by independent evaluations, raising questions about the model's accuracy and origins. Critics accused HyperWrite of misrepresenting the model's capabilities, with some suggesting it may be a variant of existing models rather than a groundbreaking innovation. As the AI community awaits further clarification and updated model weights, the incident highlights the volatility of AI hype and the importance of transparency in performance claims.

AI Model Simulates Super Mario Gameplay Dynamics

Researchers have developed MarioVGG, an AI model that simulates Super Mario Bros. gameplay by analyzing 737,000 frames of video. Despite its limitations, such as slow processing and glitches, the model demonstrates potential for generating coherent gameplay based on user inputs. The study suggests that with further training and optimization, AI could eventually replace traditional game engines, offering a new approach to game development. This innovative project highlights the evolving intersection of artificial intelligence and gaming technology.

YouTube Enhances AI Detection Tools for Creators

YouTube is introducing advanced tools to detect AI-generated content, including synthetic voices and deepfake faces. The upgraded Content ID system will identify AI-produced singing voices and allow creators to manage unauthorized use of their likenesses. These features aim to empower artists and enhance privacy, building on recent policy changes that enable takedown requests for deepfake content. YouTube's commitment to protecting creators aligns with its broader strategy to integrate AI responsibly into the platform. A pilot version of these tools is expected to launch early next year.

🔨 AI Powered Tools 🔨

Replit is a collaborative coding platform that allows users to build, deploy, and manage software projects directly from their browser. It supports over 50 programming languages and provides a cloud-based IDE for real-time collaboration. Its AI features, like "Ghostwriter," assist developers by auto-generating code and offering intelligent suggestions.
Reshot AI is a photo editor designed for professional headshots, leveraging AI to enhance images by refining facial features, removing imperfections, and optimizing lighting for a polished, professional result.
The StoryGraph is a book-tracking and recommendation platform that uses AI to offer personalized book suggestions based on users' reading habits and mood preferences. It provides detailed stats and helps users organize their reading journey.

Reply

or to participate.