- AI agents use systems that are trained on specialized data sets, and are capable of independently solving multi-step problems in service of a particular goal.
- The initial training phase is all about giving an AI agent context. This is done through Retrieval Augmented Generation (RAGS), which teaches an AI agent to use business-specific data and constrains it to use only data that’s relevant to an organization or industry. This vastly improves its accuracy over generative AI methods.
RLHF: Reinforcement Learning With Human Feedback
RLHF is a fancy way of saying that you look at the work the AI agent has done so far and grade it as “right” or “wrong.” You create a list of questions for the AI agent, review the answers the agent gives to those questions, and then mark whether or not the answers were correct.
Then the agent uses that data to improve their responses. Like an eager student, it craves the reward of getting a correct answer and alters its behavior to try to provide more of them.
One key distinction between training an AI agent and a real person is that with a colleague you can be nuanced. An AI agent has a hard time understanding feedback beyond “right” and “wrong” (at least it does now). So your training data should only include falsifiable answers rather than explanations.
A side benefit of this process is that the act of grading forces the humans creating the training set to become very specific. Just like with any human interaction, where both sides learn from each other, you’ll learn from grading the agent’s work how to shape your questions to get the answers you seek, all while the AI agent learns what your expectations are for its outputs.
RLHTF: Everyone Has an Opinion - Use That!
One danger of RLHF is that if you only have one person providing feedback, the model could skew towards that one viewpoint. So to keep your feedback from becoming lopsided, the best way to review it is with a group. We’ve invented our own acronym: RLHTF (Reinforcement Learning from a Human Team’s Feedback).
If you’ve ever worked on a team, you know that there are always a ton of different perspectives in the room. If two people were to evaluate a response from an AI agent, it’s possible that one person might feel like it was perfectly clear, while another might feel it was completely incoherent.
Additionally, different team members will have varying needs — your VP might want a high level overview to head each output, your Marketing colleague might think answers are only useful when accompanied by a graph or chart, and your Analyst might want all answers to include a list of possible exceptions.
One of the best ways to provide human feedback is to get multiple perspectives, by selecting a team made up of people who have different backgrounds. Each person on the team grades the agent’s responses independently, and then afterward, the team discusses their scoring and uses their varied perspectives to build a single unified rubric that the AI can use going forward. This standardization process is a great way to give the AI the background it needs to work within its parameters.
Your Data May Be Gold, but Your Feedback Is Diamond
You might read that process and think…wow, that’s a lot of man hours. Which is a valid point — it does require people to spend their time grading an agent’s responses, and figuring out how to teach it what it needs to know. But that time is well worth the investment.
Once your AI agent is trained, and repeatedly retrained, you’ve got an extremely valuable data set. One key benefit of this is that if you ever need to retrain your model — or even switch the LLM that you model that your agent uses — you could do that and retrain your model much faster!
If you want to, for example, transition from OpenAI to DeepSeek, or to some other LLM that hasn’t been invented yet, you’ll have a validated data set that you can input into the LLM, as well as a rubric to teach it how to answer questions. While there’s a good deal of front-end labor to create that data, it means that you’ll have everything you need to save yourself time in the future.
How Often Should You Give Feedback
It’s good practice to develop a process around continuing to sharpen your AI agent’s skills by doing some reinforcement learning on a regular basis — weekly, monthly, or quarterly. (We recommend doing it weekly in the early stages of your model and then switching to monthly or quarterly once it’s live.)
The questions we want to answer with our data, and the challenges that a business is facing, shift over time, sometimes imperceptibly. While humans will pick up on those shifts through meetings, casual conversation, or even just instinct, AI agents need to actually be told what’s going on with falsifiable data. Checking in with the AI agent and making sure it’s still giving relevant answers is important to ensure that it’s adapting at the same rate as your human team.
Next week, we’ll be talking about steering committees, and how to organize a team to advance AI in the most effective way possible.