Marketing Data Experience Business Transformation

Our services include everything from optimizing your marketing technology and data stacks to launch planning & program management for your critical new business launches.

Or if you need help building custom data models or developing data warehouses and ETL systems, we can help you make better products and faster decisions.

Data & Martech Systems Optimization

Decision Science & Data Engineering

Attribution & Media Mix Modeling

Strategy & Vision Consulting

Product Design & Development

Fractional Marketing & Data Analysts

Product Launch Support

View All Services

Increase ROI. Improve Efficacy. Grow your customers.

The most advanced tools for modern product, marketing, and analytics teams.

Make your CMO CEO CPO CDO CIO happy.

Pickaxe Query & Visualize

Analyze, Report, & Predict

Pickaxe Query gives you the easiest to use, most advanced tools for analyzing all your data gathered, including your existing data warehouses. Make sense of all of your data with built-in predictions & insights that free your team.

See All Features

Pickaxe Mix

Budget, Plan, & Predict

Pickaxe Mix gives you budgeting superpowers. A new approach to Media Mix Modeling that will not only analyze all your channels & tactics, but can also predict the future - letting you run budget scenarios to deliver the highest ROI.

See All Features

Articles

Don’t DOGE It Up—The Importance of Keeping Business Context Alongside Data

DOGE claimed that it had found evidence of fraud on a massive scale—tens of millions of people over 100 years old were receiving social security benefits… With a little extra digging, it was revealed that only 89,000 people over the age of 99 are receiving payments based on earnings.

Hate Google AI overviews? Add “F*CK” To Your Queries

For the rest of us, who think of Google Search as a tool that should return helpful answers, the AI overviews are annoying at best and infuriating at worst.

AI Hall Of Shame: The One Where ChatGPT Made Up Fake Lawsuits

It’s a great example of some missteps that happen all the time when adopting AI. If we could go back in time and play puppeteer, here are some things we’d change about this law firm’s approach.

Launch of DTC offer for YES Network

When the Yankees Entertainment Network wanted to launch a new DTC offer, they turned to Pickaxe to help.

Spartan reduced CPA by over 50% by implementing CAPI.

Long before BritBox was streaming much loved shows to audiences around the world, they were looking for an experienced team to help them take the idea from concept to launch.

Nexstar CDP Development, Event Data Schema, & Implementation

Nexstar needed a better way to understand user behavior, ad anomalies, and advertising performance across the hundreds of federated stations and websites that they operate from a mix of technology platforms built by them, or that they had acquired from multiple legacy parent companies over the years. The result was inconsistent and disparate reporting.

View All Articles & Case Studies

Industry Insights

How We Trained Your AI Agent, Part 2: Reinforcement Learning

March 4, 2025

In our previous post, we talked about the initial training phase for AI agents. To give you a quick recap:

AI agents use systems that are trained on specialized data sets, and are capable of independently solving multi-step problems in service of a particular goal.
The initial training phase is all about giving an AI agent context. This is done through Retrieval Augmented Generation (RAGS), which teaches an AI agent to use business-specific data and constrains it to use only data that’s relevant to an organization or industry. This vastly improves its accuracy over generative AI methods.

This initial phase of training an AI agent is all about showing it what its job is, and how to do it. But to improve, AI agents need feedback on their performance (just like people do). This second phase is where the reinforcement learning begins. And in this case we’ll focus on RLHF.

RLHF: Reinforcement Learning With Human Feedback

RLHF is a fancy way of saying that you look at the work the AI agent has done so far and grade it as “right” or “wrong.” You create a list of questions for the AI agent, review the answers the agent gives to those questions, and then mark whether or not the answers were correct.

Then the agent uses that data to improve their responses. Like an eager student, it craves the reward of getting a correct answer and alters its behavior to try to provide more of them.

One key distinction between training an AI agent and a real person is that with a colleague you can be nuanced. An AI agent has a hard time understanding feedback beyond “right” and “wrong” (at least it does now). So your training data should only include falsifiable answers rather than explanations.

A side benefit of this process is that the act of grading forces the humans creating the training set to become very specific. Just like with any human interaction, where both sides learn from each other, you’ll learn from grading the agent’s work how to shape your questions to get the answers you seek, all while the AI agent learns what your expectations are for its outputs.

RLHTF: Everyone Has an Opinion - Use That!

One danger of RLHF is that if you only have one person providing feedback, the model could skew towards that one viewpoint. So to keep your feedback from becoming lopsided, the best way to review it is with a group. We’ve invented our own acronym: RLHTF (Reinforcement Learning from a Human Team’s Feedback).

If you’ve ever worked on a team, you know that there are always a ton of different perspectives in the room. If two people were to evaluate a response from an AI agent, it’s possible that one person might feel like it was perfectly clear, while another might feel it was completely incoherent.

Additionally, different team members will have varying needs — your VP might want a high level overview to head each output, your Marketing colleague might think answers are only useful when accompanied by a graph or chart, and your Analyst might want all answers to include a list of possible exceptions.

One of the best ways to provide human feedback is to get multiple perspectives, by selecting a team made up of people who have different backgrounds. Each person on the team grades the agent’s responses independently, and then afterward, the team discusses their scoring and uses their varied perspectives to build a single unified rubric that the AI can use going forward. This standardization process is a great way to give the AI the background it needs to work within its parameters.

Your Data May Be Gold, but Your Feedback Is Diamond

You might read that process and think…wow, that’s a lot of man hours. Which is a valid point — it does require people to spend their time grading an agent’s responses, and figuring out how to teach it what it needs to know. But that time is well worth the investment.

Once your AI agent is trained, and repeatedly retrained, you’ve got an extremely valuable data set. One key benefit of this is that if you ever need to retrain your model — or even switch the LLM that you model that your agent uses — you could do that and retrain your model much faster!

If you want to, for example, transition from OpenAI to DeepSeek, or to some other LLM that hasn’t been invented yet, you’ll have a validated data set that you can input into the LLM, as well as a rubric to teach it how to answer questions. While there’s a good deal of front-end labor to create that data, it means that you’ll have everything you need to save yourself time in the future.

How Often Should You Give Feedback

It’s good practice to develop a process around continuing to sharpen your AI agent’s skills by doing some reinforcement learning on a regular basis — weekly, monthly, or quarterly. (We recommend doing it weekly in the early stages of your model and then switching to monthly or quarterly once it’s live.)

The questions we want to answer with our data, and the challenges that a business is facing, shift over time, sometimes imperceptibly. While humans will pick up on those shifts through meetings, casual conversation, or even just instinct, AI agents need to actually be told what’s going on with falsifiable data. Checking in with the AI agent and making sure it’s still giving relevant answers is important to ensure that it’s adapting at the same rate as your human team.

Next week, we’ll be talking about using AI cost-effectively by figuring out what models are right for you and keeping your AI infrastructure flexible.

Ready to Start Working With AI Agents?

Contact us!

To improve, AI agents need feedback on their performance (just like people do). This second phase is where the reinforcement learning begins.

Marketing Data Experience Business Transformation

Decision Science & Data Engineering

Decision Science & Data Engineering

Attribution & Media Mix Modeling

Attribution & Media Mix Modeling

Strategy & Vision Consulting

Strategy & Vision Consulting

Product Design & Development

Product Design & Development

Fractional Marketing & Data Analysts

Fractional Marketing & Data Analysts

Product Launch Support

Product Launch Support

Increase ROI. Improve Efficacy. Grow your customers.

The most advanced tools for modern product, marketing, and analytics teams.

Make your CMO CEO CPO CDO CIO happy.

Pickaxe Query & Visualize

Analyze, Report, & Predict

Pickaxe Query gives you the easiest to use, most advanced tools for analyzing all your data gathered, including your existing data warehouses. Make sense of all of your data with built-in predictions & insights that free your team.

Pickaxe Mix

Budget, Plan, & Predict

Pickaxe Mix gives you budgeting superpowers. A new approach to Media Mix Modeling that will not only analyze all your channels & tactics, but can also predict the future - letting you run budget scenarios to deliver the highest ROI.

Articles

Case Studies

How We Trained Your AI Agent, Part 2: Reinforcement Learning

RLHF: Reinforcement Learning With Human Feedback

RLHTF: Everyone Has an Opinion - Use That!

Your Data May Be Gold, but Your Feedback Is Diamond

How Often Should You Give Feedback

Ready to Start Working With AI Agents? ​

Explore more

Product

The most powerful query, visualization, analysis, and insights tool.

About Us

learn more

Thank You!

We'll be in touch!

REQUEST A DEMO

Ready to Start Working With AI Agents?