As data nerds, we love quantifying things. Gathering raw information is the first step to understanding the world around us, and the way things work—there’s little we’re more passionate about than a database.
However, examining that raw information in a vacuum, with no additional context, is sometimes worse than having no data at all.
One high profile example: in February, DOGE claimed that it had found evidence of fraud on a massive scale—tens of millions of people over 100 years old were receiving social security benefits. However, the Social Security Administration has a code in place that automatically stops payouts to anyone over the age of 115.
With a little extra digging, it was revealed that only 89,000 people over the age of 99 are receiving payments based on earnings.
The lesson? Looking at one data set, without additional information, is not enough. In this case, an overreliance on numbers without adequate background information led to spurious conclusions.
And it raises a really important fact: business context is inseparable from data.
So How Can You Make Sure You’re Taking Context Into Consideration?
Know Your Market
At Pickaxe, we’re obsessed with data systems and infrastructure—we can’t shut up about it—but that’s only part of the puzzle. You also have to know what you’re talking about.
There is a tendency to think about data as if it’s separate from whatever business it’s part of. In some ways, this is a reasonable way to think—there are general practices and rules that apply to data science across pretty much every industry. Data hygiene practices, for example, are fairly universal.
However, data teams need to understand the business in a holistic way. Which numbers are important, and which are superfluous? What are your KPIs, how do your numbers translate to real-world outcomes? Which numbers should be going up, which ones should be going down, and is the AI you’re using trained to recognize the difference?
Educating your data team on the details of your business is key to making sure the right things are being measured, and you’re landing on the correct takeaways.
Encourage Communication Between Teams
For data to be useful, it needs to be used in a way that serves everybody’s needs.
On a systems level, that means developing schemas that organize data so that it’s compatible with front-end dashboards, and documenting lineage so you can track where data originated and how it’s moved between databases. Maintaining a level of data hygiene is important for ensuring that everything that’s collected is also usable.
“Serving everybody’s needs” also means, of course, figuring out what those needs are. You can have the most beautifully organized data in the world, and a pipeline that works flawlessly, but if you’re not collecting information that serves your teams’ goals, then it doesn’t really matter.
It’s important for data teams to understand what marketing teams (or any other teams that need to access data) are trying to accomplish.
Make AI Training a Team Affair
Adopting AI is exciting, but for your AI to be useful, it needs to return the answers you need in the format that is best for everyone who uses it.
Reinforcement Learning with Human Feedback (RLHF) is a process by which you can hone your AI agent’s performance through feedback, to make sure it’s doing what it should be doing.
But to be even more effective, you can have more than one person giving that feedback—enlisting a whole team, or multiple teams, can ensure that AI is a useful tool to every stakeholder. (We wrote a whole blog post about this!)
Without Context, You Just Have Numbers
Context—more specifically, deep knowledge of your business—is necessary to finding meaning in your data.
For a data scientist, it’s kind of like sandpaper: it enables you to finesse the edges of whatever you make, so it always makes sense.