How Humans and Machine Learning Deliver the Best Result (Podcast)
I was recently honoured to be a guest on The GrayMeta Podcast - Metadata Matters. In this podcast, I have the opportunity to discuss my perspective on metadata and how to structure human teams to effectively work with data generated by machine learning. Hosted by the incredibly talented Matt Eaton.
In the episode we discuss:
- I talk about how I've worked with metadata and machine learning algorithms through my career.
- I describe the concept of Centaur Teams and discuss practical examples of where humans working together with machine learning have provided the best data results.
- We discuss some of the factors holding back faster adoption of machine learning and the opportunities of using AI within the media industry.
I really enjoyed speaking to Matt, and I hope you enjoy listening.
Appendix - transcript
Hi, my name is Matt Eaton and welcome to Metadata Matters, the GrayMeta podcast.
In this podcast series, GrayMeta will be talking to people working with metadata on a daily basis to learn their perspectives, understand best practices, and find out how to get the most out of your metadata.
In particular, I will focus on how technology like machine learning and AI can help generate, curate, and work with that metadata so it can save time and costs, increase operational efficiencies, and generate new ways to monetize content.
Just to briefly introduce GrayMeta, we're a metadata-driven solutions company who helps organizations with content in three ways.
We digitize tape-based content using our Tape-to-File service.
Our QC product, Iris, is used extensively in the mastering and content ingest process to ensure video and audio integrity where technical metadata is vital.
And our Curio platform is used to automatically generate rich descriptive metadata for content using a range of machine learning and AI services.
I'm joined today by Greg Detra, an applied machine learning expert, to talk about his perspectives on metadata and how to structure human teams to effectively work with data generated by machine learning.
Greg was previously a chief data scientist at Channel 4 where he worked on a range of award-winning data products across advertising, personalization, and forecasting.
And GrayMeta's Curio product and the metadata it generated using a range of machine learning services was involved in one of those projects, which was related to contextual advertising.
Greg now works with fast-growing startups and large companies as an advisor and coach, helping them build and develop their data technology teams.
His blog, makingdatamistakes.com, is a very informative and entertaining read too.
Hi, Greg, thanks very much for joining. - Hi, Matt, nice to be here, thank you. - So could you talk a bit about how you've been working with content metadata during your career? - Well, it's taken a few different forms.
Memrise, we were at one point trying to build the world's first crowdsourced dictionary across hundreds of languages.
And I learned the hard way that the metadata for words is just fabulously complicated.
I remember the first moment I discovered that Japanese had two alphabets, and that broke all of our database schemas.
And then everything from how to pronounce to how to remember different words.
From there, with Channel 4, to take one example, there's lots of really interesting metadata, especially around their programs, that we can potentially use to categorize and recommend and personalize.
And finally, with my consulting clients, there are some who are trying to figure out metadata about your bank transactions to help decide if you're eligible for a loan, or about the articles that you're reading to try and place them into different semantic categories.
So yeah, content metadata has been a recurring theme in my life for many years. - Yep, you're great.
I mean, I don't think that speaks to the breadth of how it can be applied and its usefulness as well.
So, at Grey Meta, we're trying to help our clients accelerate their adoption of machine learning generated metadata to drive business outcomes.
And one thing that I'm very interested to hear more about is you've talked previously about the concept of Centaur teams, and Centaur being the mythical beast with the half man, half horse here.
You described this as being the most effective way of human teams working with machine learning.
Could you please explain what you mean by Centaur teams and how this phrase came about? - Well, I would have thought that the half horse, half human children from Greek mythology would be the application to data science would be pretty straightforward and obvious.
But if I had to spell it out, I guess the idea comes from chess.
And if we go back to 1997, when Garry Kasparov got beaten by Deep Blue, you might have been forgiven for thinking at that moment, "Well, that's it for us monkeys."
That's the end of human beings having anything to contribute to the world of chess.
Actually, if you fast forward 20 years, the best chess player in the world, 20 years after Deep Blue won, well, the best chess player in the world, it wasn't a human being, but it wasn't a machine either.
It was what they call a Centaur team, a hybrid of a really great human player with a big computer and database that they could use to sort of add their intuition on top of.
And so there's something really profoundly interesting about the idea that even though machines were better in isolation than humans, the two together were much greater than the sum of their parts.
And that this was true, even in a domain as like black and white as chess, where you have perfect information and everything's deterministic.
And so I suppose the conclusion I've drawn from this and that I've seen born out is that we should expect the best performers, the world-class performance in more or less every domain of the creative and knowledge economy will be, those world-class performers will be Centaur teams.
They won't be human beings on their own.
They won't be machines on their own.
They'll be Centaur teams, where the ratio of person to machine or will probably vary quite a lot and vary over time and vary from domain to domain, but that the two together will be greater than the sum of their parts. - Yeah.
This is a great sort of way of putting it.
I mean, greater than the sum of the parts.
I mean, it means teams can do more than a human can, I mean, just on their own.
And also it extends the capabilities, not just volume, but I guess the ability of what is possible as well.
I mean, could you give some examples of where you've seen Centaur teams working well? - Yeah, so there's, I think quite a few.
If we take as one example, audience segmentation at Channel 4, where we were trying to divide our 20 million viewers into different groups so that we could more easily think about the kinds of content that they might like and their different behaviors.
And obviously this required a fancy machine learning algorithm to operate over billions of rows and millions of users.
And that was doing the heavy lifting.
But in practice, if you imagine that it's as straightforward as saying, okay, great, I'll just download the data, run the algorithm, I'll get my answer.
In practice, it really just doesn't work that way for a number of reasons.
So one is that you need humans to parameterize that algorithm.
Should you have five clusters or 15?
And do they look meaningful?
And is there a problem?
Is there a weird kind of gotcha in the data that's confusing things?
And so it ends up being, when it works well, it ends up being very much a conversation between the algorithm designers and the domain experts.
And then of course, if we take that example through a few more steps.
So even once we had that clustering in place, we use that to create multiple different versions of the Channel 4 homepage for different kinds of users, each one of which was being managed by a different individual human editor.
But then within that page, we used algorithms again, this time a kind of recommendation ranking algorithm to kind of reorder things a little bit.
So what you ended up with was the ball being handed back and forth between human editors and machines to create something that ended up being much more effective than we could have done with machines alone or humans alone.
So I think that's one example.
Another, just to kind of contrast that, we were working on a forecasting project and Channel 4 had a really expert human forecasting team that had been doing the job of trying to predict which kinds of people would be watching at different times, different programs in the future and doing a really super job for many, many years.
And they were really expert at doing this and operating under uncertainty.
So we had an algorithm that did a pretty good job and was basically a parity with the human team.
But what was really interesting is if you look at the kinds of mistakes or when they were getting things wrong, they were getting things wrong in very different circumstances.
So this presents an amazing opportunity.
The human beings were brilliant at predicting for the main channel, especially at peak time, they did an amazing job of using everything they could bring to bear about past, about the context, about whatever's going on in society and what is going on on other channels and the program itself to make really accurate predictions.
And they outperformed the machines usually.
Whereas of course, if you're trying to predict what's gonna happen at 4 a.m. on a Tuesday on a secondary channel for a repeat, for the Simpsons or something that's been shown many times before, the algorithms end up doing a better job, if only because they're sort of untiring and completely unbiased.
And so you can easily see how together you could create a system that is much greater than the sum of its parts. - Yeah, it's a great example.
And I've come across examples where in editorial compliance, machine learning can pick up on images or some background effect that a human would miss purely because they tire, they're human and this is sort of analyzing every frame with the same amount of intensity, which is interesting.
But I think the other point that you had as well, which illustrates well is that the ability to deal with volumes of data in a better way.
Graeme, I said, there's an example of this human in the loop kind of concept for the Royal Wedding, Who's Who app that we helped develop with Sky News.
We provided face detection and recognition of guests as they arrived at Windsor Chapel.
But we knew at the time, you could only expect to sort of an accuracy level of around 70%.
This is 2018, on a good day, as long as we didn't have sort of big hats and lots of dark glasses at the time, but 70% would have been great, but not good enough for broadcast.
So we built a human in the loop checkpoint as part of the process, so that the AI could do the mass checking of guests' faces as they arrived in clusters, groups of six or seven at a time, and then presented an image of the key frame from the live feed together with a key frame of who the AI thinks the person is to a human.
And the human had the ability to override it if it looked nonsense.
And that way we were able to get complete accuracy.
It was an example of where there was too many guests arriving at the same time for a human to possibly think of, I'd be able to identify all of them, but using machine learning as a tool to help generate that metadata.
So yeah, there's definitely a few different ways where it can be more than the sum of the parts.
But I think there was a recent Harvard Business Review article talking about lack of integration with these existing systems and processes as the most common barrier to faster adoption of machine learning services.
So in your experience, what's the biggest factors holding back more widespread adoption of machine learning generated data services?
Well, I think that integration point is a really critical one, and it's integration both with technologies and with processes.
And it can be a massive barrier.
If you've got to get on the roadmap of a different part of the organization that has their own priorities in order for them to make a change that enables your system to actually get into production, then you're gonna see those bottlenecks all the time.
I think there's a lot one could say about that, and that gets down to sort of org charts and the way that we plan and prioritize our technology roadmap.
I suppose I'll just make a different point though, and focus on, on I suppose the way that automation and machine learning can feel threatening and risky.
And so a couple of times, both at the Guardian and at Channel 4, I've worked at organizations where most people tend to think in terms of words or images.
And so being the team that at least, on the face of it, thinks in terms of numbers, puts you potentially at odds, especially if you're to sort of walk in with your efficiencies hat on and try and emphasize the way in which you could try and do things more cheaply.
I think one of the things about central teams that I find appealing is that it very explicitly makes clear that the humans are a critical part, that the people are a critical part of the eventual system and that the system is better, that they're ineliminable from a really, really high functioning system.
And it's worth pointing out that in this analogy, the machines are the arse end of the hybrid.
And I think secondarily, they allow you to get to value much faster and to de-risk things, right?
So in the kind of blended model that you described for face recognition, you could imagine, I mean, so that was a one-off event, but if that was a continuing process, you can imagine starting with 100% humans and you would gradually, slowly, over time, ramp up the involvement of machines.
And you end up, so my prediction, if I could buy shares in different kinds of machine learning algorithms, I would buy shares in active learning, which is a branch of machine learning where the sort of clean distinction between training and testing is broken down a little bit, by which I mean, the machines are allowed to ask for help occasionally.
And so the machines kind of have a sense of how confident they are.
And when they're stuck, they can kind of say, hang on, can you give me a hand with this?
What's the right answer here?
And that makes very efficient use of people's time because you're focusing, you're kind of increasing your training data, but especially at the kind of boundary points where things are gray and complicated, right?
Having lots of examples that are labeled for training your machine learning algorithms that are really straightforward doesn't help that much.
Whereas having lots of examples where things are kind of tricky to give an algorithm the sense of, ah, okay, so this is where the classification boundary exactly lives, for example, can really help.
So I can see in general, there's gonna be a lot more active learning and that's the kind of least interesting version of a central team.
I think the more interesting versions of central teams are where there's like real interesting AI that's providing something extra and on top of what the humans are already doing. - Right, right.
Yeah, that's fascinating.
Yeah, yeah, that's an exciting vision, I think.
Are you seeing that being adopted actively at the moment within the projects you're working on or is this how close? - Yeah, so the projects I'm working on, I tend to push for this approach 'cause I found it to be a much more effective way to generate lots of training data fast.
And so it just works really well and it de-risks things.
And you can imagine that over time, you can now start to predict with quite a lot of confidence what the ratio of person to machine will be and where you most need people's help.
And so your blended cost becomes much more predictable.
I think we aren't yet seeing that many interesting examples of where central teams are going really much beyond that.
So I'll give one example of where I can imagine the future going that's deliberately a little bit further into the future.
I gave a talk at the BAFTA, sort of in the BAFTA building on could a robot ever write a BAFTA winning screenplay?
And the argument I made there was that in the sort of medium term future, most award-winning screenplays will be written by central teams.
And so we can ask in that question, what is the AI providing?
Well, here's an example.
You could imagine that it would enable you to measure how distinctive different characters' voices are.
And so if you start writing accidentally in one character's voice when you're meant to write in another, or if two characters are insufficiently differentiated, it seems like a relatively straightforward and feasible thing for the algorithms to start saying, hang on, these two characters aren't meaningfully differentiated well enough, or you haven't written this line in that character's voice.
Now that's still a pretty low level example.
I can imagine it getting more interesting yet still, but that's an example of something that is definitely much more than just a spell check.
There's something interesting narratively or yeah, directorially, that the algorithm is providing there. - Yeah, yeah, fantastic.
Yeah, no, let's say the awareness of context, I think it's getting into that area as well, isn't it?
And using multiple data sets to kind of make that decision-making, whether it's appropriate, I guess, as well.
That would be really interesting.
Well, it's fascinating hearing about that.
And thank you so much for sharing your perspectives with us.
So how can listeners find out more about the topics you've talked about? - Well, I write regularly on makingdatamistakes.com where I try and kind of think through all the things I've done horribly wrong over the last 10, 20 years and hopefully help other people make their own separate and interesting and new mistakes.
And you're always welcome to email me at greg@gregdetroit.co.uk. - Fantastic.
Thank you, Greg. - Thanks very much. - If you'd like to find out more about GreyMeta and why metadata matters, visit greymeta.com, G-R-A-Y-M-E-T-A.
Thanks.