The trouble with data is the sheer quantity of it. IDC once said that if you stored all the data in the world on DVDs you’d have a pile of disks large enough to circle the planet 222 times.
So, how can you winnow junk data out of the stack to make better decisions? That’s one of the problems Apple seems to want to solve.
Inductiv for the rest of us
Apple has confirmed the purchase of a small Ontario-based machine learning start-up called Inductiv with its usual boiler plate statement that it purchases companies like this from time-to-time, and doesn’t want to tell us anything more about it.
The thing is, while Inductiv was small and didn’t have a big footprint, it did have a small cadre of extremely talented employees (most of whom now appear to be working at Apple), and was run by ranking AI professors from Stanford University, the University of Waterloo, and the University of Wisconsin.
We’ve tracked Apple’s AI acquisitions before. This new deal is just the latest in a lengthy trend as the company invests deeply in optimizing machine learning across its systems.
However, the biggest problem with AI is also its biggest top line element – data: Capturing high-quality data is part of the challenge, but figuring out what data is the most useful is also part of that hurdle.
One of the biggest truisms in machine learning/artificial intelligence development is the phrase: “Junk in, junk out.” Bad data, or poorly analyzed data, creates poor results.
Hey Siri, tell me what you know
That’s particularly visible in Siri.
We all saw the consternation that followed the revelation that small portions of recordings (including accidental recordings) made when Siri requests are spoken were being shared with human operatives for a process called “grading.”
The idea was that short snippets of conversation were listened to in order to help improve the system. The problem was that personal information slipped out, as Siri sometimes gets alerted by mistake. The company still grades information, though it has also made it more possible to opt-out of the process.
Apple wasn’t (and isn’t) the only company doing this, of course, but it is the company most dedicated to user privacy. This generated a user-driven pushback that ended up with the company developing new tools you can use to take control of the data Siri keeps about you.
It turned out that people didn’t want elements of their conversations to be listened to by other humans – particularly their private and personal information, or bank details.
Apple and others argued that this kind of grading process was necessary in order to help improve the AI used by Siri – in order to understand what and how the machine learning algorithms generated poor results, it’s necessary to analyze what caused them.
The thing is, when you’re attempting to winnow out high-quality data from a collection of information vast enough to circle the planet 222 times, you need an AI to handle the data AI has collected. That’s what Inductiv was developing: A tech that identifies and corrects errors in datasets.
This isn’t precisely a replacement for human grading of human conversation, but in terms of identifying data (and make no mistake, any request you make of Siri is actually data) that creates erroneous results, it may help reduce how much information needs to be graded by humans.
Of course, this isn’t just about Siri. Apple’s machine learning systems are deeply woven into its products – just look at those nifty Photos tools that help you find and create better photographs. It’s just that now while the algorithms attempt to help you with your life, other algorithms will be trying to figure out which data is actually useful for the task, and what data simply gets in the way. Hopefully.
We’ll see how that goes. “Hey Siri, which app do I use the most?” >tumbleweed<.
Please follow me on Twitter, or join me in the AppleHolic’s bar & grill and Apple Discussions groups on MeWe.
Copyright © 2020 IDG Communications, Inc.