It is actually a kind of inexact criteria. You must be able to provide enough “information” about the problem, in a pre-sorted digital form, in order for AI to find a mathematical way to transform the information about the problem into the answers you are looking for.
The “information content” of inputs is nearly impossible to quantify. Mostly this is guesswork on the part of AI designers. For example, we know a photograph contains a lot of information in the arrangement and juxtaposition of pixels, which we can (and almost always do these days) put in digital form. We know something about how nature solved this problem in vision, and we devised “convolutional neural nets” based on the highly ordered arrangement of neurons in the visual cortex.
And then, with enough fiddling, we can do image recognition, facial recognition, text recognition from images, and so on. We are not nearly as good as humans at this, but computers have the advantage of doing it hundreds of times faster than humans, so there is that.
But still, we don’t have a clear definition of the “information content” of a photograph, we just know it is “enough” to do these visual tasks.
Also, these tasks demonstrate the necessity of “ordered”. In the visual cortex, tiny patches of “pixels” are processed together to find patterns. The pixels of an image have no meaning if they are all jumbled up, that is just meaningless static. A given pixel only has meaning in the “context” of its neighbors and relationships to them.
The same thing goes for all other problems. In essence, at least some of the inputs in any given position must have “meaning” that helps us get closer to the right answer. We can also discover relationships of one input to other inputs, but it must be ordered: input 7 and input 101 can have some statistical relationship, but the relationship is defined over all instances of the problem as specifically 7:101, the statistics of that pairing must be more than just random.
Understand the AI doesn’t “think” like a person. It is a complicated statistical machine that learns, by trial and error, using a large number of samples, a huge mathematical formula that can turn ordered numerical inputs into some desired output. Or it can find some defining structure that correctly describes the inputs, or separates them correctly into classes.
But all it is, after this discovery process, is a big math formula with anywhere from hundreds to billions of computations to be done that accomplishes the task. At that point (after training) what we call AI today (all of them) are just blindly computing. They are no more than a hand calculator.
AI work, essentially, by trial and error. That is what “training” consists of, making a little move, and seeing if we got closer to the solution we already have.
The assumption is that if we can get pretty close on all the training samples we were given (both the presorted inputs and the correct answers), then we’ll probably do equally as well on.
