I think it's also important to remember a lot of human models were abandoned because of the inherent bias and black-boxed nature of them (feelings, subjectivity, etc.).
This is one of the reasons the scientific method blossomed: objectivity, rigor, transparency, reproducibility, etc. Black boxes can lead to bad decisions because it's difficult to question the process leading to decisions or highlight flaws in conclusions. When a model is developed with more rigor, it can be openly critiqued.
Instead, we have models running across such massive datasets with so many degrees of freedom that we have no feasible way of isolating problems when we see or suspect certain conclusions are amiss. Instead, we throw more data at it or train the model around those edge cases.
To be clear, I'm not saying ANN/DNN models are bad, just that we need to understand what we're getting into when we use them and recognize effects that unknown error bounds may cause.
If, when the model fails to correctly classify a new data point, the result is your photo editing tool can't remove a background person properly... then so be it, no harm no foul. If the result is that the algorithm classified a face incorrectly with high certainty and lands someone in prison as a result (we're not there, yet) then we need to understand our method has potential unknown flaws and we should proceed carefully before jumping to conclusions.
> I think it's also important to remember a lot of human models were abandoned because of the inherent bias
The industry needs to stop misusing the term bias this way. Virtually every attempt to find this supposed human bias has failed. Latest public example was Amazon and hiring[1]
Bias is the tendency to consistently mis-classify towards a certain class or tendency to consistently to over or under-estimate.
Somehow the term has been hijacked to mean 'discriminate on factors that are politically incorrect'. You can have a super racist model that's bias free, and most models blinded to protected factors are in fact statistically biased.
It's not constructive to conflate actual bias with political incorrectness.
Operational decision making, whether AI or human or statistical, faces an inherent trilemma: it's impossible to simultaneously treat everyone the same way, to have a useful model, and to have uniformly distributed 'bias'-free outcomes. At best a model can strive to achieve two of these factors.
Hold up. Bias is being used in two different manners because it has two differed meanings. When you are using bias in industry you are talking about a minor mathematical factor added to a learning rate. When we talk about human bias, we aren't. The term was never hijacked, it just has multiple meanings.
I’m not sure if he was using any of those known definitions of bias though. I’m not sure if I would define bias as the consistent act of mis-classification
Yes, mathematicians mean something different and specific when using the word bias. The average non-expert is not misusing the word. They are using the word to express a different -- and far more popular -- meaning.
Neither is wrong, but insisting that a naming clash carries any substantive significance on an underlying issue is just silly. Similarly, insisting that nonmathematicians should stop using a certain word unless they use it how mathematicians use it is a tad ridiculous.
Of anything, it's more reasonable for mathematicians to change their language. After all, their intended meaning is far less commonly understood.
If an algorithm is clearly sorting on irrelevant criterion, especially a black box algorithm, we normally assume it's a bug. It's not reasonable to reverse that, assume the code is incapable of being mistaken and say that obviously irrelevant criterion are somehow correct in an unknown way.
Amazon's problem is a bug, they even describe it's nature. And given how flawed their recommendation algorithms are, it's especially unreasonable to assume this one is infallible.
So that linked Reuters does not show a failure to find bias, if anything it shows a design error.
Data says criterion is an eigenvalue and no matter how hard amazon tried to blind the solution to that eigenvalue, the ML system kept finding ways to infer it because it was that strongly correlated with the fitness function.
This is the difference between political newspeak '''bias''' and actual bias. Amazon scrapped the model despite it performing just fine and being bias-free, because it kept finding ways to discriminate on a protected attribute which is a PR nightmare in the age of political outrage cancel culture. It's fine to explicitly decide that some attributes should not be discriminated upon, but this comes with a cost either in terms of model utility or in terms of discrimination against other demographics. There's no way around this. In designing operational decision making systems, one must explicitly choose a victim demographic or not to implement the system at all. There's no everyone-wins scenario.
The harm of the newspeak version of '''bias''' is that it misleads people into thinking that making system inputs uniform somehow makes it bias-free when the opposite is typically true. Worse, it creates the impression that some kind of magical bias-free system can exist where everyone is treated fairly, even though we've formally demonstrated that to be false.
No amount of white-boxing or model transparency will get around this trilemma. The sooner the industry comes to grips with it and learns to explicitly wield it when required, the better.
>No amount of white-boxing or model transparency will get around this trilemma. The sooner the industry comes to grips with it and learns to explicitly wield it when required, the better.
Agreed. The optima of multiple criterion will essentially never intersect.
But for Amazon, there is no evidence the tool was accurately selecting the best candidates. They themselves never said it was. After all, altering word choices in a trivial way dramatically affects ranking. On the points you mention, why should we assume their data was relevant or their fitness function even doing what they thought? If they were naive enough, they could just be building something that predicts what the staff of an e-commerce monopoly in a parallel universe will look like.
The most likely story is that they failed at what they were doing. Part of that failure happened to be controversial and so got unwanted attention. I would guess there were quite a few incredible correlations the tool "discovered" that did not get to press.
At any rate, their recommendation engine is more important and has been worked on longer yet it is conspicuously flawed. When their recommendation tool inspires awe then maybe we could take their recruiting engine seriously enough to imagine it has found deep socio-psychological-genetic secrets.
Supervised learning algorithms assume that the input data are iid of the future. This is not valid in most of the real applications. The observation that we see men more than women in programming does not necessarily generalize to the future. That's why online learning provides an exploitation vs exploration mechanism to minimize the bias in the hindsight.
In many applications, people just forgot about this simple strategy and blame the bias caused by supervised learning to the black box model.
Of course, black box AI itself is not the right solution. As more and more cross domain multitask settings emerge, open box AI will gradually take off. It is about compositional capability like functor and monad in functional language. Explanable or not is just a communication problem which is parallel to the ultimate intelligence problem. It is very possible that human intelligence is bounded.
That is the fascinating thing about consciousness.
How do we "know" we see red and not green or vice versa? How do we "know" we are feeling heat or coldness? We just do. We feel different things in different, yet recognizable ways.
And these "feelings" are multi-dimensional, feeling heat and feeling color are not on the same axis. Red and Green seems to be on different points of the same axis and hot and cold are points on some different axis, other dimension.
How many different dimensions of "feelings" are there? Note they are not the same as different senses. There can be many different feelings associated with things we see and recognize that way: colors, and shapes and lightness vs. darkness etc.
Physical sensation seems simple tbf. Any robot can sense temperature or color gradient.
It's the decision making that's complicated.
To make a decision on whether it would like chocolate ice cream or not (before trying it) a brain will take all the data gathered over time, cross reference it with related data (does chocolate taste good alone? \ did I like it? \ It looked rather inedible. \ was it high cocoa content or not? \ Chocolate yoghurt seems similar, but I did not like it. \ I do like cold, but only when it's hot outside. \ My friend says chocolate ice cream tastes good. \ My other friend says it tastes bad. \ trust or discard opinions? and much more).
All within minutes, seconds even.
An AGI would need an absolutely massive database of knowledge similar to the one acquired by a human over the first 20-30 years of their life, if it's to be truly general.
It would also need bias correction, and maybe 2-5 other AGIs to form a consensus before making any decision properly.
And even that would not be enough to fit in with humans - if you want an AGI to make the objectively right decisions, it will have to ignore many human feelings/biases.
If you want it to fit in with people, it will have to make "mistakes", i.e. ineffective decisions.
> Physical sensation seems simple tbf. Any robot can sense temperature or color gradient.
It can measure them and process it as a number, but when does sensing come into play? Does a red sheet of paper recorded by a simple camera evoke the same "feeling" of redness inside the camera that it evokes in a human being?
The flippant answer is "of course not, the camera has no consciousness/perception". The question is then how and when this feeling of "redness" gets created and what are the necessary conditions for it to happen.
By sensing, I mean just that, reading input. You mean reasoning/decision, I guess.
That would come from learned data, experience, imo. A child who's never touched a red hot electric stove, for example, would not have any bias towards it. No fear, no love, unless they previously interacted with one.
They would try and get close, watch it, smell it, touch it, to learn more.
The most interesting part of consciousness is how does one decide based on incomplete information? If you need to learn more, where do you find the raw information? It seems to be done subconsciously, some people have a higher affinity for learning/fact-finding than others. But all people are pre-wired to learn from others, distributed computing works best heh.
I guess you're asking the same question as me, where is this "programming" and how is it created? It's not just raw experience, it seems to be genetic/evolutionary. A sort of basic firmware to bootstrap further learning.
I find it fascinating, it would seem the brain never stops processing the data it acquires. During the day, and during the night, it always runs learning jobs.
I agree with your thinking on learning, so I have nothing to add, but it doesn't sound like we were referring to the same thing. I'll try rephrasing, though I find this subject extremely hard to communicate effectively.
When you're looking at a red sheet of paper, reading the input in the form of a signal is not the only thing happening. The signal is definitely read and passed along to other brain circuitry for processing, but somehow another thing happens: the experience of red (which is what the other poster called "feeling"). This experience of red is fundamentally different from the experience of green or blue or from the complete absence of looking. This experience is what I was referring to.
This experience seems to happen without any prior knowledge or exposure to the color red. The first time you stumble upon red light, this experience of red arises. You can tell red from green by the marked difference in experiences, but you cannot explain this difference in words to someone who hasn't experienced red or green themselves.
How does this experience get created? Does any sensor (such as a camera) have such experiences? Why not?
I don’t know if you’re asking rhetorically or not, but the honest answer is we don’t know yet. Those sensory experiences are called “qualia” in philosophy and cognitive science and they are not well understood yet. You might enjoy the paper “What is it like to be a bat?” https://warwick.ac.uk/fac/cross_fac/iatl/study/ugmodules/hum...
Yes, it was rhetorical in an attempt to get the point across. I try to refrain from naming qualia when first explaining it. I've read Nagel's paper, but I haven't thought about it for some time so thanks for reminding me.
I usually see the Turing Test as a thought experiment for showing the irrelevance of qualia for AI, among other philosophical objections (e.g. Chinese Room).
> When a model is developed with more rigor, it can be openly critiqued.
Yes. And the models developed like this haven't solved the problems the current black box models have.
> we have models running across such massive datasets with so many degrees of freedom that we have no feasible way of isolating problems when we see or suspect certain conclusions are amiss. Instead, we throw more data at it or train the model around those edge cases.
There are ways being studied to check for sensitivity to parameters, biases, etc. But in the end, reality is difficult and there's no way of dealing with that, or "getting the right answer" every time
This is one of the reasons the scientific method blossomed: objectivity, rigor, transparency, reproducibility, etc. Black boxes can lead to bad decisions because it's difficult to question the process leading to decisions or highlight flaws in conclusions. When a model is developed with more rigor, it can be openly critiqued.
Instead, we have models running across such massive datasets with so many degrees of freedom that we have no feasible way of isolating problems when we see or suspect certain conclusions are amiss. Instead, we throw more data at it or train the model around those edge cases.
To be clear, I'm not saying ANN/DNN models are bad, just that we need to understand what we're getting into when we use them and recognize effects that unknown error bounds may cause.
If, when the model fails to correctly classify a new data point, the result is your photo editing tool can't remove a background person properly... then so be it, no harm no foul. If the result is that the algorithm classified a face incorrectly with high certainty and lands someone in prison as a result (we're not there, yet) then we need to understand our method has potential unknown flaws and we should proceed carefully before jumping to conclusions.