The Necessity of Machine Learning

What is Machine Learning?

It is very hard to write programs that solve problems like recognizing a 3D object from a novel viewpoint in new lighting conditions in a cluttered scene.

  • We don't know what program to write because we don't know how its done in our brain.

  • Even if we had a good idea about how to do it, the program might be horresdously complicated.

The resion we need machine learning is that there's some problems where it's very hard to write the programs recognizing a 3D object for example, when it's from a novel Viewpoint in new lighting conditions in a cluttered scene is veryhard to do we don't know what program to write because we don't know how it's done in our brain and even if we did know what program to write it might be that it was a horrendously complicated program

It is hard to write a program to compute the probability that a credit card transaction is fradulent.

  • There may not be any rules that are both simple and reliable. We need to combine a very large number of weak rules.

  • Fraud is moving target. The program need to keep changing.

Another example is detecting a fraudulent credit card transaction where there may not be any nice simple rules that will tell you it's fraudulent. You really need to combine a very large number of not very reliable rules and also those rules change over time because people change the tricks they use for fraud. So we need a complicated program that combines unreliable rules and that we can change easily.

The Machine Learning Approach

Instead of writing a program by hand for each specific task, we collect lots of examples that specify the correct output for a given input

A machine learning algorithm then takes these examples and produces a program that does the job.

  • The program produced by the learnig algorithm may look very diferent from a typical hand-written program. It may contain millions of numbers.

  • If we dot it right, the program works for new cases as well as the ones we trained it on.

  • If the data changes the program can change too by training on the new data.

Massive amounts of computation are now cheaper then paying someone to write a task-specific program.

The machine learning approach is to say instead of writing each program by hand for each specific task. For a particular task we'll collect a lot of examples that specify the correct output for a given input. A machine learning algorithm then takes these examples and produces a program that does the job. The program produced by the learning algorithm may look very different from a typical handwritten program. For example it might contain millions of numbers about how you weight different kinds of evidence.Iif we do it right the program should work for new cases as well as the ones it's trained on, and if the data changes we should beable to change the program relatively easily by retraining it on the new data. And now massive amounts of computation are cheaper than paying someone to write a program for a specific task. So we can afford big complicated machine learning programs to produce these task specific systems for us.

Some examples of task best solved by learning

Recognizing patterns

  • Objects in real scenes

  • Facial indentities or facial expressions

  • Spoken words

Recognizing anomalies

  • Unusual sequences of credit card transations.

  • Unusual patterns of sensor readings in a nuclear power plant

Prediction

  • Future stock prices or currency exchange rates

  • Which movies will a person like?

Some examples of the things that are best done by using a learning algorithm are recognizing patterns o for example objects in real scenes or the identities or expressions of people's faces or spoken words. There's also recognizing anomalies so an unusual sequence of credit card transactions would be an anomaly another example of an anomaly would be an unusual pattern of sensor readings in a nuclear power plant and you wouldn't really want to have to deal with those by doing supervised learning where you look at the ones that blow up and see what caused them to blow up. You'd really like to recognize that something funny is happening without having any supervision signal it's just not behaving in its normal way, and then there's prediction. so typically predicting future stock prices or currency exchange rates or predicting which movies a person will like from knowing which other movies they like and which movies a lot of other people liked

A standard example of machine learning

A lot of genetics is done on fruit flies.

  • They are convenient because they breed fast.

  • We already know a lot about them

The MNIST database of hand-written digits is the machine learning equivalent of fruit flies.

  • They are publicly available and we can learn them quit fast in a moderate-sized neural net.

  • We know a huge amount about how well various machone learning methods do on MNIST.

We will use MNIST as our standard task.

In this course, we are going to use a standard example to explain a lot of the machine learning algorithms. This is done in a lot of science, in genetics for example, a lot of genetics is done on fruit flies and the reason is they're convenient; they breed fast and a lot is already known about the genetics of fruit flies. The MNIST database of handwritten digits is the machine-learning equivalent of fruit flies. It is publicly available. We can get machine learning algorithms to learn how to recognize these 100 digits quite quickly so it's easy to try lots of variations, and we know huge amounts about how well different machine learning methods do on MNIST and in particular the different machine learning methods were implemented by people who believed in them. So we can rely on those results. So for all those reasons we're going to use MNIST as our standard task

It is very hard to say what makes a 2

Here's an example of some of the digits in MNIST. These are ones that were correctly recognized by neural net the first time it saw them. But there are ones where the neural net wasn't very confident, and you can see why.

I've arranged these digits in standard scan line order so zeros then ones then twos and so on. If you look at a bunch of twos like the ones in the green rectangle. You can see that if you knew they were a handwritten digit you'd probably guess they were twos, but it's very hard to say what it is that makes them twos. There's nothing simple that they all have in common in particular if you try and overlay one on another you'll see it doesn't fit and even if you skew it a bit it's very hard to make them overlay on each other. So a template isn't going to do the job and in particular a template is going to be very hard to find that'll fit those twos in the green box and won't also fit the things in the red boxes so that's one thing that makes recognizing hand digits a good task for machine learning.

Beyond MNIST: The ImageNet task

1000 different object classes in 1.3M high-resolution training images from the web.

  • The best system in the 2010 competition got 47% error for its first choice and 25% for its top 5 choices.

Jitendra Malik (an eminent neural net skeptic) said that this competition is a good test of whether deep neural networks work well for object recognition.

  • A very deep neural network (Krizhevsky et. al. 2012) gets less than 40% error for its first choice and less than 20% for its top 5 choices.

Now I don't want you to think that's the only thing we can do, it's a relatively simple thing for a machine learning system to do now. To motivate the rest of the course I want to show you some examples of much more difficult things. So we now have neural nets with approaching 100 million parameters in them that can recognize a thousand different object classes in 1.3 million high-resolution training images got from the web so there was a competition in 2010 and the best system got 47% error rate if you look at its first choice and 25% error rate if you say it got it right if it was in its top five choices which isn't bad for a thousand different objects. Jitendra Malik who's an eminent neural net skeptic and a leading computer vision researcher has said that this competition is a good test of whether deep neural networks can work well for object recognition and a very deep neural network can now do considerably better than the thing that won the competition it can get less than 40% error for its first choice and less than 20% error for its top 5 choices.

Some examples from an earlier version of the net

Here are some examples of the kinds of images you have to recognize these are images from the test set that it's never seen before, and below the examples I'm showing you what the neural net thought the right answer was where the length of the horizontal bar is how confident it was and the correct answer is in Red.

If you look in the middle it correctly identified that as a snowplow but you can see that its other choices were also fairly sensible it does look a little bit like a drilling platform and if you look at its third choice a lifeboat it actually looks very like a lifeboat you can see the flag on the front of the boat and the bridge of the boat and the flag at the back and the high surf in the background so its errors tell you a lot about how it's doing it and they're very plausible errors

if you look on the left it gets it wrong possibly because the beak of the bird is missing and because the feathers of the bird look very like the wet fur of an otter but it gets it in his top five and it does better than me I wouldn't know if that was a quail or a roughed grass or a partridge.

If you look on the right it gets it completely wrong. it's a guillotine you can see why it says that you can possibly see why it says orangutang because of the sort of jungle-looking background and something orange in the middle but it fails to get the right answer.

It can, however, deal with a wide range of different objects if you look on the left. I would have said microwave is my first answer the labels aren't very systematic so actually the correct answer is that electric range does get it in its top five

In the middle, it's getting a Turnstile which is a distributed object it can't it can do more than just recognize compact things and it can also deal with pictures as well as real scenes like the bulletproof vest.

Also, it makes some very cool errors if you look at the image on the left that's an earphone it doesn't get anything like an earphone but if you look at its fourth bet it thinks it's an ant and to begin with you think that's crazy but then if you look at it carefully you can see it's a view from an ant from underneath the eyes are looking down at you and you can see the antenna behind it it's not the kind of view of an ant you'd like to have if you were a green fly.

if you look at the one on the right it doesn't get the right answer but all of its answers are cylindrical objects.

The Speech Recognition Task

A speech recognition system has several stages

  • Pre-processing: Convert the sound wave into a vector of acoustic coefficients. Extract a new vector about every 10 ms.

Another task that neural nets and I are very good at is speech recognition or at least part of a speech recognition system. Speech recognition systems have several stages first they pre-process the sound wave to get a vector of acoustic coefficients for each 10 milliseconds of sound wave. and so they get 100 of those vectors per second they then take a few adjacent vectors of acoustic coefficients and they need to place bets on which part of which phonem is being spoken so they look at this little window and they say in the middle of this window what do I think the phonem is and which part of the phonem is it. and a good speech recognition system will have many alternative models for a phonem and each model it might have three different parts so it might have many thousands of alternative fragments that it thinks this might be and you have to place bets on all those thousands of alternatives and then once you place those bets you have a decoding stage that does the best job it can of using plausible bets but piecing them together into a sequence of bets that corresponds to the kinds of things that people say. Currently, deep neural networks pioneered by George Dar and Abdul Rahman Muhammad at the University of Toronto are doing better than previous machine learning methods for the acoustic model and they're now beginning to be used in practical systems.

So, Dar and Muhammaduh developed a system that uses many layers of binary neurons to take some acoustic frames and make bets about the labels they were doing it on a fairly small database and they used 183 alternative labels and to get their system to work well they did some pre-training. After standard post processing they got 20 .7% error rate on a very standard benchmark which is kind of like the MNIST for speech the best previous result on that benchmark for speaker independent recognition was 24.4% and a very experienced re speech researcher at Microsoft research realized that that was a big enough improvement that probably this would change the way speech recognition systems were done and indeed it has

So if you look at recent result sum from several different leading speech groups Microsoft showed that this kind of deep neural network when used as the acoustic model in a speech system reduce the error rate from 27.4% to 18.5% or alternatively you could view it as reducing the amount of training data you needed from 2,000 hours down to 309hours to get comparable performance IBM which has the best system for one of the standard speech recognition tasks for large vocabulary speech recognition showed that even its very highly tuned system that was getting 18.8% can be beaten by one of these deep neural networks and Google fairly recently trained a deep neural network on a large amount of speech 5,800 hours that was still much less than they trained their Gaussian Mixture model on but even with much less data it did a lot better than the technology they had before so it reduced the error rate from 16% to 12.3% and the error rate is still falling and in the latest Android if you do voice search it's using one of these deep neural networks in order to do very good speech recognition.

in this video I'm going to tell you a little bit about real neurons on the real brain which provide the inspiration for the artificial neural networks that we're going to learn about in this course in most of the course we won't talk much about real neurons but I wanted to give you a quick overview of the beginning there are several different reasons to study how networks of neurons.

Last updated