Deep Learning: The top-down approach learning

TL;dr: Maybe you had lots of question what is the first step to really learn about machine learning and deep learning, so this is my thought. But the reason behind is I think the approach to learning this subject is also read the code more and then research the theory behind, so that’s why I took free course in fast.ai. This is an actual quote from a user in hackernoon about this approach learning and maybe it’s applicable to anyone who will start to learn ML&DL.


“I personally fell into the habit of watching the lectures too much and googling definitions/concepts/etc too much, without running the code. at first I thought that I should read the code quickly and then spend time researching the theory behind it.”

another actual quote:

“In retrospect, I¬†should have spent the majority of my time on the actual code in the notebooks instead, in terms of running it and seeing that goes into it and what comes out of it.”

one long advice I got from Quora:

“I made similar lists 4 years ago. I quit my job, and applied to a PhD program in ML (still doing that). Looking back, I can offer following advice if you goal is to become a ML engineer.

FOUNDATION:
Start with (re)learning math. Take those boring university level full length courses in calculus, linear algebra, and probability/statistics. No, unless you’re fresh STEM graduate, 10 pages math refreshes won’t do. If you’re self-studying, make sure you do all exercises and take practice exams to test yourself.

TOOLS:
1. (Re)learn C/C++, as well as linear algebra library, such as Numpy or MATLAB. You will also have to learn parallel and distributed programming at some point (CUDA, MPI, OpenMP, etc). Next take a boring university level full length course on algorithms and data structures.
2. Get a book describing ML algorithms, and implement them yourself, first using plain C, then with MPI or CUDA, and finally using plain Numpy/MATLAB, or one of the low-level ML frameworks (Theano or TensorFlow).
Applications:
Finally, start doing ML. Not learning about it, doing it. Choose an application that interests you (computer vision, NLP, speech recognition, etc), and start learning what you need to make something work). Focus on specific practical tasks. If you don’t have any particular application in mind, go to kaggle, choose a competition, and read what models/tricks the winner used. Then jump right in and start competing.

The first two requirements might take years to master, but if you skip them, you won’t be able to do any serious work in ML, or even understand latest papers. You will be a script kiddie, not a hacker.”

Over this post, I will explain the fundamental thing about Deep Learning.

But the practitioners from fast.ai highlight the last advice is bad, you just need the opposite, because this can be very discouraging for a lot of people to come across.

Image classifiers can detect anti-fraud. Image classification can really useful for lots and lots of things, for example Alphago which became which beat the go world champion, the way it worked was to use something at its heart that looked almost exactly like dogs vs cats image classification algorithms. It looked at thousands and thousands of go boards and at for each one there was a label saying whether that go board ended up being the winning or the losing player and so it learn basically an image classification that was able to look at go board and figure out whether it was a good group or a bad code board. And that’s really the key most important step in playing go well is know which move is better. Another example is looked at anti-fraud. So a provider had lots of all of his customers mouths movements, because they provided kind of these user tracking software to help avoid fraud and so he took the mouse paths (clickstream). Basically of the users on his customers website turn them into pictures of where their mouse moved and how quickly it moved and then built an image classifier that took those images as input and as output, it was that a fraudulent transaction or not and turned out really great results.

It’s worth understanding that deep learning is not you know just a word that means the same thing as machine learning. Instead, deep learning is a kind of machine learning. So machine learning was invented by a guy name Arthur Samuel who was pretty amazing in the late 50’s. He got an IBM mainframe to play checkers better than he can and the way he did it was he invented machine learning. He got the mainframe to play against itself lots of time and figure out which kinds of things led to victories and which kinds of things didn’t and use that to kind of almost write its own program. Arthur Samuel actually said in 1962 that he thought that one day the vast majority of computer software would be written using this machine learning approach rather than written by hand by writing the loops and so forth by hand. But, I guess that hasn’t happen yet but it seems to be in the process of happening. I think one of the reasons it didn’t happen for a long time is because traditional machine learning actually was very difficult and very knowledge and time intensive.

So how we minimize the very specific function with domain-spesific feature engineering, we’re going to try and create an infinitely flexible function that can solve any problem. It would solve any problem if only you set the parameters of that function correctly and so then we need or purpose way of setting the parameters of that function.

 

The algorithm is question which has these three properties is called deep learning, or it’s not an algorithm then maybe we will call it a class of algorithms. Let’s look at each of these three things in turn. So the underlying function that deep learning uses is something called the neural network.

 

Now, this neural network we’re going to learn about it and implemented from scratch later on. It consist of a number of simple linear layers interspersed with a number of simple nonlinear layers. And when you in dispersed these layers in this way, you get something called the universal approximation theorem, which says that this kind of function can solve any given problem to arbitrarily close accuracy as long as you add enough parameters. So it’s actually provably shown to be an infinitely flexible function.

Okay, so we need some way to fit the parameters so that this infinitely neural network solves some specific problem and so the way we do that is using a technique that probably most of you will have come across before at some stage called gradient descent.

With gradient descent, we basically say okay well for the different parameters we have, how good are they at solving my problem. Let’s figure out a slightly better set of parameters and a slightly better set of parameters and basically follow down the surface of the loss function downwards. It’s kind of like a marble going down to find the minimum. And as you can see here depending on where you start you end up in different places. These thing called a local minima. Now, interestingly it turns out that for neural networks in particular, there aren’t actually multiple different local minima, there’s basically just one. Or think in another way there are different parts of the space which are all equally good. So gradient descent therefore turns out to be actually an excellent way to solve this problem of fitting parameters to neural networks.

 

 

The problem is though that we need to do it in a reasonable amount of time. And it’s really only thanks to GPUs that become possible.

The image shows over the last few years how many gigaflops per second can you get out of a GPU that’s the red and green versus a CPU that’s the blue. And this is on log scale so you can see that generally speaking, the GPUs are about ten times faster than CPUs. And what’s really interesting is that nowadays not only Tita X about 10 times faster than E5-2699 CPU, but the Titan X well actually better one to look at would be the GTX 1080i that cost about 700 bucks. Whereas the CPU which is 10 times slower costs over $4,000. So GPUs turns out to be able to solve these neural network parameter fitting problems incredibly quickly also incredibly cheaply, so they’ve been absolutely key in bringing those three pieces together. Then there’s one more piece, which is I mentioned that these neural network so you can intersperse multiple sets of linear and then nonlinear layers. In the particular example that’s drawn here, there’s actually one what we call hidden layer, one layer in the middle. And something that we learned in the last few years is that these kind of neural network, although they do support the universal approximation theorem, they can solve any given problem arbitrarily closely. They require an exponentially increasing number of parameters to do so. So they don’t actually solve the fast and scalable for even reasonable size problems. But we’ve since discovered that if you create add multiple hidden layers then you get super linear scaling, so you can add a few more hidden layers to get multiplicatively more accuracy to more duplicatively more complex problems. And that is where it becomes called deep learning. So deep learning means a neural network with multiple hidden layers.

So when you put all this together, there’s actually really amazing what happens. Google started investing in deep learning in 2012. They hired Geoffrey Hinton who’s kind of the father of deep learning and his top student Alex Krizhevsky and they started trying to build a team that team became known as Google Brain. And because things with those three properties are so incredibly powerful and so incredibly flexible, you can actually see over time how many projects at Google use deep learning. The graph here only goes up through a bit over a year ago, but you know it’s been continuing to grow exponentially since then as well. And so what you see now is around Google that deep learning is used in like every part of the business. And so it’s really interesting to see how this kind of simple idea that we can solve machine learning problems using an algorithm that has these properties. When a big company invests in actually making that happen you see this incredible growth in how much it’s used. So for example, i you using a Google software, then when you receive an email from somebody it will often tell some replies that you can send to them. It’s actually using deep learning here to read the original email and to generate some suggested replies. And so like this is a really great example of the kind of stuff that previously just wasn’t possible. Another great example would be Microsoft is also a little bit more recently invested heavily in deep learning and so now you can use Skype you can speaking to it in English and ask it at the other end to translate it in real time to Chinese or Spanish and then when they talk back to you in Chinese or Spanish, Skype in real-time translated the speech in their language into English speech in real-time.

And something is really interesting to think about how deep learning can be combined with human expertise is called neural doodle (from a couple years ago) here’s the link https://arxiv.org/abs/1603.01768. It take a sketch and render it in style of an artist (impressionist painting).

So, what we have to generally noticed is that we know that the vast majority of kind of things that people do in the world currently aren’t using deep learning and each time somebody says oh let’s try using deep learning to improve performance at this thing they nearly always get fantastic results and then suddenly everybody in that industry starts using it as well. So there’s just lots and lots opportunities here at this particular time to use deep learning to help with all kinds of different stuff.

 

Alright, I will pause the post now, I’ll cover the next thing about this on the next post.