ArticlesBlog

How to Make an Image Classifier – Intro to Deep Learning #6

How to Make an Image Classifier – Intro to Deep Learning #6


How do we classify things? We consider people to be experts
in a field if they’ve mastered classification. Doctors can classify between a good
blood sample, and a bad one. Photographers can classify if their
latest shot was beautiful, or not. Musicians can classify what sounds good,
and what doesn’t, in a piece of music. The ability to classify well
takes many hours of training. We get it wrong over, and over again,
until eventually we get it right. But with a quality data set,
deep learning can classify just as well, if not better than we can. We’ll use it as a tool to improve
our craft, whatever it is. And if the job is monotonous,
it’ll do it for us. When we reach the point where we
aren’t forced to do something we don’t want to just to survive,
we’ll flourish like never before. And that’s the world we’re aiming for.>>Hello, world, it’s Siraj. And today, we’re going to build an image classifier
from scratch, to classify cats and dogs. Finally, we get to work with images. I’m feeling hype enough
to do the Macarena. [MUSIC] So, how does image classification work? Well, there were a bunch of different
attempts in the 80s, and early 90s, and all of them tried a similar approach. Think about the features
that make up an image, and hand code detectors for each of them. But there is so much variety out there. No two apples look exactly the same. So the results were always terrible. This was considered a task
only we humans could do. But in 98, a researcher named introduced a model
called a Convolutional Neural Network. Capable of classifying characters with a
99% accuracy, which broke every record. But CNN learned features by itself. In 2012, it was used by other
researcher named Alex Krizhevsky at the yearly ImageNet competition. Which is basically the annual
Olympics of computer vision. And it was able to classify thousands
of images with a new record accuracy, at the time of 85%. Since then CNN’s have been adopted by
Google, to identify photos in search, Facebook for automatic tagging. Basically they are very hot right now. But where did the idea for
CNN’s come from? [MUSIC] We’ll first want to download our image
data set from Cackle with 1024 pictures of dogs and cats,
each in its own folder. We’ll be using the Keras deep
learning library for this demo. Which is a high level wrapper
that runs on top of TensorFlow. It makes building models
really intuitive, since we can define each layer
as it’s own line of code. First thing’s first, we’ll initialize
variables for our training and validation data. Then we’re ready to build our model. We’ll initialize the type of model
using the sequential function, which will allow us to build
a linear stack of layers, so we treat each layer as an object
that feeds data to the next one. It’s like a conga line, kind of. No, the alternative would be a graph
model, which would allow for multiple separate inputs and outputs. But we’re using a more simple example. Next, we’ll add our first layer,
the convolutional layer. The first layer of a CNN is
always the convolutional layer. The input is going to be a 32 by
32 by 3 array of pixel values. The 3 refers to RGB values. Each of the numbers in this array
is given a value from 0 to 255, which describes the pixel
intensity at that point. The idea is that,
given this as an input, our CNN will describe the probability
of it being of a certain class. We can imagine the Convolutional Layer
as a flashlight shining over the top left of the image. The flashlight slides across all
the areas of the input image. The flashlight is our filter, and the region it shines over
is the Receptive field. Our filter is also an array of numbers. These numbers are weights
at a particular layer. We can think of a filter
as a feature identifier. As our filter slides, or
convolves around the input, it is multiplying its values with
the pixel values in the image. These are called element
wise multiplications. The multiplications from each
region are then summed up, and after we’ve covered all parts of the
image, we’re left with the feature map. This will help us find not buried
treasure, but a prediction. Which is even better. Since our weights
are randomly initialized, our filter won’t start off being
able to detect any specific feature. But during training, our CNN will
learn values for its filters. So this first one will learn to detect
a low level feature, like curves. So if we place this filter on a part of
the image with a curve, the resulting value from the multiplication,
and summation, is a big number. But if we place it on a different
part of the image, without a curve, the resulting value is zero. This is how filters detect features. We’ll next pass this feature map through
an activation layer, called ReLU, or rectified linear unit. ReLu is probably the name of same alien,
but it’s also a non-linear operation, that replaces all the negative pixel
values in the feature map with zero. We could use other functions, but ReLu tends to perform
better in most situations. This layer increases the non-linear
properties of our model, which means our neural net will be able
to learn more complex functions than just linear regression. After that,
we’ll initialize our max pooling layer. Pooling reduces the dimensionality
of each feature map, but retains the most
important information. This reduces the computational
complexity of our network. There are different types, but
in our case, we’ll use Max. Which takes its largest element from the
rectified feature map within a window we define, and will slide this window
over each region of our feature map, taking the max values. So a classic CNN architecture looks
like this, three Convolutional Blocks, followed by a Fully Connected layer. We’ve initialized
the first three layers. We can basically just repeat
this process twice more. The output feature map is fed into
the next convolutional layer. And the filter in this layer will learn
to detect more abstract features, like paws and doge. One technique we’ll use to prevent over
fitting, that point when our model isn’t able to predict labels for
novel data, is called dropout. A dropout layer drops out a random
set of activation’s in that layer, by setting them to zero
as data flows through it. To prepare our data for the dropout, we’ll first flatten
the feature map into one dimension. Then we’ll want to initialize a fully
connected layer with the dense function, and apply ReLu to it. After dropout, we’ll initialize
one more fully connected layer. This will output an n
dimensional vector, where n is the number
of classes we have. So it would be two. And by applying a sigmoid to it, it will
convert the data to probabilities for each class. So how does our network learn? Well, we’ll want to minimize a loss
function which measures the difference between the target output,
and the expected output. To do this,
we’ll take the derivative of the loss, with respect to
the weights in each layer. Starting from the last, compute the
direction we want our network to update. We’ll propagate our loss backwards for
each layer. Then we’ll update our weight values for
each filter, so they can change in the direction of the
gradient that will minimize our loss. We then figure the learning process
by using the compile method. Where we’ll define our loss as binary
crossentropy,which is the preferred loss function for
binary classification problems. Then our optimizer, rmsprop,
which will perform gradient descent. And a list of metrics which
will set to accuracy, since this is a classification problem. Lastly, we’ll write out our fit
function to train the model, giving it parameters for
the training and validation data. As well as a number of epochs to run for
each. And let’s save our weights, so
we can use our trained model later. Overall accuracy comes to be about 70%,
similar to my attention span. And if we feed our model a new
picture of a dog or cat, it will predict its label
relatively accurately. We could definitely improve
our prediction though, by either using more pictures, or
by augmenting an existing pre-trained network with our own network,
which is considered transfer learning. So to break it down, convolutional
neural networks are inspired by the human visual cortex, and offer state
of the art and image classification. CNN’s learned filters at each
convolutional layer that act as increasingly abstract feature detectors. And with Keras and TensorFlow,
you can build your own pretty easily. The winner of the coding challenge from
the last video, is Charles David-Blot. He used Tensorflow to build a deep net,
capable of predicting whether or not someone would get a match or
not after training on a data set. And had a pretty sweet data
visualization of his results. Wizard of the Week. And the runner up is Dalai Mingat,
clean, organized, and documented code. The coding challenge for this video
is to create an image classifier. For two types of animals,
instructions are in the read me. Post your GitHub link in the comments,
and I’ll announce the winner next Friday. Please subscribe if you want to
see more videos like this, check out this related video, and
for now, I’m gotta upload my mind. So, thanks for watching

Comments (73)

  1. Hello sir,
    i am currently working on project which identifies sugarcane leaf, cotton leaf and rice leaf from given input image leaf. then how i start this project plz explain me step by step. after this my aim is to identify leaf diseases by uploaded photo of crop leaf. plz guide me sir, and reply as soon as possible…..

  2. I wish you would have left a link to the code for each video. Not too sure I am ready for challenges but would like to see the 40 lines (or however many were used in video) of code from the current video.

  3. its just awesome ,, specially the rap song ,,

  4. Hello Siraj,
    I want to get your opinion on a project that i'm starting:
    It is a deep learning and i feel that i need some advice to know in which direction i can go. Let's start: I want to use deep learning to build a model that recognizes many characteristic in an image. To be precise, i want to create a model that allows me to recognize in a photo: .if the person is a MALE/Female. .what type of clothing is he/she wearing .what are the colors of the clothes he/she is wearing.
    What do you think of the complexity of the problem ? I really need some guidance in order to start thinking of the possible techniques that could help me go deeper in this project. Do you advise me to start thinking first of a model that can olny distinguish male/femal (and ofc it needs to detect them first)?Or do i need to think of the subject from another perespective? What are the topics that i need to look for?
    Thanks in advance.

  5. I really enjoy the series, but I think when I initially watched this video upon its release I struggled with the core content at first because its easy to get wrapped up in jupyter or get stuck installing something, which creates a lot of additional friction when I need to learn the core concept.

  6. where is the dataset?

  7. Is it possible to train Deep Learning Models on Gt840M GPU?

  8. I need to recognize sign languages through image classification. Please suggest me the required steps.

  9. You are so funny. I like you

  10. Noob question: I get an error on this line model.add(Convolution2D(32,3,3 input_shape=(img_width,img_height,3))) and I seem to overlook what I did wrong. Any help? :-/

  11. Very good video I've learned so many think ! Thank you and please continue !

  12. 씬부랠 쫀하 여룝녜

  13. I've noticed that there is a noticeable gap between the val_acc and the acc , Isn't your model overfitted ??

  14. I've noticed that there is a noticeable gap between the val_acc and the acc , Isn't your model overfitted ??

  15. Why is it that the first two convolutional layers is 32 by 32 by the last one is 64?

  16. Hey, I trained a model and when I ran predict on an image, I got [[1. 1. 1.]] as output. What does that mean?

  17. Can I use this process for apple defect Multispectral Method for Apple Defect

    Detection using Hyperspectral Imaging System ?Im new in image processing

  18. You full of life….love the teaching method

  19. Can u make a tutorial to classify videos

  20. webapp image classifier https://github.com/pengoox/Image-Classifier

  21. At 4:09 what did you mean by feature mapping.

  22. I just had one doubt. Suppose that we have a filter to detect the curve as shown in the video but the curve is aligned in some direction. How would it still recognize it? Or is it even supposed to?
    In some cases, like identifying an object, the filter should be able to identify the object regardless of it's orientation. While in other cases like identifying numbers, it shouldn't cause it may confuse a 6 with a 9.

  23. Awesome. Just fantastic

  24. the model.predict method gives out an error, can you help me with it please

  25. from parcer import load_data ? where can we find this ?

  26. really, a rap session?

  27. 0:36 The Humanity Party can make that possible www.humanityparty.com

  28. It is really very useful!!!! thank you!

  29. Sir to be sincere, you're too fast

  30. that rap was kinda fire

  31. You say 32 by 32 for the first convolution2D added to the model , yet what is the difference if we used 64 by 64. Would we gain resolution in data and accuracy in exchange for more processing time; or is this difference negligible and the results nearly equivalent?

  32. What is the best way to make a alphabate classifier.?

    What is better approch using tensor flow only or keras only?

  33. Keep rapping! Keep teaching the way you want! +1

  34. Youre a god and I love you – thanks a bunch

  35. How can I make a neural network to extract the features from images of food wrappers and train the network ? As they don't have any specific shape or type..
    Please reply..

  36. y Predict_proba and predict in keras produce same results . I am preparing a htr for english . so i have 26 classes , now i need to get the probabilities for each class . but both predict and predict_proba gives only the predicted class . any solutions . My last activation layer is softmax

  37. literally when im running the first block it says.

    importError Traceback(most recent call last)

    importError: cannot import name load_data

    what is going on? any help??

  38. hey siraj, u're work's good! but u have been replying to all the comments that praises u but havent been able to answer the question that what does import parser do and what is "load_data"?? plz get back to this

  39. At least try to make an effective video: train dataset & detect a certain shape using boundingbox

  40. For some reason the load_data command doesn't work anymore.

  41. can you teach us this stuff in low level, then go high level

  42. Thank you so much for your lecture!

  43. Hello, do you know any intelligence that recognizes hand gestures? I would like to do something like that using an OpenMV cam M7

  44. What version of python did you use?

  45. runn forestt runnnnn

  46. I CAME HERE TO LEARN, I LEARNT NOTHING. CLICKBAIT. STACKOVERFLOW WOULD RIP YOU APART.

  47. so sad…
    This vdo seems informative but presented steps are too advance to follow (many guidance missing).
    it is not helpful to beginner like me. 🙁

  48. How to make a web application for an tensorflow image classifier

  49. hey Im getting an error 'numpy.ndarray' object has no attribute 'load_img'

  50. This is going too fast and is really convoluted!

  51. One side thing I appreciated from this tute (as a beginner) is the sheer amount of space and computation power required to do all this. :'(

  52. Instead of singing and dancing, maybe you could just focus in explaining the code at a beginner level.

  53. Why are people so mad about it not being beginner friendly, machine learning is a very complex subject. In fact even this is easy in comparison to what each one of those functions are doing. He tried his best, I personally don't see how I might simplify this further.

  54. how to change the prediction into a percent?? (eg. daisy: 97%)

  55. He knows when I will get bored and keeps me back with his raps. Weird Nerdy Guy.

  56. does anyone know why the input size doesn't have to be specified after the first layer?

  57. Which database is suitable for deep learning for large amount of data?

  58. dont ever raaaap

  59. man was that intro DEEP !

  60. Siraj could you please make a video on image processing using tensorflow and python

  61. is it functional for others object sir etc car

  62. Your videos do have information for a quick grasp but dude there is too much distraction with all the pictures and animations and quotes and singing all that you put in your videos. It is ok to have a little fun thing but not so much that you prevent the viewer from focussing and interpreting things.

  63. You’re gaining some weight bro. AI that.

  64. ImportError: cannot import name 'load_data' from 'parser' (unknown location) <— i'm getting this error

  65. "we're going to be building a CNN from scratch" – then uses Keras

  66. Can u make a detailed video on semantic segmentation of medical images in matlab including all the steps..Like training,validation,and getting the result?It would be really helpful for me or some others.

  67. Did you actually rap….wtf!!

  68. 3rd line gives error… Import Error cannot import name 'load_data' when on line from parser import load_data

  69. 3rd line gives error… Import Error cannot import name 'load_data' when on line from parser import load_data up up

  70. dude u r so fast….be relax…take a keep breath….see alot of video ..in you tube how to teach………[email protected]

Comment here