From this module, you will be able to
Have you used search in Google Photos? You can search for “my photos of cat” and it will retrieve photos from your libraries containing cats. This can be done using image classification, which is treated as a supervised learning problem, where we define a set of target classes (objects to identify in images), and train a model to recognize them using labeled example photos.
Image classification is not an easy problem because of the variations in the location of the object, lighting, background, camera angle, camera focus etc.
The current big players are:
Both are heavily used in industry. If interested, see comparison of deep learning software.
torchvision.models
module. All models are available with pre-trained weights (based on ImageNet’s 224 x 224 images) Class Probability score
tiger cat 0.353
tabby, tabby cat 0.207
lynx, catamount 0.050
Pembroke, Pembroke Welsh corgi 0.046
--------------------------------------------------------------
Class Probability score
cheetah, chetah, Acinonyx jubatus 0.983
leopard, Panthera pardus 0.012
jaguar, panther, Panthera onca, Felis onca 0.004
snow leopard, ounce, Panthera uncia 0.001
--------------------------------------------------------------
Class Probability score
macaque 0.714
patas, hussar monkey, Erythrocebus patas 0.122
proboscis monkey, Nasalis larvatus 0.098
guenon, guenon monkey 0.017
--------------------------------------------------------------
Class Probability score
Walker hound, Walker foxhound 0.580
English foxhound 0.091
EntleBucher 0.080
beagle 0.065
--------------------------------------------------------------
vgg16
model which is available in torchvision
.
torchvision
has many such pre-trained models available that have been very successful across a wide range of tasks: AlexNet, VGG, ResNet, Inception, MobileNet, etc.
Class Probability score
cucumber, cuke 0.146
plate 0.117
guacamole 0.099
Granny Smith 0.091
--------------------------------------------------------------
Class Probability score
fig 0.637
pomegranate 0.193
grocery store, grocery, food market, market 0.041
crate 0.023
--------------------------------------------------------------
Class Probability score
toilet seat 0.171
safety pin 0.060
bannister, banister, balustrade, balusters, handrail 0.039
bubble 0.035
--------------------------------------------------------------
Class Probability score
vase 0.078
thimble 0.074
plate rack 0.049
saltshaker, salt shaker 0.047
--------------------------------------------------------------
Class Probability score
pizza, pizza pie 0.998
frying pan, frypan, skillet 0.001
potpie 0.000
French loaf 0.000
--------------------------------------------------------------
Class Probability score
patio, terrace 0.213
fountain 0.164
lakeside, lakeshore 0.097
sundial 0.088
--------------------------------------------------------------
Let’s look at some sample images in the dataset.
Here is the stat of our toy dataset.
Classes: ['beet_salad', 'chocolate_cake', 'edamame', 'french_fries', 'pizza', 'spring_rolls', 'sushi']
Class count: 40, 38, 40
Samples: 283
First sample: ('data/food/train/beet_salad/104294.jpg', 0)
torch.Size([283, 1024])
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 1014 | 1015 | 1016 | 1017 | 1018 | 1019 | 1020 | 1021 | 1022 | 1023 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.000290 | 0.003821 | 0.005015 | 0.001307 | 0.052690 | 0.063403 | 0.000626 | 0.001850 | 0.256254 | 0.000223 | ... | 0.229935 | 1.046375 | 2.241259 | 0.229641 | 0.033674 | 0.742792 | 1.338698 | 2.130880 | 0.625475 | 0.463088 |
1 | 0.000407 | 0.005973 | 0.003206 | 0.001932 | 0.090702 | 0.438523 | 0.001513 | 0.003906 | 0.166081 | 0.000286 | ... | 0.910680 | 1.580815 | 0.087191 | 0.606904 | 0.436106 | 0.306456 | 0.940102 | 1.159818 | 1.712705 | 1.624753 |
2 | 0.000626 | 0.005090 | 0.002887 | 0.001299 | 0.091715 | 0.548537 | 0.000491 | 0.003587 | 0.266537 | 0.000408 | ... | 0.465152 | 0.678276 | 0.946387 | 1.194697 | 2.537747 | 1.642383 | 0.701200 | 0.115620 | 0.186433 | 0.166605 |
3 | 0.000169 | 0.006087 | 0.002489 | 0.002167 | 0.087537 | 0.623212 | 0.000427 | 0.000226 | 0.460680 | 0.000388 | ... | 0.394083 | 0.700158 | 0.105200 | 0.856323 | 0.038457 | 0.023948 | 0.131838 | 1.296370 | 0.723323 | 1.915215 |
4 | 0.000286 | 0.005520 | 0.001906 | 0.001599 | 0.186034 | 0.850148 | 0.000835 | 0.003025 | 0.036309 | 0.000142 | ... | 3.313760 | 0.565744 | 0.473564 | 0.139446 | 0.029283 | 1.165938 | 0.442319 | 0.227593 | 0.884266 | 1.592698 |
5 rows × 1024 columns
Training score: 1.0
Validation score: 0.835820895522388
Let’s examine some sample predictions on the validation set.
Let’s try this out using a pre-trained model.
from ultralytics import YOLO
model = YOLO("yolov8n.pt") # pretrained YOLOv8n model
yolo_input = "data/yolo_test/3356700488_183566145b.jpg"
yolo_result = "data/yolo_result.jpg"
# Run batched inference on a list of images
result = model(yolo_input) # return a list of Results objects
result[0].save(filename=yolo_result)
image 1/1 /Users/kvarada/EL/workshops/Intro-to-deep-learning/website/slides/data/yolo_test/3356700488_183566145b.jpg: 512x640 4 persons, 2 cars, 1 stop sign, 81.1ms
Speed: 2.9ms preprocess, 81.1ms inference, 8.1ms postprocess per image at shape (1, 3, 512, 640)
'data/yolo_result.jpg'