Object Detection

객체 감지 작업은 이미지나 동영상에서 특정 클래스(예: 사람, 자동차, 동물)의 인스턴스를 식별한 다음 그 주위에 경계 상자를 그려서 이러한 인스턴스의 위치를 정확하게 찾아내는 작업입니다.

Dataset

!wget https://miro.medium.com/v2/resize:fit:1400/format:webp/1*S7udLtAX7C08wGrRneeBXQ.jpeg -O ./dataset/cat.jpg

Warning: wildcards not supported in HTTP.
--2024-05-19 18:26:49--  https://miro.medium.com/v2/resize:fit:1400/format:webp/1*S7udLtAX7C08wGrRneeBXQ.jpeg
Resolving miro.medium.com (miro.medium.com)... 162.159.153.4, 162.159.152.4, 2606:4700:7::a29f:9904, ...
Connecting to miro.medium.com (miro.medium.com)|162.159.153.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 69074 (67K) [image/webp]
Saving to: ‘./dataset/cat.jpg’

./dataset/cat.jpg   100%[===================>]  67.46K  --.-KB/s    in 0.002s  

2024-05-19 18:26:49 (39.0 MB/s) - ‘./dataset/cat.jpg’ saved [69074/69074]

Pipeline

from transformers import pipeline

model = pipeline("object-detection")
result = model("dataset/cat.jpg")

result

No model was supplied, defaulted to facebook/detr-resnet-50 and revision 2729413 (https://huggingface.co/facebook/detr-resnet-50).
Using a pipeline without specifying a model name and revision in production is not recommended.
/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(



model.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]


Some weights of the model checkpoint at facebook/detr-resnet-50 were not used when initializing DetrForObjectDetection: ['model.backbone.conv_encoder.model.layer1.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing DetrForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DetrForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



preprocessor_config.json:   0%|          | 0.00/290 [00:00<?, ?B/s]





[{'score': 0.9800992608070374,
  'label': 'cat',
  'box': {'xmin': 246, 'ymin': 141, 'xmax': 1148, 'ymax': 785}},
 {'score': 0.672563374042511,
  'label': 'cat',
  'box': {'xmin': 577, 'ymin': 141, 'xmax': 1150, 'ymax': 785}}]

Bounding Box

객체 검출에서 Boudingbox의 좌표를 BBox Regression을 통하여 찾아 줍니다.

from PIL import Image, ImageDraw

image_path = "dataset/cat.jpg"
image = Image.open(image_path)

box = result[0]["box"]
draw = ImageDraw.Draw(image)
bounding_box = (
    box['xmin'], 
    box['ymin'], 
    box['xmax'], 
    box['ymax']
)

draw.rectangle(
    bounding_box, 
    outline="red", 
    width=10
)

image.show()

Error: no "view" rule for type "image/png" passed its test case
       (for more information, add "--debug=1" on the command line)
/usr/bin/xdg-open: 869: www-browser: not found
/usr/bin/xdg-open: 869: links2: not found
/usr/bin/xdg-open: 869: elinks: not found
/usr/bin/xdg-open: 869: links: not found
/usr/bin/xdg-open: 869: lynx: not found
/usr/bin/xdg-open: 869: w3m: not found
xdg-open: no method available for opening '/tmp/tmp2k4zzvak.PNG'

DETR: Coco Dataset

DETR은 Vision Transformer 기반의 객체 검출 알고리즘 입니다.

from transformers import DetrImageProcessor, DetrForObjectDetection
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(
    url, 
    stream=True).raw)

# you can specify the revision tag if you don't want the timm dependency
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50", revision="no_timm")

inputs = processor(
    images=image, 
    return_tensors="pt"
)
outputs = model(**inputs)

# convert outputs (bounding boxes and class logits) to COCO API
# let's only keep detections with score > 0.9
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(
    outputs, 
    target_sizes=target_sizes, 
    threshold=0.9)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
    )

preprocessor_config.json:   0%|          | 0.00/401 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/6.60k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/167M [00:00<?, ?B/s]


Detected remote with confidence 0.998 at location [40.16, 70.81, 175.55, 117.98]
Detected remote with confidence 0.996 at location [333.24, 72.55, 368.33, 187.66]
Detected couch with confidence 0.995 at location [-0.02, 1.15, 639.73, 473.76]
Detected cat with confidence 0.999 at location [13.24, 52.05, 314.02, 470.93]
Detected cat with confidence 0.999 at location [345.4, 23.85, 640.37, 368.72]

Detection Inference

from PIL import ImageDraw

draw = ImageDraw.Draw(image)

detected_objects = [
    {"label": "remote", "score": 0.998, "box": [40.16, 70.81, 175.55, 117.98]},
    {"label": "remote", "score": 0.996, "box": [333.24, 72.55, 368.33, 187.66]},
    {"label": "couch", "score": 0.995, "box": [-0.02, 1.15, 639.73, 473.76]},
    {"label": "cat", "score": 0.999, "box": [13.24, 52.05, 314.02, 470.93]},
    {"label": "cat", "score": 0.999, "box": [345.4, 23.85, 640.37, 368.72]}
]

for obj in detected_objects:
    box = obj['box']
    label = obj['label']
    score = obj['score']

    draw.rectangle(box, outline="red", width=2)

    text = f"{label} {score:.3f}"
    draw.text((box[0], box[1] - 10), text, fill="red")

image.show()

Error: no "view" rule for type "image/png" passed its test case
       (for more information, add "--debug=1" on the command line)
/usr/bin/xdg-open: 869: www-browser: not found
/usr/bin/xdg-open: 869: links2: not found
/usr/bin/xdg-open: 869: elinks: not found
/usr/bin/xdg-open: 869: links: not found
/usr/bin/xdg-open: 869: lynx: not found
/usr/bin/xdg-open: 869: w3m: not found
xdg-open: no method available for opening '/tmp/tmpeiek3hjr.PNG'

PreviousImage Classification NextSegmentatio

Last updated 1 year ago

Object Detection

Dataset

!wget https://miro.medium.com/v2/resize:fit:1400/format:webp/1*S7udLtAX7C08wGrRneeBXQ.jpeg -O ./dataset/cat.jpg

Warning: wildcards not supported in HTTP.
--2024-05-19 18:26:49--  https://miro.medium.com/v2/resize:fit:1400/format:webp/1*S7udLtAX7C08wGrRneeBXQ.jpeg
Resolving miro.medium.com (miro.medium.com)... 162.159.153.4, 162.159.152.4, 2606:4700:7::a29f:9904, ...
Connecting to miro.medium.com (miro.medium.com)|162.159.153.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 69074 (67K) [image/webp]
Saving to: ‘./dataset/cat.jpg’

./dataset/cat.jpg   100%[===================>]  67.46K  --.-KB/s    in 0.002s  

2024-05-19 18:26:49 (39.0 MB/s) - ‘./dataset/cat.jpg’ saved [69074/69074]

Pipeline

from transformers import pipeline

model = pipeline("object-detection")
result = model("dataset/cat.jpg")

result

No model was supplied, defaulted to facebook/detr-resnet-50 and revision 2729413 (https://huggingface.co/facebook/detr-resnet-50).
Using a pipeline without specifying a model name and revision in production is not recommended.
/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(



model.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]


Some weights of the model checkpoint at facebook/detr-resnet-50 were not used when initializing DetrForObjectDetection: ['model.backbone.conv_encoder.model.layer1.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing DetrForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DetrForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



preprocessor_config.json:   0%|          | 0.00/290 [00:00<?, ?B/s]





[{'score': 0.9800992608070374,
  'label': 'cat',
  'box': {'xmin': 246, 'ymin': 141, 'xmax': 1148, 'ymax': 785}},
 {'score': 0.672563374042511,
  'label': 'cat',
  'box': {'xmin': 577, 'ymin': 141, 'xmax': 1150, 'ymax': 785}}]

Bounding Box

객체 검출에서 Boudingbox의 좌표를 BBox Regression을 통하여 찾아 줍니다.

from PIL import Image, ImageDraw

image_path = "dataset/cat.jpg"
image = Image.open(image_path)

box = result[0]["box"]
draw = ImageDraw.Draw(image)
bounding_box = (
    box['xmin'], 
    box['ymin'], 
    box['xmax'], 
    box['ymax']
)

draw.rectangle(
    bounding_box, 
    outline="red", 
    width=10
)

image.show()

Error: no "view" rule for type "image/png" passed its test case
       (for more information, add "--debug=1" on the command line)
/usr/bin/xdg-open: 869: www-browser: not found
/usr/bin/xdg-open: 869: links2: not found
/usr/bin/xdg-open: 869: elinks: not found
/usr/bin/xdg-open: 869: links: not found
/usr/bin/xdg-open: 869: lynx: not found
/usr/bin/xdg-open: 869: w3m: not found
xdg-open: no method available for opening '/tmp/tmp2k4zzvak.PNG'

DETR: Coco Dataset

DETR은 Vision Transformer 기반의 객체 검출 알고리즘 입니다.

from transformers import DetrImageProcessor, DetrForObjectDetection
import torch
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(
    url, 
    stream=True).raw)

# you can specify the revision tag if you don't want the timm dependency
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50", revision="no_timm")

inputs = processor(
    images=image, 
    return_tensors="pt"
)
outputs = model(**inputs)

# convert outputs (bounding boxes and class logits) to COCO API
# let's only keep detections with score > 0.9
target_sizes = torch.tensor([image.size[::-1]])
results = processor.post_process_object_detection(
    outputs, 
    target_sizes=target_sizes, 
    threshold=0.9)[0]

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    box = [round(i, 2) for i in box.tolist()]
    print(
            f"Detected {model.config.id2label[label.item()]} with confidence "
            f"{round(score.item(), 3)} at location {box}"
    )

preprocessor_config.json:   0%|          | 0.00/401 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/6.60k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/167M [00:00<?, ?B/s]


Detected remote with confidence 0.998 at location [40.16, 70.81, 175.55, 117.98]
Detected remote with confidence 0.996 at location [333.24, 72.55, 368.33, 187.66]
Detected couch with confidence 0.995 at location [-0.02, 1.15, 639.73, 473.76]
Detected cat with confidence 0.999 at location [13.24, 52.05, 314.02, 470.93]
Detected cat with confidence 0.999 at location [345.4, 23.85, 640.37, 368.72]

Detection Inference

from PIL import ImageDraw

draw = ImageDraw.Draw(image)

detected_objects = [
    {"label": "remote", "score": 0.998, "box": [40.16, 70.81, 175.55, 117.98]},
    {"label": "remote", "score": 0.996, "box": [333.24, 72.55, 368.33, 187.66]},
    {"label": "couch", "score": 0.995, "box": [-0.02, 1.15, 639.73, 473.76]},
    {"label": "cat", "score": 0.999, "box": [13.24, 52.05, 314.02, 470.93]},
    {"label": "cat", "score": 0.999, "box": [345.4, 23.85, 640.37, 368.72]}
]

for obj in detected_objects:
    box = obj['box']
    label = obj['label']
    score = obj['score']

    draw.rectangle(box, outline="red", width=2)

    text = f"{label} {score:.3f}"
    draw.text((box[0], box[1] - 10), text, fill="red")

image.show()

Error: no "view" rule for type "image/png" passed its test case
       (for more information, add "--debug=1" on the command line)
/usr/bin/xdg-open: 869: www-browser: not found
/usr/bin/xdg-open: 869: links2: not found
/usr/bin/xdg-open: 869: elinks: not found
/usr/bin/xdg-open: 869: links: not found
/usr/bin/xdg-open: 869: lynx: not found
/usr/bin/xdg-open: 869: w3m: not found
xdg-open: no method available for opening '/tmp/tmpeiek3hjr.PNG'

PreviousImage Classification NextSegmentatio

Last updated 1 year ago

Object Detection

Dataset

Pipeline

Bounding Box

DETR: Coco Dataset

Detection Inference

Object Detection

Dataset

Pipeline

Bounding Box

DETR: Coco Dataset

Detection Inference

DETR: Coco Dataset

DETR: Coco Dataset