Depth Estimation
Depth Estimation
Depth Estimation 작업에는 이미지에 있는 물체의 깊이를 예측하는 작업이 포함됩니다. 이 작업은 3D 재구성, 증강 현실, 자율 주행, 로봇 공학 등의 애플리케이션에 매우 중요합니다. 깊이 추정 모델은 카메라로부터 이미지의 모든 픽셀의 상대적 거리를 결정하도록 훈련되며, 이를 흔히 깊이라고 합니다. 이러한 모델은 단안(단일 이미지) 또는 스테레오(다중 이미지) 입력을 사용하여 깊이를 추정합니다.
from transformers import pipeline
estimator = pipeline(
task="depth-estimation",
model="Intel/dpt-large"
)
result = estimator(images="http://images.cocodataset.org/val2017/000000039769.jpg")
result
config.json: 0%| | 0.00/942 [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/1.37G [00:00<?, ?B/s]
Some weights of DPTForDepthEstimation were not initialized from the model checkpoint at Intel/dpt-large and are newly initialized: ['neck.fusion_stage.layers.0.residual_layer1.convolution1.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution1.weight', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
preprocessor_config.json: 0%| | 0.00/285 [00:00<?, ?B/s]
{'predicted_depth': tensor([[[ 6.3199, 6.3629, 6.4148, ..., 10.4104, 10.5109, 10.3847],
[ 6.3850, 6.3615, 6.4166, ..., 10.4540, 10.4384, 10.4554],
[ 6.3519, 6.3176, 6.3575, ..., 10.4247, 10.4618, 10.4257],
...,
[22.3772, 22.4624, 22.4227, ..., 22.5207, 22.5593, 22.5293],
[22.5073, 22.5148, 22.5115, ..., 22.6604, 22.6345, 22.5871],
[22.5177, 22.5275, 22.5218, ..., 22.6282, 22.6216, 22.6108]]]),
'depth': <PIL.Image.Image image mode=L size=640x480>}
result["depth"]
from PIL import Image
import requests
url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
image = Image.open(requests.get(url, stream=True).raw)
image
from transformers import pipeline
checkpoint = "vinvino02/glpn-nyu"
depth_estimator = pipeline(
"depth-estimation",
model=checkpoint
)
predictions = depth_estimator(image)
config.json: 0%| | 0.00/920 [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/245M [00:00<?, ?B/s]
preprocessor_config.json: 0%| | 0.00/137 [00:00<?, ?B/s]
predictions["depth"]
Other Full code on CPU
from PIL import Image
import numpy as np
import requests
import torch
from transformers import DPTImageProcessor, DPTForDepthEstimation
"""
Here, the code loads a pre-trained image processor and model.
low_cpu_mem_usage=True is an optional argument that reduces memory usage,
useful for systems with limited resources.
"""
image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
model = DPTForDepthEstimation.from_pretrained(
"Intel/dpt-hybrid-midas",
low_cpu_mem_usage=True
)
url = "https://images.unsplash.com/photo-1536048810607-3dc7f86981cb?q=80&w=1000&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxzZWFyY2h8Mnx8dmFsbGV5fGVufDB8fDB8fHww"
image = Image.open(
requests.get(
url,
stream=True
).raw)
"""
The image is processed using the DPTImageProcessor to convert it into
a format suitable for the model. return_tensors="pt" specifies that
the output should be PyTorch tensors.
"""
inputs = image_processor(
images=image,
return_tensors="pt"
)
"""
Inference is performed without calculating gradients (torch.no_grad()),
which is typical for inference to save memory and computation.
The depth map is extracted from the outputs.
"""
with torch.no_grad():
outputs = model(**inputs)
predicted_depth = outputs.predicted_depth
"""
interpolate to original size
The depth map tensor is resized to match the original image dimensions
using bicubic interpolation, which helps in smoothing the resized image.
"""
prediction = torch.nn.functional.interpolate(
predicted_depth.unsqueeze(1),
size=image.size[::-1],
mode="bicubic",
align_corners=False,
)
# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)
preprocessor_config.json: 0%| | 0.00/382 [00:00<?, ?B/s]
config.json: 0%| | 0.00/9.88k [00:00<?, ?B/s]
pytorch_model.bin: 0%| | 0.00/490M [00:00<?, ?B/s]
depth
Last updated