AI-Master-Book
  • about AI-Master-Book
  • AI Master Book
    • 이상치 탐지 with Python
    • 베이지안 뉴럴네트워크 (BNN) with Python
    • 그래프 뉴럴네트워크 (GNN) with Python
    • 데이터 마케팅 분석 with Python
  • LLM MASTER BOOK
    • OpenAI API 쿡북 with Python
    • 기초부터 심화까지 RAG 쿡북 with Python
    • MCP 에이전트 쿡북 with Python
  • LLMs
    • OpenAI API
      • 1️⃣ChatCompletion
      • 2️⃣DALL-E
      • 3️⃣Text to Speech
      • 4️⃣Video to Transcripts
      • 5️⃣Assistants API
      • 6️⃣Prompt Engineering
      • 7️⃣OpenAI New GPT-4o
    • LangChain
      • LangChain Basic
        • 1️⃣Basic Modules
        • 2️⃣Model I/O
        • 3️⃣Prompts
        • 4️⃣Chains
        • 5️⃣Agents
        • 6️⃣Tools
        • 7️⃣Memory
      • LangChain Intermediate
        • 1️⃣OpenAI LLM
        • 2️⃣Prompt Template
        • 3️⃣Retrieval
        • 4️⃣RAG ChatBot
        • 5️⃣RAG with Gemini
        • 6️⃣New Huggingface-LangChain
        • 7️⃣Huggingface Hub
        • 8️⃣SQL Agent & Chain
        • 9️⃣Expression Language(LCEL)
        • 🔟Llama3-8B with LangChain
      • LangChain Advanced
        • 1️⃣LLM Evaluation
        • 2️⃣RAG Evaluation with RAGAS
        • 3️⃣LangChain with RAGAS
        • 4️⃣RAG Paradigms
        • 5️⃣LangChain: Advance Techniques
        • 6️⃣LangChain with NeMo-Guardrails
        • 7️⃣LangChain vs. LlamaIndex
        • 8️⃣LangChain LCEL vs. LangGraph
    • LlamaIndex
      • LlamaIndex Basic
        • 1️⃣Introduction
        • 2️⃣Customization
        • 3️⃣Data Connectors
        • 4️⃣Documents & Nodes
        • 5️⃣Naive RAG
        • 6️⃣Advanced RAG
        • 7️⃣Llama3-8B with LlamaIndex
        • 8️⃣LlmaPack
      • LlamaIndex Intermediate
        • 1️⃣QueryEngine
        • 2️⃣Agent
        • 3️⃣Evaluation
        • 4️⃣Evaluation-Driven Development
        • 5️⃣Fine-tuning
        • 6️⃣Prompt Compression with LLMLingua
      • LlamaIndex Advanced
        • 1️⃣Agentic RAG: Router Engine
        • 2️⃣Agentic RAG: Tool Calling
        • 3️⃣Building Agent Reasoning Loop
        • 4️⃣Building Multi-document Agent
    • Hugging Face
      • Huggingface Basic
        • 1️⃣Datasets
        • 2️⃣Tokenizer
        • 3️⃣Sentence Embeddings
        • 4️⃣Transformers
        • 5️⃣Sentence Transformers
        • 6️⃣Evaluate
        • 7️⃣Diffusers
      • Huggingface Tasks
        • NLP
          • 1️⃣Sentiment Analysis
          • 2️⃣Zero-shot Classification
          • 3️⃣Aspect-Based Sentiment Analysis
          • 4️⃣Feature Extraction
          • 5️⃣Intent Classification
          • 6️⃣Topic Modeling: BERTopic
          • 7️⃣NER: Token Classification
          • 8️⃣Summarization
          • 9️⃣Translation
          • 🔟Text Generation
        • Audio & Tabular
          • 1️⃣Text-to-Speech: TTS
          • 2️⃣Speech Recognition: Whisper
          • 3️⃣Audio Classification
          • 4️⃣Tabular Qustaion & Answering
        • Vision & Multimodal
          • 1️⃣Image-to-Text
          • 2️⃣Text to Image
          • 3️⃣Image to Image
          • 4️⃣Text or Image-to-Video
          • 5️⃣Depth Estimation
          • 6️⃣Image Classification
          • 7️⃣Object Detection
          • 8️⃣Segmentatio
      • Huggingface Optimization
        • 1️⃣Accelerator
        • 2️⃣Bitsandbytes
        • 3️⃣Flash Attention
        • 4️⃣Quantization
        • 5️⃣Safetensors
        • 6️⃣Optimum-ONNX
        • 7️⃣Optimum-NVIDIA
        • 8️⃣Optimum-Intel
      • Huggingface Fine-tuning
        • 1️⃣Transformer Fine-tuning
        • 2️⃣PEFT Fine-tuning
        • 3️⃣PEFT: Fine-tuning with QLoRA
        • 4️⃣PEFT: Fine-tuning Phi-2 with QLoRA
        • 5️⃣Axoltl Fine-tuning with QLoRA
        • 6️⃣TRL: RLHF Alignment Fine-tuning
        • 7️⃣TRL: DPO Fine-tuning with Phi-3-4k-instruct
        • 8️⃣TRL: ORPO Fine-tuning with Llama3-8B
        • 9️⃣Convert GGUF gemma-2b with llama.cpp
        • 🔟Apple Silicon Fine-tuning Gemma-2B with MLX
        • 🔢LLM Mergekit
    • Agentic LLM
      • Agentic LLM
        • 1️⃣Basic Agentic LLM
        • 2️⃣Multi-agent with CrewAI
        • 3️⃣LangGraph: Multi-agent Basic
        • 4️⃣LangGraph: Agentic RAG with LangChain
        • 5️⃣LangGraph: Agentic RAG with Llama3-8B by Groq
      • Autonomous Agent
        • 1️⃣LLM Autonomous Agent?
        • 2️⃣AutoGPT: Worldcup Winner Search with LangChain
        • 3️⃣BabyAGI: Weather Report with LangChain
        • 4️⃣AutoGen: Writing Blog Post with LangChain
        • 5️⃣LangChain: Autonomous-agent Debates with Tools
        • 6️⃣CAMEL Role-playing Autonomous Cooperative Agents
        • 7️⃣LangChain: Two-player Harry Potter D&D based CAMEL
        • 8️⃣LangChain: Multi-agent Bid for K-Pop Debate
        • 9️⃣LangChain: Multi-agent Authoritarian Speaker Selection
        • 🔟LangChain: Multi-Agent Simulated Environment with PettingZoo
    • Multimodal
      • 1️⃣PaliGemma: Open Vision LLM
      • 2️⃣FLUX.1: Generative Image
    • Building LLM
      • 1️⃣DSPy
      • 2️⃣DSPy RAG
      • 3️⃣DSPy with LangChain
      • 4️⃣Mamba
      • 5️⃣Mamba RAG with LangChain
      • 7️⃣PostgreSQL VectorDB with pgvorco.rs
Powered by GitBook
On this page
  • Depth Estimation
  • Other Full code on CPU
  1. LLMs
  2. Hugging Face
  3. Huggingface Tasks
  4. Vision & Multimodal

Depth Estimation

Depth Estimation

Depth Estimation 작업에는 이미지에 있는 물체의 깊이를 예측하는 작업이 포함됩니다. 이 작업은 3D 재구성, 증강 현실, 자율 주행, 로봇 공학 등의 애플리케이션에 매우 중요합니다. 깊이 추정 모델은 카메라로부터 이미지의 모든 픽셀의 상대적 거리를 결정하도록 훈련되며, 이를 흔히 깊이라고 합니다. 이러한 모델은 단안(단일 이미지) 또는 스테레오(다중 이미지) 입력을 사용하여 깊이를 추정합니다.

from transformers import pipeline

estimator = pipeline(
    task="depth-estimation", 
    model="Intel/dpt-large"
)
result = estimator(images="http://images.cocodataset.org/val2017/000000039769.jpg")
result
config.json:   0%|          | 0.00/942 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/1.37G [00:00<?, ?B/s]


Some weights of DPTForDepthEstimation were not initialized from the model checkpoint at Intel/dpt-large and are newly initialized: ['neck.fusion_stage.layers.0.residual_layer1.convolution1.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution1.weight', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.bias', 'neck.fusion_stage.layers.0.residual_layer1.convolution2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



preprocessor_config.json:   0%|          | 0.00/285 [00:00<?, ?B/s]





{'predicted_depth': tensor([[[ 6.3199,  6.3629,  6.4148,  ..., 10.4104, 10.5109, 10.3847],
          [ 6.3850,  6.3615,  6.4166,  ..., 10.4540, 10.4384, 10.4554],
          [ 6.3519,  6.3176,  6.3575,  ..., 10.4247, 10.4618, 10.4257],
          ...,
          [22.3772, 22.4624, 22.4227,  ..., 22.5207, 22.5593, 22.5293],
          [22.5073, 22.5148, 22.5115,  ..., 22.6604, 22.6345, 22.5871],
          [22.5177, 22.5275, 22.5218,  ..., 22.6282, 22.6216, 22.6108]]]),
 'depth': <PIL.Image.Image image mode=L size=640x480>}
result["depth"]
from PIL import Image
import requests

url = "https://unsplash.com/photos/HwBAsSbPBDU/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MzR8fGNhciUyMGluJTIwdGhlJTIwc3RyZWV0fGVufDB8MHx8fDE2Nzg5MDEwODg&force=true&w=640"
image = Image.open(requests.get(url, stream=True).raw)
image
from transformers import pipeline

checkpoint = "vinvino02/glpn-nyu"
depth_estimator = pipeline(
    "depth-estimation", 
    model=checkpoint
)
predictions = depth_estimator(image)
config.json:   0%|          | 0.00/920 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/245M [00:00<?, ?B/s]



preprocessor_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]
predictions["depth"]

Other Full code on CPU

from PIL import Image
import numpy as np
import requests
import torch

from transformers import DPTImageProcessor, DPTForDepthEstimation


"""
Here, the code loads a pre-trained image processor and model. 
low_cpu_mem_usage=True is an optional argument that reduces memory usage, 
useful for systems with limited resources.
"""

image_processor = DPTImageProcessor.from_pretrained("Intel/dpt-hybrid-midas")
model = DPTForDepthEstimation.from_pretrained(
    "Intel/dpt-hybrid-midas", 
    low_cpu_mem_usage=True
)

url = "https://images.unsplash.com/photo-1536048810607-3dc7f86981cb?q=80&w=1000&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxzZWFyY2h8Mnx8dmFsbGV5fGVufDB8fDB8fHww"
image = Image.open(
    requests.get(
        url, 
        stream=True
    ).raw)

"""
The image is processed using the DPTImageProcessor to convert it into
 a format suitable for the model. return_tensors="pt" specifies that 
the output should be PyTorch tensors.
"""
inputs = image_processor(
    images=image, 
    return_tensors="pt"
)

"""
Inference is performed without calculating gradients (torch.no_grad()), 
which is typical for inference to save memory and computation. 
The depth map is extracted from the outputs.
"""
with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth


"""
interpolate to original size
The depth map tensor is resized to match the original image dimensions 
using bicubic interpolation, which helps in smoothing the resized image.
"""
prediction = torch.nn.functional.interpolate(
    predicted_depth.unsqueeze(1),
    size=image.size[::-1],
    mode="bicubic",
    align_corners=False,
)

# visualize the prediction
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)
preprocessor_config.json:   0%|          | 0.00/382 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/9.88k [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/490M [00:00<?, ?B/s]
depth
PreviousText or Image-to-VideoNextImage Classification

Last updated 1 year ago

5️⃣