Text to Image

텍스트-이미지 변환 작업은 텍스트 설명에서 시각적 표현(이미지)을 생성하는 작업입니다. 이 과정은 이미지를 설명하는 텍스트 입력으로 시작됩니다. 간단한 설명부터 복잡하고 추상적인 개념까지 다양한 내용이 포함될 수 있습니다. 모델은 텍스트를 처리하여 내용을 이해한 다음 설명과 일치하는 이미지를 생성합니다. 여기에는 텍스트의 의미를 이해하고, 설명된 요소를 시각화하여 일관된 이미지로 조립하는 작업이 포함됩니다.

Stable Diffusion

Diffusion 관련 Diffuser 라이브러리는 Huggingface Basic의 Diffuser 세션 참조

https://www.loudai.net/huggingface/huggingface-basic/diffusers

%pip install diffusers

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", 
    torch_dtype=torch.float32, 
    use_safetensors=True
)
pipe.to("cuda")
prompt = "super dog riding a red horse"

image = pipe(prompt=prompt).images[0]
image

/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(



Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]



  0%|          | 0/50 [00:00<?, ?it/s]

image.save("dataset/dog_horse_output.png")

Diffusion UNet

import torch
from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, EulerDiscreteScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

base = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ByteDance/SDXL-Lightning"
ckpt = "sdxl_lightning_4step_unet.safetensors" # Use the correct ckpt for your step setting!

# Load model and move to CPU with single precision
unet = UNet2DConditionModel.from_config(
    base, 
    subfolder="unet").to("cuda")
unet.load_state_dict(
    load_file(
        hf_hub_download(
            repo, 
            ckpt
        ), 
        device="cuda"
    )
)

# Initialize pipeline with GPU and single precision
pipe = StableDiffusionXLPipeline.from_pretrained(
    base, 
    unet=unet, 
    torch_dtype=torch.float32).to("cuda")

# Ensure sampler uses "trailing" timesteps
pipe.scheduler = EulerDiscreteScheduler.from_config(
    pipe.scheduler.config, 
    timestep_spacing="trailing"
)

# Generate image with specified inference steps and CFG scale
result = pipe(
    "A cat skating", 
    num_inference_steps=4, 
    guidance_scale=0
)
image = result.images[0]
image.save("dataset/output.png")

/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/diffusers/configuration_utils.py:244: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a model, please use <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'>.load_config(...) followed by <class 'diffusers.models.unets.unet_2d_condition.UNet2DConditionModel'>.from_config(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)



sdxl_lightning_4step_unet.safetensors:   0%|          | 0.00/5.14G [00:00<?, ?B/s]



Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]



  0%|          | 0/4 [00:00<?, ?it/s]

image

PreviousImage-to-Text NextImage to Image

Last updated 1 year ago