Diffusers

What is Hugging Face Diffusers?

허깅페이스의 diffuser 라이브러리는 Computer vision에서 Generative 알고리즘인 diffusion의 사전 학습된 확산 모델, 이미지, 동영상 및 오디오 생성 파이프라인을 위한 최고의 라이브러리 중 하나입니다.

최근에 이미지/영상 생성에서 가장 많이 사용하는 Stable diffusion이 diffusion의 한 종류 입니다.

Huggingface에서 제공하는 diffusers는 사용하기 쉽고 다양한 옵션을 선택할 수 있으며 사용자 정의가 가능합니다. 몇 줄의 코드만으로 안정적인 확산 모델을 사용하여 이미지를 생성하거나 수정할 수 있습니다.

Setup Environments

%pip install diffusers
%pip install accelerate

Text-to-Image Pipeline

안정적 확산 모델을 사용하기 위한 다양한 작업에 대한 여러 파이프라인을 제공합니다. 가장 일반적인 사용 사례는 텍스트-이미지 생성으로, StableDiffusionPipelin을 제공합니다.

import torch
from diffusers import StableDiffusionPipeline

seed=10

# Set computation device.
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

Pipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,
    variant="fp16"
).to(device)

model_index.json:   0%|          | 0.00/537 [00:00<?, ?B/s]



Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]



(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]



tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]



tokenizer/tokenizer_config.json:   0%|          | 0.00/824 [00:00<?, ?B/s]



text_encoder/config.json:   0%|          | 0.00/633 [00:00<?, ?B/s]



tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]



tokenizer/special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]



unet/config.json:   0%|          | 0.00/939 [00:00<?, ?B/s]



vae/config.json:   0%|          | 0.00/611 [00:00<?, ?B/s]



model.fp16.safetensors:   0%|          | 0.00/681M [00:00<?, ?B/s]



scheduler/scheduler_config.json:   0%|          | 0.00/345 [00:00<?, ?B/s]



diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]



diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/1.73G [00:00<?, ?B/s]



Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

Text-to-Image Prompt

safetensor는 저장된 모델의 텐서를 저장하고 로드하기 위한 새롭고 안전한 파일 형식입니다. 이 새로운 형식은 저장된 모델에 악성 코드를 저장하기 어렵게 만들며, 이는 생성형 AI의 발전과 함께 그 중요성이 점점 더 커지고 있습니다. 표시된 코드 블록에서 use_safetensors=Tru를 전달하여 safetensor를 로드합니다.

# Prompting to generate image.
prompt = "A photo of a large airplane in the middle of a storm and lightning, highly detailed, unreal engine effect"
image = pipe(
    prompt, 
    num_inference_steps=150, 
    generator=torch.manual_seed(seed)
).images[0]
 
image

  0%|          | 0/150 [00:00<?, ?it/s]

Negative Prompt

이미지를 생성할 때 왜곡되고 흐릿하며 매력적이지 않은 이미지를 가능한 한 최소화하고 싶을 것입니다. 이를 위해 디퓨저의 파이프라인은 이미지에서 원하지 않는 부분을 나타내는 프롬프트 문자열을 허용하는 negative_promp 인수를 제공합니다.

prompt = "A photo of a large airplane in the middle of a storm and lightning, highly detailed, unreal engine effect"
image = pipe(
    prompt,
    num_inference_steps=150,
    generator=torch.manual_seed(seed),
    negative_prompt='low resolution, distorted, ugly, deformed, disfigured, poor details'
).images[0]
image

  0%|          | 0/150 [00:00<?, ?it/s]

Swapping Schedulers

Stable Diffusion 모델은 각 시간 단계에서 스케줄링 기법을 사용하여 이미지를 생성합니다. 스케줄러에 대한 자세한 내용은 노이즈 제거 확산 확률론적 모델 문서에서 확인할 수 있습니다. 또한 이러한 스케줄러를 교체하여 동일한 프롬프트를 사용하여 다른 이미지를 생성할 수도 있습니다. 일반적으로 일부 스케줄러는 다른 스케줄러보다 더 잘 작동합니다. 하지만 대부분의 경우 기본 설정인 DDIMSchedulr를 그대로 두는 것이 안전합니다.

EulerAncestralDiscreteSchedule는 매우 높은 품질의 이미지를 생성합니다

from diffusers import EulerAncestralDiscreteScheduler

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

image = pipe(
    prompt,
    num_inference_steps=150,
    generator=torch.manual_seed(seed),
    negative_prompt='low resolution, distorted, ugly, deformed, disfigured, poor details'
).images[0]
 
image

  0%|          | 0/150 [00:00<?, ?it/s]

Image-to-Image Pipeline

한 이미지의 스타일을 생성된 다른 이미지로 옮기고 싶을 때가 있습니다. 이를 흔히 스타일 전송이라고 합니다. StableDiffusionImg2ImgPipelin을 사용하여 포옹하는 얼굴 디퓨저 라이브러리로 이를 수행할 수 있습니다.

Dataset

!mkdir dataset
!wget https://learnopencv.com/wp-content/uploads/2024/03/abstract_art_1.jpg -O ./dataset/abstract_1.jpg

mkdir: cannot create directory ‘dataset’: File exists
--2024-05-22 23:39:37--  https://learnopencv.com/wp-content/uploads/2024/03/abstract_art_1.jpg
Resolving learnopencv.com (learnopencv.com)... 172.66.42.215, 172.66.41.41, 2606:4700:3108::ac42:2929, ...
Connecting to learnopencv.com (learnopencv.com)|172.66.42.215|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 114129 (111K) [image/jpeg]
Saving to: ‘./dataset/abstract_1.jpg’

./dataset/abstract_ 100%[===================>] 111.45K  --.-KB/s    in 0.002s  

2024-05-22 23:39:37 (45.4 MB/s) - ‘./dataset/abstract_1.jpg’ saved [114129/114129]

from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline

Pipeline

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    'stabilityai/stable-diffusion-2-1',
    torch_dtype=torch.float16
).to(device)
 
image_path = 'dataset/abstract_1.jpg'
 
init_image = Image.open(image_path).convert("RGB")
init_image = init_image.resize((500, 500))
 
init_image

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

Prompt

prompt = "A real landscape, with starry sky, moonlit river, trending on artstation"
 
image = pipe(
    prompt=prompt,
    image=init_image,
    num_inference_steps=150,
    strength=1,
    generator=torch.manual_seed(seed)
).images[0]
 
image

  0%|          | 0/150 [00:00<?, ?it/s]

Diffusers AutoPipeline

diffuser는 제공된 모델과 인수를 기반으로 작업을 감지하는 자동 파이프라인을 제공합니다.

어떤 모델이 기여했는지 또는 안정적인 확산 변형인지 여부에 관계없이 작업에 따라 Hugging Face Hub의 모든 모델 경로를 전달할 수 있습니다.

Text-to-Image

from diffusers import AutoPipelineForText2Image

pipe = AutoPipelineForText2Image.from_pretrained(
    'stabilityai/stable-diffusion-2-1',
    torch_dtype=torch.float16,
    use_safetensors=True
).to(device)

prompt = "A white dog, anime style, realism, detailed line art, fine details, solid lines"
 
image = pipe(
    prompt,
    num_inference_steps=150,
    generator=torch.manual_seed(seed)
).images[0]
 
image

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]



  0%|          | 0/150 [00:00<?, ?it/s]

Image-to-Image

from diffusers import AutoPipelineForImage2Image
 
pipe = AutoPipelineForImage2Image.from_pretrained(
    'stabilityai/stable-diffusion-2-1',
    torch_dtype=torch.float16,
    use_safetensors=True,
).to(device)
 
image_path = 'dataset/abstract_1.jpg'
 
init_image = Image.open(image_path).convert("RGB")
init_image

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

prompt = "a painting of sand dunes in a desert, with sun rising on the horizon, sand particle style, pastel drawing, on canvas"
 
image = pipe(
    prompt,
    init_image,
    num_inference_steps=150,
    strength=1.0,
    generator=torch.manual_seed(seed)
).images[0]
 
image

  0%|          | 0/150 [00:00<?, ?it/s]

Image Inpainting

인페인팅은 이미지에 마스크를 생성하고 프롬프트를 제공한 다음 생성 AI 모델이 프롬프트에 따라 마스크 대신 객체를 생성하기를 기대하는 작업입니다. 간단해 보이지만 바닐라 이미지 생성 모델을 훈련하는 것과는 훈련 전략이 다릅니다. 추가적인 미세 조정과 데이터가 필요합니다.

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipeline = AutoPipelineForInpainting.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,
    use_safetensors=True
).to(device)

Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]

img_url = "https://www.dropbox.com/scl/fi/oymummr3q4i86fbx79qv4/car.jpg?rlkey=8aogj5fwb0lrzd5u36rb3ifih&dl=1"
mask_url = "https://www.dropbox.com/scl/fi/le25tqopqapligom78myp/car_mask.png?rlkey=8t556dfxi1mr543j5fdwcg5mc&dl=1"
 
init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")

init_image

mask_image

파이프라인을 실행할 때 바이너리 마스크를 허용하는 추가 마스크_이미지 인수를 제공합니다.

prompt = "A glass house in the middle of a ground"
image = pipeline(
    prompt,
    image=init_image,
    mask_image=mask_image,
    num_inference_steps=150,
    generator=torch.manual_seed(seed)
).images[0]
 
image

  0%|          | 0/150 [00:00<?, ?it/s]

SXDL Inpainting

from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image, make_image_grid
import torch

pipe = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1", 
    torch_dtype=torch.float16, 
    variant="fp16"
).to("cuda")

model_index.json:   0%|          | 0.00/690 [00:00<?, ?B/s]



Fetching 18 files:   0%|          | 0/18 [00:00<?, ?it/s]



text_encoder/config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]



scheduler/scheduler_config.json:   0%|          | 0.00/479 [00:00<?, ?B/s]



tokenizer/tokenizer_config.json:   0%|          | 0.00/737 [00:00<?, ?B/s]



text_encoder_2/config.json:   0%|          | 0.00/758 [00:00<?, ?B/s]



tokenizer/special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]



tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]



tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]



tokenizer_2/tokenizer_config.json:   0%|          | 0.00/725 [00:00<?, ?B/s]



model.fp16.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]



tokenizer_2/special_tokens_map.json:   0%|          | 0.00/460 [00:00<?, ?B/s]



unet/config.json:   0%|          | 0.00/1.93k [00:00<?, ?B/s]



vae/config.json:   0%|          | 0.00/659 [00:00<?, ?B/s]



diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]



model.fp16.safetensors:   0%|          | 0.00/1.39G [00:00<?, ?B/s]



diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/5.14G [00:00<?, ?B/s]



Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]


The config attributes {'decay': 0.9999, 'inv_gamma': 1.0, 'min_decay': 0.0, 'optimization_step': 37000, 'power': 0.6666666666666666, 'update_after_step': 0, 'use_ema_warmup': False} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
 
 
input_image = load_image(img_url).resize((1024, 1024))
mask_image = load_image(mask_url).resize((1024, 1024))

prompt = "a cute baby lion sitting on a park bench"
negative_prompt = "bad anatomy, deformed, ugly, disfigured"

generator = torch.Generator(device="cuda").manual_seed(0)

NUM_INFERENCE_STES:

이미지 생성 중 노이즈 제거 단계 수를 제어합니다.
단계가 높을수록 더 많은 반복을 통해 출력을 세분화하여 이미지 품질을 향상시키지만 응답 시간이 길어집니다.

strength:

기본 이미지에 추가되는 노이즈 레벨을 결정하여 칠해진 영역이 원본과 얼마나 유사한지에 영향을 줍니다.
값이 1이면 원본의 흔적이 남지 않고, 0이면 원본 영역을 재구성합니다.
강도가 높을수록 노이즈와 베이스와의 편차가 커져 창의성이 향상되지만 처리 시간이 더 많이 소요됩니다.

guidance_scale:

텍스트 프롬프트와 생성된 이미지 사이의 정렬을 나타냅니다.
값이 클수록 프롬프트를 더 엄격하게 준수하므로 창의성이 떨어집니다.
값이 낮을수록 더 다양하고 창의적인 출력이 가능합니다.

negative_prompt:

이미지 생성 시 피해야 할 사항을 모델에 안내하는 역할을 합니다.
원치 않는 시각적 요소를 제거하여 최소한의 노력으로 이미지 품질을 향상시키는 데 도움이 됩니다.

padding_mask_crop:

이미지와 마스크 모두에서 지정된 패딩을 사용하여 마스크 영역을 자릅니다.
자른 영역이 업스케일링되어 원본 이미지에 오버레이되므로 복잡한 과정 없이 품질을 향상시킬 수 있습니다.

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=input_image,
    mask_image=mask_image,
    guidance_scale=8.0, # 텍스트 프롬프트와 생성된 이미지 사이의 Alignment에 영향을 줍니다.
    num_inference_steps=20,  # steps 15~30 
    strength=0.99,  #  `strength` below 1.0
    generator=generator,
    padding_mask_crop=32 #Crops masked area + padding from image and mask.
).images[0]

  0%|          | 0/19 [00:00<?, ?it/s]

make_image_grid(
    [input_image,mask_image,image], 
    rows=1, 
    cols=3
)

PreviousEvaluate NextHuggingface Tasks

Last updated 1 year ago