FLUX.1: Generative Image

PreviousPaliGemma: Open Vision LLM NextBuilding LLM

Last updated 9 months ago

FLUX.1: Generative Image

FLUX.1 특징

1. Speed and Efficiency

FLUX.1은 이미지를 빠르게 생성하도록 설계되어 StableDiffusion, Midjourney, Colors, Aura 같은 경쟁사보다 빠른 속도를 자랑합니다. 이 모델은 세 가지 버전으로 제공됩니다:

FLUX.1[Schnell]: Flux Schnell은 품질은 낮지만 Pro 모델보다 약 10배 빠른 속도로 이미지를 생성합니다.
FLUX.1[Dev]: Flux Dev는 개발자를 위해 맞춤화되어 이미지 간 생성 같은 고급 기능을 지원합니다.
FLUX.1[Pro]: 120억 개의 파라미터를 지원하는 가장 강력한 버전인 Flux Pro는 비공개 소스이며 API를 통해 사용할 수 있습니다.

2. Prompt Adherence and Quality

Flux.1의 뛰어난 기능 중 하나는 뛰어난 프롬프트 준수입니다. 간단한 프롬프트든 복잡한 프롬프트를 사용하든 이 모델은 입력 설명과 거의 일치하는 고품질 이미지를 일관되게 제공합니다. 예를 들어, "카메라를 바라보는 고양이, 어안 렌즈"와 같은 간단한 프롬프트는 Midjourney V6의 결과와 비슷한 결과를 생성합니다. 보다 복잡한 프롬프트는 장면 내 사물의 배치와 디테일을 놀라울 정도로 정확하게 지시할 수 있습니다.

FLUX.1 사용 방법

1. FLUX1 AI 공식홈

FLUX1 AI 공식 홈에서 schnell을 Free로 사용할 수 있으나 속도는 느리다. Dev, Pro를 사용하려면 월구독제를 사용해야 한다.

2. Replicate

Replicate는 사용자가 클라우드에서 머신 러닝 모델을 실행할 수 있는 사용자 친화적인 플랫폼입니다. Flux Point1은 Replicate에서 무료로 액세스하고 테스트할 수 있습니다.

3. Poe

https://poe.com/ 에서 FLUX models을 선택하여 활용

4. Seaart.ai

Searart.ai는 최근 추가된 FLUX.1 모델을 포함한 다양한 확산 모델로 구동되는 이미지 생성 도구 모음을 제공하는 AI 플랫폼입니다. 이 플랫폼을 통해 사용자는 고품질 이미지를 무료로 손쉽게 생성할 수 있습니다. Seaart.ai는 매일 약 150개의 크레딧을 제공하며, 각 이미지 생성 작업에는 화면 비율에 따라 약 1크레딧이 소요됩니다. 이미지 대 이미지 작업에는 크레딧이 약간 더 필요할 수 있습니다.

5. Fal.ai

Fal.ai는 FLUX.1에 액세스할 수 있는 또 다른 플랫폼입니다. 프로세스는 다른 플랫폼과 유사하여 텍스트 프롬프트를 사용하여 이미지를 쉽게 생성할 수 있습니다.

6. Toast AI

Toast AI는 FLUX.1을 포함한 최신 AI 논문과 모델을 호스팅하는 무료 오픈 소스 플랫폼입니다. FLUx.1 개발 모델은 애니메이션을 포함한 다양한 스타일의 이미지를 생성하는 데 사용할 수 있습니다.

7. API Access

고급 또는 상업적 용도로 사용하려면 Black Forest Labs에서 제공하는 API를 통해 FLUX.1 [pro]에 직접 액세스할 수 있습니다. https://flux-ai.io 에 가입 후 API Key를 발행하여 유료로 사용하면 된다.

8. Hugging Face

Huggingface는 FLUX.1을 기반으로 하는 슈넬 모델의 평가판을 제공합니다. 이 플랫폼은 조정 가능한 설정으로 모델을 테스트할 수 있는 편리한 방법을 제공합니다.

Hugging Face Pipeline

Dependencies

%pip install transformers diffusers sentencepiece accelerate protobuf

import torch
from diffusers import FluxPipeline
import diffusers
from PIL import Image
import matplotlib.pyplot as plt

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

Flux Rope

# CUDA를 처리하도록 로프 함수 수정하기
_flux_rope = diffusers.models.transformers.transformer_flux.rope

def new_flux_rope(pos: torch.Tensor, dim: int, theta: int) -> torch.Tensor:
    assert dim % 2 == 0, "The dimension must be even."
    if pos.device.type == "cuda":
        return _flux_rope(pos.to("cpu"), dim, theta).to(device=pos.device)
    else:
        return _flux_rope(pos, dim, theta)

    diffusers.models.transformers.transformer_flux.rope = new_flux_rope

Pipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    revision='refs/pr/1',
    torch_dtype=torch.bfloat16
).to('cuda')

Loading pipeline components...:   0%|          | 7/7 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 2/2 [00:00<?, ?it/s]

Prompt

prompt = "A modern, minimalist Korean girl with dressed punk & goth look."

Generative Image

# Generate the image
out = pipe(
    prompt=prompt,
    guidance_scale=0.,
    height=1024,
    width=1024,
    num_inference_steps=4,
    max_sequence_length=256,
).images[0]

Save & Display

# Save the generated image
out.save("gen_girl.png")

# Display the generated image
image = Image.open("gen_girl.png")
plt.imshow(image)
plt.axis('off')
plt.show()

FLUX.1 Prompt Structure and Components 노하우

아래의 요소를 프롬프트에 반영하면 더 섬세하고 구체적인 이미지를 생성할 수 있습니다.

Subject: The main focus of the image.
Style: The artistic approach or visual aesthetic.
Composition: How elements are arranged within the frame.
Lighting: The type and quality of light in the scene.
Color Palette: The dominant colors or color scheme.
Mood/Atmosphere: The emotional tone or ambiance of the image.
Technical Details: Camera settings, perspective, or specific visual techniques.
Additional Elements: Supporting details or background information.