Text or Image-to-Video

포옹하는 얼굴의 텍스트 또는 이미지-비디오 변환 작업에는 텍스트 설명 또는 이미지에서 비디오를 생성하는 작업이 포함됩니다.

텍스트-비디오 측면의 경우, 텍스트 설명을 비디오 콘텐츠로 변환하는 프로세스가 포함됩니다. 여기에는 제공된 텍스트를 기반으로 장면, 애니메이션 또는 전체 동영상을 생성하는 작업이 포함될 수 있습니다.

예를 들어, 스토리나 대본이 주어지면 모델은 내러티브를 시각적으로 표현하는 비디오를 만들 수 있습니다.

import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

pipeline = I2VGenXLPipeline.from_pretrained(
    "ali-vilab/i2vgen-xl", 
    torch_dtype=torch.float16, 
    variant="fp16"
)
pipeline.to('cuda')

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png"
image = load_image(image_url).convert("RGB")

prompt = "Papers were floating in the air on a table in the library"
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    guidance_scale=9.0,
    generator=generator
).frames[0]

/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(



model_index.json:   0%|          | 0.00/555 [00:00<?, ?B/s]



Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]



(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/466 [00:00<?, ?B/s]



image_encoder/config.json:   0%|          | 0.00/563 [00:00<?, ?B/s]



scheduler/scheduler_config.json:   0%|          | 0.00/507 [00:00<?, ?B/s]



tokenizer/special_tokens_map.json:   0%|          | 0.00/588 [00:00<?, ?B/s]



text_encoder/config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]



tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]



model.fp16.safetensors:   0%|          | 0.00/1.26G [00:00<?, ?B/s]



model.fp16.safetensors:   0%|          | 0.00/706M [00:00<?, ?B/s]



unet/config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]



tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]



vae/config.json:   0%|          | 0.00/637 [00:00<?, ?B/s]



tokenizer/tokenizer_config.json:   0%|          | 0.00/705 [00:00<?, ?B/s]



diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/2.84G [00:00<?, ?B/s]



diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]



Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]



  0%|          | 0/50 [00:00<?, ?it/s]

export_to_gif(frames, "dataset/i2v.gif")

'dataset/i2v.gif'

from IPython.display import Image

display(Image(filename="dataset/i2v.gif"))

<IPython.core.display.Image object>

PreviousImage to Image NextDepth Estimation

Last updated 1 year ago

Text or Image-to-Video

포옹하는 얼굴의 텍스트 또는 이미지-비디오 변환 작업에는 텍스트 설명 또는 이미지에서 비디오를 생성하는 작업이 포함됩니다.

예를 들어, 스토리나 대본이 주어지면 모델은 내러티브를 시각적으로 표현하는 비디오를 만들 수 있습니다.

import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image

pipeline = I2VGenXLPipeline.from_pretrained(
    "ali-vilab/i2vgen-xl", 
    torch_dtype=torch.float16, 
    variant="fp16"
)
pipeline.to('cuda')

image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0009.png"
image = load_image(image_url).convert("RGB")

prompt = "Papers were floating in the air on a table in the library"
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)

frames = pipeline(
    prompt=prompt,
    image=image,
    num_inference_steps=50,
    negative_prompt=negative_prompt,
    guidance_scale=9.0,
    generator=generator
).frames[0]

/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(



model_index.json:   0%|          | 0.00/555 [00:00<?, ?B/s]



Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]



(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/466 [00:00<?, ?B/s]



image_encoder/config.json:   0%|          | 0.00/563 [00:00<?, ?B/s]



scheduler/scheduler_config.json:   0%|          | 0.00/507 [00:00<?, ?B/s]



tokenizer/special_tokens_map.json:   0%|          | 0.00/588 [00:00<?, ?B/s]



text_encoder/config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]



tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]



model.fp16.safetensors:   0%|          | 0.00/1.26G [00:00<?, ?B/s]



model.fp16.safetensors:   0%|          | 0.00/706M [00:00<?, ?B/s]



unet/config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]



tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]



vae/config.json:   0%|          | 0.00/637 [00:00<?, ?B/s]



tokenizer/tokenizer_config.json:   0%|          | 0.00/705 [00:00<?, ?B/s]



diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/2.84G [00:00<?, ?B/s]



diffusion_pytorch_model.fp16.safetensors:   0%|          | 0.00/167M [00:00<?, ?B/s]



Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]



  0%|          | 0/50 [00:00<?, ?it/s]

export_to_gif(frames, "dataset/i2v.gif")

'dataset/i2v.gif'

from IPython.display import Image

display(Image(filename="dataset/i2v.gif"))

<IPython.core.display.Image object>