Zero-shot Classification

Zero-shot은 학습 중에 모델이 보지 못한 것을 예측하는 작업입니다. 사전 학습된 언어 모델을 활용하는 이 방법은 일반적으로 한 작업에 대해 학습된 모델을 원래 학습된 목적과 다른 애플리케이션에서 사용하는 것을 의미하는 전이 학습의 한 사례로 생각할 수 있습니다.

이 방법은 라벨링된 데이터의 양이 적은 상황에서 특히 유용합니다.제로 샷 분류에서는 모델에 자연어로 모델에 수행할 작업을 설명하는 프롬프트와 텍스트 시퀀스를 제공합니다. 제로 샷 분류에서는 원하는 작업이 완료된 예는 모두 제외됩니다. 이러한 작업에는 선택한 작업의 단일 또는 몇 가지 예가 포함되므로 단일 또는 소수 샷 분류와는 다릅니다.

Zero, Single, Few-shot 분류는 대규모 언어 모델에서 새롭게 등장한 기능인 것 같습니다. 이 기능은 모델 크기가 1억 개 이상의 매개변수일 때 나타나는 것으로 보입니다. 제로, 단일 또는 소수 샷 작업에서 모델의 효율성은 모델 크기에 따라 확장되는 것으로 보이며, 이는 일반적으로 더 큰 모델(학습 가능한 매개 변수 또는 레이어가 더 많은 모델)이 이 작업에서 더 잘 수행한다는 것을 의미합니다.

Import Transformer

# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Tokenizer & Model

tokenizer = AutoTokenizer.from_pretrained("MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli")
model = AutoModelForSequenceClassification.from_pretrained("MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli")

tokenizer_config.json: 100%|██████████| 1.28k/1.28k [00:00<00:00, 2.60MB/s]
spm.model: 100%|██████████| 2.46M/2.46M [00:01<00:00, 2.10MB/s]
tokenizer.json: 100%|██████████| 8.66M/8.66M [00:01<00:00, 6.38MB/s]
added_tokens.json: 100%|██████████| 23.0/23.0 [00:00<00:00, 49.3kB/s]
special_tokens_map.json: 100%|██████████| 286/286 [00:00<00:00, 804kB/s]
config.json: 100%|██████████| 1.09k/1.09k [00:00<00:00, 3.62MB/s]
model.safetensors: 100%|██████████| 369M/369M [00:21<00:00, 17.3MB/s]

Pipeline

classifier = pipeline(
    "zero-shot-classification", 
    model=model, 
    tokenizer=tokenizer
)

Inference

Multi-label Classification

text = """
A group of astronomers gathered at the observatory to witness the rare celestial event. As they peered through the telescopes, they observed the alignment of several planets, creating a breathtaking view against the night sky. The event, which hadn't occurred for decades, attracted enthusiasts and experts alike, all eager to record and study this astronomical phenomenon.
"""

candidate_labels = ["astronomy","cooking","science","space","music"]
print(classifier(text, candidate_labels, multi_label=True))

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'sequence': "\nA group of astronomers gathered at the observatory to witness the rare celestial event. As they peered through the telescopes, they observed the alignment of several planets, creating a breathtaking view against the night sky. The event, which hadn't occurred for decades, attracted enthusiasts and experts alike, all eager to record and study this astronomical phenomenon.\n", 'labels': ['science', 'astronomy', 'space', 'music', 'cooking'], 'scores': [0.9984110593795776, 0.9979022145271301, 0.9921674728393555, 0.0003671070153359324, 0.00015695678303018212]}

Single-label Classification

print(classifier(text, candidate_labels, multi_label=False)["labels"])
print(classifier(text, candidate_labels, multi_label=False)["scores"])

['astronomy', 'science', 'space', 'music', 'cooking']
[0.4273662865161896, 0.4104445278644562, 0.1616421341896057, 0.00030228393734432757, 0.00024482482695020735]

PreviousSentiment Analysis NextAspect-Based Sentiment Analysis

Last updated 1 year ago