Sentiment Analysis
What is Sentiment Analysis?
감성 분석은 주어진 텍스트의 극성을 식별하는 자연어 처리 기법입니다. 감성 분석에는 다양한 종류가 있지만 가장 널리 사용되는 기법 중 하나는 데이터를 긍정, 부정, 중립으로 분류하는 것입니다.
예를 들어 감성 분석은 다양한 애플리케이션에서 사용됩니다:
소셜 미디어 언급을 분석하여 사람들이 내 브랜드와 경쟁사에 대해 어떻게 이야기하는지 파악합니다.
설문조사와 제품 리뷰의 피드백을 분석하여 고객이 제품에 대해 좋아하는 것과 싫어하는 것에 대한 인사이트를 빠르게 얻을 수 있습니다.
실시간으로 수신되는 지원 티켓을 분석하여 불만 고객을 감지하고 그에 따른 조치를 취하여 고객 이탈을 방지하세요.
Hugging Face: Sentiment Analysis
Tokenizer & Model from Huggingface
Pipeline
Inference
Optional: Output Categorized
%pip install transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(
"mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
)
model = AutoModelForSequenceClassification.from_pretrained(
"mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
)
tokenizer_config.json: 100%|██████████| 333/333 [00:00<00:00, 748kB/s]
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.07MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 817kB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 1.46MB/s]
special_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 682kB/s]
config.json: 100%|██████████| 933/933 [00:00<00:00, 2.42MB/s]
model.safetensors: 100%|██████████| 328M/328M [00:17<00:00, 18.7MB/s]
Pipeline
nlp = pipeline(
"sentiment-analysis",
model=model,
tokenizer=tokenizer
)
Inference
text = """XYZ Corporation's stock soared by 20% after reporting
record-breaking annual profits and announcing a significant dividend
increase for shareholders.
"""
sentiment = nlp(text)
print(sentiment)
[{'label': 'positive', 'score': 0.9996968507766724}]
text = """Yesterday, the local botanical garden unveiled a rare
collection of orchids, attracting a large number of nature enthusiasts
and photographers excited to witness the unique floral display.
"""
sentiment = nlp(text)
print(sentiment)
[{'label': 'neutral', 'score': 0.9332515597343445}]
Output Categorized
Binary Output
Three Category
5 Stars
Binary Output
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("siebert/sentiment-roberta-large-english")
model = AutoModelForSequenceClassification.from_pretrained("siebert/sentiment-roberta-large-english")
nlp = pipeline(
"sentiment-analysis",
model=model,
tokenizer=tokenizer
)
text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This product is not good, neither bad."
for text in [text_good, text_bad, text_neutral]:
sentiment = nlp(text)
print(sentiment)
tokenizer_config.json: 100%|██████████| 256/256 [00:00<00:00, 695kB/s]
config.json: 100%|██████████| 687/687 [00:00<00:00, 2.09MB/s]
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.08MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 32.9MB/s]
special_tokens_map.json: 100%|██████████| 150/150 [00:00<00:00, 425kB/s]
pytorch_model.bin: 100%|██████████| 1.42G/1.42G [01:24<00:00, 16.9MB/s]
[{'label': 'POSITIVE', 'score': 0.9988435506820679}]
[{'label': 'NEGATIVE', 'score': 0.9995039701461792}]
[{'label': 'NEGATIVE', 'score': 0.999470055103302}]
Three Category
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")
model = AutoModelForSequenceClassification.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")
nlp = pipeline(
"sentiment-analysis",
model=model,
tokenizer=tokenizer
)
text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This is just a neutral question"
for text in [text_good, text_bad, text_neutral]:
sentiment = nlp(text)
print(sentiment)
tokenizer_config.json: 100%|██████████| 373/373 [00:00<00:00, 967kB/s]
vocab.txt: 100%|██████████| 996k/996k [00:00<00:00, 29.3MB/s]
tokenizer.json: 100%|██████████| 2.92M/2.92M [00:01<00:00, 2.53MB/s]
special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 396kB/s]
config.json: 100%|██████████| 759/759 [00:00<00:00, 2.12MB/s]
model.safetensors: 100%|██████████| 541M/541M [00:04<00:00, 115MB/s]
[{'label': 'positive', 'score': 0.9819211363792419}]
[{'label': 'negative', 'score': 0.9518771767616272}]
[{'label': 'neutral', 'score': 0.5191274285316467}]
5 Stars
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
nlp = pipeline(
"sentiment-analysis",
model=model,
tokenizer=tokenizer
)
text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This is just a neutral question"
for text in [text_good, text_bad, text_neutral]:
sentiment = nlp(text)
print(sentiment)
tokenizer_config.json: 100%|██████████| 39.0/39.0 [00:00<00:00, 35.2kB/s]
config.json: 100%|██████████| 953/953 [00:00<00:00, 2.63MB/s]
vocab.txt: 100%|██████████| 872k/872k [00:00<00:00, 4.55MB/s]
special_tokens_map.json: 100%|██████████| 112/112 [00:00<00:00, 354kB/s]
pytorch_model.bin: 100%|██████████| 669M/669M [00:06<00:00, 110MB/s]
[{'label': '4 stars', 'score': 0.4784605801105499}]
[{'label': '1 star', 'score': 0.7462039589881897}]
[{'label': '1 star', 'score': 0.3701825737953186}]
Last updated