Sentiment Analysis

What is Sentiment Analysis?

감성 분석은 주어진 텍스트의 극성을 식별하는 자연어 처리 기법입니다. 감성 분석에는 다양한 종류가 있지만 가장 널리 사용되는 기법 중 하나는 데이터를 긍정, 부정, 중립으로 분류하는 것입니다.

예를 들어 감성 분석은 다양한 애플리케이션에서 사용됩니다:

소셜 미디어 언급을 분석하여 사람들이 내 브랜드와 경쟁사에 대해 어떻게 이야기하는지 파악합니다.
설문조사와 제품 리뷰의 피드백을 분석하여 고객이 제품에 대해 좋아하는 것과 싫어하는 것에 대한 인사이트를 빠르게 얻을 수 있습니다.
실시간으로 수신되는 지원 티켓을 분석하여 불만 고객을 감지하고 그에 따른 조치를 취하여 고객 이탈을 방지하세요.

Hugging Face: Sentiment Analysis

Tokenizer & Model from Huggingface
Pipeline
Inference
Optional: Output Categorized

%pip install transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Tokenizer & Model

tokenizer = AutoTokenizer.from_pretrained(
    "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
)

model = AutoModelForSequenceClassification.from_pretrained(
    "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
)

tokenizer_config.json: 100%|██████████| 333/333 [00:00<00:00, 748kB/s]
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.07MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 817kB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 1.46MB/s]
special_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 682kB/s]
config.json: 100%|██████████| 933/933 [00:00<00:00, 2.42MB/s]
model.safetensors: 100%|██████████| 328M/328M [00:17<00:00, 18.7MB/s]

Pipeline

nlp = pipeline(
    "sentiment-analysis", 
    model=model, 
    tokenizer=tokenizer
)

Inference

text = """XYZ Corporation's stock soared by 20% after reporting 
record-breaking annual profits and announcing a significant dividend 
increase for shareholders.
"""

sentiment = nlp(text)
print(sentiment)

[{'label': 'positive', 'score': 0.9996968507766724}]

text = """Yesterday, the local botanical garden unveiled a rare 
collection of orchids, attracting a large number of nature enthusiasts 
and photographers excited to witness the unique floral display.
"""

sentiment = nlp(text)
print(sentiment)

[{'label': 'neutral', 'score': 0.9332515597343445}]

Output Categorized

Binary Output
Three Category
5 Stars

Binary Output

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("siebert/sentiment-roberta-large-english")
model = AutoModelForSequenceClassification.from_pretrained("siebert/sentiment-roberta-large-english")

nlp = pipeline(
    "sentiment-analysis", 
    model=model, 
    tokenizer=tokenizer
)

text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This product is not good, neither bad."

for text in [text_good, text_bad, text_neutral]:
    sentiment = nlp(text)
    print(sentiment)

tokenizer_config.json: 100%|██████████| 256/256 [00:00<00:00, 695kB/s]
config.json: 100%|██████████| 687/687 [00:00<00:00, 2.09MB/s]
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.08MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 32.9MB/s]
special_tokens_map.json: 100%|██████████| 150/150 [00:00<00:00, 425kB/s]
pytorch_model.bin: 100%|██████████| 1.42G/1.42G [01:24<00:00, 16.9MB/s]


[{'label': 'POSITIVE', 'score': 0.9988435506820679}]
[{'label': 'NEGATIVE', 'score': 0.9995039701461792}]
[{'label': 'NEGATIVE', 'score': 0.999470055103302}]

Three Category

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")
model = AutoModelForSequenceClassification.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")

nlp = pipeline(
    "sentiment-analysis", 
    model=model, 
    tokenizer=tokenizer
)

text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This is just a neutral question"

for text in [text_good, text_bad, text_neutral]:
    sentiment = nlp(text)
    print(sentiment)

tokenizer_config.json: 100%|██████████| 373/373 [00:00<00:00, 967kB/s]
vocab.txt: 100%|██████████| 996k/996k [00:00<00:00, 29.3MB/s]
tokenizer.json: 100%|██████████| 2.92M/2.92M [00:01<00:00, 2.53MB/s]
special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 396kB/s]
config.json: 100%|██████████| 759/759 [00:00<00:00, 2.12MB/s]
model.safetensors: 100%|██████████| 541M/541M [00:04<00:00, 115MB/s] 


[{'label': 'positive', 'score': 0.9819211363792419}]
[{'label': 'negative', 'score': 0.9518771767616272}]
[{'label': 'neutral', 'score': 0.5191274285316467}]

5 Stars

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")

nlp = pipeline(
    "sentiment-analysis", 
    model=model, 
    tokenizer=tokenizer
)

text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This is just a neutral question"

for text in [text_good, text_bad, text_neutral]:
    sentiment = nlp(text)
    print(sentiment)

tokenizer_config.json: 100%|██████████| 39.0/39.0 [00:00<00:00, 35.2kB/s]
config.json: 100%|██████████| 953/953 [00:00<00:00, 2.63MB/s]
vocab.txt: 100%|██████████| 872k/872k [00:00<00:00, 4.55MB/s]
special_tokens_map.json: 100%|██████████| 112/112 [00:00<00:00, 354kB/s]
pytorch_model.bin: 100%|██████████| 669M/669M [00:06<00:00, 110MB/s]  


[{'label': '4 stars', 'score': 0.4784605801105499}]
[{'label': '1 star', 'score': 0.7462039589881897}]
[{'label': '1 star', 'score': 0.3701825737953186}]

PreviousNLP NextZero-shot Classification

Last updated 1 year ago

/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm

tokenizer = AutoTokenizer.from_pretrained( "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis" ) model = AutoModelForSequenceClassification.from_pretrained( "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis" )

tokenizer_config.json: 100%|██████████| 333/333 [00:00<00:00, 748kB/s] vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.07MB/s] merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 817kB/s] tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 1.46MB/s] special_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 682kB/s] config.json: 100%|██████████| 933/933 [00:00<00:00, 2.42MB/s] model.safetensors: 100%|██████████| 328M/328M [00:17<00:00, 18.7MB/s]

text = """Yesterday, the local botanical garden unveiled a rare collection of orchids, attracting a large number of nature enthusiasts and photographers excited to witness the unique floral display. """

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("siebert/sentiment-roberta-large-english") model = AutoModelForSequenceClassification.from_pretrained("siebert/sentiment-roberta-large-english") nlp = pipeline( "sentiment-analysis", model=model, tokenizer=tokenizer ) text_good = "This product is good." text_bad = "This product is bad." text_neutral = "This product is not good, neither bad." for text in [text_good, text_bad, text_neutral]: sentiment = nlp(text) print(sentiment)

tokenizer_config.json: 100%|██████████| 256/256 [00:00<00:00, 695kB/s] config.json: 100%|██████████| 687/687 [00:00<00:00, 2.09MB/s] vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.08MB/s] merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 32.9MB/s] special_tokens_map.json: 100%|██████████| 150/150 [00:00<00:00, 425kB/s] pytorch_model.bin: 100%|██████████| 1.42G/1.42G [01:24<00:00, 16.9MB/s] [{'label': 'POSITIVE', 'score': 0.9988435506820679}] [{'label': 'NEGATIVE', 'score': 0.9995039701461792}] [{'label': 'NEGATIVE', 'score': 0.999470055103302}]

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student") model = AutoModelForSequenceClassification.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student") nlp = pipeline( "sentiment-analysis", model=model, tokenizer=tokenizer ) text_good = "This product is good." text_bad = "This product is bad." text_neutral = "This is just a neutral question" for text in [text_good, text_bad, text_neutral]: sentiment = nlp(text) print(sentiment)

tokenizer_config.json: 100%|██████████| 373/373 [00:00<00:00, 967kB/s] vocab.txt: 100%|██████████| 996k/996k [00:00<00:00, 29.3MB/s] tokenizer.json: 100%|██████████| 2.92M/2.92M [00:01<00:00, 2.53MB/s] special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 396kB/s] config.json: 100%|██████████| 759/759 [00:00<00:00, 2.12MB/s] model.safetensors: 100%|██████████| 541M/541M [00:04<00:00, 115MB/s] [{'label': 'positive', 'score': 0.9819211363792419}] [{'label': 'negative', 'score': 0.9518771767616272}] [{'label': 'neutral', 'score': 0.5191274285316467}]

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment") model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment") nlp = pipeline( "sentiment-analysis", model=model, tokenizer=tokenizer ) text_good = "This product is good." text_bad = "This product is bad." text_neutral = "This is just a neutral question" for text in [text_good, text_bad, text_neutral]: sentiment = nlp(text) print(sentiment)

tokenizer_config.json: 100%|██████████| 39.0/39.0 [00:00<00:00, 35.2kB/s] config.json: 100%|██████████| 953/953 [00:00<00:00, 2.63MB/s] vocab.txt: 100%|██████████| 872k/872k [00:00<00:00, 4.55MB/s] special_tokens_map.json: 100%|██████████| 112/112 [00:00<00:00, 354kB/s] pytorch_model.bin: 100%|██████████| 669M/669M [00:06<00:00, 110MB/s] [{'label': '4 stars', 'score': 0.4784605801105499}] [{'label': '1 star', 'score': 0.7462039589881897}] [{'label': '1 star', 'score': 0.3701825737953186}]