AI-Master-Book
  • about AI-Master-Book
  • AI Master Book
    • 이상치 탐지 with Python
    • 베이지안 뉴럴네트워크 (BNN) with Python
    • 그래프 뉴럴네트워크 (GNN) with Python
    • 데이터 마케팅 분석 with Python
  • LLM MASTER BOOK
    • OpenAI API 쿡북 with Python
    • 기초부터 심화까지 RAG 쿡북 with Python
    • MCP 에이전트 쿡북 with Python
  • LLMs
    • OpenAI API
      • 1️⃣ChatCompletion
      • 2️⃣DALL-E
      • 3️⃣Text to Speech
      • 4️⃣Video to Transcripts
      • 5️⃣Assistants API
      • 6️⃣Prompt Engineering
      • 7️⃣OpenAI New GPT-4o
    • LangChain
      • LangChain Basic
        • 1️⃣Basic Modules
        • 2️⃣Model I/O
        • 3️⃣Prompts
        • 4️⃣Chains
        • 5️⃣Agents
        • 6️⃣Tools
        • 7️⃣Memory
      • LangChain Intermediate
        • 1️⃣OpenAI LLM
        • 2️⃣Prompt Template
        • 3️⃣Retrieval
        • 4️⃣RAG ChatBot
        • 5️⃣RAG with Gemini
        • 6️⃣New Huggingface-LangChain
        • 7️⃣Huggingface Hub
        • 8️⃣SQL Agent & Chain
        • 9️⃣Expression Language(LCEL)
        • 🔟Llama3-8B with LangChain
      • LangChain Advanced
        • 1️⃣LLM Evaluation
        • 2️⃣RAG Evaluation with RAGAS
        • 3️⃣LangChain with RAGAS
        • 4️⃣RAG Paradigms
        • 5️⃣LangChain: Advance Techniques
        • 6️⃣LangChain with NeMo-Guardrails
        • 7️⃣LangChain vs. LlamaIndex
        • 8️⃣LangChain LCEL vs. LangGraph
    • LlamaIndex
      • LlamaIndex Basic
        • 1️⃣Introduction
        • 2️⃣Customization
        • 3️⃣Data Connectors
        • 4️⃣Documents & Nodes
        • 5️⃣Naive RAG
        • 6️⃣Advanced RAG
        • 7️⃣Llama3-8B with LlamaIndex
        • 8️⃣LlmaPack
      • LlamaIndex Intermediate
        • 1️⃣QueryEngine
        • 2️⃣Agent
        • 3️⃣Evaluation
        • 4️⃣Evaluation-Driven Development
        • 5️⃣Fine-tuning
        • 6️⃣Prompt Compression with LLMLingua
      • LlamaIndex Advanced
        • 1️⃣Agentic RAG: Router Engine
        • 2️⃣Agentic RAG: Tool Calling
        • 3️⃣Building Agent Reasoning Loop
        • 4️⃣Building Multi-document Agent
    • Hugging Face
      • Huggingface Basic
        • 1️⃣Datasets
        • 2️⃣Tokenizer
        • 3️⃣Sentence Embeddings
        • 4️⃣Transformers
        • 5️⃣Sentence Transformers
        • 6️⃣Evaluate
        • 7️⃣Diffusers
      • Huggingface Tasks
        • NLP
          • 1️⃣Sentiment Analysis
          • 2️⃣Zero-shot Classification
          • 3️⃣Aspect-Based Sentiment Analysis
          • 4️⃣Feature Extraction
          • 5️⃣Intent Classification
          • 6️⃣Topic Modeling: BERTopic
          • 7️⃣NER: Token Classification
          • 8️⃣Summarization
          • 9️⃣Translation
          • 🔟Text Generation
        • Audio & Tabular
          • 1️⃣Text-to-Speech: TTS
          • 2️⃣Speech Recognition: Whisper
          • 3️⃣Audio Classification
          • 4️⃣Tabular Qustaion & Answering
        • Vision & Multimodal
          • 1️⃣Image-to-Text
          • 2️⃣Text to Image
          • 3️⃣Image to Image
          • 4️⃣Text or Image-to-Video
          • 5️⃣Depth Estimation
          • 6️⃣Image Classification
          • 7️⃣Object Detection
          • 8️⃣Segmentatio
      • Huggingface Optimization
        • 1️⃣Accelerator
        • 2️⃣Bitsandbytes
        • 3️⃣Flash Attention
        • 4️⃣Quantization
        • 5️⃣Safetensors
        • 6️⃣Optimum-ONNX
        • 7️⃣Optimum-NVIDIA
        • 8️⃣Optimum-Intel
      • Huggingface Fine-tuning
        • 1️⃣Transformer Fine-tuning
        • 2️⃣PEFT Fine-tuning
        • 3️⃣PEFT: Fine-tuning with QLoRA
        • 4️⃣PEFT: Fine-tuning Phi-2 with QLoRA
        • 5️⃣Axoltl Fine-tuning with QLoRA
        • 6️⃣TRL: RLHF Alignment Fine-tuning
        • 7️⃣TRL: DPO Fine-tuning with Phi-3-4k-instruct
        • 8️⃣TRL: ORPO Fine-tuning with Llama3-8B
        • 9️⃣Convert GGUF gemma-2b with llama.cpp
        • 🔟Apple Silicon Fine-tuning Gemma-2B with MLX
        • 🔢LLM Mergekit
    • Agentic LLM
      • Agentic LLM
        • 1️⃣Basic Agentic LLM
        • 2️⃣Multi-agent with CrewAI
        • 3️⃣LangGraph: Multi-agent Basic
        • 4️⃣LangGraph: Agentic RAG with LangChain
        • 5️⃣LangGraph: Agentic RAG with Llama3-8B by Groq
      • Autonomous Agent
        • 1️⃣LLM Autonomous Agent?
        • 2️⃣AutoGPT: Worldcup Winner Search with LangChain
        • 3️⃣BabyAGI: Weather Report with LangChain
        • 4️⃣AutoGen: Writing Blog Post with LangChain
        • 5️⃣LangChain: Autonomous-agent Debates with Tools
        • 6️⃣CAMEL Role-playing Autonomous Cooperative Agents
        • 7️⃣LangChain: Two-player Harry Potter D&D based CAMEL
        • 8️⃣LangChain: Multi-agent Bid for K-Pop Debate
        • 9️⃣LangChain: Multi-agent Authoritarian Speaker Selection
        • 🔟LangChain: Multi-Agent Simulated Environment with PettingZoo
    • Multimodal
      • 1️⃣PaliGemma: Open Vision LLM
      • 2️⃣FLUX.1: Generative Image
    • Building LLM
      • 1️⃣DSPy
      • 2️⃣DSPy RAG
      • 3️⃣DSPy with LangChain
      • 4️⃣Mamba
      • 5️⃣Mamba RAG with LangChain
      • 7️⃣PostgreSQL VectorDB with pgvorco.rs
Powered by GitBook
On this page
  • What is Sentiment Analysis?
  • Hugging Face: Sentiment Analysis
  • Tokenizer & Model
  • Pipeline
  • Inference
  • Output Categorized
  1. LLMs
  2. Hugging Face
  3. Huggingface Tasks
  4. NLP

Sentiment Analysis

What is Sentiment Analysis?

감성 분석은 주어진 텍스트의 극성을 식별하는 자연어 처리 기법입니다. 감성 분석에는 다양한 종류가 있지만 가장 널리 사용되는 기법 중 하나는 데이터를 긍정, 부정, 중립으로 분류하는 것입니다.

예를 들어 감성 분석은 다양한 애플리케이션에서 사용됩니다:

  • 소셜 미디어 언급을 분석하여 사람들이 내 브랜드와 경쟁사에 대해 어떻게 이야기하는지 파악합니다.

  • 설문조사와 제품 리뷰의 피드백을 분석하여 고객이 제품에 대해 좋아하는 것과 싫어하는 것에 대한 인사이트를 빠르게 얻을 수 있습니다.

  • 실시간으로 수신되는 지원 티켓을 분석하여 불만 고객을 감지하고 그에 따른 조치를 취하여 고객 이탈을 방지하세요.

Hugging Face: Sentiment Analysis

  1. Tokenizer & Model from Huggingface

  2. Pipeline

  3. Inference

  4. Optional: Output Categorized

%pip install transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Tokenizer & Model

tokenizer = AutoTokenizer.from_pretrained(
    "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
)

model = AutoModelForSequenceClassification.from_pretrained(
    "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"
)
tokenizer_config.json: 100%|██████████| 333/333 [00:00<00:00, 748kB/s]
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.07MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 817kB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 1.46MB/s]
special_tokens_map.json: 100%|██████████| 239/239 [00:00<00:00, 682kB/s]
config.json: 100%|██████████| 933/933 [00:00<00:00, 2.42MB/s]
model.safetensors: 100%|██████████| 328M/328M [00:17<00:00, 18.7MB/s] 

Pipeline

nlp = pipeline(
    "sentiment-analysis", 
    model=model, 
    tokenizer=tokenizer
)

Inference

text = """XYZ Corporation's stock soared by 20% after reporting 
record-breaking annual profits and announcing a significant dividend 
increase for shareholders.
"""
sentiment = nlp(text)
print(sentiment)
[{'label': 'positive', 'score': 0.9996968507766724}]
text = """Yesterday, the local botanical garden unveiled a rare 
collection of orchids, attracting a large number of nature enthusiasts 
and photographers excited to witness the unique floral display.
"""
sentiment = nlp(text)
print(sentiment)
[{'label': 'neutral', 'score': 0.9332515597343445}]

Output Categorized

  1. Binary Output

  2. Three Category

  3. 5 Stars

Binary Output

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("siebert/sentiment-roberta-large-english")
model = AutoModelForSequenceClassification.from_pretrained("siebert/sentiment-roberta-large-english")

nlp = pipeline(
    "sentiment-analysis", 
    model=model, 
    tokenizer=tokenizer
)

text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This product is not good, neither bad."

for text in [text_good, text_bad, text_neutral]:
    sentiment = nlp(text)
    print(sentiment)
tokenizer_config.json: 100%|██████████| 256/256 [00:00<00:00, 695kB/s]
config.json: 100%|██████████| 687/687 [00:00<00:00, 2.09MB/s]
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 1.08MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 32.9MB/s]
special_tokens_map.json: 100%|██████████| 150/150 [00:00<00:00, 425kB/s]
pytorch_model.bin: 100%|██████████| 1.42G/1.42G [01:24<00:00, 16.9MB/s]


[{'label': 'POSITIVE', 'score': 0.9988435506820679}]
[{'label': 'NEGATIVE', 'score': 0.9995039701461792}]
[{'label': 'NEGATIVE', 'score': 0.999470055103302}]

Three Category

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")
model = AutoModelForSequenceClassification.from_pretrained("lxyuan/distilbert-base-multilingual-cased-sentiments-student")

nlp = pipeline(
    "sentiment-analysis", 
    model=model, 
    tokenizer=tokenizer
)

text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This is just a neutral question"

for text in [text_good, text_bad, text_neutral]:
    sentiment = nlp(text)
    print(sentiment)
tokenizer_config.json: 100%|██████████| 373/373 [00:00<00:00, 967kB/s]
vocab.txt: 100%|██████████| 996k/996k [00:00<00:00, 29.3MB/s]
tokenizer.json: 100%|██████████| 2.92M/2.92M [00:01<00:00, 2.53MB/s]
special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 396kB/s]
config.json: 100%|██████████| 759/759 [00:00<00:00, 2.12MB/s]
model.safetensors: 100%|██████████| 541M/541M [00:04<00:00, 115MB/s] 


[{'label': 'positive', 'score': 0.9819211363792419}]
[{'label': 'negative', 'score': 0.9518771767616272}]
[{'label': 'neutral', 'score': 0.5191274285316467}]

5 Stars

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("nlptown/bert-base-multilingual-uncased-sentiment")

nlp = pipeline(
    "sentiment-analysis", 
    model=model, 
    tokenizer=tokenizer
)

text_good = "This product is good."
text_bad = "This product is bad."
text_neutral = "This is just a neutral question"

for text in [text_good, text_bad, text_neutral]:
    sentiment = nlp(text)
    print(sentiment)
tokenizer_config.json: 100%|██████████| 39.0/39.0 [00:00<00:00, 35.2kB/s]
config.json: 100%|██████████| 953/953 [00:00<00:00, 2.63MB/s]
vocab.txt: 100%|██████████| 872k/872k [00:00<00:00, 4.55MB/s]
special_tokens_map.json: 100%|██████████| 112/112 [00:00<00:00, 354kB/s]
pytorch_model.bin: 100%|██████████| 669M/669M [00:06<00:00, 110MB/s]  


[{'label': '4 stars', 'score': 0.4784605801105499}]
[{'label': '1 star', 'score': 0.7462039589881897}]
[{'label': '1 star', 'score': 0.3701825737953186}]
PreviousNLPNextZero-shot Classification

Last updated 1 year ago

1️⃣