AI-Master-Book
  • about AI-Master-Book
  • AI Master Book
    • 이상치 탐지 with Python
    • 베이지안 뉴럴네트워크 (BNN) with Python
    • 그래프 뉴럴네트워크 (GNN) with Python
    • 데이터 마케팅 분석 with Python
  • LLM MASTER BOOK
    • OpenAI API 쿡북 with Python
    • 기초부터 심화까지 RAG 쿡북 with Python
    • MCP 에이전트 쿡북 with Python
  • LLMs
    • OpenAI API
      • 1️⃣ChatCompletion
      • 2️⃣DALL-E
      • 3️⃣Text to Speech
      • 4️⃣Video to Transcripts
      • 5️⃣Assistants API
      • 6️⃣Prompt Engineering
      • 7️⃣OpenAI New GPT-4o
    • LangChain
      • LangChain Basic
        • 1️⃣Basic Modules
        • 2️⃣Model I/O
        • 3️⃣Prompts
        • 4️⃣Chains
        • 5️⃣Agents
        • 6️⃣Tools
        • 7️⃣Memory
      • LangChain Intermediate
        • 1️⃣OpenAI LLM
        • 2️⃣Prompt Template
        • 3️⃣Retrieval
        • 4️⃣RAG ChatBot
        • 5️⃣RAG with Gemini
        • 6️⃣New Huggingface-LangChain
        • 7️⃣Huggingface Hub
        • 8️⃣SQL Agent & Chain
        • 9️⃣Expression Language(LCEL)
        • 🔟Llama3-8B with LangChain
      • LangChain Advanced
        • 1️⃣LLM Evaluation
        • 2️⃣RAG Evaluation with RAGAS
        • 3️⃣LangChain with RAGAS
        • 4️⃣RAG Paradigms
        • 5️⃣LangChain: Advance Techniques
        • 6️⃣LangChain with NeMo-Guardrails
        • 7️⃣LangChain vs. LlamaIndex
        • 8️⃣LangChain LCEL vs. LangGraph
    • LlamaIndex
      • LlamaIndex Basic
        • 1️⃣Introduction
        • 2️⃣Customization
        • 3️⃣Data Connectors
        • 4️⃣Documents & Nodes
        • 5️⃣Naive RAG
        • 6️⃣Advanced RAG
        • 7️⃣Llama3-8B with LlamaIndex
        • 8️⃣LlmaPack
      • LlamaIndex Intermediate
        • 1️⃣QueryEngine
        • 2️⃣Agent
        • 3️⃣Evaluation
        • 4️⃣Evaluation-Driven Development
        • 5️⃣Fine-tuning
        • 6️⃣Prompt Compression with LLMLingua
      • LlamaIndex Advanced
        • 1️⃣Agentic RAG: Router Engine
        • 2️⃣Agentic RAG: Tool Calling
        • 3️⃣Building Agent Reasoning Loop
        • 4️⃣Building Multi-document Agent
    • Hugging Face
      • Huggingface Basic
        • 1️⃣Datasets
        • 2️⃣Tokenizer
        • 3️⃣Sentence Embeddings
        • 4️⃣Transformers
        • 5️⃣Sentence Transformers
        • 6️⃣Evaluate
        • 7️⃣Diffusers
      • Huggingface Tasks
        • NLP
          • 1️⃣Sentiment Analysis
          • 2️⃣Zero-shot Classification
          • 3️⃣Aspect-Based Sentiment Analysis
          • 4️⃣Feature Extraction
          • 5️⃣Intent Classification
          • 6️⃣Topic Modeling: BERTopic
          • 7️⃣NER: Token Classification
          • 8️⃣Summarization
          • 9️⃣Translation
          • 🔟Text Generation
        • Audio & Tabular
          • 1️⃣Text-to-Speech: TTS
          • 2️⃣Speech Recognition: Whisper
          • 3️⃣Audio Classification
          • 4️⃣Tabular Qustaion & Answering
        • Vision & Multimodal
          • 1️⃣Image-to-Text
          • 2️⃣Text to Image
          • 3️⃣Image to Image
          • 4️⃣Text or Image-to-Video
          • 5️⃣Depth Estimation
          • 6️⃣Image Classification
          • 7️⃣Object Detection
          • 8️⃣Segmentatio
      • Huggingface Optimization
        • 1️⃣Accelerator
        • 2️⃣Bitsandbytes
        • 3️⃣Flash Attention
        • 4️⃣Quantization
        • 5️⃣Safetensors
        • 6️⃣Optimum-ONNX
        • 7️⃣Optimum-NVIDIA
        • 8️⃣Optimum-Intel
      • Huggingface Fine-tuning
        • 1️⃣Transformer Fine-tuning
        • 2️⃣PEFT Fine-tuning
        • 3️⃣PEFT: Fine-tuning with QLoRA
        • 4️⃣PEFT: Fine-tuning Phi-2 with QLoRA
        • 5️⃣Axoltl Fine-tuning with QLoRA
        • 6️⃣TRL: RLHF Alignment Fine-tuning
        • 7️⃣TRL: DPO Fine-tuning with Phi-3-4k-instruct
        • 8️⃣TRL: ORPO Fine-tuning with Llama3-8B
        • 9️⃣Convert GGUF gemma-2b with llama.cpp
        • 🔟Apple Silicon Fine-tuning Gemma-2B with MLX
        • 🔢LLM Mergekit
    • Agentic LLM
      • Agentic LLM
        • 1️⃣Basic Agentic LLM
        • 2️⃣Multi-agent with CrewAI
        • 3️⃣LangGraph: Multi-agent Basic
        • 4️⃣LangGraph: Agentic RAG with LangChain
        • 5️⃣LangGraph: Agentic RAG with Llama3-8B by Groq
      • Autonomous Agent
        • 1️⃣LLM Autonomous Agent?
        • 2️⃣AutoGPT: Worldcup Winner Search with LangChain
        • 3️⃣BabyAGI: Weather Report with LangChain
        • 4️⃣AutoGen: Writing Blog Post with LangChain
        • 5️⃣LangChain: Autonomous-agent Debates with Tools
        • 6️⃣CAMEL Role-playing Autonomous Cooperative Agents
        • 7️⃣LangChain: Two-player Harry Potter D&D based CAMEL
        • 8️⃣LangChain: Multi-agent Bid for K-Pop Debate
        • 9️⃣LangChain: Multi-agent Authoritarian Speaker Selection
        • 🔟LangChain: Multi-Agent Simulated Environment with PettingZoo
    • Multimodal
      • 1️⃣PaliGemma: Open Vision LLM
      • 2️⃣FLUX.1: Generative Image
    • Building LLM
      • 1️⃣DSPy
      • 2️⃣DSPy RAG
      • 3️⃣DSPy with LangChain
      • 4️⃣Mamba
      • 5️⃣Mamba RAG with LangChain
      • 7️⃣PostgreSQL VectorDB with pgvorco.rs
Powered by GitBook
On this page
  • Advanced RAG with LlamaIndex, Weaviate
  • Prerequisites
  • Embedding Model and LLM
  • Load data
  • Step 3: Chunk to Nodes
  • Building index
  • Query Engine
  • Run Advanced RAG Query
  1. LLMs
  2. LlamaIndex
  3. LlamaIndex Basic

Advanced RAG

Advanced RAG with LlamaIndex, Weaviate

LlamaIndex와 Weaviate를 사용한 Advanced RAG 파이프라인을 안내합니다.

Prerequisites

#%pip5 install -U weaviate-client
#%pip install llama-index-vector-stores-weaviate

import llama_index
import weaviate
from importlib.metadata import version
import os
from dotenv import load_dotenv,find_dotenv

!echo "OPENAI_API_KEY=<Your OpenAI Key>" >> .env # 최초 한번만 설정

load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
True

Embedding Model and LLM

먼저 글로벌 설정 개체에서 임베딩 모델과 LLM을 정의할 수 있습니다. 이렇게 하면 코드에서 모델을 다시 명시적으로 지정할 필요가 없습니다.

  • Embedding model: 쿼리뿐만 아니라 문서 청크에 대한 벡터 임베딩을 생성하는 데 사용됩니다.

  • LLM: 사용자 쿼리와 관련 컨텍스트를 기반으로 답변을 생성하는 데 사용됩니다.

  • Weaviate에서도 임베딩 모델(벡터화 모듈)과 LLM(생성 모듈)을 지정할 수 있지만 이 경우에는 Llamaindex에 정의된 LLM과 임베딩 모델이 사용됩니다.

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding()

Load data

!mkdir -p 'data'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham_essay.txt'
from llama_index.core import SimpleDirectoryReader

# Load data
documents = SimpleDirectoryReader(
        input_files=["./data/paul_graham_essay.txt"]
).load_data()

#documents

Step 3: Chunk to Nodes

  • 전체 문서가 너무 커서 LLM의 컨텍스트 창에 맞지 않으므로 이를 작은 텍스트 덩어리로 분할해야 하며, 이를 LlamaIndex에서 nodes라고 합니다.

  • SentenceWindowNodeParser를 사용하면 각 문장은 메타데이터로 원래 문장을 둘러싼 더 큰 텍스트 창과 함께 청크로 저장됩니다.

from llama_index.core.node_parser import SentenceWindowNodeParser

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# Extract nodes from documents
nodes = node_parser.get_nodes_from_documents(documents)
# This block of code is for educational purposes 
# to showcase what the nodes looks like
i=10

print(f"Text: \n{nodes[i].text}")
print("------------------")
print(f"Window: \n{nodes[i].metadata['window']}")
Text: 
So this is not about whether it's ok to kill killers. 
------------------
Window: 
Defendants' lawyers are often incompetent.  And prosecutors are often motivated more by publicity than justice.  
  
 In the real world, [about 4%](http://time.com/79572/more-innocent-people-on-death-row-than-estimated-study/) of people sentenced to death are innocent.  So this is not about whether it's ok to kill killers.  This is about whether it's ok to kill innocent people.  
  
 A child could answer that one for you.  
  
 This year, in California, you have a chance to end this, by voting yes on Proposition 62. 

Building index

  • 모든 외부 지식을 Weaviate 벡터 데이터베이스에 저장하는 인덱스를 구축합니다.

  • 먼저 Weaviate 인스턴스에 연결해야 합니다. 여기서는 Weaviate Embedded를 사용합니다.

  • 임베디드 인스턴스는 부모 애플리케이션이 실행되는 한 계속 유지된다는 점에 유의하세요. 보다 영구적인 솔루션을 원한다면 관리형 Weaviate 클라우드 서비스(WCS) 인스턴스를 사용하는 것이 좋습니다. 여기에서 14일 동안 무료로 사용해 볼 수 있습니다여기.*

import weaviate

# Connect to your Weaviate instance
client = weaviate.Client(
    embedded_options=weaviate.embedded.EmbeddedOptions(), 
)

print(f"Client is ready: {client.is_ready()}")

# Print this line to get more information about the client
# client.get_meta()

데이터를 저장하고 상호 작용할 Weaviate 클라이언트에서 VectorStoreIndex를 빌드합니다.

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.weaviate import WeaviateVectorStore

index_name = "MyExternalContext"

# Construct vector store
vector_store = WeaviateVectorStore(
    weaviate_client = client, 
    index_name = index_name
)

# Set up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# If an index with the same index name already exists within Weaviate, delete it
if client.schema.exists(index_name):
    client.schema.delete_class(index_name)

# Setup the index
# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex(
    nodes,
    storage_context = storage_context,
)
import json
response = client.schema.get(index_name)

print(json.dumps(response, indent=2))
{
  "class": "MyExternalContext",
  "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanupIntervalSeconds": 60,
    "stopwords": {
      "additions": null,
      "preset": "en",
      "removals": null
    }
  },
  "multiTenancyConfig": {
    "enabled": false
  },
  "properties": [
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "_node_content",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_path",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_type",
      "tokenization": "word"
    },
    {
      "dataType": [
        "uuid"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": false,
      "name": "doc_id"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "text",
      "tokenization": "word"
    },
    {
      "dataType": [
        "number"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": false,
      "name": "file_size"
    },
    {
      "dataType": [
        "uuid"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": false,
      "name": "document_id"
    },
    {
      "dataType": [
        "uuid"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": false,
      "name": "ref_doc_id"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_name",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "_node_type",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "last_accessed_date",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "creation_date",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "last_modified_date",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:48 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "original_text",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:48 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "window",
      "tokenization": "word"
    }
  ],
  "replicationConfig": {
    "factor": 1
  },
  "shardingConfig": {
    "virtualPerPhysical": 128,
    "desiredCount": 1,
    "actualCount": 1,
    "desiredVirtualCount": 128,
    "actualVirtualCount": 128,
    "key": "_id",
    "strategy": "hash",
    "function": "murmur3"
  },
  "vectorIndexConfig": {
    "skip": false,
    "cleanupIntervalSeconds": 300,
    "maxConnections": 64,
    "efConstruction": 128,
    "ef": -1,
    "dynamicEfMin": 100,
    "dynamicEfMax": 500,
    "dynamicEfFactor": 8,
    "vectorCacheMaxObjects": 1000000000000,
    "flatSearchCutoff": 40000,
    "distance": "cosine",
    "pq": {
      "enabled": false,
      "bitCompression": false,
      "segments": 0,
      "centroids": 256,
      "trainingLimit": 100000,
      "encoder": {
        "type": "kmeans",
        "distribution": "log-normal"
      }
    }
  },
  "vectorIndexType": "hnsw",
  "vectorizer": "none"
}

Query Engine

마지막으로 인덱스를 쿼리 엔진으로 설정합니다.

Build Metadata Replacement Post Processor

Advanced RAG에서는 문장 창 검색 메서드의 일부로 MetadataReplacementPostProcessor를 사용하여 각 노드의 문장을 주변 컨텍스트로 대체할 수 있습니다.

from llama_index.core.postprocessor import MetadataReplacementPostProcessor

# The target key defaults to `window` to match the node_parser's default
postproc = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)
# This block of code is for educational purposes 
# to showcase how the MetadataReplacementPostProcessor works
#from llama_index.core.schema import NodeWithScore
#from copy import deepcopy

#scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]
#nodes_old = [deepcopy(n) for n in nodes]
#replaced_nodes = postproc.postprocess_nodes(scored_nodes)

#print(f"Retrieved sentece: {nodes_old[i].text}")
#print("------------------")
#print(f"Replaced window: {replaced_nodes[i].text}")

Add Re-ranker

Advanced RAG의 경우 검색된 컨텍스트의 쿼리와의 관련성에 따라 다시 순위를 매기는 re-ranker를 추가할 수도 있습니다. similarity_top_k를 더 많이 검색해야 하며, 이는 top_n으로 줄어든다는 점에 유의하세요.

from llama_index.core.postprocessor import SentenceTransformerRerank

# BAAI/bge-reranker-base
# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
    top_n = 2, 
    model = "BAAI/bge-reranker-base"
)

마지막으로 모든 구성 요소를 쿼리 엔진에 통합할 수 있습니다!

또한, 의미 기반 검색과 키워드 기반 검색 간의 가중치를 제어하기 위해 추가 알파 파라미터를 사용하여 하이브리드 검색을 활성화하기 위해 vector_store_query_mode를 "hybrid"로 설정합니다.

# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine(
    similarity_top_k = 6, 
    vector_store_query_mode="hybrid", 
    alpha=0.5,
    node_postprocessors = [postproc, rerank],
)

Run Advanced RAG Query

# Use your Default RAG
response = query_engine.query(
    "What happened at Interleaf?"
)
print(str(response))
Interleaf는 Emacs에서 영감을 받아 소프트웨어에 스크립팅 언어를 추가하고 이 스크립팅 언어를 Lisp의 방언으로 만들었습니다.
window = response.source_nodes[0].node.metadata["window"]
sentence = response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")
Window: 당시에는 깨닫지 못했지만 Viaweb을 운영하는 데 따른 노력과 스트레스로 지쳐가고 있었어요.  캘리포니아에 도착한 후 한동안은 새벽 3시까지 프로그래밍을 계속하는 평소 방식을 유지하려고 했지만, 피로가 야후의 오래된 문화와 산타클라라의 음침한 큐브 농장과 결합하면서 점차 지쳐갔습니다.  몇 달이 지나자 당황스러울 정도로 인터리프에서 일하는 것이 싫게 느껴졌습니다.

 야후는 저희를 인수할 때 많은 옵션을 제시했습니다.  당시에는 야후가 너무 고평가되어 가치가 없을 거라고 생각했는데 놀랍게도 1년 만에 주가가 5배나 올랐죠.
------------------
원문 문장: 몇 달이 지나자 당황스러울 정도로 인터리프에서 일하는 기분이 들었습니다.
PreviousNaive RAGNextLlama3-8B with LlamaIndex

Last updated 1 year ago

6️⃣