Advanced RAG

Advanced RAG with LlamaIndex, Weaviate

LlamaIndex와 Weaviate를 사용한 Advanced RAG 파이프라인을 안내합니다.

Prerequisites

#%pip5 install -U weaviate-client
#%pip install llama-index-vector-stores-weaviate

import llama_index
import weaviate
from importlib.metadata import version

import os
from dotenv import load_dotenv,find_dotenv

!echo "OPENAI_API_KEY=<Your OpenAI Key>" >> .env # 최초 한번만 설정

load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

True

Embedding Model and LLM

먼저 글로벌 설정 개체에서 임베딩 모델과 LLM을 정의할 수 있습니다. 이렇게 하면 코드에서 모델을 다시 명시적으로 지정할 필요가 없습니다.

Embedding model: 쿼리뿐만 아니라 문서 청크에 대한 벡터 임베딩을 생성하는 데 사용됩니다.
LLM: 사용자 쿼리와 관련 컨텍스트를 기반으로 답변을 생성하는 데 사용됩니다.
Weaviate에서도 임베딩 모델(벡터화 모듈)과 LLM(생성 모듈)을 지정할 수 있지만 이 경우에는 Llamaindex에 정의된 LLM과 임베딩 모델이 사용됩니다.

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
Settings.embed_model = OpenAIEmbedding()

Load data

!mkdir -p 'data'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham_essay.txt'

from llama_index.core import SimpleDirectoryReader

# Load data
documents = SimpleDirectoryReader(
        input_files=["./data/paul_graham_essay.txt"]
).load_data()

#documents

Step 3: Chunk to Nodes

전체 문서가 너무 커서 LLM의 컨텍스트 창에 맞지 않으므로 이를 작은 텍스트 덩어리로 분할해야 하며, 이를 LlamaIndex에서 nodes라고 합니다.
SentenceWindowNodeParser를 사용하면 각 문장은 메타데이터로 원래 문장을 둘러싼 더 큰 텍스트 창과 함께 청크로 저장됩니다.

from llama_index.core.node_parser import SentenceWindowNodeParser

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# Extract nodes from documents
nodes = node_parser.get_nodes_from_documents(documents)

# This block of code is for educational purposes 
# to showcase what the nodes looks like
i=10

print(f"Text: \n{nodes[i].text}")
print("------------------")
print(f"Window: \n{nodes[i].metadata['window']}")

Text: 
So this is not about whether it's ok to kill killers. 
------------------
Window: 
Defendants' lawyers are often incompetent.  And prosecutors are often motivated more by publicity than justice.  
  
 In the real world, [about 4%](http://time.com/79572/more-innocent-people-on-death-row-than-estimated-study/) of people sentenced to death are innocent.  So this is not about whether it's ok to kill killers.  This is about whether it's ok to kill innocent people.  
  
 A child could answer that one for you.  
  
 This year, in California, you have a chance to end this, by voting yes on Proposition 62.

Building index

모든 외부 지식을 Weaviate 벡터 데이터베이스에 저장하는 인덱스를 구축합니다.
먼저 Weaviate 인스턴스에 연결해야 합니다. 여기서는 Weaviate Embedded를 사용합니다.
임베디드 인스턴스는 부모 애플리케이션이 실행되는 한 계속 유지된다는 점에 유의하세요. 보다 영구적인 솔루션을 원한다면 관리형 Weaviate 클라우드 서비스(WCS) 인스턴스를 사용하는 것이 좋습니다. 여기에서 14일 동안 무료로 사용해 볼 수 있습니다여기.*

import weaviate

# Connect to your Weaviate instance
client = weaviate.Client(
    embedded_options=weaviate.embedded.EmbeddedOptions(), 
)

print(f"Client is ready: {client.is_ready()}")

# Print this line to get more information about the client
# client.get_meta()

데이터를 저장하고 상호 작용할 Weaviate 클라이언트에서 VectorStoreIndex를 빌드합니다.

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.weaviate import WeaviateVectorStore

index_name = "MyExternalContext"

# Construct vector store
vector_store = WeaviateVectorStore(
    weaviate_client = client, 
    index_name = index_name
)

# Set up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# If an index with the same index name already exists within Weaviate, delete it
if client.schema.exists(index_name):
    client.schema.delete_class(index_name)

# Setup the index
# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex(
    nodes,
    storage_context = storage_context,
)

import json
response = client.schema.get(index_name)

print(json.dumps(response, indent=2))

{
  "class": "MyExternalContext",
  "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanupIntervalSeconds": 60,
    "stopwords": {
      "additions": null,
      "preset": "en",
      "removals": null
    }
  },
  "multiTenancyConfig": {
    "enabled": false
  },
  "properties": [
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "_node_content",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_path",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_type",
      "tokenization": "word"
    },
    {
      "dataType": [
        "uuid"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": false,
      "name": "doc_id"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "text",
      "tokenization": "word"
    },
    {
      "dataType": [
        "number"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": false,
      "name": "file_size"
    },
    {
      "dataType": [
        "uuid"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": false,
      "name": "document_id"
    },
    {
      "dataType": [
        "uuid"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": false,
      "name": "ref_doc_id"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "file_name",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "_node_type",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "last_accessed_date",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "creation_date",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:39 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "last_modified_date",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:48 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "original_text",
      "tokenization": "word"
    },
    {
      "dataType": [
        "text"
      ],
      "description": "This property was generated by Weaviate's auto-schema feature on Wed Feb 14 13:46:48 2024",
      "indexFilterable": true,
      "indexSearchable": true,
      "name": "window",
      "tokenization": "word"
    }
  ],
  "replicationConfig": {
    "factor": 1
  },
  "shardingConfig": {
    "virtualPerPhysical": 128,
    "desiredCount": 1,
    "actualCount": 1,
    "desiredVirtualCount": 128,
    "actualVirtualCount": 128,
    "key": "_id",
    "strategy": "hash",
    "function": "murmur3"
  },
  "vectorIndexConfig": {
    "skip": false,
    "cleanupIntervalSeconds": 300,
    "maxConnections": 64,
    "efConstruction": 128,
    "ef": -1,
    "dynamicEfMin": 100,
    "dynamicEfMax": 500,
    "dynamicEfFactor": 8,
    "vectorCacheMaxObjects": 1000000000000,
    "flatSearchCutoff": 40000,
    "distance": "cosine",
    "pq": {
      "enabled": false,
      "bitCompression": false,
      "segments": 0,
      "centroids": 256,
      "trainingLimit": 100000,
      "encoder": {
        "type": "kmeans",
        "distribution": "log-normal"
      }
    }
  },
  "vectorIndexType": "hnsw",
  "vectorizer": "none"
}

Query Engine

마지막으로 인덱스를 쿼리 엔진으로 설정합니다.

Build Metadata Replacement Post Processor

Advanced RAG에서는 문장 창 검색 메서드의 일부로 MetadataReplacementPostProcessor를 사용하여 각 노드의 문장을 주변 컨텍스트로 대체할 수 있습니다.

from llama_index.core.postprocessor import MetadataReplacementPostProcessor

# The target key defaults to `window` to match the node_parser's default
postproc = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

# This block of code is for educational purposes 
# to showcase how the MetadataReplacementPostProcessor works
#from llama_index.core.schema import NodeWithScore
#from copy import deepcopy

#scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]
#nodes_old = [deepcopy(n) for n in nodes]
#replaced_nodes = postproc.postprocess_nodes(scored_nodes)

#print(f"Retrieved sentece: {nodes_old[i].text}")
#print("------------------")
#print(f"Replaced window: {replaced_nodes[i].text}")

Add Re-ranker

Advanced RAG의 경우 검색된 컨텍스트의 쿼리와의 관련성에 따라 다시 순위를 매기는 re-ranker를 추가할 수도 있습니다. similarity_top_k를 더 많이 검색해야 하며, 이는 top_n으로 줄어든다는 점에 유의하세요.

from llama_index.core.postprocessor import SentenceTransformerRerank

# BAAI/bge-reranker-base
# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
    top_n = 2, 
    model = "BAAI/bge-reranker-base"
)

마지막으로 모든 구성 요소를 쿼리 엔진에 통합할 수 있습니다!

또한, 의미 기반 검색과 키워드 기반 검색 간의 가중치를 제어하기 위해 추가 알파 파라미터를 사용하여 하이브리드 검색을 활성화하기 위해 vector_store_query_mode를 "hybrid"로 설정합니다.

# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine(
    similarity_top_k = 6, 
    vector_store_query_mode="hybrid", 
    alpha=0.5,
    node_postprocessors = [postproc, rerank],
)

Run Advanced RAG Query

# Use your Default RAG
response = query_engine.query(
    "What happened at Interleaf?"
)
print(str(response))

Interleaf는 Emacs에서 영감을 받아 소프트웨어에 스크립팅 언어를 추가하고 이 스크립팅 언어를 Lisp의 방언으로 만들었습니다.

window = response.source_nodes[0].node.metadata["window"]
sentence = response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

Window: 당시에는 깨닫지 못했지만 Viaweb을 운영하는 데 따른 노력과 스트레스로 지쳐가고 있었어요.  캘리포니아에 도착한 후 한동안은 새벽 3시까지 프로그래밍을 계속하는 평소 방식을 유지하려고 했지만, 피로가 야후의 오래된 문화와 산타클라라의 음침한 큐브 농장과 결합하면서 점차 지쳐갔습니다.  몇 달이 지나자 당황스러울 정도로 인터리프에서 일하는 것이 싫게 느껴졌습니다.

 야후는 저희를 인수할 때 많은 옵션을 제시했습니다.  당시에는 야후가 너무 고평가되어 가치가 없을 거라고 생각했는데 놀랍게도 1년 만에 주가가 5배나 올랐죠.
------------------
원문 문장: 몇 달이 지나자 당황스러울 정도로 인터리프에서 일하는 기분이 들었습니다.

PreviousNaive RAG NextLlama3-8B with LlamaIndex

Last updated 1 year ago