AI-Master-Book
  • about AI-Master-Book
  • AI Master Book
    • 이상치 탐지 with Python
    • 베이지안 뉴럴네트워크 (BNN) with Python
    • 그래프 뉴럴네트워크 (GNN) with Python
    • 데이터 마케팅 분석 with Python
  • LLM MASTER BOOK
    • OpenAI API 쿡북 with Python
    • 기초부터 심화까지 RAG 쿡북 with Python
    • MCP 에이전트 쿡북 with Python
  • LLMs
    • OpenAI API
      • 1️⃣ChatCompletion
      • 2️⃣DALL-E
      • 3️⃣Text to Speech
      • 4️⃣Video to Transcripts
      • 5️⃣Assistants API
      • 6️⃣Prompt Engineering
      • 7️⃣OpenAI New GPT-4o
    • LangChain
      • LangChain Basic
        • 1️⃣Basic Modules
        • 2️⃣Model I/O
        • 3️⃣Prompts
        • 4️⃣Chains
        • 5️⃣Agents
        • 6️⃣Tools
        • 7️⃣Memory
      • LangChain Intermediate
        • 1️⃣OpenAI LLM
        • 2️⃣Prompt Template
        • 3️⃣Retrieval
        • 4️⃣RAG ChatBot
        • 5️⃣RAG with Gemini
        • 6️⃣New Huggingface-LangChain
        • 7️⃣Huggingface Hub
        • 8️⃣SQL Agent & Chain
        • 9️⃣Expression Language(LCEL)
        • 🔟Llama3-8B with LangChain
      • LangChain Advanced
        • 1️⃣LLM Evaluation
        • 2️⃣RAG Evaluation with RAGAS
        • 3️⃣LangChain with RAGAS
        • 4️⃣RAG Paradigms
        • 5️⃣LangChain: Advance Techniques
        • 6️⃣LangChain with NeMo-Guardrails
        • 7️⃣LangChain vs. LlamaIndex
        • 8️⃣LangChain LCEL vs. LangGraph
    • LlamaIndex
      • LlamaIndex Basic
        • 1️⃣Introduction
        • 2️⃣Customization
        • 3️⃣Data Connectors
        • 4️⃣Documents & Nodes
        • 5️⃣Naive RAG
        • 6️⃣Advanced RAG
        • 7️⃣Llama3-8B with LlamaIndex
        • 8️⃣LlmaPack
      • LlamaIndex Intermediate
        • 1️⃣QueryEngine
        • 2️⃣Agent
        • 3️⃣Evaluation
        • 4️⃣Evaluation-Driven Development
        • 5️⃣Fine-tuning
        • 6️⃣Prompt Compression with LLMLingua
      • LlamaIndex Advanced
        • 1️⃣Agentic RAG: Router Engine
        • 2️⃣Agentic RAG: Tool Calling
        • 3️⃣Building Agent Reasoning Loop
        • 4️⃣Building Multi-document Agent
    • Hugging Face
      • Huggingface Basic
        • 1️⃣Datasets
        • 2️⃣Tokenizer
        • 3️⃣Sentence Embeddings
        • 4️⃣Transformers
        • 5️⃣Sentence Transformers
        • 6️⃣Evaluate
        • 7️⃣Diffusers
      • Huggingface Tasks
        • NLP
          • 1️⃣Sentiment Analysis
          • 2️⃣Zero-shot Classification
          • 3️⃣Aspect-Based Sentiment Analysis
          • 4️⃣Feature Extraction
          • 5️⃣Intent Classification
          • 6️⃣Topic Modeling: BERTopic
          • 7️⃣NER: Token Classification
          • 8️⃣Summarization
          • 9️⃣Translation
          • 🔟Text Generation
        • Audio & Tabular
          • 1️⃣Text-to-Speech: TTS
          • 2️⃣Speech Recognition: Whisper
          • 3️⃣Audio Classification
          • 4️⃣Tabular Qustaion & Answering
        • Vision & Multimodal
          • 1️⃣Image-to-Text
          • 2️⃣Text to Image
          • 3️⃣Image to Image
          • 4️⃣Text or Image-to-Video
          • 5️⃣Depth Estimation
          • 6️⃣Image Classification
          • 7️⃣Object Detection
          • 8️⃣Segmentatio
      • Huggingface Optimization
        • 1️⃣Accelerator
        • 2️⃣Bitsandbytes
        • 3️⃣Flash Attention
        • 4️⃣Quantization
        • 5️⃣Safetensors
        • 6️⃣Optimum-ONNX
        • 7️⃣Optimum-NVIDIA
        • 8️⃣Optimum-Intel
      • Huggingface Fine-tuning
        • 1️⃣Transformer Fine-tuning
        • 2️⃣PEFT Fine-tuning
        • 3️⃣PEFT: Fine-tuning with QLoRA
        • 4️⃣PEFT: Fine-tuning Phi-2 with QLoRA
        • 5️⃣Axoltl Fine-tuning with QLoRA
        • 6️⃣TRL: RLHF Alignment Fine-tuning
        • 7️⃣TRL: DPO Fine-tuning with Phi-3-4k-instruct
        • 8️⃣TRL: ORPO Fine-tuning with Llama3-8B
        • 9️⃣Convert GGUF gemma-2b with llama.cpp
        • 🔟Apple Silicon Fine-tuning Gemma-2B with MLX
        • 🔢LLM Mergekit
    • Agentic LLM
      • Agentic LLM
        • 1️⃣Basic Agentic LLM
        • 2️⃣Multi-agent with CrewAI
        • 3️⃣LangGraph: Multi-agent Basic
        • 4️⃣LangGraph: Agentic RAG with LangChain
        • 5️⃣LangGraph: Agentic RAG with Llama3-8B by Groq
      • Autonomous Agent
        • 1️⃣LLM Autonomous Agent?
        • 2️⃣AutoGPT: Worldcup Winner Search with LangChain
        • 3️⃣BabyAGI: Weather Report with LangChain
        • 4️⃣AutoGen: Writing Blog Post with LangChain
        • 5️⃣LangChain: Autonomous-agent Debates with Tools
        • 6️⃣CAMEL Role-playing Autonomous Cooperative Agents
        • 7️⃣LangChain: Two-player Harry Potter D&D based CAMEL
        • 8️⃣LangChain: Multi-agent Bid for K-Pop Debate
        • 9️⃣LangChain: Multi-agent Authoritarian Speaker Selection
        • 🔟LangChain: Multi-Agent Simulated Environment with PettingZoo
    • Multimodal
      • 1️⃣PaliGemma: Open Vision LLM
      • 2️⃣FLUX.1: Generative Image
    • Building LLM
      • 1️⃣DSPy
      • 2️⃣DSPy RAG
      • 3️⃣DSPy with LangChain
      • 4️⃣Mamba
      • 5️⃣Mamba RAG with LangChain
      • 7️⃣PostgreSQL VectorDB with pgvorco.rs
Powered by GitBook
On this page
  • Document
  • Node
  • Documents & Nodes 실습
  • Construct Document
  • Customize Document
  • Construct Node
  • Customize Node
  1. LLMs
  2. LlamaIndex
  3. LlamaIndex Basic

Documents & Nodes

LlamaIndex에서 Document와 Node는 핵심 데이터의 추상화를 목적으로 합니다.

Document

Document는 모든 데이터 소스를 위한 컨테이너입니다:

  • PDF

  • API 응답

  • 데이터베이스

  • 등등

Document 저장소:

  • 텍스트 데이터

  • 속성 데이터

    • metadata

    • relationships

예시:

from llama_index.core import Document
text_list = ["hello", "world"]
documents = [Document(text=t) for t in text_list]

사용자 지정 Document

사용자 지정 Document는 메타데이터 및 문서 ID 설정을 사용할 수 있습니다.

메타데이터는 문서가 작성될 때 지정하고, 문서 객체에서 수정하고, SimpleDirectoryReader를 사용하는 동안 설정할 수 있습니다.

  1. 빌드 시 지정

    from llama_index.core import Document
    
    document = Document(
        text='Hello World',
        metadata={
            'filename': 'hello_world.pdf',
            'category': 'science'
        }
    )
  2. 문서 개체에서 수정

    document.metadata = {'filename': 'hello_world_v2.pdf'}
  3. SimpleDirectoryReader를 사용하여 설정

    SimpleDirectoryReader를 사용하여 문서를 로드할 때는 file_metadata 콜백을 사용하여 설정

from llama_index.core import SimpleDirectoryReader
    
filenama_hook = lambda filename: {'file_name': filename}
documents = SimpleDirectoryReader('./data', file_metadata=filenama_hook).load_data()

Document ID는 Document를 생성한 후에 설정할 수 있습니다.

from llama_index.core import Document

document = Document(text='Hello World')
document.doc_id = "xxxx-yyyy"

Node

Node는 LlamaIndex의 가장 장점입니다. Node는 또한 Document에서와 동일한 유형의 데이터와 속성을 포함합니다.

Node는 일반적으로 두 가지 방식으로 구성됩니다:

  1. API에 직접 기반

  2. NodeParser를 사용하여 Document를 기반

Document에서 파생된 Node는 Document의 속성도 상속한다는 점에 유의하세요.

예시:

# API를 기반으로 직접 노드 빌드
from llama_index.core.schema import TextNode

node = TextNode(text="hello world", id_="1234-5678")
# # 노드 파서를 사용하여 노드 생성하기
from llama_index.core import Document
from llama_index.core.node_parser import SimpleNodeParser

text_list = ["hello", "world"]
documents = [Document(text=t) for t in text_list]

parser = SimpleNodeParser.from_defaults()
nodes = parser.get_nodes_from_documents(documents)

Custom Node

일반적으로 개발자는 Node를 다음과 같이 사용자 지정할 수 있습니다:

  1. Node 간의 관계를 정의합니다(relationship).

  2. Node ID 사용자 지정

Node 간의 관계 정의

RelationNodeInfo는 관계를 설정하는 데 사용됩니다. RelationNodeInfo는 구축 시 메타데이터를 설정하기 위한 파라미터 전달을 지원합니다.

from llama_index.core.schema import TextNode, NodeRelationship, RelatedNodeInfo

hello_node = TextNode(text="Hello", id_="1111-1111")
world_node = TextNode(text="World", id_="2222-2222")

hello_node.relationships[NodeRelationship.NEXT] = RelatedNodeInfo(node_id=world_node.node_id, metadata={"created_by": "VerySmallWoods"})

world_node.relationships[NodeRelationship.PREVIOUS] = RelatedNodeInfo(node_id=hello_node.node_id)
nodes = [hello_node, world_node]

사용자 지정 노드 ID

from llama_index.core.schema import TextNode, NodeRelationship, RelatedNodeInfo

hello_node = TextNode(text="Hello", id_="1111-1111")
hello_node.id_ = "3333-3333"


Documents & Nodes 실습

Construct Document

from llama_index.core import Document

text_list = ["hello", "world"]
documents = [Document(text=t) for t in text_list]
documents
[Document(id_='7f11d402-9463-40db-935f-66f19e9ec831', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='hello', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='08dfdc9e-bc3c-4760-940c-91fc10c95a59', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='world', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]

Customize Document

Metadata customization

  1. 문서 구성에서 사용자 지정

from llama_index.core import Document

document = Document(
  text='Hello World',
  metadata={
    'filename': 'hello_world.pdf',
    'category': 'science'
  }
)
document
Document(id_='75265f5c-816f-42e9-9bfb-01b4f63cb7a6', embedding=None, metadata={'filename': 'hello_world.pdf', 'category': 'science'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Hello World', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')

2. 문서 작성 후 사용자 지정

document.metadata = {'filename': 'hello_world_v2.pdf'}
document
Document(id_='75265f5c-816f-42e9-9bfb-01b4f63cb7a6', embedding=None, metadata={'filename': 'hello_world_v2.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Hello World', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')

3. SimpleDirectoryReade 사용법 사용자 지정

!rm -rf data && mkdir data
!echo 'hello llama!' > data/hello_llama.txt

from llama_index.core import SimpleDirectoryReader

filenama_hook = lambda filename: {'file_name': filename}
documents = SimpleDirectoryReader('./data', file_metadata=filenama_hook).load_data()
documents
[Document(id_='1f0c956f-71c2-400f-9312-0dd30ae0135f', embedding=None, metadata={'file_name': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/05_Documents_Nodes/data/hello_llama.txt'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='hello llama!\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]

Document ID Customization

from llama_index.core import Document

document = Document(text='Hello World')
document.doc_id = "xxxx-yyyy"
document
Document(id_='xxxx-yyyy', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Hello World', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')

Construct Node

개발자는 노드와 그 모든 속성을 직접 정의하거나 NodeParser 클래스를 통해 소스 문서를 노드로 구문 분석할 수 있습니다.

  1. 직접 구축

from llama_index.core.schema import TextNode
node = TextNode(text="hello world", id_="1234-5678")
node
TextNode(id_='1234-5678', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='hello world', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')

2. NodeParser에 의한 생성

위에서 로드한 document 변수를 재사용하겠습니다.

from llama_index.core.node_parser import SimpleNodeParser

parser = SimpleNodeParser.from_defaults()
nodes = parser.get_nodes_from_documents(documents)
nodes
[TextNode(id_='72c54176-3c28-489a-b460-5fd171e257f1', embedding=None, metadata={'file_name': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/05_Documents_Nodes/data/hello_llama.txt'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='1f0c956f-71c2-400f-9312-0dd30ae0135f', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_name': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/05_Documents_Nodes/data/hello_llama.txt'}, hash='2c825ce076dfa35e3cb1c1961c51e9477d41ff1238a0c63b6147133a2a14cbe2')}, text='hello llama!', start_char_idx=0, end_char_idx=12, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]

Customize Node

  1. nodes 관계 정의

from llama_index.core.schema import TextNode, NodeRelationship, RelatedNodeInfo

hello_node = TextNode(text="Hello", id_="1111-1111")
world_node = TextNode(text="World", id_="2222-2222")

hello_node.relationships[NodeRelationship.NEXT] = RelatedNodeInfo(node_id=world_node.node_id, metadata={"created_by": "VerySmallWoods"})
world_node.relationships[NodeRelationship.PREVIOUS] = RelatedNodeInfo(node_id=hello_node.node_id)
nodes = [hello_node, world_node]
nodes
[TextNode(id_='1111-1111', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='2222-2222', node_type=None, metadata={'created_by': 'VerySmallWoods'}, hash=None)}, text='Hello', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 TextNode(id_='2222-2222', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='1111-1111', node_type=None, metadata={}, hash=None)}, text='World', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]

2. node ID 사용자 지정

from llama_index.core.schema import TextNode, NodeRelationship, RelatedNodeInfo

hello_node = TextNode(text="Hello", id_="1111-1111")
hello_node.id_ = '3333-3333'
hello_node
TextNode(id_='3333-3333', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Hello', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')
PreviousData ConnectorsNextNaive RAG

Last updated 1 year ago

4️⃣