AI-Master-Book
  • about AI-Master-Book
  • AI Master Book
    • 이상치 탐지 with Python
    • 베이지안 뉴럴네트워크 (BNN) with Python
    • 그래프 뉴럴네트워크 (GNN) with Python
    • 데이터 마케팅 분석 with Python
  • LLM MASTER BOOK
    • OpenAI API 쿡북 with Python
    • 기초부터 심화까지 RAG 쿡북 with Python
    • MCP 에이전트 쿡북 with Python
  • LLMs
    • OpenAI API
      • 1️⃣ChatCompletion
      • 2️⃣DALL-E
      • 3️⃣Text to Speech
      • 4️⃣Video to Transcripts
      • 5️⃣Assistants API
      • 6️⃣Prompt Engineering
      • 7️⃣OpenAI New GPT-4o
    • LangChain
      • LangChain Basic
        • 1️⃣Basic Modules
        • 2️⃣Model I/O
        • 3️⃣Prompts
        • 4️⃣Chains
        • 5️⃣Agents
        • 6️⃣Tools
        • 7️⃣Memory
      • LangChain Intermediate
        • 1️⃣OpenAI LLM
        • 2️⃣Prompt Template
        • 3️⃣Retrieval
        • 4️⃣RAG ChatBot
        • 5️⃣RAG with Gemini
        • 6️⃣New Huggingface-LangChain
        • 7️⃣Huggingface Hub
        • 8️⃣SQL Agent & Chain
        • 9️⃣Expression Language(LCEL)
        • 🔟Llama3-8B with LangChain
      • LangChain Advanced
        • 1️⃣LLM Evaluation
        • 2️⃣RAG Evaluation with RAGAS
        • 3️⃣LangChain with RAGAS
        • 4️⃣RAG Paradigms
        • 5️⃣LangChain: Advance Techniques
        • 6️⃣LangChain with NeMo-Guardrails
        • 7️⃣LangChain vs. LlamaIndex
        • 8️⃣LangChain LCEL vs. LangGraph
    • LlamaIndex
      • LlamaIndex Basic
        • 1️⃣Introduction
        • 2️⃣Customization
        • 3️⃣Data Connectors
        • 4️⃣Documents & Nodes
        • 5️⃣Naive RAG
        • 6️⃣Advanced RAG
        • 7️⃣Llama3-8B with LlamaIndex
        • 8️⃣LlmaPack
      • LlamaIndex Intermediate
        • 1️⃣QueryEngine
        • 2️⃣Agent
        • 3️⃣Evaluation
        • 4️⃣Evaluation-Driven Development
        • 5️⃣Fine-tuning
        • 6️⃣Prompt Compression with LLMLingua
      • LlamaIndex Advanced
        • 1️⃣Agentic RAG: Router Engine
        • 2️⃣Agentic RAG: Tool Calling
        • 3️⃣Building Agent Reasoning Loop
        • 4️⃣Building Multi-document Agent
    • Hugging Face
      • Huggingface Basic
        • 1️⃣Datasets
        • 2️⃣Tokenizer
        • 3️⃣Sentence Embeddings
        • 4️⃣Transformers
        • 5️⃣Sentence Transformers
        • 6️⃣Evaluate
        • 7️⃣Diffusers
      • Huggingface Tasks
        • NLP
          • 1️⃣Sentiment Analysis
          • 2️⃣Zero-shot Classification
          • 3️⃣Aspect-Based Sentiment Analysis
          • 4️⃣Feature Extraction
          • 5️⃣Intent Classification
          • 6️⃣Topic Modeling: BERTopic
          • 7️⃣NER: Token Classification
          • 8️⃣Summarization
          • 9️⃣Translation
          • 🔟Text Generation
        • Audio & Tabular
          • 1️⃣Text-to-Speech: TTS
          • 2️⃣Speech Recognition: Whisper
          • 3️⃣Audio Classification
          • 4️⃣Tabular Qustaion & Answering
        • Vision & Multimodal
          • 1️⃣Image-to-Text
          • 2️⃣Text to Image
          • 3️⃣Image to Image
          • 4️⃣Text or Image-to-Video
          • 5️⃣Depth Estimation
          • 6️⃣Image Classification
          • 7️⃣Object Detection
          • 8️⃣Segmentatio
      • Huggingface Optimization
        • 1️⃣Accelerator
        • 2️⃣Bitsandbytes
        • 3️⃣Flash Attention
        • 4️⃣Quantization
        • 5️⃣Safetensors
        • 6️⃣Optimum-ONNX
        • 7️⃣Optimum-NVIDIA
        • 8️⃣Optimum-Intel
      • Huggingface Fine-tuning
        • 1️⃣Transformer Fine-tuning
        • 2️⃣PEFT Fine-tuning
        • 3️⃣PEFT: Fine-tuning with QLoRA
        • 4️⃣PEFT: Fine-tuning Phi-2 with QLoRA
        • 5️⃣Axoltl Fine-tuning with QLoRA
        • 6️⃣TRL: RLHF Alignment Fine-tuning
        • 7️⃣TRL: DPO Fine-tuning with Phi-3-4k-instruct
        • 8️⃣TRL: ORPO Fine-tuning with Llama3-8B
        • 9️⃣Convert GGUF gemma-2b with llama.cpp
        • 🔟Apple Silicon Fine-tuning Gemma-2B with MLX
        • 🔢LLM Mergekit
    • Agentic LLM
      • Agentic LLM
        • 1️⃣Basic Agentic LLM
        • 2️⃣Multi-agent with CrewAI
        • 3️⃣LangGraph: Multi-agent Basic
        • 4️⃣LangGraph: Agentic RAG with LangChain
        • 5️⃣LangGraph: Agentic RAG with Llama3-8B by Groq
      • Autonomous Agent
        • 1️⃣LLM Autonomous Agent?
        • 2️⃣AutoGPT: Worldcup Winner Search with LangChain
        • 3️⃣BabyAGI: Weather Report with LangChain
        • 4️⃣AutoGen: Writing Blog Post with LangChain
        • 5️⃣LangChain: Autonomous-agent Debates with Tools
        • 6️⃣CAMEL Role-playing Autonomous Cooperative Agents
        • 7️⃣LangChain: Two-player Harry Potter D&D based CAMEL
        • 8️⃣LangChain: Multi-agent Bid for K-Pop Debate
        • 9️⃣LangChain: Multi-agent Authoritarian Speaker Selection
        • 🔟LangChain: Multi-Agent Simulated Environment with PettingZoo
    • Multimodal
      • 1️⃣PaliGemma: Open Vision LLM
      • 2️⃣FLUX.1: Generative Image
    • Building LLM
      • 1️⃣DSPy
      • 2️⃣DSPy RAG
      • 3️⃣DSPy with LangChain
      • 4️⃣Mamba
      • 5️⃣Mamba RAG with LangChain
      • 7️⃣PostgreSQL VectorDB with pgvorco.rs
Powered by GitBook
On this page
  • LlamaHub
  • 사용 예시
  • LlamaIndex: Data Connectors
  • Setup Environments
  • Data connectors
  1. LLMs
  2. LlamaIndex
  3. LlamaIndex Basic

Data Connectors

RAG 비즈니스 시나리오에서 데이터 로딩은 매우 중요한 측면입니다. LlamaIndex는 Data Connectors 인터페이스를 정의하고 다양한 데이터 소스 또는 데이터 형식에서 데이터 로드를 지원하는 여러 가지 구현을 제공합니다. 여기에는 다음이 포함됩니다.:

  • Simple Directory Reader

  • Psychic Reader

  • DeepLake Reader

  • Qdrant Reader

  • Discord Reader

  • MongoDB Reader

  • Chroma Reader

  • MyScale Reader

  • Faiss Reader

  • Obsidian Reader

  • Slack Reader

  • Web Page Reader

  • Pinecone Reader

  • Mbox Reader

  • MilvusReader

  • Notion Reader

  • Github Repo Reader

  • Google Docs Reader

  • Database Reader

  • Twitter Reader

  • Weaviate Reader

  • Make Reader

LlamaHub

LlmaIndex 용 Data Connectors는 LlamaHub를 통해 제공됩니다. LlmaHub는 데이터 커넥터가 포함된 오픈 소스 저장소로, 모든 LlamaIndex 애플리케이션에 쉽게 통합할 수 있습니다.

사용 예시

LlamaIndex 프레임워크의 내장 Data Connectors 사용

LlamaIndex 프레임워크는 내장된 Data Connector 세트를 제공합니다. 개발자는 LlamaHub에서 로드할 필요 없이 바로 사용할 수 있습니다.

다음 코드는 웹 페이지 데이터를 읽는 방법을 보여줍니다.

from llama_index.core import SummaryIndex, SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)

LlamaHub에서 Data Connectors 로드하기

다음 샘플 코드는 LlamaHub에서 마크다운 문서 데이터 커넥터를 로드하는 예제 코드입니다. 이 데이터 커넥터에 대한 자세한 내용은 https://llamahub.ai/l/file-markdown를 참조하세요.

from pathlib import Path
from llama_index.core import download_loader

MarkdownReader = download_loader("MarkdownReader")

loader = MarkdownReader()
documents = loader.load_data(file=Path('./README.md'))

LlamaIndex: Data Connectors

Setup Environments

import os
from dotenv import load_dotenv  

!echo "OPENAI_API_KEY=<Your OpenAI Key>" >> .env # OpenAI API Key 입
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

Data connectors

LlamaIndex: Data Connectors

LlamaIndex 튜토리얼(https://github.com/Anil-matcha/LlamaIndex-tutorials을 참조하여 쿼리 엔진 사용을 위해 LlamaIndex 정의 문서에 로드합니다.

!git clone https://github.com/Anil-matcha/LlamaIndex-tutorials.git
Cloning into 'LlamaIndex-tutorials'...
remote: Enumerating objects: 16, done.
remote: Counting objects: 100% (16/16), done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 16 (delta 3), reused 4 (delta 1), pack-reused 0
Unpacking objects: 100% (16/16), 8.04 KiB | 1.15 MiB/s, done.
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(
    input_dir="./LlamaIndex-tutorials",
    required_exts=[".md"],
    recursive=True
)
docs = reader.load_data()
docs
[Document(id_='e06e478c-8581-4ff8-b3ba-59be370e8ffc', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nLlamaIndex tutorials\n\nOverview and tutorials of the LlamaIndex Library\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='3eb8990d-ae3e-4940-9f23-809934e30e33', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nGetting Started\n\nVideos coming soon https://www.youtube.com/@AnilChandraNaiduMatcha\n.Subscribe to the channel to get latest content\n\nFollow Anil Chandra Naidu Matcha on twitter for updates\n\nJoin our discord server for support https://discord.gg/FBpafqbbYF\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
 Document(id_='195d642a-f821-4d8f-82ee-10e3059633a7', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nAlso check\n\nLlamaIndex Course\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.query("LlamaIndex란 무엇이야?")

print(response)
LlamaIndex is a library that provides an overview and tutorials for users.
response = query_engine.query("LlamaIndex 튜토리얼은 무엇을 제공해?")

print(response)
The LlamaIndex tutorials provide an overview and tutorials of the LlamaIndex Library.

LlmaHub: Data Connectors

LlamaIndex 튜토리얼(https://github.com/Anil-matcha/LlamaIndex-tutorials)을 참조하여 쿼리 엔진 사용을 위해 LlamaIndex 정의 문서에 로드합니다.

from pathlib import Path
from llama_index.core import download_loader

MarkdownReader = download_loader("MarkdownReader")

loader = MarkdownReader()
documents = loader.load_data(file=Path('./LlamaIndex-tutorials/README.md'))
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("LlamaIndex 시작하기 튜토리얼에서는 어떤 버전의 프레임워크를 사용하나요?")

print(response)
The LlamaIndex tutorials use a specific version of a framework.
PreviousCustomizationNextDocuments & Nodes

Last updated 1 year ago

3️⃣