RAG 비즈니스 시나리오에서 데이터 로딩은 매우 중요한 측면입니다. LlamaIndex는 Data Connectors
인터페이스를 정의하고 다양한 데이터 소스 또는 데이터 형식에서 데이터 로드를 지원하는 여러 가지 구현을 제공합니다. 여기에는 다음이 포함됩니다.:
LlamaIndex 프레임워크는 내장된 Data Connector 세트를 제공합니다. 개발자는 LlamaHub에서 로드할 필요 없이 바로 사용할 수 있습니다.
다음 코드는 웹 페이지 데이터를 읽는 방법을 보여줍니다.
from llama_index.core import SummaryIndex, SimpleWebPageReader
documents = SimpleWebPageReader(html_to_text=True).load_data(
["http://paulgraham.com/worked.html"]
)
from pathlib import Path
from llama_index.core import download_loader
MarkdownReader = download_loader("MarkdownReader")
loader = MarkdownReader()
documents = loader.load_data(file=Path('./README.md'))
import os
from dotenv import load_dotenv
!echo "OPENAI_API_KEY=<Your OpenAI Key>" >> .env # OpenAI API Key 입
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
LlamaIndex 튜토리얼(https://github.com/Anil-matcha/LlamaIndex-tutorials을 참조하여 쿼리 엔진 사용을 위해 LlamaIndex 정의 문서에 로드합니다.
!git clone https://github.com/Anil-matcha/LlamaIndex-tutorials.git
Cloning into 'LlamaIndex-tutorials'...
remote: Enumerating objects: 16, done.[K
remote: Counting objects: 100% (16/16), done.[K
remote: Compressing objects: 100% (15/15), done.[K
remote: Total 16 (delta 3), reused 4 (delta 1), pack-reused 0[K
Unpacking objects: 100% (16/16), 8.04 KiB | 1.15 MiB/s, done.
from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader(
input_dir="./LlamaIndex-tutorials",
required_exts=[".md"],
recursive=True
)
docs = reader.load_data()
[Document(id_='e06e478c-8581-4ff8-b3ba-59be370e8ffc', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nLlamaIndex tutorials\n\nOverview and tutorials of the LlamaIndex Library\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
Document(id_='3eb8990d-ae3e-4940-9f23-809934e30e33', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nGetting Started\n\nVideos coming soon https://www.youtube.com/@AnilChandraNaiduMatcha\n.Subscribe to the channel to get latest content\n\nFollow Anil Chandra Naidu Matcha on twitter for updates\n\nJoin our discord server for support https://discord.gg/FBpafqbbYF\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
Document(id_='195d642a-f821-4d8f-82ee-10e3059633a7', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nAlso check\n\nLlamaIndex Course\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.query("LlamaIndex란 무엇이야?")
print(response)
LlamaIndex is a library that provides an overview and tutorials for users.
response = query_engine.query("LlamaIndex 튜토리얼은 무엇을 제공해?")
print(response)
The LlamaIndex tutorials provide an overview and tutorials of the LlamaIndex Library.
LlamaIndex 튜토리얼(https://github.com/Anil-matcha/LlamaIndex-tutorials)을 참조하여 쿼리 엔진 사용을 위해 LlamaIndex 정의 문서에 로드합니다.
from pathlib import Path
from llama_index.core import download_loader
MarkdownReader = download_loader("MarkdownReader")
loader = MarkdownReader()
documents = loader.load_data(file=Path('./LlamaIndex-tutorials/README.md'))
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("LlamaIndex 시작하기 튜토리얼에서는 어떤 버전의 프레임워크를 사용하나요?")
print(response)
The LlamaIndex tutorials use a specific version of a framework.