New Huggingface-LangChain

PreviousRAG with Gemini NextHuggingface Hub

Last updated 1 year ago

New Huggingface-LangChain

Huggingface-LangChain

langchain_huggingface는 기존 LangChain에서 사용하던langchain_community.chat_models.huggingfac을 간결하게 바꾼 프레임워크로 Huggingface에서 2024년 5월 14일 공식 발표했습니다.

아래는 기존의 langchain_communiy의 API 문서 입니다. 아직 LangChain은 변경된 라이브러리를 적용하지 않았습니다.

Setup Environments

langchain-huggingface를 시작하는 방법은 간단합니다. 패키지를 설치하고 사용을 시작하는 방법은 다음과 같습니다:

import os

os.environ["HUGGINGFACEHUB_API_TOKEN"]="<Your_Huggingface_Token>"

%pip install langchain-huggingface

1. LLMs

HuggingFacePipeline

트랜스포머 중에서도 파이프라인은 허깅 페이스 도구 상자에서 가장 다재다능한 도구입니다. LangChain은 주로 RAG 및 에이전트 사용 사례를 다루기 위해 설계되었기 때문에 여기서 파이프라인의 범위는 다음과 같은 텍스트 중심 작업으로:

“text-generation", “text2text-generation", “summarization”, “translation”.

from langchain_huggingface import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 100,
        "top_k": 50,
        "temperature": 0.1,
    },
)
llm.invoke("What is Transformer from huggingface?")

'What is Transformer from huggingface?\n\nI am trying to understand the Transformer model from huggingface. I have read the paper and the code, but I am still not clear on some aspects.\n\n1. What is the difference between the Transformer model and the original Transformer model from the paper?\n2. What is the purpose of the `forward` method in the Transformer model?\n3. What is the purpose of the `forward_attention` method in the Transformer model?'

위의 HuggingFacePipeline.from_model_id() 메서드를 사용하면 기존의 Transformer pipeline을 구성하는 것과 동일합니다.

from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline

model_id = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    #attn_implementation="flash_attention_2", # flash attention 사용 시
)
pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=100, 
    top_k=50, 
    temperature=0.1
)
llm = HuggingFacePipeline(pipeline=pipe)
llm.invoke("What is Transformer from huggingface")

'What is Transformer from huggingface?\n\n[response]: The Transformer is a deep learning model architecture that was introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017. It was designed to handle sequential data, such as natural language text, and has since become a foundational model in the field of natural language processing (NLP). The Transformer model is particularly known for its efficiency in parallelization and its ability to capture long-range dependencies in'

HuggingFaceEndpoint

이 클래스를 사용하는 방법에는 두 가지가 있습니다.

Huggigface Hub에서 불러올 시에 repo_id 매개변수를 사용하여 모델을 지정할 수 있습니다. 이러한 엔드포인트는 서버리스 API를 사용하므로 프로 계정이나 엔터프라이즈 허브를 사용하는 사람들에게 특히 유용합니다.
로컬에서 코드를 실행하는 환경에서 <endpoint_url>을 지정하여 HF 토큰으로 연결하여 이미 상당한 양의 요청에 액세스할 수 있습니다.

from langchain_huggingface import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    task="text-generation",
    max_new_tokens=100,
    do_sample=False,
)
llm.invoke("What is Transformer from huggingface?")

' transformers library?\nTransformer from the Hugging Face Transformers library is a pre-trained language model that is based on the original Transformer architecture proposed by Vaswani et al. in the paper "Attention is All You Need" in 2017.\nThe Transformer model is a type of neural network that is specifically designed for natural language processing tasks, such as language translation, text generation, and question answering. It is based on the idea of self-attention, which allows the model to focus on different parts of'

llm = HuggingFaceEndpoint( endpoint_url="<endpoint_url>", task="text-generation", max_new_tokens=1024, do_sample=False, ) llm.invoke("What is Transformer from huggingface")

ChatHuggingFace

모든 모델에는 가장 잘 작동하는 고유한 특수 토큰이 있습니다. 프롬프트에 이러한 토큰을 추가하지 않으면 모델의 성능이 크게 저하됩니다.

메시지 목록에서 완료 프롬프트로 이동할 때 대부분의 LLM 토큰화 도구에는 chat_template이라는 속성이 있습니다.

이 클래스는 메시지 목록을 입력으로 받은 다음 tokenizer.apply_chat_template 메서드를 사용하여 올바른 완료 프롬프트를 생성합니다.

from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url="<endpoint_url>",
    task="text-generation",
    max_new_tokens=1024,
    do_sample=False,
)
llm_engine_hf = ChatHuggingFace(llm=llm)
llm_engine_hf.invoke(""What is Transformer from huggingface?")

# with mistralai/Mistral-7B-Instruct-v0.2
llm.invoke("<s>[INST] "What is Transformer from huggingface? [/INST]")

# with meta-llama/Meta-Llama-3-8B-Instruct
llm.invoke("""<|begin_of_text|><|start_header_id|>user<|end_header_id|>Hugging Face is<|eot_id|><|start_header_id|>assistant<|end_header_id|>""")

2. Embeddings

HuggingFaceEmbeddings

이 클래스는 sentence-transformers 임베딩을 사용합니다. 임베딩을 로컬에서 계산하므로 컴퓨터 리소스를 사용합니다.

from langchain_huggingface.embeddings import HuggingFaceEmbeddings

model_name = "mixedbread-ai/mxbai-embed-large-v1"
hf_embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
)
texts = ["Hello, world!", "How are you?"]
hf_embeddings.embed_documents(texts)

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]



config_sentence_transformers.json:   0%|          | 0.00/171 [00:00<?, ?B/s]



README.md:   0%|          | 0.00/113k [00:00<?, ?B/s]



sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/677 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]



vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/297 [00:00<?, ?B/s]


[[0.3059690594673157,
  0.7907294034957886,
  0.009807512164115906,
  -0.15764302015304565,
  -0.8023927807807922,
  0.14975695312023163,
  1.2789497375488281,
  0.7509115934371948,
  ...]]

HuggingFaceEndpointEmbeddings

HuggingFaceEndpointEmbeddings는 내부적으로 InferenceClient를 사용하여 임베딩을 계산한다는 점에서 HuggingFaceEndpoint가 LLM에서 수행하는 것과 매우 유사합니다.

허브의 모델과 로컬 또는 온라인 배포 여부에 관계없이 TEI 인스턴스에 사용할 수 있습니다.

from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model= "mixedbread-ai/mxbai-embed-large-v1",
    task="feature-extraction",
    #huggingfacehub_api_token="<HF_TOKEN>",
)
texts = ["Hello, world!", "How are you?"]
hf_embeddings.embed_documents(texts)

[[0.3059690594673157,
  0.7907294034957886,
  0.009807512164115906,
  -0.15764302015304565,
  -0.8023927807807922,
  0.14975695312023163,
  1.2789497375488281,
  0.7509115934371948,
  ...]]