Sentence Embeddings

HuggingFace: Sentence Embeddings

문장 유사도 작업은 두 문장을 비교하여 의미 또는 의미적 내용 측면에서 얼마나 유사한지 판단하는 작업입니다.

문장 유사성 모델을 사용하려면 일반적으로 두 문장을 임베딩으로 인코딩합니다. 문장 간의 유사성은 결과 임베딩 간의 유사성 메트릭(예: 코사인 유사성)을 계산하여 측정할 수 있습니다.

%pip install sentence-transformers

Build `sentence embedding` pipeline 🤗 Transformers

all-MiniLM-L6-v2

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

model.safetensors: 100%|██████████| 90.9M/90.9M [00:00<00:00, 107MB/s]

sentences1 = ['고양이는 의자 위에 올라갔다.',
              '강아지는 바닥에서 구르고 있다.',
              '나는 고양이와 강아지의 사진을 찍었다.']

embeddings1 = model.encode(sentences1, 
                           convert_to_tensor=True)
print(embeddings1)

tensor([[-0.0353,  0.0811,  0.0289,  ...,  0.0909, -0.0290, -0.0294],
        [-0.0368,  0.1028,  0.0339,  ...,  0.0204, -0.0981,  0.0058],
        [-0.0591,  0.0815,  0.0702,  ...,  0.0621, -0.0993, -0.0238]],
       device='cuda:0')

sentences2 = ['개는 정원에서 공놀이를 하고 있다.',
              '여자는 TV로 드라마를 시청하고 있다.',
              '영화는 액션이 뛰어나서 흥미진진하게 전개되었다.']

embeddings2 = model.encode(sentences2, 
                           convert_to_tensor=True)
print(embeddings2)

tensor([[-0.0117,  0.0559,  0.0917,  ...,  0.0411, -0.0797, -0.0073],
        [ 0.0048,  0.0472,  0.0360,  ...,  0.0351, -0.0901,  0.0256],
        [ 0.0204,  0.1008,  0.0271,  ...,  0.0095, -0.0577,  0.0161]],
       device='cuda:0')

Calculate Cosine similarity

from sentence_transformers import util

cosine_scores = util.cos_sim(embeddings1,embeddings2)

print(cosine_scores)

tensor([[0.5717, 0.5727, 0.5364],
        [0.8632, 0.7304, 0.7405],
        [0.8258, 0.6486, 0.7269]], device='cuda:0')

for i in range(len(sentences1)):
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i],
                                                 sentences2[i],
                                                 cosine_scores[i][i]))

고양이는 의자 위에 올라갔다. 		 개는 정원에서 공놀이를 하고 있다. 		 Score: 0.5717
강아지는 바닥에서 구르고 있다. 		 여자는 TV로 드라마를 시청하고 있다. 		 Score: 0.7304
나는 고양이와 강아지의 사진을 찍었다. 		 영화는 액션이 뛰어나서 흥미진진하게 전개되었다. 		 Score: 0.7269

PreviousTokenizer NextTransformers

Last updated 1 year ago

Sentence Embeddings

HuggingFace: Sentence Embeddings

문장 유사도 작업은 두 문장을 비교하여 의미 또는 의미적 내용 측면에서 얼마나 유사한지 판단하는 작업입니다.

%pip install sentence-transformers

Build `sentence embedding` pipeline 🤗 Transformers

all-MiniLM-L6-v2

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

model.safetensors: 100%|██████████| 90.9M/90.9M [00:00<00:00, 107MB/s]

sentences1 = ['고양이는 의자 위에 올라갔다.',
              '강아지는 바닥에서 구르고 있다.',
              '나는 고양이와 강아지의 사진을 찍었다.']

embeddings1 = model.encode(sentences1, 
                           convert_to_tensor=True)
print(embeddings1)

tensor([[-0.0353,  0.0811,  0.0289,  ...,  0.0909, -0.0290, -0.0294],
        [-0.0368,  0.1028,  0.0339,  ...,  0.0204, -0.0981,  0.0058],
        [-0.0591,  0.0815,  0.0702,  ...,  0.0621, -0.0993, -0.0238]],
       device='cuda:0')

sentences2 = ['개는 정원에서 공놀이를 하고 있다.',
              '여자는 TV로 드라마를 시청하고 있다.',
              '영화는 액션이 뛰어나서 흥미진진하게 전개되었다.']

embeddings2 = model.encode(sentences2, 
                           convert_to_tensor=True)
print(embeddings2)

tensor([[-0.0117,  0.0559,  0.0917,  ...,  0.0411, -0.0797, -0.0073],
        [ 0.0048,  0.0472,  0.0360,  ...,  0.0351, -0.0901,  0.0256],
        [ 0.0204,  0.1008,  0.0271,  ...,  0.0095, -0.0577,  0.0161]],
       device='cuda:0')

Calculate Cosine similarity

from sentence_transformers import util

cosine_scores = util.cos_sim(embeddings1,embeddings2)

print(cosine_scores)

tensor([[0.5717, 0.5727, 0.5364],
        [0.8632, 0.7304, 0.7405],
        [0.8258, 0.6486, 0.7269]], device='cuda:0')

for i in range(len(sentences1)):
    print("{} \t\t {} \t\t Score: {:.4f}".format(sentences1[i],
                                                 sentences2[i],
                                                 cosine_scores[i][i]))

고양이는 의자 위에 올라갔다. 		 개는 정원에서 공놀이를 하고 있다. 		 Score: 0.5717
강아지는 바닥에서 구르고 있다. 		 여자는 TV로 드라마를 시청하고 있다. 		 Score: 0.7304
나는 고양이와 강아지의 사진을 찍었다. 		 영화는 액션이 뛰어나서 흥미진진하게 전개되었다. 		 Score: 0.7269

PreviousTokenizer NextTransformers

Last updated 1 year ago

HuggingFace: Sentence Embeddings

Build sentence embedding pipeline 🤗 Transformers

Calculate Cosine similarity

HuggingFace: Sentence Embeddings

Build sentence embedding pipeline 🤗 Transformers

Calculate Cosine similarity

Build `sentence embedding` pipeline 🤗 Transformers

Build `sentence embedding` pipeline 🤗 Transformers