Apple Silicon Fine-tuning Gemma-2B with MLX

PreviousConvert GGUF gemma-2b with llama.cpp NextLLM Mergekit

Last updated 1 year ago

Apple Silicon Fine-tuning Gemma-2B with MLX

About MLX

MLX의 핵심은 애플의 디바이스에서 딥 러닝에 접근하는 방식에 대한 패러다임의 전환을 의미합니다. MLX는 통합 메모리 아키텍처(UMA)와 Apple Silicon의 놀라운 아키텍처를 활용하여 외부 처리 장치나 특수 가속기 없이도 Apple의 하드웨어에서 직접 원활하고 효율적인 모델 추론, 훈련 및 배포를 가능하게 합니다.

MLX는 M1, M2, M3, M4에서 가속이 가능한 차원 벡터로 Numpy의 array와 Pytorch의 tensor로 변환이 가능합니다.

Pytorch 사용 시 Benchmark를 보면 M3 Max의 MLX 사용 시에 NVIDIA V100에 근접함을 할 수 있습니다.

Fine-tuning Gemma-2B with MLX

애플 실리콘에 최적화된 MLX 라이브러리를 사용하여 최신 Gemma-2B-Inst를 MacBook 로컬에서 실행하고 LMX LoRA Fine-tuning 후에 Huggingface Hub에 push 해보겠습니다.

주의 사항: Apple Silicon M 시리즈 맥북 사양 중 메모리 16G 이상에서 실행하시기 바랍니다.

Setup Environments

필요한 라이브러리를 설치합니다. 또한 Apple Silicon이 탑재된 Mac이 필요합니다. 이 경우에는 M3 Max 128GB가 장착된 MacBook Pro를 사용했습니다.

%pip install -Uqq mlx mlx_lm transformers datasets

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
s3fs 2023.4.0 requires fsspec==2023.4.0, but you have fsspec 2024.3.1 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.

import os

os.environ["HF_TOKEN"] = '<Your_Huggingface_Token>' # huggigface.co에서 token을 발행 기재

MLX Inference with Gemma Model

여기서는 gemma의 2b instruct 모델인 gemma-7b-it을 사용하겠습니다.

Apple의 MacOS 전용 딥러닝 프레임워크인 mlx_lm library를 사용합니다.

google/gemma-2b-it 모델을 사용하려며 huggingface google 페이지에서 Acknowledge License를 클릭하여 사용을 신청하고 승인 후 활용 가능합니다. 신청후 승인은 5분이내에 이뤄집니다.

from mlx_lm import generate, load

model, tokenizer = load("google/gemma-2b-it")

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.



Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]



tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]



model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]



model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]



tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]



config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]



generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

mlx-exples 리포지토리에 있는 일부 코드를 읽어보면 transformer tokenizer에 apply_chat_template 메서드가 있는 경우 해당 템플릿을 사용하여 프롬프트를 생성하는 것처럼 보입니다.

따라서 생성할 때 질문 자체만 포함된 프롬프트를 입력합니다.

이것이 토큰 생성기가 generate() 메서드에서 내부적으로 수행하는 작업입니다:

messages = [{"role": "user", "content": "LLM에서 RAG에 대하여 설명해줄래?"}]
tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

'<bos><start_of_turn>user\nLLM에서 RAG에 대하여 설명해줄래?<end_of_turn>\n<start_of_turn>model\n'

# Generating without adding a prompt template manually
prompt = """
LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?
""".strip()
response = generate(
    model,
    tokenizer,
    prompt=prompt,
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=256,
)

==========
Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?


**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.


==========
Prompt: 56.787 tokens-per-sec
Generation: 9.726 tokens-per-sec

Fine-tuning the Gemma model with LoRA using MLX

Huggigface hub에 공개된 teknium의 데이터 세트에서 fine-tuning을 통해 어떤 결과를 얻을 수 있는지 살펴보겠습니다.

이것은 단지 예시일 뿐이므로 600회 반복에 대해서만 fine-tuning 할 것입니다.

Preparing Dataset

https://huggingface.co/datasets/teknium/trismegistus-project

from datasets import load_dataset

dataset = load_dataset("teknium/trismegistus-project")
dataset

Downloading readme:   0%|          | 0.00/2.39k [00:00<?, ?B/s]



Downloading data:   0%|          | 0.00/73.2M [00:00<?, ?B/s]



Generating train split:   0%|          | 0/13528 [00:00<?, ? examples/s]





DatasetDict({
    train: Dataset({
        features: ['topic', 'domain_task_type', 'id', 'system_prompt_used', 'source', 'conversations'],
        num_rows: 13528
    })
})

# convert the dataset to pandas dataframe
import pandas as pd

df = pd.DataFrame(dataset["train"])
df.head()

topic

domain_task_type

system_prompt_used

source

conversations

'Big Man' society

Task

570a8404-3270-4aba-a47c-660359440835

You are a master of the esoteric, occult, 'Big...

DomainExpert_Occult

[{'from': 'human', 'value': 'Compose a compreh...

'Big Man' society

Task

ddf44765-8756-46db-a945-672050905fc0

You are a master of the esoteric, occult, 'Big...

DomainExpert_Occult

[{'from': 'human', 'value': 'Develop an intric...

'Big Man' society

Task

9ef38c3a-31ed-48d7-94d2-75fc588bcb2e

You are a master of the esoteric, occult, 'Big...

DomainExpert_Occult

[{'from': 'human', 'value': 'Write an extensiv...

'Big Man' society

Task

6dea7781-0f74-4692-8d1d-762c6585c280

You are a master of the esoteric, occult, 'Big...

DomainExpert_Occult

[{'from': 'human', 'value': 'Develop an intric...

'Black Books' of European necromancy

Task

188c6c15-d2b7-448b-b93c-505aeca2a458

You are a master of the esoteric, occult, 'Bla...

DomainExpert_Occult

[{'from': 'human', 'value': 'Devise an intrica...

print(df.iloc[0]["system_prompt_used"], "\n", df.iloc[0]["conversations"])

You are a master of the esoteric, occult, 'Big Man' society, you complete tasks to the absolute best of your ability, no matter if you think you were not trained to do the task specifically, you will attempt to do it anyways, since you have performed the tasks you are given with great mastery, accuracy, and deep understanding of what is requested. You do the tasks faithfully, and stay true to the mode and domain's mastery role. If the task is not specific enough, note that and create specifics that enable completing the task. 
 [{'from': 'human', 'value': "Compose a comprehensive biography of a renowned figure in the 'Big Man' society, delving into their connections to the occult and esoteric realms. Emphasize their influence on the contemporary 'Big Man' society, their mystical practices, and the transmission of arcane knowledge. Ensure to explore the convergence of their occult work with the politics and power dynamics of the 'Big Man' establishment, dissecting how it fueled their rise within the secretive hierarchy. Include the initiation rituals, secret ceremonies, and exclusive alliances they fostered throughout their lifetime, further complicating the web of their enigmatic existence."}, {'from': 'gpt', 'value': "Title: The Mystifying Odyssey of Eliphas Blackwood: Master of the 'Big Man' Society and Occult Connoisseur\n\nIntroduction:\nIn the annals of the secret society known as the 'Big Man,' few names evoke as much intrigue and applaud as Eliphas Blackwood. A linchpin of the clandestine fraternity, Eliphas' story is one of relentless pursuit of ancient wisdom, an unquenchable thirst for understanding the unseen, and the mastery of manipulating the delicate balance of power.\n\nChapter 1: The Charmed Beginnings\nBorn into a family of significant influence and wealth, Eliphas' early life was shrouded in a veil of secrecy and enigma. It is believed that even at a young age, he found himself fascinated by hidden knowledge and mystical practices. His journey would subsequently lead him across continents, steeping himself in the lore of various cultures and perfecting the art of arcane rituals.\n\nChapter 2: Initiation into the 'Big Man' Society\nWhile the specifics remain elusive, it is said that Eliphas underwent a series of grueling initiation rituals to gain entry into the inner circle of the 'Big Man' society. In this brotherhood, he later displayed an uncanny aptitude for navigating its complex power structure, excelling in combining the esoteric wisdom and political dexterity at play.\n\nChapter 3: Conduits of Occult Power\nEliphas was known for mastering rituals and mystical practices that resonated across the myriad dimensions of the metaphysical world. Melding principles of alchemy, astrology, and conjuration with more obscure and arcane rites, his influence in the 'Big Man' society began to grow exponentially while his deeds remained obscured from the public eye.\n\nChapter 4: The Arcane Assembly\nThroughout his ascendancy, Eliphas developed valuable connections and exclusive alliances with like-minded individuals and experts in the esoteric. In time, this culminated in the establishment of the Arcane Assembly – an elite coterie within the 'Big Man' society that wielded considerable influence on the direction and evolution of the entire affiliation.\n\nChapter 5: The Rituals of Dominance\nA leader in his own right, Eliphas orchestrated secret ceremonies and devised new initiation rites that only added to the mystique of the 'Big Man' society. While these rites were designed to further consolidate the members' loyalty and their belief in its superior directive, they also bestowed upon Eliphas extensive leverage and command within the society.\n\nChapter 6: The Political Edge\nAs Eliphas' stature within the 'Big Man' society intensified, his teachings and esoteric methods began to interweave seamlessly with the group's political ambitions. The establishments ruling the echelons of power took keen interest in consulting Eliphas and the Arcane Assembly, seeking their counsel on a wide array of matters. Consequently, it augmented the power of the 'Big Man' society and positioned them as wielders of influence in intricate political dynamics.\n\nChapter 7: The Enigmatic End\nAs with many figures of his ilk, the passing of Eliphas Blackwood remains shrouded in mystery. Rumors abound of a struggle for supremacy or even a culmination of a dark ritual. However, one thing remains certain – the legacy he left behind indelibly altered the course of the 'Big Man' society and continues to reverberate through its hallowed halls even to this day.\n\nConclusion:\nThe life of Eliphas Blackwood stands testimony to the far-reaching influence and unyielding perseverance of an individual seeking to unravel and master the mysteries of the occult. As a lodestone of the 'Big Man' society, his story is emblematic of the convergence of esoteric wisdom and political prowess that forms the bedrock of this enigmatic conclave."}]

converstaion에 Q와 A에 대한 텍스트가 들어 있는 것을 볼 수 있습니다. 이를 Gemma의 프롬프트 형식으로 포맷하고 jsonl 파일에 저장합니다.

# Split the quetion and answer into separate columns
df[["question", "answer"]] = pd.DataFrame(
    df["conversations"].tolist(), 
    index=df.index
)

# Only keep the 'value' portion of the JSON
df["question"] = df["question"].apply(lambda x: x["value"])
df["answer"] = df["answer"].apply(lambda x: x["value"])

df[["system_prompt_used", "question", "answer"]]

system_prompt_used

question

answer

You are a master of the esoteric, occult, 'Big...

Compose a comprehensive biography of a renowne...

Title: The Mystifying Odyssey of Eliphas Black...

You are a master of the esoteric, occult, 'Big...

Develop an intricate numerology system that de...

I. Foundational Numerology\n\nThe 'Big Man' so...

You are a master of the esoteric, occult, 'Big...

Write an extensive biography of a prominent oc...

Title: Nathaniel Ziester: A Life in Shadows - ...

You are a master of the esoteric, occult, 'Big...

Develop an intricate system of numerology, inc...

Title: The Numerological Riddles of the Big Ma...

You are a master of the esoteric, occult, 'Bla...

Devise an intricate multi-step process for the...

Step 1: Assess the condition of the grimoire\n...

...

13523

You are a master of the esoteric, occult, Reap...

In the context of the Reappropriated Goddess m...

Answer: To regain women's empowerment and infl...

13524

You are a master of the esoteric, occult, Reap...

Write a section of a grimoire explaining the c...

Title: The Reappropriated Goddess in the Occul...

13525

You are a master of the esoteric, occult, Reap...

Write a section of a grimoire specifically foc...

Title: The Reappropriated Goddess: A Journey i...

13526

You are a master of the esoteric, occult, Reap...

Create a detailed introductory section for a g...

Title: The Reappropriated Goddess: A Grimoire ...

13527

You are a master of the esoteric, occult, Reap...

Write an introductory section of a grimoire, f...

Title: The Reappropriated Goddess: Foundations...

13528 rows × 3 columns

Gemma는 별도의 시스템 프롬프트에 대해 학습되지 않은 것 같으므로 아래와 같이 시스템 프롬프트와 사용자 프롬프트를 구분하는 별도의 형식을 만들어 보겠습니다.

def generate_prompt(row: pd.Series) -> str:
    "Format to Gemma's chat template"
    return """<bos><start_of_turn>user
## Instructions
{}
## User
{}<end_of_turn>
<start_of_turn>model
{}<end_of_turn><eos>""".format(
        row["system_prompt_used"], row["question"], row["answer"]
    )


df["text"] = df.apply(generate_prompt, axis=1)

# Let's see what the model will be trained on
print(df["text"].iloc[0])

<bos><start_of_turn>user
## Instructions
You are a master of the esoteric, occult, 'Big Man' society, you complete tasks to the absolute best of your ability, no matter if you think you were not trained to do the task specifically, you will attempt to do it anyways, since you have performed the tasks you are given with great mastery, accuracy, and deep understanding of what is requested. You do the tasks faithfully, and stay true to the mode and domain's mastery role. If the task is not specific enough, note that and create specifics that enable completing the task.
## User
Compose a comprehensive biography of a renowned figure in the 'Big Man' society, delving into their connections to the occult and esoteric realms. Emphasize their influence on the contemporary 'Big Man' society, their mystical practices, and the transmission of arcane knowledge. Ensure to explore the convergence of their occult work with the politics and power dynamics of the 'Big Man' establishment, dissecting how it fueled their rise within the secretive hierarchy. Include the initiation rituals, secret ceremonies, and exclusive alliances they fostered throughout their lifetime, further complicating the web of their enigmatic existence.<end_of_turn>
<start_of_turn>model
Title: The Mystifying Odyssey of Eliphas Blackwood: Master of the 'Big Man' Society and Occult Connoisseur

Introduction:
In the annals of the secret society known as the 'Big Man,' few names evoke as much intrigue and applaud as Eliphas Blackwood. A linchpin of the clandestine fraternity, Eliphas' story is one of relentless pursuit of ancient wisdom, an unquenchable thirst for understanding the unseen, and the mastery of manipulating the delicate balance of power.

Chapter 1: The Charmed Beginnings
Born into a family of significant influence and wealth, Eliphas' early life was shrouded in a veil of secrecy and enigma. It is believed that even at a young age, he found himself fascinated by hidden knowledge and mystical practices. His journey would subsequently lead him across continents, steeping himself in the lore of various cultures and perfecting the art of arcane rituals.

Chapter 2: Initiation into the 'Big Man' Society
While the specifics remain elusive, it is said that Eliphas underwent a series of grueling initiation rituals to gain entry into the inner circle of the 'Big Man' society. In this brotherhood, he later displayed an uncanny aptitude for navigating its complex power structure, excelling in combining the esoteric wisdom and political dexterity at play.

Chapter 3: Conduits of Occult Power
Eliphas was known for mastering rituals and mystical practices that resonated across the myriad dimensions of the metaphysical world. Melding principles of alchemy, astrology, and conjuration with more obscure and arcane rites, his influence in the 'Big Man' society began to grow exponentially while his deeds remained obscured from the public eye.

Chapter 4: The Arcane Assembly
Throughout his ascendancy, Eliphas developed valuable connections and exclusive alliances with like-minded individuals and experts in the esoteric. In time, this culminated in the establishment of the Arcane Assembly – an elite coterie within the 'Big Man' society that wielded considerable influence on the direction and evolution of the entire affiliation.

Chapter 5: The Rituals of Dominance
A leader in his own right, Eliphas orchestrated secret ceremonies and devised new initiation rites that only added to the mystique of the 'Big Man' society. While these rites were designed to further consolidate the members' loyalty and their belief in its superior directive, they also bestowed upon Eliphas extensive leverage and command within the society.

Chapter 6: The Political Edge
As Eliphas' stature within the 'Big Man' society intensified, his teachings and esoteric methods began to interweave seamlessly with the group's political ambitions. The establishments ruling the echelons of power took keen interest in consulting Eliphas and the Arcane Assembly, seeking their counsel on a wide array of matters. Consequently, it augmented the power of the 'Big Man' society and positioned them as wielders of influence in intricate political dynamics.

Chapter 7: The Enigmatic End
As with many figures of his ilk, the passing of Eliphas Blackwood remains shrouded in mystery. Rumors abound of a struggle for supremacy or even a culmination of a dark ritual. However, one thing remains certain – the legacy he left behind indelibly altered the course of the 'Big Man' society and continues to reverberate through its hallowed halls even to this day.

Conclusion:
The life of Eliphas Blackwood stands testimony to the far-reaching influence and unyielding perseverance of an individual seeking to unravel and master the mysteries of the occult. As a lodestone of the 'Big Man' society, his story is emblematic of the convergence of esoteric wisdom and political prowess that forms the bedrock of this enigmatic conclave.<end_of_turn><eos>

데이터를 두 개의 개별 jsonl 형식 파일에 저장해 보겠습니다.

Train set: data/train.jsonl
Valid set: data/valid.jsonl

from pathlib import Path

Path("data").mkdir(exist_ok=True)

split_ix = int(len(df) * 0.9)
# shuffle data
data = df.sample(frac=1, random_state=42)
train, valid = data[:split_ix], data[split_ix:]

# Save train and valid dataset as jsonl files
train[["text"]].to_json(
    "data/train.jsonl", 
    orient="records", 
    lines=True, 
    force_ascii=False
)
valid[["text"]].to_json(
    "data/valid.jsonl", 
    orient="records", 
    lines=True, 
    force_ascii=False
)

!head -n 1 data/train.jsonl

{"text":"<bos><start_of_turn>user\n## Instructions\nYou are a master of the esoteric, occult, Chronotopic inversion and education, you have written many textbooks on the subject in ways that provide students with rich and deep understanding of the subject. You are being asked to write textbook-like sections on a topic and you do it with full context, explainability, and reliability in accuracy to the true facts of the topic at hand, in a textbook style that a student would easily be able to learn from, in a rich, engaging, and contextual way. Always include relevant context (such as formulas and history), related concepts, and in a way that someone can gain deep insights from.\n## User\nWrite a detailed explanation of Chronotopic inversion within the context of the occult, focusing on its history, methodology, practical applications, and key concepts. Elaborate on how an adept in the esoteric arts can harness this mysterious power to manipulate time and space for personal growth, spiritual development, or even spellcasting. Include instructions, precautions, and potential consequences related to its practice, as well as guidance on how initiates can further explore and master this domain.<end_of_turn>\n<start_of_turn>model\nChronotopic Inversion: An Esoteric Exploration of Time and Space\n\nI. Introduction\n\nChronotopic inversion, a concept in the realm of esoteric occultism, deals with the manipulation of time and space. Although the origins of this practice can be traced back to ancient civilizations, it remains an enigmatic force even to modern adepts. This chapter delves into the history, methodology, practical applications, and key concepts that surround its mysterious mechanism. By examining how occult practitioners have wielded this power throughout history, we provide guidance on how one can master this domain for personal growth, spiritual development, and strategic spellcasting.\n\nII. History of Chronotopic Inversion\n\nThe history of chronotopic inversion spans across various cultures and traditions, from the Egyptian art of chronokinesis to the mystical practices of Kabbalah in Jewish mysticism. Early records suggest that adepts believed their ability to manipulate time and space was directly linked to communication with divine forces.\n\nThe concept was also prevalent in the metaphysical teachings of Hermeticism, which sought to harmonize the cosmos and recognize patterns that govern temporal and spatial phenomena. Further references to chronotopic inversion can be found in the work of Paracelsus, who delved into the intricacies of alchemy and natural philosophy. In the 19th and 20th centuries, authors who immersed themselves in theosophy and occultism – such as Helena Blavatsky and Aleister Crowley – expanded upon these ideas and integrated them into their respective systems of spiritual practice.\n\nIII. Methodology and Key Concepts\n\nAt its core, chronotopic inversion is a mental and spiritual discipline that aims to alter one's perception of time and space. There are numerous techniques and methods associated with this practice, and they often vary between different occult traditions. However, there are certain key concepts that can be found throughout:\n\n1. Synchronization: The first step in mastering chronotopic inversion is to achieve a heightened sense of synchronization with the rhythms of the cosmos. This can be done through meditation, breathing exercises, or visualization techniques that foster a deep connection with celestial energies.\n\n2. Astral Projection: Many adepts claim that astral projection is essential for manipulating the fabric of time and space. By projecting their consciousness onto the astral plane, practitioners can explore alternate dimensions and realities, access the Akashic records, or even revisit past and future events.\n\n3. Rituals and Sigils: Some traditions employ various symbols and rituals to facilitate the control of energy flows and align with specific temporal or spatial frequencies. Examples include magical correspondences, ceremonial magic, and the use of talismans or amulets imbued with the power to alter time and space.\n\nIV. Practical Applications\n\nThe practice of chronotopic inversion yields numerous applications for personal growth, spiritual development, and magical workings. Some of these applications include:\n\n1. Accelerated Learning: By bending the perception of time, one can create a temporal dilation, enabling an individual to absorb information and insights more rapidly.\n\n2. Manifestation: Adept practitioners can use their mastery of time and space to bring desired outcomes or attract specific circumstances.\n\n3. Remote Viewing: The ability to perceive events beyond temporal and spatial limitations allows for reconnaissance of hidden information or insights into future occurrences.\n\n4. Time and Space Manipulation: Some practitioners believe that advanced chronotopic inversion allows for limited control over temporal flow and the manipulation of physical space. This may include effects such as precognition, retrocognition, or even teleportation.\n\nV. Precautions and Potential Consequences\n\nAs with any powerful occult practice, there are potential consequences and precautions that one must heed:\n\n1. Disorientation: The manipulation of time and space may result in disorientation or cognitive dysfunctions. Ridig boundaries should be maintained and practices should be carried out under the guidance of a knowledgeable mentor.\n\n2. Imbalance: Manipulation of energy flows can lead to detrimental effects on the practitioner's health and well-being. Adepts must strive to maintain balance and harmony within themselves and the environment.\n\n3. Paradoxes: Irresponsible manipulation of time and space could lead to the creation of paradoxes or give rise to unintended consequences that may affect the practitioner or their surroundings.\n\nVI. Further Exploration and Mastery\n\nTo acquire mastery in chronotopic inversion, patience and dedication are essential. Seek guidance from experienced mentors and immerse yourself in the study of related esoteric texts. Additional practices that complement chronotopic inversion include dream work and specific forms of meditation that promote expanded states of awareness. Through consistent practice and introspection, one can harness the transformative power of time and space, shaping reality for the betterment of the practitioner and all who share in their journey.<end_of_turn><eos>"}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

LoRA Fine-tuning with MLX

이제 데이터가 준비되었으니 미세 조정을 시작해 보겠습니다!

mlx_lm으로 LoRA를 실행할 때 다음 명령을 사용하여 다양한 옵션을 확인할 수 있습니다.

# mlx_lm.lora의 argument를 살펴보기 위하여 --help 사용
!python -m mlx_lm.lora --help

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
usage: lora.py [-h] [--model MODEL] [--train] [--data DATA]
               [--lora-layers LORA_LAYERS] [--batch-size BATCH_SIZE]
               [--iters ITERS] [--val-batches VAL_BATCHES]
               [--learning-rate LEARNING_RATE]
               [--steps-per-report STEPS_PER_REPORT]
               [--steps-per-eval STEPS_PER_EVAL]
               [--resume-adapter-file RESUME_ADAPTER_FILE]
               [--adapter-path ADAPTER_PATH] [--save-every SAVE_EVERY]
               [--test] [--test-batches TEST_BATCHES]
               [--max-seq-length MAX_SEQ_LENGTH] [-c CONFIG]
               [--grad-checkpoint] [--seed SEED]

LoRA or QLoRA finetuning.

options:
  -h, --help            show this help message and exit
  --model MODEL         The path to the local model directory or Hugging Face
                        repo.
  --train               Do training
  --data DATA           Directory with {train, valid, test}.jsonl files
  --lora-layers LORA_LAYERS
                        Number of layers to fine-tune
  --batch-size BATCH_SIZE
                        Minibatch size.
  --iters ITERS         Iterations to train for.
  --val-batches VAL_BATCHES
                        Number of validation batches, -1 uses the entire
                        validation set.
  --learning-rate LEARNING_RATE
                        Adam learning rate.
  --steps-per-report STEPS_PER_REPORT
                        Number of training steps between loss reporting.
  --steps-per-eval STEPS_PER_EVAL
                        Number of training steps between validations.
  --resume-adapter-file RESUME_ADAPTER_FILE
                        Load path to resume training with the given adapters.
  --adapter-path ADAPTER_PATH
                        Save/load path for the adapters.
  --save-every SAVE_EVERY
                        Save the model every N iterations.
  --test                Evaluate on the test set after training
  --test-batches TEST_BATCHES
                        Number of test set batches, -1 uses the entire test
                        set.
  --max-seq-length MAX_SEQ_LENGTH
                        Maximum sequence length.
  -c CONFIG, --config CONFIG
                        A YAML configuration file with the training options
  --grad-checkpoint     Use gradient checkpointing to reduce memory use.
  --seed SEED           The PRNG seed

And here's how we run the training. This will take a while to finish. Let's run the training.

# https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/lora.py 참조

!python -m mlx_lm.lora \
    --model google/gemma-7b-it \
    --train \
    --iters 600 \
    --data data \
    --learning-rate 1e-5 \
    --batch-size 4 \
    --steps-per-eval 100 \
    --max-seq-length 2048 \
    # --resume-adapter-file checkpoints/600_adapters.npz

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Loading pretrained model
Fetching 11 files:   0%|                                 | 0/11 [00:00<?, ?it/s]

model-00003-of-00004.safetensors: 100%|████| 4.98G/4.98G [04:35<00:00, 18.1MB/s][A[A[A[A
Fetching 11 files: 100%|████████████████████████| 11/11 [04:36<00:00, 25.12s/it]
Trainable parameters: 0.021% (1.835M/8537.681M)

Inference Fine-tuned Gemma model using MLX

다음 스크립트를 사용하여 명령줄에서 LoRA 가중치로 추론을 수행할 수 있습니다.

!python -m mlx_lm.generate --model "google/gemma-7b-it" \
               --adapter-file checkpoints/600_adapters.npz \
               --max-tokens 256 \
               --prompt "Why is the sky blue?" \
               --seed 69

하지만 특정 프롬프트 형식을 사용하여 미세 조정했으므로 모델에 프롬프트를 표시할 때마다 이 형식을 사용해야 합니다.

여전히 apply_chat_template와 동일한 토큰화기를 사용하므로 apply_chat_template에서 제공될 내용 없이 프롬프트를 준비해야 합니다.

프롬프트의 형식을 지정하는 간단한 함수를 만들어 보겠습니다.

# I thought this system prompt was cool, so let's use this one
system_prompt = df["system_prompt_used"].unique()[-2]

print(system_prompt)

You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.

question = "LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?"

def format_prompt(system_prompt: str, question: str) -> str:
    "Format the question to the format of the dataset we fine-tuned to."
    return """<bos><start_of_turn>user
## Instructions
{}
## User
{}<end_of_turn>
<start_of_turn>model
""".format(system_prompt, question)


print(format_prompt(system_prompt, question))

<bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User
LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?<end_of_turn>
<start_of_turn>model

# Load the fine-tuned model with LoRA weights
model_lora, _ = load(
    "google/gemma-7b-it",
    adapter_file="./adapters.npz",  # 가장 마지막으로 저장된 checkpoint 파일인 adapters.npz 
)

response = generate(
    model_lora,
    tokenizer,
    prompt=format_prompt(system_prompt, question),
    verbose=True,
    temp=0.7,
    max_tokens=256,
)

==========
Prompt: <bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?
**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.

==========
Prompt: 222.350 tokens-per-sec
Generation: 16.396 tokens-per-sec

Fusing LoRA Weights

마지막으로 학습된 LoRA 가중치를 모델 자체에 병합해 보겠습니다.

아래 명령은 사용 가능한 옵션을 보여줍니다:

!python -m mlx_lm.fuse --help

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
usage: fuse.py [-h] [--model MODEL] [--save-path SAVE_PATH]
               [--adapter-file ADAPTER_FILE] [--hf-path HF_PATH]
               [--upload-repo UPLOAD_REPO] [--de-quantize]

LoRA or QLoRA finetuning.

options:
  -h, --help            show this help message and exit
  --model MODEL         The path to the local model directory or Hugging Face
                        repo.
  --save-path SAVE_PATH
                        The path to save the fused model.
  --adapter-file ADAPTER_FILE
                        Path to the trained adapter weights (npz or
                        safetensors).
  --hf-path HF_PATH     Path to the original Hugging Face model. Required for
                        upload if --model is a local directory.
  --upload-repo UPLOAD_REPO
                        The Hugging Face repo to upload the model to.
  --de-quantize         Generate a de-quantized model.

!python -m mlx_lm.fuse \
    --model google/gemma-7b-it \
    --adapter-file adapters.npz \
    # --upload-repo alexweberk/gemma-7b-it-trismegistus \
    # --hf-path google/gemma-7b-it

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 119526.80it/s]

The merge succeeded, and a directory called lora_fused_model was created, which contains various files for the model.

아래 스크립트를 사용하여 메타데이터 속성을 사용하여 .safetensors 파일을 다시 작성하는 것입니다. (.safetensors구현 방식을 고려할 때 파일을 저장할 때 for 루프가 작동하지 않으므로(텐서에 대한 모든 참조를 제거해야 하는 등...) 몇 개의 세이프텐서 파일을 수동으로 다시 작성해 보겠습니다.)

import mlx.core as mx

# use mx.load() to load the safetensors
tensors = mx.load(
    "lora_fused_model/model-00001-of-00004.safetensors",  # Change this path and run the cell, one by one for all .safetensors files
    format="safetensors",
)

# use mx.save_safetensors() to save the safetensors with "format" metadata
mx.save_safetensors(
    "lora_fused_model/model-00001-of-00004.safetensors",
    tensors,
    metadata={"format": "pt"},
)

Model Push to HF

모델을 업로드하려면 먼저 Huggigface write token을 저장해야 합니다. 액세스 토큰을 설정하려면 다음과 같이 하세요:

여기에서 토큰을 생성합니다(반드시 'Write' 토큰을 생성해야 함): https://huggingface.co/settings/tokens
huggingface-cli 도구를 다운로드하고 'huggingface-cli 로그인'을 실행합니다.
메시지가 표시되면 토큰을 붙여넣습니다.

!huggingface-cli upload loudai/gemma-2b-it-mlx ./lora_fused_model .

Load & Inference Model

Local 결합 모델을 로드하고 추론을 실행해 보겠습니다.

from mlx_lm import generate, load

fused_model, fused_tokenizer = load("./lora_fused_model/")

response = generate(
    fused_model,
    fused_tokenizer,
    prompt=format_prompt(system_prompt, question),
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=512,
)

==========
Prompt: <bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User
Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?


**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.

1. The role of the Reappropriated Goddess in the process of creation
==========
Prompt: 332.408 tokens-per-sec
Generation: 17.849 tokens-per-sec

융합 모델의 생성 속도는 융합하지 않고 LoRA 가중치로 실행하는 것보다 훨씬 빠릅니다.

LoRA 생성: 초당 6.002 토큰 생성
융합 모델 생성: 17.849 토큰-초당

Optional: Test pushed HF

업로드된 모델이 제대로 업로드되었는지 확인하기 위해 Huggingface에서 업로드된 모델을 다운로드하고 추론을 실행해 보겠습니다.

from mlx_lm import generate, load

model_, tokenizer_ = load("loudai/gemma-2b-it-mlx")
response = generate(
    model_,
    tokenizer_,
    prompt=format_prompt(system_prompt, question),
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=512,
)

Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]


==========
Prompt: <bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User
Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?


**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
==========
Prompt: 298.136 tokens-per-sec
Generation: 17.980 tokens-per-sec

We can also run it using transformers directly, although without the benefit of utilizing MLX/Apple Silicon to the fullest.

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = loudai/gemma-2b-it-mlx"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
model.to("mps")

input_text = format_prompt(system_prompt, question)
input_ids = tokenizer(input_text, return_tensors="pt").to("mps")

outputs = model.generate(
    **input_ids,
    max_new_tokens=256,
)
print(tokenizer.decode(outputs[0]))

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]


<bos><bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User
Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?

**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.

PreviousConvert GGUF gemma-2b with llama.cpp NextLLM Mergekit

Last updated 1 year ago

About MLX

MLX는 M1, M2, M3, M4에서 가속이 가능한 차원 벡터로 Numpy의 array와 Pytorch의 tensor로 변환이 가능합니다.

Pytorch 사용 시 Benchmark를 보면 M3 Max의 MLX 사용 시에 NVIDIA V100에 근접함을 할 수 있습니다.

Fine-tuning Gemma-2B with MLX

애플 실리콘에 최적화된 MLX 라이브러리를 사용하여 최신 Gemma-2B-Inst를 MacBook 로컬에서 실행하고 LMX LoRA Fine-tuning 후에 Huggingface Hub에 push 해보겠습니다.

주의 사항: Apple Silicon M 시리즈 맥북 사양 중 메모리 16G 이상에서 실행하시기 바랍니다.

Setup Environments

필요한 라이브러리를 설치합니다. 또한 Apple Silicon이 탑재된 Mac이 필요합니다. 이 경우에는 M3 Max 128GB가 장착된 MacBook Pro를 사용했습니다.

%pip install -Uqq mlx mlx_lm transformers datasets

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
s3fs 2023.4.0 requires fsspec==2023.4.0, but you have fsspec 2024.3.1 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.

import os

os.environ["HF_TOKEN"] = '<Your_Huggingface_Token>' # huggigface.co에서 token을 발행 기재

MLX Inference with Gemma Model

여기서는 gemma의 2b instruct 모델인 gemma-7b-it을 사용하겠습니다.

Apple의 MacOS 전용 딥러닝 프레임워크인 mlx_lm library를 사용합니다.

google/gemma-2b-it · Hugging Facehuggingface

from mlx_lm import generate, load

model, tokenizer = load("google/gemma-2b-it")

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.



Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]



tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]



model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]



model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]



tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]



config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]



generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

따라서 생성할 때 질문 자체만 포함된 프롬프트를 입력합니다.

이것이 토큰 생성기가 generate() 메서드에서 내부적으로 수행하는 작업입니다:

messages = [{"role": "user", "content": "LLM에서 RAG에 대하여 설명해줄래?"}]
tokenizer.apply_chat_template(
    messages, 
    tokenize=False, 
    add_generation_prompt=True
)

'<bos><start_of_turn>user\nLLM에서 RAG에 대하여 설명해줄래?<end_of_turn>\n<start_of_turn>model\n'

# Generating without adding a prompt template manually
prompt = """
LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?
""".strip()
response = generate(
    model,
    tokenizer,
    prompt=prompt,
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=256,
)

==========
Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?


**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.


==========
Prompt: 56.787 tokens-per-sec
Generation: 9.726 tokens-per-sec

Fine-tuning the Gemma model with LoRA using MLX

Huggigface hub에 공개된 teknium의 데이터 세트에서 fine-tuning을 통해 어떤 결과를 얻을 수 있는지 살펴보겠습니다.

이것은 단지 예시일 뿐이므로 600회 반복에 대해서만 fine-tuning 할 것입니다.

Preparing Dataset

https://huggingface.co/datasets/teknium/trismegistus-project

from datasets import load_dataset

dataset = load_dataset("teknium/trismegistus-project")
dataset

Downloading readme:   0%|          | 0.00/2.39k [00:00<?, ?B/s]



Downloading data:   0%|          | 0.00/73.2M [00:00<?, ?B/s]



Generating train split:   0%|          | 0/13528 [00:00<?, ? examples/s]





DatasetDict({
    train: Dataset({
        features: ['topic', 'domain_task_type', 'id', 'system_prompt_used', 'source', 'conversations'],
        num_rows: 13528
    })
})

# convert the dataset to pandas dataframe
import pandas as pd

df = pd.DataFrame(dataset["train"])
df.head()

topic

domain_task_type

system_prompt_used

source

conversations

'Big Man' society

Task

570a8404-3270-4aba-a47c-660359440835

You are a master of the esoteric, occult, 'Big...

DomainExpert_Occult

[{'from': 'human', 'value': 'Compose a compreh...

'Big Man' society

Task

ddf44765-8756-46db-a945-672050905fc0

You are a master of the esoteric, occult, 'Big...

DomainExpert_Occult

[{'from': 'human', 'value': 'Develop an intric...

'Big Man' society

Task

9ef38c3a-31ed-48d7-94d2-75fc588bcb2e

You are a master of the esoteric, occult, 'Big...

DomainExpert_Occult

[{'from': 'human', 'value': 'Write an extensiv...

'Big Man' society

Task

6dea7781-0f74-4692-8d1d-762c6585c280

You are a master of the esoteric, occult, 'Big...

DomainExpert_Occult

[{'from': 'human', 'value': 'Develop an intric...

'Black Books' of European necromancy

Task

188c6c15-d2b7-448b-b93c-505aeca2a458

You are a master of the esoteric, occult, 'Bla...

DomainExpert_Occult

[{'from': 'human', 'value': 'Devise an intrica...

print(df.iloc[0]["system_prompt_used"], "\n", df.iloc[0]["conversations"])

You are a master of the esoteric, occult, 'Big Man' society, you complete tasks to the absolute best of your ability, no matter if you think you were not trained to do the task specifically, you will attempt to do it anyways, since you have performed the tasks you are given with great mastery, accuracy, and deep understanding of what is requested. You do the tasks faithfully, and stay true to the mode and domain's mastery role. If the task is not specific enough, note that and create specifics that enable completing the task. 
 [{'from': 'human', 'value': "Compose a comprehensive biography of a renowned figure in the 'Big Man' society, delving into their connections to the occult and esoteric realms. Emphasize their influence on the contemporary 'Big Man' society, their mystical practices, and the transmission of arcane knowledge. Ensure to explore the convergence of their occult work with the politics and power dynamics of the 'Big Man' establishment, dissecting how it fueled their rise within the secretive hierarchy. Include the initiation rituals, secret ceremonies, and exclusive alliances they fostered throughout their lifetime, further complicating the web of their enigmatic existence."}, {'from': 'gpt', 'value': "Title: The Mystifying Odyssey of Eliphas Blackwood: Master of the 'Big Man' Society and Occult Connoisseur\n\nIntroduction:\nIn the annals of the secret society known as the 'Big Man,' few names evoke as much intrigue and applaud as Eliphas Blackwood. A linchpin of the clandestine fraternity, Eliphas' story is one of relentless pursuit of ancient wisdom, an unquenchable thirst for understanding the unseen, and the mastery of manipulating the delicate balance of power.\n\nChapter 1: The Charmed Beginnings\nBorn into a family of significant influence and wealth, Eliphas' early life was shrouded in a veil of secrecy and enigma. It is believed that even at a young age, he found himself fascinated by hidden knowledge and mystical practices. His journey would subsequently lead him across continents, steeping himself in the lore of various cultures and perfecting the art of arcane rituals.\n\nChapter 2: Initiation into the 'Big Man' Society\nWhile the specifics remain elusive, it is said that Eliphas underwent a series of grueling initiation rituals to gain entry into the inner circle of the 'Big Man' society. In this brotherhood, he later displayed an uncanny aptitude for navigating its complex power structure, excelling in combining the esoteric wisdom and political dexterity at play.\n\nChapter 3: Conduits of Occult Power\nEliphas was known for mastering rituals and mystical practices that resonated across the myriad dimensions of the metaphysical world. Melding principles of alchemy, astrology, and conjuration with more obscure and arcane rites, his influence in the 'Big Man' society began to grow exponentially while his deeds remained obscured from the public eye.\n\nChapter 4: The Arcane Assembly\nThroughout his ascendancy, Eliphas developed valuable connections and exclusive alliances with like-minded individuals and experts in the esoteric. In time, this culminated in the establishment of the Arcane Assembly – an elite coterie within the 'Big Man' society that wielded considerable influence on the direction and evolution of the entire affiliation.\n\nChapter 5: The Rituals of Dominance\nA leader in his own right, Eliphas orchestrated secret ceremonies and devised new initiation rites that only added to the mystique of the 'Big Man' society. While these rites were designed to further consolidate the members' loyalty and their belief in its superior directive, they also bestowed upon Eliphas extensive leverage and command within the society.\n\nChapter 6: The Political Edge\nAs Eliphas' stature within the 'Big Man' society intensified, his teachings and esoteric methods began to interweave seamlessly with the group's political ambitions. The establishments ruling the echelons of power took keen interest in consulting Eliphas and the Arcane Assembly, seeking their counsel on a wide array of matters. Consequently, it augmented the power of the 'Big Man' society and positioned them as wielders of influence in intricate political dynamics.\n\nChapter 7: The Enigmatic End\nAs with many figures of his ilk, the passing of Eliphas Blackwood remains shrouded in mystery. Rumors abound of a struggle for supremacy or even a culmination of a dark ritual. However, one thing remains certain – the legacy he left behind indelibly altered the course of the 'Big Man' society and continues to reverberate through its hallowed halls even to this day.\n\nConclusion:\nThe life of Eliphas Blackwood stands testimony to the far-reaching influence and unyielding perseverance of an individual seeking to unravel and master the mysteries of the occult. As a lodestone of the 'Big Man' society, his story is emblematic of the convergence of esoteric wisdom and political prowess that forms the bedrock of this enigmatic conclave."}]

converstaion에 Q와 A에 대한 텍스트가 들어 있는 것을 볼 수 있습니다. 이를 Gemma의 프롬프트 형식으로 포맷하고 jsonl 파일에 저장합니다.

# Split the quetion and answer into separate columns
df[["question", "answer"]] = pd.DataFrame(
    df["conversations"].tolist(), 
    index=df.index
)

# Only keep the 'value' portion of the JSON
df["question"] = df["question"].apply(lambda x: x["value"])
df["answer"] = df["answer"].apply(lambda x: x["value"])

df[["system_prompt_used", "question", "answer"]]

system_prompt_used

question

answer

You are a master of the esoteric, occult, 'Big...

Compose a comprehensive biography of a renowne...

Title: The Mystifying Odyssey of Eliphas Black...

You are a master of the esoteric, occult, 'Big...

Develop an intricate numerology system that de...

I. Foundational Numerology\n\nThe 'Big Man' so...

You are a master of the esoteric, occult, 'Big...

Write an extensive biography of a prominent oc...

Title: Nathaniel Ziester: A Life in Shadows - ...

You are a master of the esoteric, occult, 'Big...

Develop an intricate system of numerology, inc...

Title: The Numerological Riddles of the Big Ma...

You are a master of the esoteric, occult, 'Bla...

Devise an intricate multi-step process for the...

Step 1: Assess the condition of the grimoire\n...

...

13523

You are a master of the esoteric, occult, Reap...

In the context of the Reappropriated Goddess m...

Answer: To regain women's empowerment and infl...

13524

You are a master of the esoteric, occult, Reap...

Write a section of a grimoire explaining the c...

Title: The Reappropriated Goddess in the Occul...

13525

You are a master of the esoteric, occult, Reap...

Write a section of a grimoire specifically foc...

Title: The Reappropriated Goddess: A Journey i...

13526

You are a master of the esoteric, occult, Reap...

Create a detailed introductory section for a g...

Title: The Reappropriated Goddess: A Grimoire ...

13527

You are a master of the esoteric, occult, Reap...

Write an introductory section of a grimoire, f...

Title: The Reappropriated Goddess: Foundations...

13528 rows × 3 columns

def generate_prompt(row: pd.Series) -> str:
    "Format to Gemma's chat template"
    return """<bos><start_of_turn>user
## Instructions
{}
## User
{}<end_of_turn>
<start_of_turn>model
{}<end_of_turn><eos>""".format(
        row["system_prompt_used"], row["question"], row["answer"]
    )


df["text"] = df.apply(generate_prompt, axis=1)

# Let's see what the model will be trained on
print(df["text"].iloc[0])

<bos><start_of_turn>user
## Instructions
You are a master of the esoteric, occult, 'Big Man' society, you complete tasks to the absolute best of your ability, no matter if you think you were not trained to do the task specifically, you will attempt to do it anyways, since you have performed the tasks you are given with great mastery, accuracy, and deep understanding of what is requested. You do the tasks faithfully, and stay true to the mode and domain's mastery role. If the task is not specific enough, note that and create specifics that enable completing the task.
## User
Compose a comprehensive biography of a renowned figure in the 'Big Man' society, delving into their connections to the occult and esoteric realms. Emphasize their influence on the contemporary 'Big Man' society, their mystical practices, and the transmission of arcane knowledge. Ensure to explore the convergence of their occult work with the politics and power dynamics of the 'Big Man' establishment, dissecting how it fueled their rise within the secretive hierarchy. Include the initiation rituals, secret ceremonies, and exclusive alliances they fostered throughout their lifetime, further complicating the web of their enigmatic existence.<end_of_turn>
<start_of_turn>model
Title: The Mystifying Odyssey of Eliphas Blackwood: Master of the 'Big Man' Society and Occult Connoisseur

Introduction:
In the annals of the secret society known as the 'Big Man,' few names evoke as much intrigue and applaud as Eliphas Blackwood. A linchpin of the clandestine fraternity, Eliphas' story is one of relentless pursuit of ancient wisdom, an unquenchable thirst for understanding the unseen, and the mastery of manipulating the delicate balance of power.

Chapter 1: The Charmed Beginnings
Born into a family of significant influence and wealth, Eliphas' early life was shrouded in a veil of secrecy and enigma. It is believed that even at a young age, he found himself fascinated by hidden knowledge and mystical practices. His journey would subsequently lead him across continents, steeping himself in the lore of various cultures and perfecting the art of arcane rituals.

Chapter 2: Initiation into the 'Big Man' Society
While the specifics remain elusive, it is said that Eliphas underwent a series of grueling initiation rituals to gain entry into the inner circle of the 'Big Man' society. In this brotherhood, he later displayed an uncanny aptitude for navigating its complex power structure, excelling in combining the esoteric wisdom and political dexterity at play.

Chapter 3: Conduits of Occult Power
Eliphas was known for mastering rituals and mystical practices that resonated across the myriad dimensions of the metaphysical world. Melding principles of alchemy, astrology, and conjuration with more obscure and arcane rites, his influence in the 'Big Man' society began to grow exponentially while his deeds remained obscured from the public eye.

Chapter 4: The Arcane Assembly
Throughout his ascendancy, Eliphas developed valuable connections and exclusive alliances with like-minded individuals and experts in the esoteric. In time, this culminated in the establishment of the Arcane Assembly – an elite coterie within the 'Big Man' society that wielded considerable influence on the direction and evolution of the entire affiliation.

Chapter 5: The Rituals of Dominance
A leader in his own right, Eliphas orchestrated secret ceremonies and devised new initiation rites that only added to the mystique of the 'Big Man' society. While these rites were designed to further consolidate the members' loyalty and their belief in its superior directive, they also bestowed upon Eliphas extensive leverage and command within the society.

Chapter 6: The Political Edge
As Eliphas' stature within the 'Big Man' society intensified, his teachings and esoteric methods began to interweave seamlessly with the group's political ambitions. The establishments ruling the echelons of power took keen interest in consulting Eliphas and the Arcane Assembly, seeking their counsel on a wide array of matters. Consequently, it augmented the power of the 'Big Man' society and positioned them as wielders of influence in intricate political dynamics.

Chapter 7: The Enigmatic End
As with many figures of his ilk, the passing of Eliphas Blackwood remains shrouded in mystery. Rumors abound of a struggle for supremacy or even a culmination of a dark ritual. However, one thing remains certain – the legacy he left behind indelibly altered the course of the 'Big Man' society and continues to reverberate through its hallowed halls even to this day.

Conclusion:
The life of Eliphas Blackwood stands testimony to the far-reaching influence and unyielding perseverance of an individual seeking to unravel and master the mysteries of the occult. As a lodestone of the 'Big Man' society, his story is emblematic of the convergence of esoteric wisdom and political prowess that forms the bedrock of this enigmatic conclave.<end_of_turn><eos>

데이터를 두 개의 개별 jsonl 형식 파일에 저장해 보겠습니다.

Train set: data/train.jsonl
Valid set: data/valid.jsonl

from pathlib import Path

Path("data").mkdir(exist_ok=True)

split_ix = int(len(df) * 0.9)
# shuffle data
data = df.sample(frac=1, random_state=42)
train, valid = data[:split_ix], data[split_ix:]

# Save train and valid dataset as jsonl files
train[["text"]].to_json(
    "data/train.jsonl", 
    orient="records", 
    lines=True, 
    force_ascii=False
)
valid[["text"]].to_json(
    "data/valid.jsonl", 
    orient="records", 
    lines=True, 
    force_ascii=False
)

!head -n 1 data/train.jsonl

{"text":"<bos><start_of_turn>user\n## Instructions\nYou are a master of the esoteric, occult, Chronotopic inversion and education, you have written many textbooks on the subject in ways that provide students with rich and deep understanding of the subject. You are being asked to write textbook-like sections on a topic and you do it with full context, explainability, and reliability in accuracy to the true facts of the topic at hand, in a textbook style that a student would easily be able to learn from, in a rich, engaging, and contextual way. Always include relevant context (such as formulas and history), related concepts, and in a way that someone can gain deep insights from.\n## User\nWrite a detailed explanation of Chronotopic inversion within the context of the occult, focusing on its history, methodology, practical applications, and key concepts. Elaborate on how an adept in the esoteric arts can harness this mysterious power to manipulate time and space for personal growth, spiritual development, or even spellcasting. Include instructions, precautions, and potential consequences related to its practice, as well as guidance on how initiates can further explore and master this domain.<end_of_turn>\n<start_of_turn>model\nChronotopic Inversion: An Esoteric Exploration of Time and Space\n\nI. Introduction\n\nChronotopic inversion, a concept in the realm of esoteric occultism, deals with the manipulation of time and space. Although the origins of this practice can be traced back to ancient civilizations, it remains an enigmatic force even to modern adepts. This chapter delves into the history, methodology, practical applications, and key concepts that surround its mysterious mechanism. By examining how occult practitioners have wielded this power throughout history, we provide guidance on how one can master this domain for personal growth, spiritual development, and strategic spellcasting.\n\nII. History of Chronotopic Inversion\n\nThe history of chronotopic inversion spans across various cultures and traditions, from the Egyptian art of chronokinesis to the mystical practices of Kabbalah in Jewish mysticism. Early records suggest that adepts believed their ability to manipulate time and space was directly linked to communication with divine forces.\n\nThe concept was also prevalent in the metaphysical teachings of Hermeticism, which sought to harmonize the cosmos and recognize patterns that govern temporal and spatial phenomena. Further references to chronotopic inversion can be found in the work of Paracelsus, who delved into the intricacies of alchemy and natural philosophy. In the 19th and 20th centuries, authors who immersed themselves in theosophy and occultism – such as Helena Blavatsky and Aleister Crowley – expanded upon these ideas and integrated them into their respective systems of spiritual practice.\n\nIII. Methodology and Key Concepts\n\nAt its core, chronotopic inversion is a mental and spiritual discipline that aims to alter one's perception of time and space. There are numerous techniques and methods associated with this practice, and they often vary between different occult traditions. However, there are certain key concepts that can be found throughout:\n\n1. Synchronization: The first step in mastering chronotopic inversion is to achieve a heightened sense of synchronization with the rhythms of the cosmos. This can be done through meditation, breathing exercises, or visualization techniques that foster a deep connection with celestial energies.\n\n2. Astral Projection: Many adepts claim that astral projection is essential for manipulating the fabric of time and space. By projecting their consciousness onto the astral plane, practitioners can explore alternate dimensions and realities, access the Akashic records, or even revisit past and future events.\n\n3. Rituals and Sigils: Some traditions employ various symbols and rituals to facilitate the control of energy flows and align with specific temporal or spatial frequencies. Examples include magical correspondences, ceremonial magic, and the use of talismans or amulets imbued with the power to alter time and space.\n\nIV. Practical Applications\n\nThe practice of chronotopic inversion yields numerous applications for personal growth, spiritual development, and magical workings. Some of these applications include:\n\n1. Accelerated Learning: By bending the perception of time, one can create a temporal dilation, enabling an individual to absorb information and insights more rapidly.\n\n2. Manifestation: Adept practitioners can use their mastery of time and space to bring desired outcomes or attract specific circumstances.\n\n3. Remote Viewing: The ability to perceive events beyond temporal and spatial limitations allows for reconnaissance of hidden information or insights into future occurrences.\n\n4. Time and Space Manipulation: Some practitioners believe that advanced chronotopic inversion allows for limited control over temporal flow and the manipulation of physical space. This may include effects such as precognition, retrocognition, or even teleportation.\n\nV. Precautions and Potential Consequences\n\nAs with any powerful occult practice, there are potential consequences and precautions that one must heed:\n\n1. Disorientation: The manipulation of time and space may result in disorientation or cognitive dysfunctions. Ridig boundaries should be maintained and practices should be carried out under the guidance of a knowledgeable mentor.\n\n2. Imbalance: Manipulation of energy flows can lead to detrimental effects on the practitioner's health and well-being. Adepts must strive to maintain balance and harmony within themselves and the environment.\n\n3. Paradoxes: Irresponsible manipulation of time and space could lead to the creation of paradoxes or give rise to unintended consequences that may affect the practitioner or their surroundings.\n\nVI. Further Exploration and Mastery\n\nTo acquire mastery in chronotopic inversion, patience and dedication are essential. Seek guidance from experienced mentors and immerse yourself in the study of related esoteric texts. Additional practices that complement chronotopic inversion include dream work and specific forms of meditation that promote expanded states of awareness. Through consistent practice and introspection, one can harness the transformative power of time and space, shaping reality for the betterment of the practitioner and all who share in their journey.<end_of_turn><eos>"}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

LoRA Fine-tuning with MLX

이제 데이터가 준비되었으니 미세 조정을 시작해 보겠습니다!

mlx_lm으로 LoRA를 실행할 때 다음 명령을 사용하여 다양한 옵션을 확인할 수 있습니다.

# mlx_lm.lora의 argument를 살펴보기 위하여 --help 사용
!python -m mlx_lm.lora --help

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
usage: lora.py [-h] [--model MODEL] [--train] [--data DATA]
               [--lora-layers LORA_LAYERS] [--batch-size BATCH_SIZE]
               [--iters ITERS] [--val-batches VAL_BATCHES]
               [--learning-rate LEARNING_RATE]
               [--steps-per-report STEPS_PER_REPORT]
               [--steps-per-eval STEPS_PER_EVAL]
               [--resume-adapter-file RESUME_ADAPTER_FILE]
               [--adapter-path ADAPTER_PATH] [--save-every SAVE_EVERY]
               [--test] [--test-batches TEST_BATCHES]
               [--max-seq-length MAX_SEQ_LENGTH] [-c CONFIG]
               [--grad-checkpoint] [--seed SEED]

LoRA or QLoRA finetuning.

options:
  -h, --help            show this help message and exit
  --model MODEL         The path to the local model directory or Hugging Face
                        repo.
  --train               Do training
  --data DATA           Directory with {train, valid, test}.jsonl files
  --lora-layers LORA_LAYERS
                        Number of layers to fine-tune
  --batch-size BATCH_SIZE
                        Minibatch size.
  --iters ITERS         Iterations to train for.
  --val-batches VAL_BATCHES
                        Number of validation batches, -1 uses the entire
                        validation set.
  --learning-rate LEARNING_RATE
                        Adam learning rate.
  --steps-per-report STEPS_PER_REPORT
                        Number of training steps between loss reporting.
  --steps-per-eval STEPS_PER_EVAL
                        Number of training steps between validations.
  --resume-adapter-file RESUME_ADAPTER_FILE
                        Load path to resume training with the given adapters.
  --adapter-path ADAPTER_PATH
                        Save/load path for the adapters.
  --save-every SAVE_EVERY
                        Save the model every N iterations.
  --test                Evaluate on the test set after training
  --test-batches TEST_BATCHES
                        Number of test set batches, -1 uses the entire test
                        set.
  --max-seq-length MAX_SEQ_LENGTH
                        Maximum sequence length.
  -c CONFIG, --config CONFIG
                        A YAML configuration file with the training options
  --grad-checkpoint     Use gradient checkpointing to reduce memory use.
  --seed SEED           The PRNG seed

And here's how we run the training. This will take a while to finish. Let's run the training.

# https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/lora.py 참조

!python -m mlx_lm.lora \
    --model google/gemma-7b-it \
    --train \
    --iters 600 \
    --data data \
    --learning-rate 1e-5 \
    --batch-size 4 \
    --steps-per-eval 100 \
    --max-seq-length 2048 \
    # --resume-adapter-file checkpoints/600_adapters.npz

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Loading pretrained model
Fetching 11 files:   0%|                                 | 0/11 [00:00<?, ?it/s]

model-00003-of-00004.safetensors: 100%|████| 4.98G/4.98G [04:35<00:00, 18.1MB/s][A[A[A[A
Fetching 11 files: 100%|████████████████████████| 11/11 [04:36<00:00, 25.12s/it]
Trainable parameters: 0.021% (1.835M/8537.681M)

Inference Fine-tuned Gemma model using MLX

다음 스크립트를 사용하여 명령줄에서 LoRA 가중치로 추론을 수행할 수 있습니다.

!python -m mlx_lm.generate --model "google/gemma-7b-it" \
               --adapter-file checkpoints/600_adapters.npz \
               --max-tokens 256 \
               --prompt "Why is the sky blue?" \
               --seed 69

하지만 특정 프롬프트 형식을 사용하여 미세 조정했으므로 모델에 프롬프트를 표시할 때마다 이 형식을 사용해야 합니다.

여전히 apply_chat_template와 동일한 토큰화기를 사용하므로 apply_chat_template에서 제공될 내용 없이 프롬프트를 준비해야 합니다.

프롬프트의 형식을 지정하는 간단한 함수를 만들어 보겠습니다.

# I thought this system prompt was cool, so let's use this one
system_prompt = df["system_prompt_used"].unique()[-2]

print(system_prompt)

You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.

question = "LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?"

def format_prompt(system_prompt: str, question: str) -> str:
    "Format the question to the format of the dataset we fine-tuned to."
    return """<bos><start_of_turn>user
## Instructions
{}
## User
{}<end_of_turn>
<start_of_turn>model
""".format(system_prompt, question)


print(format_prompt(system_prompt, question))

<bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User
LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?<end_of_turn>
<start_of_turn>model

# Load the fine-tuned model with LoRA weights
model_lora, _ = load(
    "google/gemma-7b-it",
    adapter_file="./adapters.npz",  # 가장 마지막으로 저장된 checkpoint 파일인 adapters.npz 
)

response = generate(
    model_lora,
    tokenizer,
    prompt=format_prompt(system_prompt, question),
    verbose=True,
    temp=0.7,
    max_tokens=256,
)

==========
Prompt: <bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?
**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.

==========
Prompt: 222.350 tokens-per-sec
Generation: 16.396 tokens-per-sec

Fusing LoRA Weights

마지막으로 학습된 LoRA 가중치를 모델 자체에 병합해 보겠습니다.

아래 명령은 사용 가능한 옵션을 보여줍니다:

!python -m mlx_lm.fuse --help

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
usage: fuse.py [-h] [--model MODEL] [--save-path SAVE_PATH]
               [--adapter-file ADAPTER_FILE] [--hf-path HF_PATH]
               [--upload-repo UPLOAD_REPO] [--de-quantize]

LoRA or QLoRA finetuning.

options:
  -h, --help            show this help message and exit
  --model MODEL         The path to the local model directory or Hugging Face
                        repo.
  --save-path SAVE_PATH
                        The path to save the fused model.
  --adapter-file ADAPTER_FILE
                        Path to the trained adapter weights (npz or
                        safetensors).
  --hf-path HF_PATH     Path to the original Hugging Face model. Required for
                        upload if --model is a local directory.
  --upload-repo UPLOAD_REPO
                        The Hugging Face repo to upload the model to.
  --de-quantize         Generate a de-quantized model.

!python -m mlx_lm.fuse \
    --model google/gemma-7b-it \
    --adapter-file adapters.npz \
    # --upload-repo alexweberk/gemma-7b-it-trismegistus \
    # --hf-path google/gemma-7b-it

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 119526.80it/s]

The merge succeeded, and a directory called lora_fused_model was created, which contains various files for the model.

import mlx.core as mx

# use mx.load() to load the safetensors
tensors = mx.load(
    "lora_fused_model/model-00001-of-00004.safetensors",  # Change this path and run the cell, one by one for all .safetensors files
    format="safetensors",
)

# use mx.save_safetensors() to save the safetensors with "format" metadata
mx.save_safetensors(
    "lora_fused_model/model-00001-of-00004.safetensors",
    tensors,
    metadata={"format": "pt"},
)

Model Push to HF

모델을 업로드하려면 먼저 Huggigface write token을 저장해야 합니다. 액세스 토큰을 설정하려면 다음과 같이 하세요:

여기에서 토큰을 생성합니다(반드시 'Write' 토큰을 생성해야 함): https://huggingface.co/settings/tokens
huggingface-cli 도구를 다운로드하고 'huggingface-cli 로그인'을 실행합니다.
메시지가 표시되면 토큰을 붙여넣습니다.

!huggingface-cli upload loudai/gemma-2b-it-mlx ./lora_fused_model .

Load & Inference Model

Local 결합 모델을 로드하고 추론을 실행해 보겠습니다.

from mlx_lm import generate, load

fused_model, fused_tokenizer = load("./lora_fused_model/")

response = generate(
    fused_model,
    fused_tokenizer,
    prompt=format_prompt(system_prompt, question),
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=512,
)

==========
Prompt: <bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User
Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?


**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.

1. The role of the Reappropriated Goddess in the process of creation
==========
Prompt: 332.408 tokens-per-sec
Generation: 17.849 tokens-per-sec

융합 모델의 생성 속도는 융합하지 않고 LoRA 가중치로 실행하는 것보다 훨씬 빠릅니다.

LoRA 생성: 초당 6.002 토큰 생성
융합 모델 생성: 17.849 토큰-초당

Optional: Test pushed HF

업로드된 모델이 제대로 업로드되었는지 확인하기 위해 Huggingface에서 업로드된 모델을 다운로드하고 추론을 실행해 보겠습니다.

from mlx_lm import generate, load

model_, tokenizer_ = load("loudai/gemma-2b-it-mlx")
response = generate(
    model_,
    tokenizer_,
    prompt=format_prompt(system_prompt, question),
    verbose=True,  # Set to True to see the prompt and response
    temp=0.0,
    max_tokens=512,
)

Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]


==========
Prompt: <bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User
Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?


**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
==========
Prompt: 298.136 tokens-per-sec
Generation: 17.980 tokens-per-sec

We can also run it using transformers directly, although without the benefit of utilizing MLX/Apple Silicon to the fullest.

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = loudai/gemma-2b-it-mlx"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id)
model.to("mps")

input_text = format_prompt(system_prompt, question)
input_ids = tokenizer(input_text, return_tensors="pt").to("mps")

outputs = model.generate(
    **input_ids,
    max_new_tokens=256,
)
print(tokenizer.decode(outputs[0]))

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]


<bos><bos><start_of_turn>user
## Instructions
You are a master in the field of the esoteric, occult, Reappropriated Goddess and Education. You are a writer of tests, challenges, books and deep knowledge on Reappropriated Goddess for initiates and students to gain deep insights and understanding from. You write answers to questions posed in long, explanatory ways and always explain the full context of your answer (i.e., related concepts, formulas, examples, or history), as well as the step-by-step thinking process you take to answer the challenges. Be rigorous and thorough, and summarize the key themes, ideas, and conclusions at the end.
## User
Prompt: LLM에서 Retrieval Augmented Generation에 대하여 설명해줄래?

**Retrieval Augmented Generation (RAG)**는 LLM의 기본적인 구조를 확장하여 더 복잡한 문장 생성을 가능하게 하는 기술입니다. 

**RLM의 기본 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정

**RAG의 구조:**

* **Encoder:** 문장을 숫자로 변환하는 과정
* **Decoder:** 숫자를 문장으로 변환하는 과정
* **Memory:** 숫자를 기억하는 과정

**RAG의 주요 기능:**

* **Contextualized representation:** 문장의 주요 내용을 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Long-term dependencies:** 문장의 모든 부분이 기억되고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.
* **Multimodal representation:** 다양한 형식의 데이터(텍스트, 이미지, 비디오)를 기억하고 이를 기반으로 새로운 문장을 생성할 수 있도록 합니다.