Optimum-Intel
Last updated
Last updated
Huggingface에서 제공하는 Optimum Intel은 인텔의 CPU, GPU 아키텍처에서 엔드투엔드 파이프라인을 가속화하기 위해 인텔이 제공하는 Transformer 및 Diffuser 라이브러리와 다양한 툴 및 라이브러리 간의 인터페이스입니다. Optimum Intel은 2가지를 지원합니다.
LLM의 Quantization, Pruning, Knowledge Distillation와 같은 가장 널리 사용되는 압축 기술을 사용할 수 있는 오픈 소스 라이브러리입니다. 사용자가 Quantized 모델을 쉽게 생성할 수 있도록 정확도 중심의 자동 튜닝 전략을 지원합니다. 사용자는 정적, 동적, 인식 훈련 양자화 접근법을 쉽게 적용하면서 예상 정확도 기준을 제시할 수 있습니다. 또한 다양한 Weight Pruning 기법을 지원하여 사전 정의된 희소성 목표를 제공하는 Pruning 모델을 생성할 수 있습니다.
OpenVINO는 인텔 CPU, GPU 및 특수 DL 추론 가속기를 위한 고성능 추론 기능을 지원하는 오픈 소스 툴킷입니다. Quantization, Pruning, Knowledge Distillation으로 모델을 최적화할 수 있는 도구 세트가 함께 제공됩니다. Optimum Intel-OpenVINO는 Transformer와 Diffusers 모델을 최적화하고, OpenVINO IR(Intermediate Representation) 형식으로 변환하고, OpenVINO 런타임을 사용하여 추론을 실행할 수 있는 간단한 인터페이스를 제공합니다.
Optimum Intel-OpenVINO의 Huggingface 지원 Tasks는 아래와 같습니다:
text-classification
OVModelForSequenceClassification
token-classification
OVModelForTokenClassification
question-answering
OVModelForQuestionAnswering
audio-classification
OVModelForAudioClassification
image-classification
OVModelForImageClassification
feature-extraction
OVModelForFeatureExtraction
fill-mask
OVModelForMaskedLM
text-generation
OVModelForCausalLM
text2text-generation
OVModelForSeq2SeqLM
Optimum Intel-OpenVINO 설치는 OpenVINO Runtime과 IR을 설치해야하기 때문에 가능하면 Docker build를 권장합니다.
Optimum OpenVINO의 자세한 튜토리얼은 아래 링크를 참조하시기 바랍니다.
neural-compressor는 Quantization, Pruning, Knowledge Distillation을 효과적으로 변환할 수 있습니다. 다만 Huggingface는 이미 Bitsandbytes, Accelerator에서 쉽게 pipeline에 적용 가능해서 인텔의 neural-compressor를 사용하는 의미는 없는 것 같습니다. 아래는 활용 예시입니다.
LLM 모델의 Weight-only Quantization
일반 Transformer 모델의 Static Quantization
%pip install optimum[neural-compressor]
%pip install neural-compressor auto_round
LLM의 가중치 전용 정량화를 보여주는 것으로, 인텔 CPU, 인텔 가우디2 AI 가속기, 엔비디아 GPU를 지원하며 최적의 디바이스가 자동으로 선택됩니다.
from transformers import AutoModel, AutoTokenizer
from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor.quantization import fit
from neural_compressor.adaptor.torch_utils.auto_round import get_dataloader
#model_name = "EleutherAI/gpt-neo-125m"
model_name = "google/gemma-2b-it"
float_model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True
)
dataloader = get_dataloader(
tokenizer,
seqlen=2048
)
woq_conf = PostTrainingQuantConfig(
approach="weight_only",
op_type_dict={
".*": { # match all ops
"weight": {
"dtype": "int",
"bits": 4,
"algorithm": "AUTOROUND",
},
}
},
)
quantized_model = fit(
model=float_model,
conf=woq_conf,
calib_dataloader=dataloader
)
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Map: 0%| | 0/10000 [00:00<?, ? examples/s]
2024-05-31 22:52:35 [INFO] Start auto tuning.
2024-05-31 22:52:35 [INFO] Quantize model without tuning!
2024-05-31 22:52:35 [INFO] Quantize the model with default configuration without evaluating the model. To perform the tuning process, please either provide an eval_func or provide an eval_dataloader an eval_metric.
2024-05-31 22:52:35 [INFO] Adaptor has 5 recipes.
2024-05-31 22:52:35 [INFO] 0 recipes specified by user.
2024-05-31 22:52:35 [INFO] 3 recipes require future tuning.
2024-05-31 22:52:36 [INFO] *** Initialize auto tuning
2024-05-31 22:52:36 [INFO] {
2024-05-31 22:52:36 [INFO] 'PostTrainingQuantConfig': {
2024-05-31 22:52:36 [INFO] 'AccuracyCriterion': {
2024-05-31 22:52:36 [INFO] 'criterion': 'relative',
2024-05-31 22:52:36 [INFO] 'higher_is_better': True,
2024-05-31 22:52:36 [INFO] 'tolerable_loss': 0.01,
2024-05-31 22:52:36 [INFO] 'absolute': None,
2024-05-31 22:52:36 [INFO] 'keys': <bound method AccuracyCriterion.keys of <neural_compressor.config.AccuracyCriterion object at 0x7f313bffde90>>,
2024-05-31 22:52:36 [INFO] 'relative': 0.01
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'approach': 'post_training_weight_only',
2024-05-31 22:52:36 [INFO] 'backend': 'default',
2024-05-31 22:52:36 [INFO] 'calibration_sampling_size': [
2024-05-31 22:52:36 [INFO] 100
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'device': 'cpu',
2024-05-31 22:52:36 [INFO] 'diagnosis': False,
2024-05-31 22:52:36 [INFO] 'domain': 'auto',
2024-05-31 22:52:36 [INFO] 'example_inputs': 'Not printed here due to large size tensors...',
2024-05-31 22:52:36 [INFO] 'excluded_precisions': [
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'framework': 'pytorch_fx',
2024-05-31 22:52:36 [INFO] 'inputs': [
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'model_name': '',
2024-05-31 22:52:36 [INFO] 'ni_workload_name': 'quantization',
2024-05-31 22:52:36 [INFO] 'op_name_dict': None,
2024-05-31 22:52:36 [INFO] 'op_type_dict': {
2024-05-31 22:52:36 [INFO] '.*': {
2024-05-31 22:52:36 [INFO] 'weight': {
2024-05-31 22:52:36 [INFO] 'dtype': [
2024-05-31 22:52:36 [INFO] 'int'
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'bits': [
2024-05-31 22:52:36 [INFO] 4
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'algorithm': [
2024-05-31 22:52:36 [INFO] 'AUTOROUND'
2024-05-31 22:52:36 [INFO] ]
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'outputs': [
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'quant_format': 'default',
2024-05-31 22:52:36 [INFO] 'quant_level': 'auto',
2024-05-31 22:52:36 [INFO] 'recipes': {
2024-05-31 22:52:36 [INFO] 'smooth_quant': False,
2024-05-31 22:52:36 [INFO] 'smooth_quant_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'layer_wise_quant': False,
2024-05-31 22:52:36 [INFO] 'layer_wise_quant_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'fast_bias_correction': False,
2024-05-31 22:52:36 [INFO] 'weight_correction': False,
2024-05-31 22:52:36 [INFO] 'gemm_to_matmul': True,
2024-05-31 22:52:36 [INFO] 'graph_optimization_level': None,
2024-05-31 22:52:36 [INFO] 'first_conv_or_matmul_quantization': True,
2024-05-31 22:52:36 [INFO] 'last_conv_or_matmul_quantization': True,
2024-05-31 22:52:36 [INFO] 'pre_post_process_quantization': True,
2024-05-31 22:52:36 [INFO] 'add_qdq_pair_to_weight': False,
2024-05-31 22:52:36 [INFO] 'optypes_to_exclude_output_quant': [
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'dedicated_qdq_pair': False,
2024-05-31 22:52:36 [INFO] 'rtn_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'awq_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'gptq_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'teq_args': {
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'autoround_args': {
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'reduce_range': None,
2024-05-31 22:52:36 [INFO] 'TuningCriterion': {
2024-05-31 22:52:36 [INFO] 'max_trials': 100,
2024-05-31 22:52:36 [INFO] 'objective': [
2024-05-31 22:52:36 [INFO] 'performance'
2024-05-31 22:52:36 [INFO] ],
2024-05-31 22:52:36 [INFO] 'strategy': 'basic',
2024-05-31 22:52:36 [INFO] 'strategy_kwargs': None,
2024-05-31 22:52:36 [INFO] 'timeout': 0
2024-05-31 22:52:36 [INFO] },
2024-05-31 22:52:36 [INFO] 'use_bf16': True
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [INFO] }
2024-05-31 22:52:36 [WARNING] [Strategy] Please install `mpi4py` correctly if using distributed tuning; otherwise, ignore this warning.
2024-05-31 22:52:36 [INFO] Pass query framework capability elapsed time: 6.51 ms
2024-05-31 22:52:36 [INFO] Do not evaluate the baseline and quantize the model with default configuration.
2024-05-31 22:52:36 [INFO] Quantize the model with default config.
2024-05-31 22:52:36 [INFO] All algorithms to do: {'AUTOROUND'}
2024-05-31 22:52:36 [INFO] quantizing with the AutoRound algorithm
2024-05-31 22:52:36 INFO utils.py L570: Using GPU device
2024-05-31 22:52:38 INFO autoround.py L465: using torch.float16 for quantization tuning
2024-05-31 22:54:14 INFO autoround.py L981: quantizing 1/18, layers.0
2024-05-31 22:55:27 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.042549 -> iter 195: 0.008586
2024-05-31 22:55:36 INFO autoround.py L981: quantizing 2/18, layers.1
2024-05-31 22:56:48 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.012353 -> iter 188: 0.003782
2024-05-31 22:56:58 INFO autoround.py L981: quantizing 3/18, layers.2
2024-05-31 22:58:08 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.003564 -> iter 183: 0.001639
2024-05-31 22:58:18 INFO autoround.py L981: quantizing 4/18, layers.3
2024-05-31 22:59:29 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001812 -> iter 168: 0.000810
2024-05-31 22:59:38 INFO autoround.py L981: quantizing 5/18, layers.4
2024-05-31 23:00:49 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001076 -> iter 168: 0.000581
2024-05-31 23:00:59 INFO autoround.py L981: quantizing 6/18, layers.5
2024-05-31 23:02:10 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000886 -> iter 186: 0.000537
2024-05-31 23:02:20 INFO autoround.py L981: quantizing 7/18, layers.6
2024-05-31 23:03:30 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000899 -> iter 191: 0.000483
2024-05-31 23:03:40 INFO autoround.py L981: quantizing 8/18, layers.7
2024-05-31 23:04:51 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.000975 -> iter 139: 0.000531
2024-05-31 23:05:01 INFO autoround.py L981: quantizing 9/18, layers.8
2024-05-31 23:06:12 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001110 -> iter 177: 0.000699
2024-05-31 23:06:22 INFO autoround.py L981: quantizing 10/18, layers.9
2024-05-31 23:07:32 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001358 -> iter 197: 0.000854
2024-05-31 23:07:42 INFO autoround.py L981: quantizing 11/18, layers.10
2024-05-31 23:08:52 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.001754 -> iter 178: 0.001185
2024-05-31 23:09:03 INFO autoround.py L981: quantizing 12/18, layers.11
2024-05-31 23:10:13 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.002099 -> iter 168: 0.001497
2024-05-31 23:10:23 INFO autoround.py L981: quantizing 13/18, layers.12
2024-05-31 23:11:35 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.002648 -> iter 194: 0.001862
2024-05-31 23:11:45 INFO autoround.py L981: quantizing 14/18, layers.13
2024-05-31 23:12:55 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.003757 -> iter 106: 0.002598
2024-05-31 23:13:06 INFO autoround.py L981: quantizing 15/18, layers.14
2024-05-31 23:14:17 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.007845 -> iter 105: 0.004465
2024-05-31 23:14:27 INFO autoround.py L981: quantizing 16/18, layers.15
2024-05-31 23:15:37 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.010025 -> iter 126: 0.006707
2024-05-31 23:15:47 INFO autoround.py L981: quantizing 17/18, layers.16
2024-05-31 23:16:58 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.015236 -> iter 160: 0.010168
2024-05-31 23:17:08 INFO autoround.py L981: quantizing 18/18, layers.17
2024-05-31 23:18:18 INFO autoround.py L935: quantized 7/7 layers in the block, loss iter 0: 0.018019 -> iter 193: 0.012850
2024-05-31 23:18:29 INFO autoround.py L1096: quantization tuning time 1551.1706745624542
2024-05-31 23:18:29 INFO autoround.py L1112: Summary: quantized 126/126 in the model
2024-05-31 23:18:30 [INFO] |******Mixed Precision Statistics******|
2024-05-31 23:18:30 [INFO] +------------+---------+---------------+
2024-05-31 23:18:30 [INFO] | Op Type | Total | A32W4G32 |
2024-05-31 23:18:30 [INFO] +------------+---------+---------------+
2024-05-31 23:18:30 [INFO] | Linear | 126 | 126 |
2024-05-31 23:18:30 [INFO] +------------+---------+---------------+
2024-05-31 23:18:30 [INFO] Pass quantize model elapsed time: 1554793.01 ms
2024-05-31 23:18:30 [INFO] Save tuning history to /home/kubwa/kubwai/15-Huggingface/06_Optimzation/nc_workspace/2024-05-31_22-51-42/./history.snapshot.
2024-05-31 23:18:30 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-05-31 23:18:30 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-05-31 23:18:30 [INFO] Save deploy yaml to /home/kubwa/kubwai/15-Huggingface/06_Optimzation/nc_workspace/2024-05-31_22-51-42/deploy.yaml
%pip install torchvision
from torchvision import models
from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor.data import DataLoader, Datasets
from neural_compressor.quantization import fit
float_model = models.resnet18()
dataset = Datasets("pytorch")["dummy"](shape=(1, 3, 224, 224))
calib_dataloader = DataLoader(
framework="pytorch",
dataset=dataset
)
static_quant_conf = PostTrainingQuantConfig()
quantized_model = fit(
model=float_model,
conf=static_quant_conf,
calib_dataloader=calib_dataloader
)
2024-05-31 23:31:52 [INFO] Start auto tuning.
2024-05-31 23:31:52 [INFO] Quantize model without tuning!
2024-05-31 23:31:52 [INFO] Quantize the model with default configuration without evaluating the model. To perform the tuning process, please either provide an eval_func or provide an eval_dataloader an eval_metric.
2024-05-31 23:31:52 [INFO] Adaptor has 5 recipes.
2024-05-31 23:31:52 [INFO] 0 recipes specified by user.
2024-05-31 23:31:52 [INFO] 3 recipes require future tuning.
2024-05-31 23:31:52 [INFO] *** Initialize auto tuning
2024-05-31 23:31:52 [INFO] {
2024-05-31 23:31:52 [INFO] 'PostTrainingQuantConfig': {
2024-05-31 23:31:52 [INFO] 'AccuracyCriterion': {
2024-05-31 23:31:52 [INFO] 'criterion': 'relative',
2024-05-31 23:31:52 [INFO] 'higher_is_better': True,
2024-05-31 23:31:52 [INFO] 'tolerable_loss': 0.01,
2024-05-31 23:31:52 [INFO] 'absolute': None,
2024-05-31 23:31:52 [INFO] 'keys': <bound method AccuracyCriterion.keys of <neural_compressor.config.AccuracyCriterion object at 0x7ff3c1b59a50>>,
2024-05-31 23:31:52 [INFO] 'relative': 0.01
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'approach': 'post_training_static_quant',
2024-05-31 23:31:52 [INFO] 'backend': 'default',
2024-05-31 23:31:52 [INFO] 'calibration_sampling_size': [
2024-05-31 23:31:52 [INFO] 100
2024-05-31 23:31:52 [INFO] ],
2024-05-31 23:31:52 [INFO] 'device': 'cpu',
2024-05-31 23:31:52 [INFO] 'diagnosis': False,
2024-05-31 23:31:52 [INFO] 'domain': 'auto',
2024-05-31 23:31:52 [INFO] 'example_inputs': 'Not printed here due to large size tensors...',
2024-05-31 23:31:52 [INFO] 'excluded_precisions': [
2024-05-31 23:31:52 [INFO] ],
2024-05-31 23:31:52 [INFO] 'framework': 'pytorch_fx',
2024-05-31 23:31:52 [INFO] 'inputs': [
2024-05-31 23:31:52 [INFO] ],
2024-05-31 23:31:52 [INFO] 'model_name': '',
2024-05-31 23:31:52 [INFO] 'ni_workload_name': 'quantization',
2024-05-31 23:31:52 [INFO] 'op_name_dict': None,
2024-05-31 23:31:52 [INFO] 'op_type_dict': None,
2024-05-31 23:31:52 [INFO] 'outputs': [
2024-05-31 23:31:52 [INFO] ],
2024-05-31 23:31:52 [INFO] 'quant_format': 'default',
2024-05-31 23:31:52 [INFO] 'quant_level': 'auto',
2024-05-31 23:31:52 [INFO] 'recipes': {
2024-05-31 23:31:52 [INFO] 'smooth_quant': False,
2024-05-31 23:31:52 [INFO] 'smooth_quant_args': {
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'layer_wise_quant': False,
2024-05-31 23:31:52 [INFO] 'layer_wise_quant_args': {
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'fast_bias_correction': False,
2024-05-31 23:31:52 [INFO] 'weight_correction': False,
2024-05-31 23:31:52 [INFO] 'gemm_to_matmul': True,
2024-05-31 23:31:52 [INFO] 'graph_optimization_level': None,
2024-05-31 23:31:52 [INFO] 'first_conv_or_matmul_quantization': True,
2024-05-31 23:31:52 [INFO] 'last_conv_or_matmul_quantization': True,
2024-05-31 23:31:52 [INFO] 'pre_post_process_quantization': True,
2024-05-31 23:31:52 [INFO] 'add_qdq_pair_to_weight': False,
2024-05-31 23:31:52 [INFO] 'optypes_to_exclude_output_quant': [
2024-05-31 23:31:52 [INFO] ],
2024-05-31 23:31:52 [INFO] 'dedicated_qdq_pair': False,
2024-05-31 23:31:52 [INFO] 'rtn_args': {
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'awq_args': {
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'gptq_args': {
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'teq_args': {
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'autoround_args': {
2024-05-31 23:31:52 [INFO] }
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'reduce_range': None,
2024-05-31 23:31:52 [INFO] 'TuningCriterion': {
2024-05-31 23:31:52 [INFO] 'max_trials': 100,
2024-05-31 23:31:52 [INFO] 'objective': [
2024-05-31 23:31:52 [INFO] 'performance'
2024-05-31 23:31:52 [INFO] ],
2024-05-31 23:31:52 [INFO] 'strategy': 'basic',
2024-05-31 23:31:52 [INFO] 'strategy_kwargs': None,
2024-05-31 23:31:52 [INFO] 'timeout': 0
2024-05-31 23:31:52 [INFO] },
2024-05-31 23:31:52 [INFO] 'use_bf16': True
2024-05-31 23:31:52 [INFO] }
2024-05-31 23:31:52 [INFO] }
2024-05-31 23:31:52 [WARNING] [Strategy] Please install `mpi4py` correctly if using distributed tuning; otherwise, ignore this warning.
/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/ao/quantization/fx/fuse.py:56: UserWarning: Passing a fuse_custom_config_dict to fuse is deprecated and will not be supported in a future version. Please pass in a FuseCustomConfig instead.
warnings.warn(
2024-05-31 23:31:52 [INFO] Attention Blocks: 0
2024-05-31 23:31:52 [INFO] FFN Blocks: 0
2024-05-31 23:31:52 [INFO] Pass query framework capability elapsed time: 119.76 ms
2024-05-31 23:31:52 [INFO] Do not evaluate the baseline and quantize the model with default configuration.
2024-05-31 23:31:52 [INFO] Quantize the model with default config.
2024-05-31 23:31:53 [INFO] |******Mixed Precision Statistics******|
2024-05-31 23:31:53 [INFO] +----------------------+-------+-------+
2024-05-31 23:31:53 [INFO] | Op Type | Total | INT8 |
2024-05-31 23:31:53 [INFO] +----------------------+-------+-------+
2024-05-31 23:31:53 [INFO] | quantize_per_tensor | 1 | 1 |
2024-05-31 23:31:53 [INFO] | ConvReLU2d | 9 | 9 |
2024-05-31 23:31:53 [INFO] | MaxPool2d | 1 | 1 |
2024-05-31 23:31:53 [INFO] | Conv2d | 11 | 11 |
2024-05-31 23:31:53 [INFO] | add_relu | 8 | 8 |
2024-05-31 23:31:53 [INFO] | AdaptiveAvgPool2d | 1 | 1 |
2024-05-31 23:31:53 [INFO] | flatten | 1 | 1 |
2024-05-31 23:31:53 [INFO] | Linear | 1 | 1 |
2024-05-31 23:31:53 [INFO] | dequantize | 1 | 1 |
2024-05-31 23:31:53 [INFO] +----------------------+-------+-------+
2024-05-31 23:31:53 [INFO] Pass quantize model elapsed time: 1033.35 ms
2024-05-31 23:31:53 [INFO] Save tuning history to /home/kubwa/kubwai/15-Huggingface/06_Optimzation/nc_workspace/2024-05-31_23-31-49/./history.snapshot.
2024-05-31 23:31:53 [INFO] [Strategy] Found the model meets accuracy requirements, ending the tuning process.
2024-05-31 23:31:53 [INFO] Specified timeout or max trials is reached! Found a quantized model which meet accuracy goal. Exit.
2024-05-31 23:31:53 [INFO] Save deploy yaml to /home/kubwa/kubwai/15-Huggingface/06_Optimzation/nc_workspace/2024-05-31_23-31-49/deploy.yaml