用DeepSeek-R1-Distill-data-110k蒸馏中文数据集-微调Qwen2.5-7B-Instruct

2025-03-04 约 398 字预计阅读 1 分钟

https://bing.ee123.net/img/rand?artid=146007768

用DeepSeek-R1-Distill-data-110k蒸馏中文数据集微调Qwen2.5-7B-Instruct！

下载模型与数据
模型下载：
huggingface：
[Qwen/Qwen2.5-7B-Instruct · HF Mirror
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
“Qwen/Qwen2.5-7B-Instruct · HF Mirror”)
魔搭：
[魔搭社区
汇聚各领域最先进的机器学习模型，提供模型探索体验、推理、训练、部署和应用的一站式服务。
“魔搭社区”)
数据下载：
[https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k
“ )

安装swift

使用 pip 安装：

pip install ms-swift -U

从源安装：

# pip install git+https://github.com/modelscope/ms-swift.git

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

微调

CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
    --model /home/models/pretrained_models/llm/Qwen2.5-7B-Instruct \ 
    --train_type lora \
    --dataset  /home/data/Chinese-DeepSeek-R1-Distill-data-110k-SFT/new_distill_r1_110k_sft.json \
    --torch_dtype bfloat16 \
    --num_train_epochs 6 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 5 \
    --logging_steps 5 \
    --output_dir output \
    --system 'You are a deep thinking assistant.' \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --model_author Q \                                 
    --model_name Q-AILab-Qwen2.5-7B-Instruct-R1-Distill

训练过程
2张A800，训练时长5天，共训练6轮。

推理效果

推理：

CUDA_VISIBLE_DEVICES=0,1 \
swift infer \
    --adapters /home/model/swift/output/v6-20250217-075043/checkpoint-50 \
    --stream true \
    --temperature 0 \
    --max_new_tokens 8192

推理测试：

后续合并Loar、断点训练、推送模型、可参考Swift github项目地址：

[https://github.com/modelscope/ms-swift

“ )

目录

用DeepSeek-R1-Distill-data-110k蒸馏中文数据集-微调Qwen2.5-7B-Instruct

用DeepSeek-R1-Distill-data-110k蒸馏中文数据集 微调Qwen2.5-7B-Instruct！

下载模型与数据

安装swift

微调

训练过程

推理效果

推理测试：

后续合并Loar、断点训练、推送模型、可参考Swift github项目地址：

用DeepSeek-R1-Distill-data-110k蒸馏中文数据集微调Qwen2.5-7B-Instruct！