之前我们强制使用了在4bit的模式下保存,进行测试的时候发现并没有微调成功,感觉就像是lora参数被强制删除了一样,很奇怪的是在保存后的模型再次加上lora参数后就可以是微调后的输出了。
我们可以采用两种方案,一种是直接使用fp16精度的基座模型进行保存,但是需要在导入模型的时候不能是本地的4bit模型,第二种是只保存lora参数,加载基座模型的时候同时将lora参数进行加载,这样也能够实现微调后的效果。
1.保存模型
model.save_pretrained_merged("ffmpeg_log_analyze_model", tokenizer,) # 第一种方案
model.save_pretrained("lora_model") # 第二种方案,当然你也可以都保存
tokenizer.save_pretrained("lora_model")
2.再次用unsloth加载模型
import os
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = False
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "lora_model",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
3.进行测试
FastLanguageModel.for_inference(model)
messages = [
{"role": "user", "content": "Analyze the FFmpeg video transcoding log below and provide the transcoding status, PSNR value, any detected error message, and suggested resolution steps.ffmpeg -i input.mp4 -c:v libx264 -crf 23 -f null -\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 FFmpeg developers\nInput #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':\n Duration: 00:00:10.00, start: 0.000000, bitrate: 1000 kb/s\n Stream #0:0: Video: h264 (High), yuv420p, 1280x720, 25 fps\nOutput #0, null, to 'pipe:1':\n Metadata:\n encoder : Lavf58.76.100\n Stream #0:0: Video: h264 (libx264), yuv420p, 1280x720, q=-1--1, 25 fps\nStream mapping:\n Stream #0:0 -> #0:0 (h264 (native) -> libx264 (libx264))\nPress [q] to stop, [?] for help.\nframe= 250 fps= 25 q=28.0 size=N/A time=00:00:10.00 bitrate=N/A speed=1.00x\nvideo:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_token_id = tokenizer.eos_token_id)
{"successful": true, "psnr": 47.34, "error_message": "", "resolution_steps": "No action required; transcoding completed successfully."}<|eot_id|>
FastLanguageModel.for_inference(model)
messages = [
{"role": "user", "content": "Analyze the FFmpeg video transcoding log below and provide the transcoding status, PSNR value, any detected error message, and suggested resolution steps.ffmpeg -i input.mp4 -c:v copy -b:v 1G -c:a copy output_large_bitrate.mp4\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 FFmpeg developers\n[mp4 @ 0x...] Value 1000000000 for parameter 'video_bit_rate' is out of range [-2147483648 - 2147483647]\nCould not write header for output file 'output_large_bitrate.mp4': Invalid argument\nConversion failed!"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt = True,
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 128, pad_token_id = tokenizer.eos_token_id)
{"successful": false, "psnr": 0.00, "error_message": "Invalid video bitrate value (out of range or invalid format).", "resolution_steps": "Check the FFmpeg documentation for valid bitrate value ranges and formats. Try specifying the bitrate in a more explicit format, such as '1G' for 1 GB or '1000000' for 1 MB."}<|eot_id|>
可以看到,使用unsloth加载模型后的效果达到预期,成功返回了格式化的数据。
4.使用vllm启动服务
为了使模型能用应用,我们需要将启动模型的服务,使其能够用api进行访问。我们使用vllm来启动这个服务。
命令行输入(不使用lora)
vllm serve ./ffmpeg_log_analyze_model \
--host 0.0.0.0 \
--port 11111 \
--max-model-len=32k \
--gpu_memory_utilization=0.90 \
--tensor-parallel-size 1 \
--trust-remote-code
命令行输入(使用lora)
vllm serve ./Meta-Llama-3.1-8B-Instruct-bnb-4bit \
--enable-lora \
--lora-modules ffmpeg-analyzer=./lora_model \
--host 0.0.0.0 \
--port 11111 \
--max-model-len=8k \
--gpu_memory_utilization=0.90 \
--tensor-parallel-size 1 \
--trust-remote-code
我们使用curl来进行测试
curl http://127.0.0.1:11111/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ffmpeg-analyzer",
"messages": [
{"role": "user", "content": "Analyze the FFmpeg video transcoding log below and provide the transcoding status, PSNR value, any detected error message, and suggested resolution steps via json format. the json col name should be: transcode-status, psnr, error_msg, suggested_resolution_steps. Analyze the FFmpeg video transcoding log below and provide the transcoding status, PSNR value, any detected error message, and suggested resolution steps.ffmpeg -i input.mp4 -c:v copy -b:v 1G -c:a copy output_large_bitrate.mp4\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 FFmpeg developers\n[mp4 @ 0x...] Value 1000000000 for parameter 'video_bit_rate' is out of range [-2147483648 - 2147483647]\nCould not write header for output file 'output_large_bitrate.mp4': Invalid argument\nConversion failed!"}
],
"temperature": 0.1,
"max_tokens": 500
}'
{"id":"chatcmpl-9d6e0da75ce24de8abeab9e4cf81bc1a","object":"chat.completion","created":1756950115,"model":"ffmpeg-analyzer","choices":[{"index":0,"message":{"role":"assistant","content":"{\"transcode-status\": false, \"psnr\": 0.00, \"error_msg\": \"Invalid argument for video bitrate. Value out of range.\", \"suggested_resolution_steps\": \"Check the specified bitrate value. Ensure it's within the valid range for the chosen codec. If using a custom bitrate, verify the unit (e.g., bits, bytes) and the codec's capabilities.\"}","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":238,"total_tokens":320,"completion_tokens":82,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}
curl http://127.0.0.1:11111/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ffmpeg-analyzer",
"messages": [
{"role": "user", "content": "Analyze the FFmpeg video transcoding log below and provide the transcoding status, PSNR value, any detected error message, and suggested resolution steps via json format. the json col name should be: transcode-status, psnr, error_msg, suggested_resolution_steps. ffmpeg -i input.mp4 -c:v libx264 -crf 23 -f null -\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 FFmpeg developers\nInput #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':\n Duration: 00:00:10.00, start: 0.000000, bitrate: 1000 kb/s\n Stream #0:0: Video: h264 (High), yuv420p, 1280x720, 25 fps\nOutput #0, null, to 'pipe:1':\n Metadata:\n encoder : Lavf58.76.100\n Stream #0:0: Video: h264 (libx264), yuv420p, 1280x720, q=-1--1, 25 fps\nStream mapping:\n Stream #0:0 -> #0:0 (h264 (native) -> libx264 (libx264))\nPress [q] to stop, [?] for help.\nframe= 250 fps= 25 q=28.0 size=N/A time=00:00:10.00 bitrate=N/A speed=1.00x\nvideo:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%"}
],
"temperature": 0.1,
"max_tokens": 500
}'
{"id":"chatcmpl-b1bf8f8e2e3c4bc4bf146ffe2b95b8d9","object":"chat.completion","created":1756950224,"model":"ffmpeg-analyzer","choices":[{"index":0,"message":{"role":"assistant","content":"{\"transcode-status\": \"Success\", \"psnr\": 48.23, \"error_msg\": \"\", \"suggested_resolution_steps\": \"\"}","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":402,"total_tokens":433,"completion_tokens":31,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}
代码中标红的部分即为模型返回的内容,可以看到模型确实返回了结构化的内容(\”代表转义符号),但是跟之前我们微调的内容好像不一样了,返回的结构化数据的键不一样了,但是内容还是一致的。明明使用unsloth加载模型就是正确的,但是使用vllm进行部署的时候,返回的结构化数据却不一致了,我们希望的返回应该是”{\n \”successful\”: true,\n \”psnr_value\”: 31.90,\n \”error_message\”: \”\”,\n \”resolution_steps\”: \”Output file generated as expected. No further action needed.\”\n}”。我推测应该是lora进行微调的参数是fp16的,但是模型却是4bit的模型,导致参数偏差,所以如果使用lora参数,应该选择fp16精度的模型。
当然还有其他解决方案,通过给定提示词来更改对应的键,或者直接写一个转换代码,将输出内容转换为我们需要的内容,保持系统的一个稳定性。