Skip to content

Qwen2.5VL 3B calibration using awq takes too long #433

@kritohyh

Description

@kritohyh

For a 1k data set, when batchsize=16, the duration is >13h. But llm-compressor awq calibration takes <1h. May I ask what is the reason?
I think one possible reason is that the performance of qwen2.5vl attn backend using sdpa is slower. Are there other factors?

Hardware platform: H20 * 1
input token numbers (text+image) is ≈ 200

awq yaml config:

base:
    seed: &seed 42
model:
    type: Qwen2_5VL
    path: xxx
    tokenizer_mode: slow
    torch_dtype: auto
calib:
    name: custom_mm
    download: False
    path: xxx
    apply_chat_template: True
    n_samples: 960
    bs: 16
    seq_len: 512
    padding: True
    seed: *seed

quant:
    method: Awq
    weight:
        bit: 4
        symmetric: False
        granularity: per_group
        group_size: 64
        # Available options: ['gemm_pack']
        pack_version: gemm_pack
    special:
        trans: True
        trans_version: v2
        weight_clip: True
        do_gqa_trans: True
    quant_out: False
save:
    save_mlcllm: True
    save_fake: True
    save_path: xxx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions