Skip to content

v0.3.03

Latest

Choose a tag to compare

@xming521 xming521 released this 04 Jan 14:40
· 5 commits to master since this release
0379096

🎉 What's Changed

The key highlights of this update include an upgrade to Python 3.12 and optimization of the dataset pipeline.

Dependency and Environment Updates:

  • Upgraded the required Python version to 3.12 in pyproject.toml and development settings, and updated the target version for linting and type checking to Python 3.12. [1] [2] [3]
  • Updated dependencies: switched from a git-based install of llamafactory to a fixed version, added torchdata and torchaudio with CUDA 12.6 support, and refined platform-specific dependency markers for PyTorch packages. [1] [2]

Data

  • Added the "<begin_chat>" marker in user messages, allowing for improved context in conversation flows.
  • Updated the qa_generator.py to include a new mechanism for managing chat member relationships, allowing the addition of contextual information about the relationship between users in conversations.
  • Refactored the CSV loading function to support loading user relationship data from a users.json file, improving the context provided during QA generation.
  • Added a new configuration option add_relation to the dataset settings, enabling users to toggle this feature.

others

  • Introduces OnlineLLM with thread‑pooled batch chat and optional JSON‑guided decoding; unifies JSON parsing across vLLM and OpenAI results.

New Contributors

Full Changelog: v0.3.02...v0.3.03

😊 更新内容

本次更新核心亮点包括升级至Python 3.12以及数据集管线优化。

依赖与环境更新:

  • pyproject.toml和开发配置中将Python版本升级至3.12。
  • 依赖项更新:将llamafactory从基于git的安装方式改为固定版本,新增支持CUDA 12.6的torchdatatorchaudio,并优化了PyTorch包的平台特定依赖标记。

数据处理

  • 新增"<begin_chat>"标记,以提升对话流程的上下文连贯性
  • 更新qa_generator.py,新增聊天成员关系管理机制,支持在对话中添加用户间关系的上下文信息
  • 重构CSV加载函数,支持从users.json文件加载用户关系数据,增强问答生成时的上下文信息
  • 在数据集配置中新增add_relation选项,允许用户自主启用/禁用此功能

其他

  • 引入支持线程池批量聊天和可选JSON引导解码的OnlineLLM;统一了vLLM与OpenAI结果的JSON解析流程。