Fix tensor device mismatch in `_eval_polarisable_electrode` when training with CUDA by Copilot · Pull Request #5 · ChiahsinChu/ec-MLP

Copilot · 2026-03-23T07:40:04Z

dp train --init-frz-model crashes with a RuntimeError when CUDA is available because setup_from_lammps creates tensors on torch_admp.env.DEVICE (cuda:0), while t_positions, t_box, t_charges, and the neighbor list tensors were all created on CPU — causing a device conflict inside charge_optimization at the torch.where(electrode_mask, charges, ...) call.

Changes

_eval_polarisable_electrode in dipole_charge_electrode.py:
- Call setup_from_lammps first and derive device from input_data[0].device
- Move self.calculator and self.er to device (no-op if already co-located)
- Create t_positions, t_box, t_charges on device via torch.as_tensor(..., device=device) so vesin_nblist returns neighbor-list tensors on the same device automatically

# Before: tensors defaulted to CPU regardless of env.DEVICE
t_positions = torch.tensor(positions.reshape(-1, 3), requires_grad=True)  # CPU
input_data = setup_from_lammps(...)  # tensors on env.DEVICE (cuda:0)
# → RuntimeError inside charge_optimization

# After: device aligned before any tensor construction
input_data = setup_from_lammps(...)
device = input_data[0].device
self.calculator.to(device)
self.er.to(device)
t_positions = torch.as_tensor(positions.reshape(-1, 3), device=device).requires_grad_(True)

Original prompt

This section details on the original issue you should resolve

<issue_title>[Bug] RuntimeError: Expected all tensors to be on the same device (cuda:0 and cpu) during dp train --init-frz-model</issue_title>
<issue_description>Bug summary

A RuntimeError regarding tensor device mismatch occurs when executing dp train --init-frz-model. The error is triggered inside torch_admp/electrode.py during the modify_data phase called by the ec-MLP modifier. It seems electrode_mask and charges are located on different devices (one on cuda:0 and one on cpu) when computing modified_charges.

DeePMD-kit Version
3.1.4.dev2+ga60e90be6

ec-MLP version:
1.1.0.dev5+g2c7383b5b

torch-admp version:
1.1.5

How did you download the software?
Built from source

Input Files, Running Commands, Error Log, etc.
issue.tar.gz

Steps to Reproduce
dp train ec-mlp.json --init-frz-model ec-mlp.pb

</issue_description>

<agent_instructions>Any suggestion about this issue?</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes [Bug] RuntimeError: Expected all tensors to be on the same device (cuda:0 and cpu) during dp train --init-frz-model #4

⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

ChiahsinChu · 2026-03-23T07:40:52Z

Here is the error log:

Traceback (most recent call last):
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/bin/dp", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/main.py", line 1052, in main
    deepmd_main(args)
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/tf/entrypoints/main.py", line 68, in main
    train_dp(**dict_args)
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/tf/entrypoints/train.py", line 189, in train
    _do_work(jdata, run_opt, is_compress)
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/tf/entrypoints/train.py", line 297, in _do_work
    model.train(train_data, valid_data)
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/tf/train/trainer.py", line 533, in train
    datasetloader = DatasetLoader(train_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/tf/train/trainer.py", line 1003, in __init__
    batch_data = self.train_data.get_batch()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/utils/data_system.py", line 460, in get_batch
    b_data = self.get_batch_standard(sys_idx)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/utils/data_system.py", line 487, in get_batch_standard
    b_data = self.data_systems[self.pick_idx].get_batch(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/utils/data.py", line 302, in get_batch
    self._load_batch_set(self.dirs[self.set_count % self.get_numb_set()])
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/deepmd/utils/data.py", line 594, in _load_batch_set
    self.modifier.modify_data(self.batch_set, self)
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/ec_mlp/tf/modifier/dipole_charge_electrode.py", line 284, in modify_data
    tot_e, tot_f, tot_v = self.eval(coord, box, atype)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/ec_mlp/tf/modifier/dipole_charge_electrode.py", line 185, in eval
    e, f, v = self._eval_polarisable_electrode(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/ec_mlp/tf/modifier/dipole_charge_electrode.py", line 96, in _eval_polarisable_electrode
    _q_opt, _efield = charge_optimization(
                      ^^^^^^^^^^^^^^^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/torch_admp/electrode.py", line 489, in charge_optimization
    chi_elec, _energy = calculator.calc_coulomb_potential(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/public/groups/chenggroup/jpqiu/conda/ec-MLP/lib/python3.11/site-packages/torch_admp/electrode.py", line 66, in calc_coulomb_potential
    modified_charges = torch.where(electrode_mask == 0, charges, 0.0)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

codecov · 2026-03-23T07:44:17Z

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

…training Co-authored-by: ChiahsinChu <53895049+ChiahsinChu@users.noreply.github.com> Agent-Logs-Url: https://github.com/ChiahsinChu/ec-MLP/sessions/92e40153-2342-4caf-a27b-626c06c5aea4

Initial plan

917b647

Copilot AI assigned Copilot and ChiahsinChu Mar 23, 2026

Copilot started work on behalf of ChiahsinChu March 23, 2026 07:40 View session

fix: resolve device mismatch in _eval_polarisable_electrode for CUDA …

7ebbad0

…training Co-authored-by: ChiahsinChu <53895049+ChiahsinChu@users.noreply.github.com> Agent-Logs-Url: https://github.com/ChiahsinChu/ec-MLP/sessions/92e40153-2342-4caf-a27b-626c06c5aea4

Copilot AI changed the title ~~[WIP] Fix RuntimeError on tensor device mismatch during dp train~~ Fix tensor device mismatch in _eval_polarisable_electrode when training with CUDA Mar 23, 2026

Copilot AI requested a review from ChiahsinChu March 23, 2026 07:50

Copilot finished work on behalf of ChiahsinChu March 23, 2026 07:50

ChiahsinChu closed this Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tensor device mismatch in `_eval_polarisable_electrode` when training with CUDA#5

Fix tensor device mismatch in `_eval_polarisable_electrode` when training with CUDA#5
Copilot wants to merge 2 commits into
masterfrom
copilot/fix-runtimeerror-tensor-device-mismatch

Copilot AI commented Mar 23, 2026 •

edited

Loading

Uh oh!

ChiahsinChu commented Mar 23, 2026

Uh oh!

codecov Bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Comments on the Issue (you are @copilot in this section)

Uh oh!

ChiahsinChu commented Mar 23, 2026

Uh oh!

codecov Bot commented Mar 23, 2026

⚠️ JUnit XML file not found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 23, 2026 •

edited

Loading