Skip to content

Issue 7310 - On LMDB an export loops indefinitely if the suffix entry…#7311

Open
tbordaz wants to merge 1 commit into389ds:mainfrom
tbordaz:issue_7310
Open

Issue 7310 - On LMDB an export loops indefinitely if the suffix entry…#7311
tbordaz wants to merge 1 commit into389ds:mainfrom
tbordaz:issue_7310

Conversation

@tbordaz
Copy link
Copy Markdown
Contributor

@tbordaz tbordaz commented Mar 5, 2026

… has not the entryid=1

Bug description:
On LMDB there is a strong requirement that the suffix entry has the entry-id=1.
In the case replication is enabled on a backend before the suffix entry exists
it creates the RUV as entry-id=1 and so the suffix becomes entry-id=2.
One of the symptom is an infinite loop when exporting the suffix
#0 slapi_ch_array_add_ext

		#1  0x00007f24e56a36eb in slapi_rdn_add_rdn_to_all_rdns
		#2  0x00007f24e14ea99e in entryrdn_lookup_dn
		#3  0x00007f24e154b3a6 in dbmdb_db2ldif
		#4  0x00007f24e150294c in ldbm_back_ldbm2ldif
		#5  0x00005587335ff56f in slapd_exemode_db2ldif
		#6  main

Fix description:
Hardening of _replica_configure_ruv that checks that the suffix
entry exists before attempting to create the RUV.

fixes: #7310

Reviewed by:

Summary by Sourcery

Validate replica suffix entries before configuring RUV to prevent invalid LMDB layouts and export loops.

Bug Fixes:

  • Prevent RUV configuration when the replica suffix entry is missing, returning an appropriate LDAP error instead of looping indefinitely during export.

Enhancements:

  • Add an internal suffix-entry existence check used by replication configuration and RUV setup paths.
  • Propagate detailed error text from RUV configuration failures when creating replicas from configuration entries.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In _suffix_entry_check, all non-success cases return -1 but are treated identically by _replica_configure_ruv; consider distinguishing between "suffix missing" and internal errors (e.g., OOM or unexpected rc) so callers and logs can react appropriately.
  • The change of the RUV lookup scope in _replica_configure_ruv from LDAP_SCOPE_BASE to LDAP_SCOPE_ONELEVEL is significant; it would help to either narrow the filter/target DN or add a brief code comment explaining why one-level is required and safe with respect to other entries under the suffix.
  • The new _suffix_entry_check logging uses a mismatched function name in the OOM message ("_suffix_entry_found" vs. "_suffix_entry_check"); aligning these names will make log analysis and grep-based debugging easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_suffix_entry_check`, all non-success cases return `-1` but are treated identically by `_replica_configure_ruv`; consider distinguishing between "suffix missing" and internal errors (e.g., OOM or unexpected rc) so callers and logs can react appropriately.
- The change of the RUV lookup scope in `_replica_configure_ruv` from `LDAP_SCOPE_BASE` to `LDAP_SCOPE_ONELEVEL` is significant; it would help to either narrow the filter/target DN or add a brief code comment explaining why one-level is required and safe with respect to other entries under the suffix.
- The new `_suffix_entry_check` logging uses a mismatched function name in the OOM message ("_suffix_entry_found" vs. "_suffix_entry_check"); aligning these names will make log analysis and grep-based debugging easier.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@packit-as-a-service
Copy link
Copy Markdown

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo dnf install -y 'dnf*-command(copr)'
  • dnf copr enable packit/389ds-389-ds-base-7311
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

}
return_value = 0;
} else if (rc == LDAP_NO_SUCH_OBJECT) {
slapi_log_err(SLAPI_LOG_ERR, repl_plugin_name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if that one should not rather be a Warning rather than an error.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even REPL logging...

ReplicaId rid = 0;

if (_suffix_entry_check(r)) {
slapi_log_err(SLAPI_LOG_ERR, repl_plugin_name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of frightening error messages at _suffix_entry_check then at _suffix_entry_check
maybe should not log the _suffix_entry_check message if entry is not found
(if success but no entry you can log a message as this case should not happen)
and log a warning about being unable to write the ruv because suffix entry does not exists

ReplicaId rid = 0;

if (_suffix_entry_check(r)) {
slapi_log_err(SLAPI_LOG_ERR, repl_plugin_name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to indent the logging function

pb,
slapi_sdn_get_dn(r->repl_root),
LDAP_SCOPE_BASE,
LDAP_SCOPE_ONELEVEL,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LDAP_SCOPE_BASE should be fine. Better keep it.
Since ns-uniqueid is provided it is used rather the dn and we are returning here a single entry

@progier389
Copy link
Copy Markdown
Contributor

I wonder if the logic is OK.
Should we really prevent the RUV creation or should we not write it in the db (i.e doing the test in ruv_write_ruv instead ?)

@tbordaz tbordaz added the work in progress Work in Progress - can be reviewed, but not ready for merge. label Mar 6, 2026
@tbordaz
Copy link
Copy Markdown
Contributor Author

tbordaz commented Mar 6, 2026

Thank for all the reviews. I will update the patch accordingly. Meanwhile there is a lot of CI failures that I will investigate

@tbordaz tbordaz force-pushed the issue_7310 branch 2 times, most recently from 7bf379d to 5c0798b Compare March 10, 2026 14:27
… has not the entryid=1

Bug description:
	On LMDB there is a strong requirement that the suffix entry has the entry-id=1.
	In the case replication is enabled on a backend before the suffix entry exists
	it creates the RUV as entry-id=1 and so the suffix becomes entry-id=2.
	One of the symptom is an infinite loop when exporting the suffix
		#0  slapi_ch_array_add_ext
		389ds#1  0x00007f24e56a36eb in slapi_rdn_add_rdn_to_all_rdns
		389ds#2  0x00007f24e14ea99e in entryrdn_lookup_dn
		389ds#3  0x00007f24e154b3a6 in dbmdb_db2ldif
		389ds#4  0x00007f24e150294c in ldbm_back_ldbm2ldif
		389ds#5  0x00005587335ff56f in slapd_exemode_db2ldif
		389ds#6  main

Fix description:
	Hardening of _replica_configure_ruv that checks that the suffix
	entry exists before attempting to create the RUV.

fixes: 389ds#7310

Reviewed by: Pierre Rogier, Mark Reynolds (thanks !!)
Copy link
Copy Markdown
Member

@droideck droideck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor issues. Otherwise, LGTM!

pb = slapi_pblock_new();
if (!pb) {
slapi_log_err(SLAPI_LOG_ERR, repl_plugin_name,
"_suffix_entry_found - Out of memory\n");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"_suffix_entry_found - Out of memory\n");
"_suffix_entry_check - Out of memory\n");

if (rc != 0) {
rc = LDAP_OTHER;
if (rc != LDAP_UNWILLING_TO_PERFORM) {
rc = LDAP_OTHER;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

@@ -2586,7 +2632,7 @@ _replica_config_changelog(Replica *replica)
* Returns 0 on success, -1 on failure. If 0 is returned, the RUV is present in the replica.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns 0 on success, -1 on failure -- probably, it should be extended as we also return LDAP_UNWILLING_TO_PERFORM now

}


done:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double blank line before done: :)

# and restart the instance to take into account the restored backend
rc = S1.tasks.bak2db(backup_dir=f'{backup_dir}', args={TASK_WAIT: True})
assert rc == 0
S1.restart()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not normal that you have to restart the server .
as bak2db already restart the backend.
IMHO if you really need to restart that means that something is broken somewhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will investigate deeper

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an online restore. During startup, it fails to retrieve the suffix and check the RUV because data.mdb was remove. So the backend is offline/broken before the restore that should contribute to the backend not restarted after restore. continuing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the in-memory replica is built when parsing the dse.ldif config (replica_config_add) that is done at startup or when the replica is created. This step failed because data.mb was removed and the replica did not exist. When the backend is restored, it is started but not the replica. This because replica_config_add is not called again.
An option would be to detect that the suffix is missing, log an error, skip the creation of the RUV and return silently a success. Worth a try.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have not the same issue after online import
Maybe we should add a new step when bringing up backend on line
to configure the ruv if it is not done and if the suffix entry exists

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there is the same issue with online import. May be we could use framework plg_mmr_betxn_preop

Copy link
Copy Markdown
Contributor

@progier389 progier389 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About plg_mmr_betxn_preop: the preop is not called in case of restore/import
and I am not sure we want to do the test for every operation. (But we may want to do it when adding the suffix entry (in bepostop?). ( BTW how replication is bootstrapped when creating empty backend then adding suffix entry ?)
IMHO, for import/restore case, you should add a new pre_enable_be function (a bit like what Mark has done with #7294 for the pre_close function)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well in fact the callback already exists:
multisupplier_be_state_change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

work in progress Work in Progress - can be reviewed, but not ready for merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

On LMDB an export loops indefinitely if the suffix entry has not the entryid=1

4 participants