Issue 7310 - On LMDB an export loops indefinitely if the suffix entry…#7311
Issue 7310 - On LMDB an export loops indefinitely if the suffix entry…#7311tbordaz wants to merge 1 commit into389ds:mainfrom
Conversation
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
_suffix_entry_check, all non-success cases return-1but are treated identically by_replica_configure_ruv; consider distinguishing between "suffix missing" and internal errors (e.g., OOM or unexpected rc) so callers and logs can react appropriately. - The change of the RUV lookup scope in
_replica_configure_ruvfromLDAP_SCOPE_BASEtoLDAP_SCOPE_ONELEVELis significant; it would help to either narrow the filter/target DN or add a brief code comment explaining why one-level is required and safe with respect to other entries under the suffix. - The new
_suffix_entry_checklogging uses a mismatched function name in the OOM message ("_suffix_entry_found" vs. "_suffix_entry_check"); aligning these names will make log analysis and grep-based debugging easier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `_suffix_entry_check`, all non-success cases return `-1` but are treated identically by `_replica_configure_ruv`; consider distinguishing between "suffix missing" and internal errors (e.g., OOM or unexpected rc) so callers and logs can react appropriately.
- The change of the RUV lookup scope in `_replica_configure_ruv` from `LDAP_SCOPE_BASE` to `LDAP_SCOPE_ONELEVEL` is significant; it would help to either narrow the filter/target DN or add a brief code comment explaining why one-level is required and safe with respect to other entries under the suffix.
- The new `_suffix_entry_check` logging uses a mismatched function name in the OOM message ("_suffix_entry_found" vs. "_suffix_entry_check"); aligning these names will make log analysis and grep-based debugging easier.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
| } | ||
| return_value = 0; | ||
| } else if (rc == LDAP_NO_SUCH_OBJECT) { | ||
| slapi_log_err(SLAPI_LOG_ERR, repl_plugin_name, |
There was a problem hiding this comment.
I wonder if that one should not rather be a Warning rather than an error.
There was a problem hiding this comment.
Or maybe even REPL logging...
| ReplicaId rid = 0; | ||
|
|
||
| if (_suffix_entry_check(r)) { | ||
| slapi_log_err(SLAPI_LOG_ERR, repl_plugin_name, |
There was a problem hiding this comment.
lots of frightening error messages at _suffix_entry_check then at _suffix_entry_check
maybe should not log the _suffix_entry_check message if entry is not found
(if success but no entry you can log a message as this case should not happen)
and log a warning about being unable to write the ruv because suffix entry does not exists
| ReplicaId rid = 0; | ||
|
|
||
| if (_suffix_entry_check(r)) { | ||
| slapi_log_err(SLAPI_LOG_ERR, repl_plugin_name, |
There was a problem hiding this comment.
Need to indent the logging function
| pb, | ||
| slapi_sdn_get_dn(r->repl_root), | ||
| LDAP_SCOPE_BASE, | ||
| LDAP_SCOPE_ONELEVEL, |
There was a problem hiding this comment.
LDAP_SCOPE_BASE should be fine. Better keep it.
Since ns-uniqueid is provided it is used rather the dn and we are returning here a single entry
|
I wonder if the logic is OK. |
|
Thank for all the reviews. I will update the patch accordingly. Meanwhile there is a lot of CI failures that I will investigate |
7bf379d to
5c0798b
Compare
… has not the entryid=1 Bug description: On LMDB there is a strong requirement that the suffix entry has the entry-id=1. In the case replication is enabled on a backend before the suffix entry exists it creates the RUV as entry-id=1 and so the suffix becomes entry-id=2. One of the symptom is an infinite loop when exporting the suffix #0 slapi_ch_array_add_ext 389ds#1 0x00007f24e56a36eb in slapi_rdn_add_rdn_to_all_rdns 389ds#2 0x00007f24e14ea99e in entryrdn_lookup_dn 389ds#3 0x00007f24e154b3a6 in dbmdb_db2ldif 389ds#4 0x00007f24e150294c in ldbm_back_ldbm2ldif 389ds#5 0x00005587335ff56f in slapd_exemode_db2ldif 389ds#6 main Fix description: Hardening of _replica_configure_ruv that checks that the suffix entry exists before attempting to create the RUV. fixes: 389ds#7310 Reviewed by: Pierre Rogier, Mark Reynolds (thanks !!)
droideck
left a comment
There was a problem hiding this comment.
Very minor issues. Otherwise, LGTM!
| pb = slapi_pblock_new(); | ||
| if (!pb) { | ||
| slapi_log_err(SLAPI_LOG_ERR, repl_plugin_name, | ||
| "_suffix_entry_found - Out of memory\n"); |
There was a problem hiding this comment.
| "_suffix_entry_found - Out of memory\n"); | |
| "_suffix_entry_check - Out of memory\n"); |
| if (rc != 0) { | ||
| rc = LDAP_OTHER; | ||
| if (rc != LDAP_UNWILLING_TO_PERFORM) { | ||
| rc = LDAP_OTHER; |
| @@ -2586,7 +2632,7 @@ _replica_config_changelog(Replica *replica) | |||
| * Returns 0 on success, -1 on failure. If 0 is returned, the RUV is present in the replica. | |||
There was a problem hiding this comment.
Returns 0 on success, -1 on failure -- probably, it should be extended as we also return LDAP_UNWILLING_TO_PERFORM now
| } | ||
|
|
||
|
|
||
| done: |
There was a problem hiding this comment.
Double blank line before done: :)
| # and restart the instance to take into account the restored backend | ||
| rc = S1.tasks.bak2db(backup_dir=f'{backup_dir}', args={TASK_WAIT: True}) | ||
| assert rc == 0 | ||
| S1.restart() |
There was a problem hiding this comment.
That is not normal that you have to restart the server .
as bak2db already restart the backend.
IMHO if you really need to restart that means that something is broken somewhere
There was a problem hiding this comment.
Ok. I will investigate deeper
There was a problem hiding this comment.
It is an online restore. During startup, it fails to retrieve the suffix and check the RUV because data.mdb was remove. So the backend is offline/broken before the restore that should contribute to the backend not restarted after restore. continuing
There was a problem hiding this comment.
Indeed, the in-memory replica is built when parsing the dse.ldif config (replica_config_add) that is done at startup or when the replica is created. This step failed because data.mb was removed and the replica did not exist. When the backend is restored, it is started but not the replica. This because replica_config_add is not called again.
An option would be to detect that the suffix is missing, log an error, skip the creation of the RUV and return silently a success. Worth a try.
There was a problem hiding this comment.
I wonder if we have not the same issue after online import
Maybe we should add a new step when bringing up backend on line
to configure the ruv if it is not done and if the suffix entry exists
There was a problem hiding this comment.
Yes there is the same issue with online import. May be we could use framework plg_mmr_betxn_preop
There was a problem hiding this comment.
About plg_mmr_betxn_preop: the preop is not called in case of restore/import
and I am not sure we want to do the test for every operation. (But we may want to do it when adding the suffix entry (in bepostop?). ( BTW how replication is bootstrapped when creating empty backend then adding suffix entry ?)
IMHO, for import/restore case, you should add a new pre_enable_be function (a bit like what Mark has done with #7294 for the pre_close function)
There was a problem hiding this comment.
Well in fact the callback already exists:
multisupplier_be_state_change
… has not the entryid=1
Bug description:
On LMDB there is a strong requirement that the suffix entry has the entry-id=1.
In the case replication is enabled on a backend before the suffix entry exists
it creates the RUV as entry-id=1 and so the suffix becomes entry-id=2.
One of the symptom is an infinite loop when exporting the suffix
#0 slapi_ch_array_add_ext
Fix description:
Hardening of _replica_configure_ruv that checks that the suffix
entry exists before attempting to create the RUV.
fixes: #7310
Reviewed by:
Summary by Sourcery
Validate replica suffix entries before configuring RUV to prevent invalid LMDB layouts and export loops.
Bug Fixes:
Enhancements: