Skip to content

Full shutdown document added#10

Draft
bbezak wants to merge 2 commits into
masterfrom
full-shutdown
Draft

Full shutdown document added#10
bbezak wants to merge 2 commits into
masterfrom
full-shutdown

Conversation

@bbezak

@bbezak bbezak commented Mar 29, 2021

Copy link
Copy Markdown
Member

No description provided.

@oneswig oneswig left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Bartosz, just a couple of questions / suggestions

Comment thread source/full_shutdown.rst Outdated
Comment thread source/full_shutdown.rst
Comment thread source/full_shutdown.rst Outdated
Comment thread source/operations_and_monitoring.rst
Comment thread source/full_shutdown.rst

Stop Ceph
---------
Procedure based on `Red Hat documentation <https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/administration_guide/understanding-process-management-for-ceph#powering-down-and-rebooting-a-red-hat-ceph-storage-cluster_admin>`__

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's something equivalent in the community docs it would be better, but the closest I found was https://docs.ceph.com/en/latest/rados/operations/operating/ and it doesn't cover setting all the flags below.

Comment thread source/full_shutdown.rst

.. code-block:: bash

systemctl poweroff

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be serialised form of shutdown invocation using Kayobe's tools https://docs.openstack.org/kayobe/latest/administration/overcloud.html#running-commands - perhaps also with a small delay to the shutdown command so that it doesn't immediately chop off the ansible connection.

@markgoddard markgoddard left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition. Looks like there is some room for automation here, but that can be added iteratively.

Comment thread source/full_shutdown.rst Outdated
Comment thread source/full_shutdown.rst
.. code-block:: bash

for i in `openstack server list --all-projects -c ID -f value` ; \
do openstack server stop $i ; done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this asynchronous? Should we check for success?

Comment thread source/full_shutdown.rst
- Stop the Ceph clients from using any Ceph resources (RBD, RADOS Gateway, CephFS)
- Check if cluster is in healthy state

.. code-block:: bash

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be indented more to be part of the bullet?

Comment thread source/full_shutdown.rst

- Stop CephFS (if applicable)

Stop CephFS cluster by reducing the number of ranks to 1, setting the cluster_down flag, and then failing the last rank.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, indentation?

Comment thread source/full_shutdown.rst
----------------------------

Set maintenance mode in bifrost to prevent nodes from automatically
powering back on

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other option is to power off via bifrost

Comment thread source/full_shutdown.rst


Full Power on Procedure
-----------------------

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a different heading style. Alternatively (preferably?) this section could go in another page called cold_start.rst.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or change the page to be: "Shutdown and power on procedures"

Comment thread source/full_shutdown.rst
* Shut down controllers
* Shut down Ceph nodes (if applicable)
* Shut down seed VM
* Shut down Ansible control host

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't covered

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should't make any assumptions about what or where this is. It may not be the seed hypervisor, which should also be called out explicitly.

Comment thread source/full_shutdown.rst
* Perform a graceful shutdown of all virtual machine instances
* Stop Ceph (if applicable)
* Put all nodes into maintenance mode in Bifrost
* Shut down compute nodes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this lists shutting down different types of nodes separately, but the procedure only stops the services separately, then shuts down all nodes at once.

Comment thread source/full_shutdown.rst
* Remove nodes from maintenance mode in bifrost
* Recover MariaDB cluster
* Start Ceph (if applicable)
* Check that all docker containers are running

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: they haven't been started

Comment thread source/full_shutdown.rst

.. code-block:: bash

kayobe# kayobe overcloud database recover

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would be cleaner to stop the containers before shutdown, to avoid them starting up in a broken state.

@markgoddard

Copy link
Copy Markdown

Looks like quite a few comments still to be addressed. It's quite hard to review larger changes when force-pushed. Could you add commits, then squash at the end?

Comment thread source/full_shutdown.rst
following order:

* Perform a graceful shutdown of all virtual machine instances
* Stop Ceph (if applicable)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be early for stopping Ceph, in case the OpenStack services are still using Ceph state (eg, image uploads). Perhaps stop Ceph at the point where the Ceph nodes are shut down.

@bbezak

bbezak commented Apr 9, 2021

Copy link
Copy Markdown
Member Author

Looks like quite a few comments still to be addressed. It's quite hard to review larger changes when force-pushed. Could you add commits, then squash at the end?

sure, makes perfect sense - that was Gerrit habit ;)

@priteau

priteau commented Jan 9, 2023

Copy link
Copy Markdown
Member

This would be nice to complete and merge.

@bbezak bbezak marked this pull request as draft June 26, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants