Skip to content

Commit 8a63b79

Browse files
Merge pull request #242 from K-Natsugawa/master
Add backup/restore scripts for WKS 4.7.x
2 parents 509858e + 2b12444 commit 8a63b79

7 files changed

Lines changed: 847 additions & 0 deletions

File tree

knowledge-studio/4.7.x/README.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# wks-scripts
2+
Utility script files for WKS (on CP4D)
3+
4+
The purpose of utility script files is system backup/restore of CP4D instance, NOT individual WKS artifacts (workspace, e.g.) backup/restore.
5+
6+
Please use the `backup` for the same WKS instance. Restore a `backup` from different instance will cause errors.
7+
8+
Any modifications that have been made after the previous `backup` will be replaced with backup contents by `restore`. For example:
9+
- WKS workspace that was created after the previous `backup` will be deleted.
10+
- Changes made on WKS workspaces contents after the previous `backup` will be replaced with backup contents.
11+
12+
backup/restore scripts do backup/restore data in the following databases of WKS in the order:
13+
1. MongoDB
14+
2. PostgreSQL
15+
3. MinIO
16+
17+
<b>NOTE</b> Users should not access to WKS during backup/restore because WKS will be deactivated (All deactivated pods will be reactivated after backup/restore)
18+
19+
# Deactivate and Reactivate Knowledge Studio
20+
## Note: You don't need to deactivate/reactivate when you run the all-backup-restore.sh script because the script handles the process.
21+
- Deactivate Knowledge Studio
22+
- Make sure that no training and evaluation processes are running. You can check job status with the following command:
23+
- `kubectl -n NAMESPACE get jobs`
24+
- raining jobs of Knowledge Studio are named in the format wks-train-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx, and evaluation jobs are named in the format wks-batch-apply-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. If the COMPLETIONS column of a training job reads 0/1, that job is still running. Wait until all of the training jobs finish.
25+
- Deactivate Knowledge Studio with the following command:
26+
- `kubectl -n NAMESPACE patch --type=merge wks wks -p '{"spec":{"global":{"quiesceMode":true}}}'`
27+
- Make sure no Knowledge Studio pods exist except datastore pods by the following command (this may takes few minutes):
28+
- `kubectl -n NAMESPACE get pod | grep -Ev 'minio|etcd|mongo|postgresql|gw-instance|Completed' | grep wks`
29+
30+
- Reactivate Knowledge Studio
31+
- Reactivate Knowledge Studio with the following command:
32+
- `kubectl -n NAMESPACE patch --type=merge wks wks -p '{"spec":{"global":{"quiesceMode":false}}}'`
33+
34+
# About backup/restore scripts
35+
## all-backup-restore.sh [command] [releaseName] [backupDir] [-n namespace]
36+
### Overall
37+
This script delegates operations of backup/restore to each database scripts (`mongodb-backup-restore.sh`, `postgresql-backup-restore.sh`, `minio-backup-restore.sh`). The order of getting backup/restore is MongoDB, PostgreSQL, MinIO. Besides, at the beginning of this script, scale down the number of pods to zero. At the end of this script, scale up the number of pods to the initial state. <b>We would recommend running `all-backup-restore.sh` rather than running each database scripts.</b>
38+
39+
### Prerequisite
40+
41+
MinIO client (`mc` command) is required to run backup/restore scripts of MinIO.
42+
Please verify that `mc` command is runnable by `type mc` command.
43+
Otherwise, scripts will download it from MinIO web site (`https://dl.min.io/`) during backup/restore.
44+
45+
### Arguments:
46+
- `[command]`: there are two modes:
47+
- `backup`: data of each database are saved into backup directories.
48+
- `restore`: data of each database are recovered by loading data from backup directories.
49+
- `[releaseName]`: release name. you can find it at prefix of pod name, e.g. `{release_name}-ibm-watson-ks-yyy-xxx`
50+
- In 2020 June release, `{release_name}` is always `wks`
51+
- `[backupDir]`: Backup data of MongoDB, PostgreSQL, Minio are stored respectively into these directories. Each database is also restored loading backup data from these directories:
52+
- `backup`: a new folder with timestamp `wks-backup-yyyymmdd_hhmmss` will be created under [backupDir]:
53+
- `[backupDir]/wks-backup-yyyymmdd_hhmmss/mongodb`
54+
- `[backupDir]/wks-backup-yyyymmdd_hhmmss/postgresql`
55+
- `[backupDir]/wks-backup-yyyymmdd_hhmmss/minio`
56+
- `restore`: please set [backupDir] with `wks-backup-yyyymmdd_hhmmss`:
57+
- `[backupDir]/mongodb`
58+
- `[backupDir]/postgresql`
59+
- `[backupDir]/minio`
60+
- `[-n namespace]`: namespace where pods exist
61+
- default namespace is `zen` (if you do not change it)
62+
63+
### Status
64+
Please verify that the success message is shown in console log when all scripts succeed.
65+
- `[SUCCESS] MongoDB,PostgreSQL,Minio (backup|restore)`
66+
67+
Otherwise, some of scripts fail when the fail message is shown. <b>In this case, backup data is corrupted, so please do NOT use the corrupted backup data to restore.</b>
68+
- `[FAIL] MongoDB,PostgreSQL,Minio (backup|restore)`
69+
70+
## mongodb-backup-restore.sh [command] [releaseName] [backupDir] [-n namespace]
71+
72+
Deactivate Knowledge Studio before backup/restore and Reactivate Knowledge Studio after backup/restore.
73+
74+
### Backup
75+
Get backup of MongoDB data
76+
1. Create remote temp file under mongoDB pod, and extract following data `WKSDATA` `ENVDATA` `escloud_sbsep`.
77+
2. Copy `WKSDATA` `ENVDATA` `escloud_sbsep` to local `[backupDir]`.
78+
3. Remove remote temp file.
79+
### Restore
80+
Restore the backed up data to MongoDB
81+
1. Create remote temp file under mongoDB pod
82+
2. Copy `WKSDATA` `ENVDATA` `escloud_sbsep` from local `[backupDir]` to remote temp file.
83+
3. Restore `WKSDATA` `ENVDATA` `escloud_sbsep` on remote temp file.
84+
4. Remove remote temp file.
85+
86+
## postgresql-backup-restore.sh [command] [releaseName] [backupDir] [-n namespace]
87+
88+
Before backup/restore:
89+
90+
Deactivate Knowledge Studio before backup/restore and Reactivate Knowledge Studio after backup/restore.
91+
92+
### Backup
93+
Get backup of the postgresql by getting data dump.
94+
1. Create a job for postgresql backup
95+
2. Dump the databases such as `jobq_{release_name_underscore}`, `model_management_api` and `model_management_api_v2`. The dump files will be named as its name with suffix `.custom`.
96+
3. Copy the dump files to local `[backupDir]`.
97+
4. Delete `.pgpass`
98+
99+
### Restore
100+
Restore the backup data to postgresql by putting data. Delete all existing databases.
101+
1. Create a job for postgresql restore
102+
2. Restore the databases (`jobq_{release_name_underscore}`, `model_management_api` and `model_management_api_v2`) by loading `.custom` files under `[backupDir]`.
103+
3. Delete `.pgpass`
104+
105+
## minio-backup-restore.sh [command] [releaseName] [backupDir] [-n namespace]
106+
107+
Before backup/restore:
108+
109+
Deactivate Knowledge Studio before backup/restore and Reactivate Knowledge Studio after backup/restore.
110+
111+
### Backup
112+
Get backup of MinIO by getting snapshot.
113+
114+
1. `kubectl -n {namespace} port-forward` to a pod with prefix `{release_name}-ibm-minio` in background.
115+
2. Configure minio alias, `wks-minio`
116+
3. Copy all data from `wks-minio/wks-icp` to `{backupDir}`
117+
4. Finish `kubectl -n {namespace} port-forward`
118+
119+
### Restore
120+
Restore the backup data to MinIO. Delete all the existing data of MinIO before restoring data.
121+
1. `kubectl -n {namespace} port-forward` to a pod with prefix `{release_name}-ibm-minio` in background.
122+
2. Configure minio alias, `wks-minio`
123+
3. Copy all data from `{backupDir}` to `wks-minio/wks-icp`
124+
4. Finish `kubectl -n {namespace} port-forward`
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
#!/bin/bash
2+
3+
declare -r CUREBT_TIME=`date +%s`
4+
declare -r TIME_OUT=600
5+
6+
print_help() {
7+
echo "This script is in order to backup/restore all MongoDB/PostgreSQL/S3 data"
8+
echo "USAGE: $0 [command] [releaseName] [backupDir] [-n namespace]"
9+
exit 1
10+
}
11+
12+
cmd_check(){
13+
if [ $? -ne 0 ] ; then
14+
reactivate
15+
echo "[FAIL] MongoDB,PostgreSQL,S3 $COMMAND"
16+
exit 1
17+
fi
18+
}
19+
20+
reactivate() {
21+
echo "Reactivate knowledge studio:"
22+
oc $KUBECTL_ARGS patch --type=merge wks ${RELEASE_NAME} -p "{\"spec\":{\"global\":{\"quiesceMode\":false}}}"
23+
cmd_check
24+
25+
oc $KUBECTL_ARGS patch --type=merge wks ${RELEASE_NAME} -p "{\"spec\":{\"mma\":{\"replicas\":\"${MMA_REPLICAS}\"}}}"
26+
cmd_check
27+
28+
END_TIME=`date +%s`
29+
printf "%s %d\n" "Elapsed time: " `expr $END_TIME - $CUREBT_TIME`
30+
}
31+
32+
deactivating() {
33+
echo "Deactivating knowledge studio:"
34+
oc $KUBECTL_ARGS patch --type=merge wks ${RELEASE_NAME} -p '{"spec":{"global":{"quiesceMode":true}}}'
35+
cmd_check
36+
37+
oc $KUBECTL_ARGS patch --type=merge wks ${RELEASE_NAME} -p '{"spec":{"mma":{"replicas":0}}}'
38+
cmd_check
39+
}
40+
41+
# command line arguments
42+
if [ $# -lt 3 ] ; then
43+
print_help
44+
fi
45+
46+
COMMAND=$1
47+
shift
48+
RELEASE_NAME=$1
49+
shift
50+
DATA_DIR=$1
51+
shift
52+
while getopts "n:" opt; do
53+
case $opt in
54+
"n" ) NAMESPACE=$OPTARG ;;
55+
esac
56+
done
57+
58+
echo "release name:'$RELEASE_NAME'"
59+
60+
echo "checking command..."
61+
if [[ ! $COMMAND = "backup" && ! $COMMAND = "restore" ]]; then
62+
echo "command: '$COMMAND' not supported. backup and restore are supported."
63+
print_help
64+
else
65+
echo "command: '$COMMAND'"
66+
fi
67+
68+
echo "checking $COMMAND directory..."
69+
if [[ $COMMAND = "backup" ]]; then
70+
DATA_DIR=${DATA_DIR%*/}"/wks-${COMMAND}-`date '+%Y%m%d_%H%M%S'`"
71+
72+
DATA_DIR_MONGODB="${DATA_DIR%*/}/mongodb"
73+
DATA_DIR_POSTRGESQL="${DATA_DIR%*/}/postgresql"
74+
DATA_DIR_S3="${DATA_DIR%*/}/S3"
75+
else
76+
DATA_DIR_MONGODB="${DATA_DIR%*/}/mongodb"
77+
DATA_DIR_POSTRGESQL="${DATA_DIR%*/}/postgresql"
78+
DATA_DIR_S3=$([ -d ${DATA_DIR%*/}/S3 ] && echo "${DATA_DIR%*/}/S3" || echo "${DATA_DIR%*/}/minio")
79+
80+
if [[ ! -d ${DATA_DIR_MONGODB} || ! -d ${DATA_DIR_POSTRGESQL} || ! -d ${DATA_DIR_S3} ]]; then
81+
echo "no backup directory: $DATA_DIR"
82+
echo "[FAIL] MongoDB,PostgreSQL,S3 $COMMAND"
83+
exit 1
84+
fi
85+
fi
86+
87+
echo "$COMMAND directory:"
88+
echo " mongodb:'$DATA_DIR_MONGODB'"
89+
echo " postrgesql:'$DATA_DIR_POSTRGESQL'"
90+
echo " S3:'$DATA_DIR_S3'"
91+
92+
if [ -v NAMESPACE ]; then
93+
echo "checking namespace..."
94+
oc get namespace $NAMESPACE
95+
if [[ ! $? -eq 0 ]]; then
96+
echo "namespace:'$NAMESPACE' not exist"
97+
echo "[FAIL] MongoDB,PostgreSQL,S3 $COMMAND"
98+
exit 1
99+
else
100+
echo "namespace:'$NAMESPACE'"
101+
fi
102+
103+
KUBECTL_ARGS="${KUBECTL_ARGS} --namespace=$NAMESPACE"
104+
NAMESPACE_OPT="-n $NAMESPACE"
105+
else
106+
echo "default namespace is used for oc"
107+
fi
108+
109+
################################################################
110+
#Decativate wks deployment
111+
#Make sure no running job
112+
################################################################
113+
jobs=`oc $KUBECTL_ARGS get job --no-headers | grep -e ${RELEASE_NAME}-train -e ${RELEASE_NAME}-batch-apply | awk '{print $2}'`
114+
for job in ${jobs}; do
115+
if [[ $job == "0/1" ]]; then
116+
echo "$COMMAND failed because training/evaluation job is running. Please wait for while until the job will complete."
117+
echo "[FAIL] MongoDB,PostgreSQL,S3 $COMMAND"
118+
exit 1
119+
fi
120+
done
121+
122+
echo "Get 'Postgresql IMAGE' for $COMMAND PostgreSql"
123+
POSTGRESQL_POD_NAME=`oc $KUBECTL_ARGS get pods -o=go-template --template='{{range $pod := .items}}{{range .status.containerStatuses}}{{if .ready}}{{$pod.metadata.name}}{{"\n"}}{{end}}{{end}}{{end}}' | grep ${RELEASE_NAME}-edb-postgresql | head -n 1`
124+
if [[ ! $POSTGRESQL_POD_NAME ]]; then
125+
echo "get Postgresql pod failed"
126+
echo "[FAIL] MongoDB,PostgreSQL,S3 $COMMAND"
127+
exit 1
128+
else
129+
echo "Postgresql pod name: '$POSTGRESQL_POD_NAME'"
130+
fi
131+
POSTGRESQL_IMAGE_NAME=`oc $KUBECTL_ARGS get pod $POSTGRESQL_POD_NAME -o jsonpath='{.spec.containers[0].image}'`
132+
if [[ ! $POSTGRESQL_IMAGE_NAME ]]; then
133+
echo "get Postgresql image failed"
134+
echo "[FAIL] MongoDB,PostgreSQL,S3 $COMMAND"
135+
exit 1
136+
else
137+
echo "Postgresql image name: '$POSTGRESQL_IMAGE_NAME'"
138+
fi
139+
140+
echo ""
141+
echo "Get replicas of MMA"
142+
MMA_REPLICAS=`kubectl $KUBECTL_ARGS get wks ${RELEASE_NAME} -o jsonpath='{.spec.mma.replicas}'`
143+
if [ -z "$MMA_REPLICAS" ]; then
144+
WKS_SIZE=`kubectl $KUBECTL_ARGS get wks ${RELEASE_NAME} -o jsonpath='{.spec.global.size}'`
145+
if [[ ${WKS_SIZE} == "medium" ]]; then
146+
MMA_REPLICAS="2"
147+
else # small
148+
MMA_REPLICAS="1"
149+
fi
150+
fi
151+
echo "replicas of MMA: ${MMA_REPLICAS}"
152+
153+
echo ""
154+
deactivating
155+
156+
echo "Wait until all pods stop except datastore pods, this may take a few minutes..."
157+
sleepTime=0
158+
while :
159+
do
160+
GET_POD_NUMBER=`kubectl $KUBECTL_ARGS get pod | grep -Ev 'minio|etcd|mongo|postgresql|gw-instance|Completed' | grep "${RELEASE_NAME}-" | wc -l`
161+
echo "number of the present pods which need to stop: $GET_POD_NUMBER, please wait..."
162+
if [ $GET_POD_NUMBER = 0 ] ; then
163+
echo "All pods outside the datastore pod scaled down"
164+
break
165+
fi
166+
167+
sleep 10
168+
sleepTime=$[$sleepTime+10]
169+
if [[ $sleepTime -ge $TIME_OUT ]]; then
170+
echo "Time out when waiting knowledge studio to be deactivated"
171+
reactivate
172+
echo "[FAIL] MongoDB,PostgreSQL,S3 $COMMAND"
173+
exit 1
174+
fi
175+
done
176+
177+
################################################################
178+
#backup/restore all MongoDB PostgreSQL S3 data
179+
################################################################
180+
181+
echo ""
182+
echo "============================== $COMMAND MongoDB start:"
183+
bash mongodb-backup-restore.sh $COMMAND $RELEASE_NAME $DATA_DIR_MONGODB $NAMESPACE_OPT
184+
cmd_check
185+
echo ""
186+
echo "============================== $COMMAND PostgreSQL start:"
187+
bash postgresql-backup-restore.sh $COMMAND $RELEASE_NAME $DATA_DIR_POSTRGESQL $POSTGRESQL_IMAGE_NAME $NAMESPACE_OPT
188+
cmd_check
189+
echo ""
190+
echo "============================== $COMMAND s3 start:"
191+
bash s3-backup-restore.sh $COMMAND $RELEASE_NAME $DATA_DIR_S3 $NAMESPACE_OPT
192+
cmd_check
193+
echo ""
194+
195+
################################################################
196+
#Reactivate wks deployment
197+
################################################################
198+
199+
reactivate
200+
201+
echo "[SUCCESS] MongoDB,PostgreSQL,S3 $COMMAND"
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
apiVersion: batch/v1
2+
kind: Job
3+
metadata:
4+
name: ${RELEASE_NAME_UNDERSCORE}-edb-postgresql-backup-restore-job
5+
labels:
6+
app.kubernetes.io/name: ibm-watson-ks
7+
postgresql: ${RELEASE_NAME_UNDERSCORE}-edb-postgresql
8+
release: ${RELEASE_NAME_UNDERSCORE}
9+
function: ${RELEASE_NAME_UNDERSCORE}-edb-postgresql-${COMMAND}
10+
namespace: ${NAMESPACE}
11+
spec:
12+
template:
13+
metadata:
14+
labels:
15+
app.kubernetes.io/name: ibm-watson-ks
16+
postgresql: ${RELEASE_NAME_UNDERSCORE}-edb-postgresql
17+
release: ${RELEASE_NAME_UNDERSCORE}
18+
function: ${RELEASE_NAME_UNDERSCORE}-edb-postgresql-${COMMAND}
19+
spec:
20+
restartPolicy: Never
21+
containers:
22+
- name: ${RELEASE_NAME_UNDERSCORE}-edb-postgresql-backup-restore
23+
image: ${POSTGRESQL_IMAGE}
24+
env:
25+
- name: TZ
26+
value: "UTC+7"
27+
- name: PGPASSWORD
28+
value: "${PGPASSWORD}"
29+
- name: PGPORT
30+
value: "${PGPORT}"
31+
- name: PGHOST
32+
value: "${RELEASE_NAME_UNDERSCORE}-edb-postgresql-rw.${NAMESPACE}"
33+
- name: PGUSER
34+
value: "${PGUSER}"
35+
resources:
36+
limits:
37+
cpu: "800m"
38+
memory: "2Gi"
39+
requests:
40+
cpu: "100m"
41+
memory: "256Mi"
42+
command:
43+
- "/bin/bash"
44+
- "-ec"
45+
- |
46+
psql -l
47+
echo "${COMMAND} begin:"
48+
if [ "${COMMAND}" = "backup" ]; then
49+
pg_dump --clean -Fc jobq_${RELEASE_NAME_UNDERSCORE} > /tmp/jobq_${RELEASE_NAME_UNDERSCORE}.custom
50+
pg_dump --clean -Fc model_management_api > /tmp/model_management_api.custom
51+
pg_dump --clean -Fc model_management_api_v2 > /tmp/model_management_api_v2.custom
52+
else
53+
echo "restore"
54+
fi
55+
touch /tmp/${COMMAND}_job_complete
56+
while true;
57+
do
58+
if [ -e /tmp/${COMMAND}_complete ] ; then
59+
break
60+
else
61+
echo "waiting for ${COMMAND} to be completed"
62+
sleep 10
63+
fi
64+
done

0 commit comments

Comments
 (0)