CASSSIDECAR-377: Implement job coordination for cluster-wide operations#360
CASSSIDECAR-377: Implement job coordination for cluster-wide operations#360andresbeckruiz wants to merge 2 commits into
Conversation
98fc6e2 to
0e36849
Compare
0e36849 to
817bf51
Compare
|
|
||
| // New job is submitted for all cases when we do not have a corresponding downstream job | ||
| jobTracker.computeIfAbsent(job.jobId(), jobId -> { | ||
| OperationalJob tracked = jobTracker.computeIfAbsent(job.jobId(), jobId -> job); |
There was a problem hiding this comment.
do we need to remove from jobTracker if there's a coordination conflict ? Or perhaps should we just add to jobTracker in case the tryCoordination was successful, so tryCoordination should come before jobtracker.computeIfAbsent ?
There was a problem hiding this comment.
This leaves stale job entry in CREATED state if the coordination fails.
There was a problem hiding this comment.
Would it make sense to mark the job as FAILED if coordination fails? And set the failureReason to indicate that the job was not able to start due to coordination failure?
| { | ||
| if (job.requiresCoordination()) | ||
| { | ||
| Preconditions.checkState(coordinator != null, |
There was a problem hiding this comment.
Preconditions throws illegal state exception, which is not handled in the caller
| }; | ||
|
|
||
| manager.trySubmitJob(job, onComplete, executorPool.service(), SecondBoundConfiguration.parse("5s")); | ||
| assertThat(latch.await(10, TimeUnit.SECONDS)).isTrue(); |
There was a problem hiding this comment.
Also verify that tracker doesn't have entry for this job after conflict detected.
| { | ||
| Preconditions.checkState(coordinator != null, | ||
| "Job requires coordination but no OperationalJobCoordinator is configured"); | ||
| boolean activated = coordinator.trySetActive(job.operationType(), job.jobId()); |
There was a problem hiding this comment.
I am afraid to merge this PR without calling clearActive. If we delay adding that in a subsequent PR and if someone enables requiresCoordination() for any job meanwhile, then no more further operations will be allowed. I would recommend implementing calling clearActive in the same PR.
CASSSIDECAR-377
Original PR made against CASSSIDECAR-373 branch with review comments: andresbeckruiz#2.
Changes
OperationalJobCoordinatorinterfaceStorageOperationalJobCoordinator: Implementation that delegates toStorageProvider's compare and set based methods to acquire a lock for an active operationOperationalJob.operationType(): New abstract method returningOperationTypeenum, implemented by all concrete jobsOperationalJobManagerintegration: If a job requires coordination, the manager acquires the active lock via the coordinator before execution; throwsOperationalJobConflictExceptionif rejectedStorageOperationalJobCoordinatorand coordinator integration inOperationalJobManagerTestThis ticket only covers the job activation path when a job is submitted to be executed immediately. Clearing locks after completion and status update activation will be implemented in later tickets (CASSSIDECAR-378, CASSSIDECAR-379).
Future Work