Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: How to reset a stale running job / restart a completed job?

  1. #1

    Default How to reset a stale running job / restart a completed job?

    Hi everyone,

    Sometimes my application server fails in a way, that executing job is not marked as FAILED (e.g. the application server is shut down). In this case the DB is not updated, and I can't re-launch the job. So I've implemented a simple method to mark such job executions as STOPPED to allow them to be re-run:

    Code:
    public boolean resetStaleJob(long jobExecutionId) {
    	JobExecution jobExecution = jobExplorer.getJobExecution(Long.valueOf(jobExecutionId));
    
    	if (jobExecution == null) {
    		return false;
    	}
    
    	final BatchStatus status = jobExecution.getStatus();
    
    	if (status.isGreaterThan(BatchStatus.STARTED)) {
    		return false;
    	}
    
    	jobExecution.setStatus(BatchStatus.STOPPED);
    	jobExecution.setEndTime(new Date());
    	jobRepository.update(jobExecution);
    
    	return true;
    }
    I am not sure if the hacking like that is good way.

    Also I can't restart the completed job, because JobInstanceAlreadyCompleteException is thrown. I have set <batch:tasklet allow-start-if-complete="true">, but that does not help. Is there any legal way to restart the job, except of introducing the dummy Job parameter (e.g. equal to execution date)?

    Thanks.

  2. #2
    Join Date
    Dec 2005
    Location
    Lyon, France
    Posts
    311

    Default

    Also I can't restart the completed job, because JobInstanceAlreadyCompleteException is thrown. I have set <batch:tasklet allow-start-if-complete="true">, but that does not help. Is there any legal way to restart the job, except of introducing the dummy Job parameter (e.g. equal to execution date)?
    there's no way to restart a completed job instance. The allow-start-if-complete flag applies to tasklets, to re-execute an already completed tasklet on a restart.

  3. #3

    Default

    Quote Originally Posted by arno View Post
    there's no way to restart a completed job instance. The allow-start-if-complete flag applies to tasklets, to re-execute an already completed tasklet on a restart.
    arno, thanks for reply. I still do not understand the correlation between <job restartable="xxx"> and <tasklet allow-start-if-complete="...">. Do you agree that documentation should be perhaps more clear about this?

    And in my particular situation: Can framework be extended to support above (the restart of a completed job + fixing of stale jobs), or is it against the framework ideology (and I should basically do it myself)? In the later case, which way is the best (to restart a completed job and to fix stale jobs)?

  4. #4
    Join Date
    Dec 2005
    Location
    Lyon, France
    Posts
    311

    Default

    I still do not understand the correlation between <job restartable="xxx"> and <tasklet allow-start-if-complete="...">.
    for <job restartable="true/false" />. The default is true: you can restart any non-COMPLETED/ABANDONED instance. Set it to false if a failed job instance shouldn't be restarted (because the job doesn't handle restart and could process the same data twice, which is bad).

    for <tasklet allow-start-if-complete="true/false">. Let's take an example: a job has three steps (step1, step2, step3). A first execution runs and fails during step3. On a restart, Spring Batch will go directly to step3 and try to finish the execution. It's because allow-start-if-complete defaults to false. Now, same thing, except allow-start-if-complete=true on step1 tasklet. On a restart (after a failure on step3), Spring Batch re-executes step1, skips step2, and re-executes step3. Clearer now? :-)

    I think the documentation is pretty clear on this part: http://static.springsource.org/sprin...tartIfComplete

    Can framework be extended to support above (the restart of a completed job), or is it against the framework ideology
    I think Spring Batch's behavior makes sense here: why would you want to restart something already finished? There could be some flag to set, but what are the semantics: the instance is done, where should it restart from? Can't you use the STOPPED status? There's plenty of support in Spring Batch to choose the final status of an job instance (and avoid the COMPLETED step if necessary), you should perhaps take at that (you'll still need to know exactly where the job should resume).

    Can framework be extended to support above (fixing of stale jobs), or is it against the framework ideology
    you could perhaps raise a JIRA for that.

  5. #5

    Default

    for <tasklet allow-start-if-complete="true/false">. Let's take an example: a job has three steps (step1, step2, step3). A first execution runs and fails during step3. On a restart, Spring Batch will go directly to step3 and try to finish the execution. It's because allow-start-if-complete defaults to false. Now, same thing, except allow-start-if-complete=true on step1 tasklet. On a restart (after a failure on step3), Spring Batch re-executes step1, skips step2, and re-executes step3. Clearer now? :-)
    I think the documentation is pretty clear on this part: http://static.springsource.org/sprin...tartIfComplete
    Thanks, absolutely clear. I was looking into wrong chapter of documentation.

    I think Spring Batch's behavior makes sense here: why would you want to restart something already finished? There could be some flag to set, but what are the semantics: the instance is done, where should it restart from?
    In my case I have new data added to DB, and I want a job to start from the beginning (so it can hardly be called "restart", better "start from blank").

    Can't you use the STOPPED status? There's plenty of support in Spring Batch to choose the final status of an job instance (and avoid the COMPLETED step if necessary), you should perhaps take at that (you'll still need to know exactly where the job should resume).
    The STOPPED status is good if there is some problem detected (after fixing the problem to let the job to continue from the interrupted point). I just want to start the job again on periodical basis (say, every Monday run from the beginning).

  6. #6
    Join Date
    Dec 2005
    Location
    Lyon, France
    Posts
    311

    Default

    The STOPPED status is good if there is some problem detected (after fixing the problem to let the job to continue from the interrupted point). I just want to start the job again on periodical basis (say, every Monday run from the beginning).
    looks to me the concept of a new job instance matches what you want. Each instance would have a date job parameter if it works on the same input data (table, file). I can't see why you need to restart the already completed job, just like if you wanted only one, "eternal" job instance for a particular job. Can you tell more about what you want to achieve?

  7. #7

    Default

    Quote Originally Posted by arno View Post
    looks to me the concept of a new job instance matches what you want. Each instance would have a date job parameter if it works on the same input data (table, file).
    I think, that is what I asked with my first post: do I need to introduce the dummy date job parameter (set to execution date) to start a new job?

    I can't see why you need to restart the already completed job, just like if you wanted only one, "eternal" job instance for a particular job. Can you tell more about what you want to achieve?
    My fault here: now I understand the term "restart" better in a way what it means to Batch. Indeed, I do not need to restart a completed job: I want to start a new one. Let the completed job be archived, as it should. On the other hand I would like the framework to check, that the job with this name is not currently being executed (regardless of parameters). So in brief the steps should look like:

    Code:
    if (job is COMPLETED)
    {
      start_new_job()
    }
    else if (job has FAILED)
    {
      restart_old_job()
    }
    else
    {
      exception: "The job is currently running"
    }
    How to implement start_new_job()?

  8. #8
    Join Date
    Dec 2005
    Location
    Lyon, France
    Posts
    311

    Default

    if you job runs daily, you should indeed use a job parameters for the day. If the concepts of job instances, job executions, etc... aren't clear to you, take a look at the documentation: http://static.springsource.org/sprin...html#domainJob.

    you can use the JobExplorer interface to check if there's a running execution of a job.

  9. #9
    Join Date
    May 2006
    Posts
    7

    Default

    I think there is a part of the original question which was not replied and is also a problem for me. How do you resume job executions that have started but are not completed nor failed nor stopped because the server for some reason went down? How is it possible to resume these cases and have the job pickup where it left-off?

  10. #10

    Default

    Quote Originally Posted by arno View Post
    if you job runs daily, you should indeed use a job parameters for the day
    OK, clear. I need to add one parameter, which is dependant on the day of start.

    I hope, the framework could provide the method like restartOrStartNewJob(Job job) like this:

    Code:
    void restartOrStartNewJob(Job job) {
    	JobParameters jobParameters = null;
    
    	List<JobInstance> lastInstances = jobExplorer.getJobInstances(job.getJobName(), 0, 1);
    
    	if (!CollectionUtils.isEmpty(lastInstances)) {
    		// Try to restart the last execution:
    		jobParameters = lastInstances.get(0).getJobParameters();
    	}
    
    	if (jobParameters == null) {
    		// Try to start a new instance:
    		jobParameters = job.getJobParametersIncrementer().getNext(createDefaultJobParameters());
    	}
    
    	try {
    		logger.info("Attempting to re-launch job with parameters " + jobParameters);
    
    		Long executionId;
    
    		try {
    			executionId = jobLauncher.run(job, jobParameters).getId();
    		}
    		catch (JobInstanceAlreadyCompleteException e) {
    			jobParameters = job.getJobParametersIncrementer().getNext(jobParameters);
    
    			logger.info("Attempting to start new job with parameters " + jobParameters);
    
    			executionId = jobLauncher.run(job, jobParameters).getId();
    		}
    
    		...
    	}
    	catch (JobInstanceAlreadyCompleteException e) {
    		// This situation should never happen, as we have taken steps to increment the parameters:
    		...
    	}
    	catch (JobExecutionAlreadyRunningException e) {
    		...
    	}
    	catch (JobRestartException e) {
    		...
    	}
    	catch (JobParametersInvalidException e) {
    		...
    	}
    }
    If the concepts of job instances, job executions, etc... aren't clear to you, take a look at the documentation: http://static.springsource.org/sprin...html#domainJob.
    Thanks for the link. After reading the docu again, I understand the basics better

    2eparchas:

    Quote Originally Posted by eparchas View Post
    How do you resume job executions that have started but are not completed nor failed nor stopped because the server for some reason went down? How is it possible to resume these cases and have the job pickup where it left-off?
    I have a kind of ugly solution, which I put into the initial post. I would like to hear from Batch core developers, if resetStaleJob() above looks good. eparchas, you can use it as is – it works absolutely fine for me.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •