Page 2 of 2 FirstFirst 12
Results 11 to 16 of 16

Thread: resume the job after power failure

  1. #11

    Default

    Are the steps to update the database to resolve this issue documented? We had a power failure and I have two jobs that are now failing to start with the message "instance is already running: A job execution for this job is already running: JobInstance: id=124110"

    I found both job instances in the tables but I'm not sure what to do with them to resolve the issue. Should I just delete all rows related to these two job instances or can I modify the status columns to get the framework to realize they fail and recover?

    Code:
    SELECT *
    FROM BATCH_JOB_INSTANCE BJI
        LEFT JOIN BATCH_JOB_EXECUTION BJE ON BJI.JOB_INSTANCE_ID = BJE.JOB_INSTANCE_ID
        LEFT JOIN BATCH_STEP_EXECUTION BSE ON BJE.JOB_EXECUTION_ID = BSE.JOB_EXECUTION_ID
    WHERE BJI.JOB_INSTANCE_ID IN (124108, 124110)
    Code:
    JOB_INSTANCE_ID  VERSION  JOB_NAME           JOB_KEY                                    JOB_EXECUTION_ID  VERSION  JOB_INSTANCE_ID  CREATE_TIME              START_TIME               END_TIME  STATUS   CONTINUABLE  EXIT_CODE  EXIT_MESSAGE  STEP_EXECUTION_ID  VERSION  STEP_NAME                                              JOB_EXECUTION_ID  START_TIME               END_TIME  STATUS   COMMIT_COUNT  ITEM_COUNT  READ_SKIP_COUNT  WRITE_SKIP_COUNT  ROLLBACK_COUNT  CONTINUABLE  EXIT_CODE    EXIT_MESSAGE
    124108           0        statsPurgingJob    STORED_DATE=Sun Jan 11 06:00:35 CST 2009;  130388            2        124108           2009-01-11 06:01:30.124  2009-01-11 06:01:30.135  <null>    STARTED  N            UNKNOWN    <null>        151300             0        org.jasig.portal.stats.purge.StatsPurgingStep#1eb2c1b  130388            2009-01-11 06:01:32.022  <null>    STARTED  0             0           0                0                 0               Y            CONTINUABLE  <null>
    124110           0        statsAggregateJob  STORED_DATE=Sun Jan 11 05:51:52 CST 2009;  130390            2        124110           2009-01-11 06:03:00.118  2009-01-11 06:03:00.126  <null>    STARTED  N            UNKNOWN    <null>        151302             0        StatsAggregatingStep                                   130390            2009-01-11 06:03:00.892  <null>    STARTED  0             0           0                0                 0               Y            CONTINUABLE  <null>

  2. #12

    Default

    I'm not sure if this is the appropriate solution but after some trial & error I found the following SQL fixed my problem. I'd still love to get feedback on if this is the correct approach.

    Code:
    UPDATE BATCH_JOB_EXECUTION BJE
    SET STATUS='FAILED', EXIT_CODE='FAILED', END_TIME=SYSDATE
    WHERE BJE.JOB_EXECUTION_ID IN (130388, 130390);
    
    UPDATE BATCH_STEP_EXECUTION BSE
    SET STATUS='FAILED', EXIT_CODE='FAILED', END_TIME=SYSDATE
    WHERE BSE.JOB_EXECUTION_ID IN (130388, 130390);

  3. #13
    Join Date
    Jan 2009
    Posts
    1

    Default

    Quote Originally Posted by edalquist View Post
    I'm not sure if this is the appropriate solution but after some trial & error I found the following SQL fixed my problem. I'd still love to get feedback on if this is the correct approach.

    Code:
    UPDATE BATCH_JOB_EXECUTION BJE
    SET STATUS='FAILED', EXIT_CODE='FAILED', END_TIME=SYSDATE
    WHERE BJE.JOB_EXECUTION_ID IN (130388, 130390);
    
    UPDATE BATCH_STEP_EXECUTION BSE
    SET STATUS='FAILED', EXIT_CODE='FAILED', END_TIME=SYSDATE
    WHERE BSE.JOB_EXECUTION_ID IN (130388, 130390);

    I haven't had a chance to dive into the inner workings of the way spring batch interprets the requirements for when you can restart a job, but it does seem that you need to put both the status and exit code to failed for it to accept a new job with the same parameters. For anyone using the 1.x release, I guess you have to write some sort of boot strap code to examine all jobs that have running statuses, adjust them to failed, pull out the job and job parameters and pass them back into the joblauncher to properly restart them?

  4. #14
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    You're basically correct. We have no way of knowing if a job with a started of other than 'FAILED' or 'STOPPED' is still running or not. At some point (I think in 2.0) we added the last_updated column, which gives a little more detail. However, since Spring Batch has no knowledge about the application or what the commit intervals might be, we leave it to the user to examine and set them to failed. If you want to automate this, the new interfaces in 2.0 make it much easier to query for these before starting your job.

  5. #15
    Join Date
    Aug 2008
    Posts
    16

    Default Property to tell Spring Batch that it shouldn't care about status when restarting

    Quote Originally Posted by Dave Syer View Post
    If you have some concrete suggestions for improvements, features, or use cases that we could implement time is running out for 2.0, so please tell us what is needed.
    We are running our jobs in a mature batch environment that uses a scheduler to schedule batch jobs in sequences and with dependencies at predefined times. The jobs are handled by a job entry subsystem that has full control over the jobs and knows whether they are running or has been running. The jobs are numbered and there will never be two or more copies of the same job run at the same time. The output from the jobs are stored in a safe place.

    As I mentioned earlier in this post, we have other statuses in our home grown repository. We can only have two statuses, one is Not Completed, which implies that the job is running or that it has been prematurely ended. The other is Completed, which means that the job has completed(!). So, if the status after a power outage is Not Completed, the job can be restarted without any problems since we know outside of Spring Batch that the job is not running. On the other hand, if the status is Not Completed we will never restart the job as long as we know that the job is still running - and we know that outside of Spring Batch.

    Our proposal is that there should be a way to configure Spring Batch to tell it that we have control over the jobs outside of Spring Batch. In that way we could restart the jobs after a power failure or any other hard stop of the job. I.e., we would like a property to tell Spring Batch to not care about the status so that we can restart jobs without having to fiddle with the status flag in the repository - neither by hand nor by using an application programming interface.

    Any views on this?

    /Len...

  6. #16
    Join Date
    Jun 2005
    Posts
    4,231

    Default

    The BatchStatus is at the heart of some quite fundamental domain logic inside Spring Batch, and you are asking if we can ignore it. I think the answer is probably "no". It would be better to take the route proposed earlier and make it easier to reset the RUNNING jobs that you somehow know are dead. If I were you I would create a JIRA on that topic so you can track it.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •