Feb 11th, 2008, 08:51 PM
Question on basic usages of Spring Batch
I have tried to read through the samples of Spring Batch (1.0m3 and m4) and still not that clear if my understandings on some of its usage are correct. It will be great if some hints can be given to me the Spring-batch-newbie on the following issues:
1) For JobParameter (formerly JobIdentifier), is it correct that: if same job is launched (passing job of same type to launcher), it is considered as a re-run of previous job, if and only if Job Param passed contains same content?
2) if 1 is correct, what if I want to re-run a job, but with extra parameter passed to the job? (for example, username who launch the job, which a different user may re-run a previously failed job)
3) Is there any place that I can put shared information throughout the job execution? For example, I may want to retrieve the invoking user's information, so that every step/tasklet or item processor in my job can read it? Is Job Param a candidate for such purpose?
4) I want to make a service that user can send request to invoke jobs. What is the correct way if I want the same job able to be concurrently invoked? It seems that if I defined the job in app context, the Job created (and registered to JobRegistry) is a singleton bean. If I have stateful information (e.g. StepContext or other data) in the Job, multiple invocation (by making use of Async Task Executor in JobLauncher) of same job will cause abnormal behaviour.
Shall I create a separate app context for every incoming request, so that every app context (and JobLauncher) serves for only one job?
Thanks a lot for your help
Feb 12th, 2008, 01:12 AM
1) You are correct. To expound a bit, if a job is configured to be restartable and it is invoked again with the same parameters, then the same job instance will be referenced and the job will be restarted with a new execution if a restart is available.
2) As of M4 release, there is currently no supported way to provide non-identifying job parameters (i.e. job parameters that are not used as part of the job instance identification logic). You can get around this by passing in this type of argument as system properties (i.e. with the -Dkey=value argument to the java interpreter) and then accessing them via System.getProperty(key).
3) There is no offical "job context" mechanism. The simplest way to do this is probably to create a single Spring-managed singleton object to hold your data (e.g. a Spring bean of class java.util.Properties) and then injecting that object into each of your job artifacts that needs to share information.
4) Even though the job configuration is a singleton, each job instance is created from the job configuration as a new object by the job launcher, so you should have no problems with concurrency in this regard, as long as you are not trying to do anything funky or unsupported (e.g. changing the job configuration object, which is not threadsafe since modifying it during execution is not supported).
Originally Posted by adrianshum
Feb 12th, 2008, 01:14 AM
A bit more regarding 3) - if these steps are so tightly coupled, you might consider combining them into a single step.
Feb 12th, 2008, 04:50 AM
Thanks a lot dkaminsky. It did help to clarify a lot of my understanding.
May I have some extra question on 3)?
You have suggested to inject a singleton Properties bean to the job. However, isn't such way will cause all instance of same job config shared the Properties? I think we should have one Properties for each JobInstance right? However, by what way I can achieve that?
Last edited by adrianshum; Feb 12th, 2008 at 05:13 AM.
Feb 12th, 2008, 07:58 AM
I'm not sure what you mean. You can always declare other beans within your job in a different scope (e.g. prototype or step scope) even if your job is a singleton. There are three different job-related artifacts -- Job, JobInstance and JobExecution. The Job should be a singleton. Each JobInstance will be a brand new object created from the Job by giving it a different set of JobParameters. These job instances are matched up with previous instances through the repository - if that instance already exists in the repository, a restart is invoked (if the job is configured to allow that). Each JobExecution is generated from the JobInstance and represents an attempt to run a JobInstance.
I hope this clarifies things for you. If not, please describe in more detail what you are trying to do that you believe the singleton scope won't allow.
Feb 14th, 2008, 01:39 AM
Just to make sure I understand, did the "Job Configuration" you mentioned the same thing as "Job"?
Is my understanding about this correct?: When a Job Execution is created, it will refers to a Job Instance (either newly created or retrieved from DB). Job Instance refers to Job (which contains steps which contains Tasklets). Launcher will ask Job to execute with the created Job Execution. Job's execute delegate its work to each of its step's execute method, passing corresponding step execution, in which StepExecutor will subsequently invoke step's tasklet.
However, as Job is a singleton (so do the steps and tasklet inside it), why I found out that in every execution, it is executing on a separate Tasklet instance? Seems I get lose where the tasklet is 'cloned and executed'
(I am trying to trace the code to get some understanding but still have some confusion on that... probably I missed out some magics between the flow)
Regarding the singleton question, what I am thinking is, if we have a Job A, and contains a step for which contains a Tasklet (T-A). If I create a singleton Properties bean and inject to T-A, when same job is invoked twice concurrently, two job instance will be created but at last it will delegate the exeuction to T-A. and that two instance of T-A willl be referring to the same Properties object instance, which is not something I ask for. Is my understanding correct?
Feb 14th, 2008, 02:08 AM
Before the M4 release, Job Configuration referred to the object that is now called Job. When I use it in the context of M4, I am referring to the specific way that you configure your Job object or the file containing that Job definition.
Originally Posted by adrianshum
What you wrote is basically correct, except that you are missing the idea that a Step will also have a StepInstance (which is created from the Step and JobInstance) and a StepExecution (which is created from the StepInstance and JobExecution).
The Job and Step objects are singletons, but the runtime artifacts (the *Instance and *Execution instances) are created as necessary by the framework, so they are not singletons.
You are, however, correct, that declaring a Tasklet as a singleton has implications to the functionality of your job. Your tasklet is not cloned or copied, so if you declare it as a singleton, the same tasklet object will be used everywhere it is referenced.
The best practice as recommended by the Spring Source developers on the project is to keep your Tasklets stateless (e.g. any call to the tasklet's execute method will be completely independent of all previous calls). If your tasklet is stateless, the fact that it is a singleton shouldn't matter, as every call to the execute method will be treated the same way. (See ItemOrientedTasklet in the execution package to see an example of a stateless tasklet.)
If your tasklet is stateful, however, you have the alternative of defining it in "step" scope so that its lifecycle is managed by Spring at the beginning and end of each step execution. This would make it so that a new instance of the tasklet would be created at the beginning of each step execution.
In terms of properties, do you mean JobParameters or do you mean a custom Properties object? For JobParameters, it shouldn't matter since they will be retrieved from the corresponding StepContext. For a custom Properties object, if you declare it in "step" scope, a new one would be created for each step. If you declare it as a singleton, it would be shared between steps.
Feb 14th, 2008, 04:02 AM
Thanks a lot for the clear explanation.
Originally Posted by dkaminsky
For Properties, I am referring to the previous discussion about Job-Context. I'd like to have some data shared among tasklets/processors within each job execution. For example, I may put the requesting user ID so that tasklets can get that to perform its logic.
Therefore what I want is really a Job scope stuff, which I can inject to my tasklets/processors of same job, while it is shared within one job instance/job execution.
If what I understood is correct, both singleton or step scope don't really help in such case. Probably the only way is the extend JobParameters to have a Properties in it. So that it can serve for both
"Job context" and
"Non-identifying job parameters" (discussed in your reply to my 2nd point in first article). Then I make my interested Tasklet/Processor StepContextAware so that it can get the JobParameters by ((CustomJobParam)stepContext.getStepExecution().ge tJobExecution().getJobInstance().getJobParameters( )).getAbitraryDataProperties()
Is this way acceptable?
Really thanks a lot for your kind helps
Last edited by adrianshum; Feb 14th, 2008 at 04:08 AM.
Feb 14th, 2008, 04:06 AM
That would be a fair solution, provided you are willing to do the necessary extension of JobParameters and JobParametersFactory. This also might be a good use case for inventing either a JobContext and/or JobScope.
You might consider submitting a JIRA issue for one or the other...
Feb 14th, 2008, 04:17 AM
If I understand your issue properly, you could, alternatively, declare a ThreadLocal holder for your Properties in singleton scope. The init and destroy hooks for your Properties would not be called at the beginning end of the job execution (they would be called when the app context is created and destroyed) but it would allow concurrent job processing.