Feb 19th, 2008, 10:26 PM
Extending Job Parameters
I would want to extend Job Parameters to store some transient, non-identifying parameters which will pass to each job. In order to achieve, I should have the following assumption:
JobParameter refered in JobExecution/StepExecuction will always be the instance I passed to JobLauncher, instead of a new instance constructed from persisted data (even in a re-run situation)
I know it is the case in current milestone (1.0m4). Can I take this assumption for the coming versions?
Thanks a lot
Feb 20th, 2008, 10:53 AM
I don't see that changing any time soon. But I am curious about your use case, what non-identifying information do you need to store that the Step also needs access to?
Feb 21st, 2008, 05:59 AM
in fact what I am trying to do is provide "real" parameters to the job, instead of an "identifier" currently JobParameters is acting as.
For example, in a batch that will export certain data from DB to a file, I may only put a sequence number as in JobParameter to act as a Identifier of Job, while I may put the request user's id, date range to export, and format to export etc as 'non-identifying job parameters'
There is always some case I need to pass something to the job execution, which shold not be considered as identifier of job. like, I may want to store the request user's ID in my task, while re-run of same job may not necessary be sent by same user.
Feb 21st, 2008, 06:59 AM
You realize that by doing that you are sacrificing the ability to properly restart in the case of failure, right? If the non-identifying parameters contain data relevant to the work being done (e.g. a date range) then when the job is restarted, if a different non-identifying date range is passed in restart may have undesired results...
Feb 21st, 2008, 08:35 AM
I am bit confused about what you are trying to do. As Doug points out you can't really do restart if the non-identifying parameters you mention are relevant for the job execution (and passing irrelevant parameters doesn't make sense) - so why are you trying to tie the new execution to the same job instance? Why not create a new job instance for new parameter values?
Concerning your example I don't see why the batch framework would be interested in userId. I think you don't really want to pass the userId to execution - the execution has no use for it. I guess you want to log which user launched the execution - so I would look towards tweaking the launcher. Does that make sense?
Feb 21st, 2008, 10:40 AM
yup of course I do aware of the restartability of job.
But sometimes, I may really need non-identifying attributes passed to job.
For example, requester user id as I mentioned, I may , in the job, have validations to see if the requester has permission to do the job. I don't really care who re-run the job but I need to ensure he is permitted; or, in my record updated, I need to set the requester ID as the 'last updated user', which is not really related to restartability but I shall need such information.
Secondly, in fact I do want to use it as a work-around for lacking of job-context , which in my situation, I only need a job-scoped context to let my steps (may be my first step doing some data-pre-fetching) and put to that transient job-scoped context to share throughout the job.
Feb 21st, 2008, 09:09 PM
This still seems like a 'Scheduling Concern' Most good enterprise schedulers provide this functionality (ensuring a user has rights) I really can't recommend launching a job, then getting to a step and bombing out because a user can't run that job. It seems to me like it would be much better to make that determination well before you even launch the job.
I also don't understand the JobContext issue as well.
Feb 21st, 2008, 09:24 PM
Feb 21st, 2008, 09:41 PM
@Non identifying JobParameters
I'm not completely against them by any means. When I first coded it up, I thought long and hard about if I should create two distinct sets. In fact, in a version I did months and months ago it did have that. However, here's the issue, if you're 'updated data' based upon this parameter, is not this a new JobInstance? If you're using this value to say, update some data (I still don't understand completely though) then restarting a job that will start at the same place it left off, but with different 'updated data' seems wrong to me. It still seems like, if you're using this parameter *at all* in your step, which I'm assuming you would if you bothered putting it in JobParameters in the first place, then restarting a job with a different parameter would be bad. Keep in mind, it's not preventing you from running the job, but simply causes a new instance to be created.
That being said, I'm still open to it, but I would need the law of 'job parameters are either used to identify or modify/control processing' to be broken, and I still can't come up with a use case that breaks it.
It still sounds like a caching solution would be a bit better than JobContext. I've seen a similar solution used in quite a few scenarios. Further, more caches are built to handle concurrent requests. This is another one I'm still open to (and I think the rest of the team as well) but I'm still not seeing a solid use case. However, it's probably another one of those things we'll look at when creating the feature/improvement list after release 1.
Feb 21st, 2008, 10:05 PM
Thanks a lot Lucasward,
Originally Posted by lucasward
Hope I am not creating too much troubles here :P but I just want to find way to achieve what I want as I am building a 'long-running job framework' for our new system to use
To be more precise, in fact we are building some kind of settlement system, which we need to run a day-end batch job once and only once everyday. I need to make use of the restart feature as I don't want completed steps to be run twice. However, job re-run may not be issued by same operator staff here. During our batch, we may need to update our data, and in our tables, we have a information field denoting who is the last user updating this record. Therefore, for example, if different user re-run a failed dayend batch job, although the parameter to the job (the user id) is different, I need to make them the same job instance because it is the day-end job of same day (which I shall give the date as the identifying job parameter)
External caching maybe a valid choice. Assume I build some kind of "cache manager", make it a singleton and inject to my steps, is it reasonable if I use the Job Exeuction ID as the key to put stuff into the cache?