Results 1 to 8 of 8

Thread: Using in memory repository or in memory database

  1. #1
    Join Date
    May 2011
    Posts
    18

    Default Using in memory repository or in memory database

    Hello,

    We are planning for a new big proyect and so far we are leaning towards using Spring Batch because it is amazingly strong and complete.

    We have a functional requirement to process items 1 by 1 in some cases, and after doing some tests we found out that the "overhead" of using a database jobrepository while processing single items its way too much, all the inserts/updates done to BATCH_ tables increase the processing time considerably and definitely that is not something we want.

    Our idea right now is to use a master step to partition our data and then process it with N threads, our partitioner will assign just 1 record to each partition hence our slave step will process 1 record at a time. Is this the correct approach? Or is there any other "best practice" that we should follow here?

    If we do not care that the job is aware of the jobs/steps state when we restart the vm is there any other drawback of using the inmemory job repository instead of the database one?

    Also we were thinking if there is any bennefit/difference between using the inMemory job repository and setting up an inmemory database and use the database job repository pointing to the inmemory database?

    If there is indeed a beneffit from using an in memory database, is there any that is proved to work best with Spring Batch?

    Thanks in advance.

  2. #2
    Join Date
    Dec 2005
    Location
    Lyon, France
    Posts
    311

    Default

    Our idea right now is to use a master step to partition our data and then process it with N threads, our partitioner will assign just 1 record to each partition hence our slave step will process 1 record at a time. Is this the correct approach? Or is there any other "best practice" that we should follow here?
    usually the partitioner defines ranges of records, each range would be then processed in a dedicated step execution.

    Also we were thinking if there is any bennefit/difference between using the inMemory job repository and setting up an inmemory database and use the database job repository pointing to the inmemory database?
    the in-memory job repository isn't meant to be used in production. It lacks some concurrency support for example. You should try to use the persistent job repository with an in-memory database, like H2. It's safer and shouldn't add too much overhead.

  3. #3
    Join Date
    May 2011
    Posts
    18

    Default

    Thanks arno.

    Lack of concurrency support is enough to take the in memory database route, i will test that for a while and see how it goes.

    We will partition our data in bigger chunks but our worst case scenario is when we need to partition them in chunks of size 1 so we are analyzing this based on that. And you are right we will have a dedicated slave step to process each chunk.

    One more question is there a way to pass parameters between steps? or between the job and the steps? So far we think you can only pass String, Long, Date and Float parameters but what about a custom object?

    So with that in mind we probably will end up writing the custom object into a file and then pass the line number as argument to the next step. Is this ok or is there any Spring Batch support to make this easier/better?

  4. #4
    Join Date
    Dec 2010
    Posts
    175

    Default

    Yes. You can put an object into execution context.

  5. #5
    Join Date
    May 2011
    Posts
    18

    Default

    I was looking into the JobParameters, are you refering to ChunkContext.setAttribute(name, value) ?

    Is that context passed from one step to the next one?
    If i am using a master/slave step approach for parallel processing, can i access the context from the partitioner?

  6. #6
    Join Date
    Dec 2010
    Posts
    175

    Default

    no. I was referring to jobexecution which should be available to all the steps for that job.

  7. #7
    Join Date
    Dec 2005
    Location
    Lyon, France
    Posts
    311

    Default

    We will partition our data in bigger chunks but our worst case scenario is when we need to partition them in chunks of size 1 so we are analyzing this based on that. And you are right we will have a dedicated slave step to process each chunk.
    there's no relation between the partition size and the chunk size. The size of a partition can be 10K items and the chunk size for a step can be 1, it doesn't matter.

    One more question is there a way to pass parameters between steps? or between the job and the steps? So far we think you can only pass String, Long, Date and Float parameters but what about a custom object?
    you can indeed use the execution context. You can also use dependency injection: define an holder class, define a bean, and inject it the batch artifacts (readers, writers, etc.) that need the information. A step would set the value and a step would read it downstream.

  8. #8
    Join Date
    May 2011
    Posts
    18

    Default

    Setting parameters in the JobExecutionContext works great to pass data between steps but when you try to use parallel processing by using a partitoner and a partitionHandler you have a problem because you cant access the Context from the partitioner (or at least i dont know how to access it).

    Now if I use dependency injection to pass the data to the partitioner it works great as long as i only have 1 instance of the same Job running, if i have more than 1 instance all of them will share the same partitioner object (due to spring singletons) and the the partitioner parameters will be overwritten and mixed between each job instance.

    I thougth about injecting a map with the parameter using as mapkey the jobName and jobExecutionId that way i can control the parameters that belong to each instance of my job. But again this gets me to the same deadlock of the partitioner not being able to access the executionContext.

    Any ideas?

    Right now we are thinking about having a previous step that will setup all the required data by the partitioner to split the jobs and save it in a database table then the partitioner will get the data from the DB and partition the job.

    Please let me know your comments.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •