Results 1 to 10 of 12

Thread: Using 2 different datasources with Spring Batch

Hybrid View

  1. #1
    Join Date
    Jan 2008
    Location
    Antibes (FRANCE)
    Posts
    7

    Default Using 2 different datasources with Spring Batch

    Hello,

    I am completely new to Spring Batch and am just getting started with it, so I apologize if some of my questions are too obvious!

    In my department we use Microsoft SQL Server databases. Also, we are restricted to the usage of stored procedures when interacting with the database. That's why I am trying to adapt the job and step incrementer to that policy and haven't succeeded so far.

    Therefore, I was wondering if it is possible to use 2 databases simultaneously, the standard one for data processing and keep the embedded HSQLDB (or any other database) only for the framework?

    Cheers,
    Choucri

  2. #2
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    Choucri,

    You can choose any database that you like to run Spring Batch with. The framework does store metadata in 4 tables (will probably increase with m4 though). You could easily use a database to store metadata in, and another one for application concerns. However, I wouldn't recommend using an in-memory database for the metadata. The value of storing information about a batch run is being to able to look at it in real-time, and report off of it. For instance, you could look back and see what time a job was run two weeks ago, and how many records it processed, etc.

    I'm curious about this 'everything has to be a stored procedure' policy at your company. In a batch scenario, this can cause some serious issues. For example, since the datasets are so large, and take long periods to process, we break them up into 'chunks' and commit after each chunk has been processed. This allows us to restart from the last chunk successfully processed, and keeps transaction sizes at a reasonable size, but not too small for performance reasons. It would not be possible to do this in a stored procedure, since the framework would have no control, and there would be no advantage in a stored procedure that processes one record. Furthermore, most of our 'stock' database item readers do not use stored procedures. The same is true for our metadata storage dao's.

    -Lucas

  3. #3

    Default Using Multiple Databases

    I currently use two datasources, one for all the batch tables and sequences, and one for my data store. There is nothing in the framework that prevents you from doing this.

    The only real issue I think you should be concerned about is transaction management across multiple datasources, since you can only register one transaction manager with your step executor.

    I do agree that there is a problem with the policy of only using stored procedures. Unfortunately, as Lucas points out, this can potentially cause you to lose a great deal of the benefit of Spring Batch, but in a corporate environment I understand that you can't do much about this. I can't tell you what you should do about this, as I do not know your position. I would personally push back and say "this is a framework that is outside of our control - create a new username for it in the database that it will use solely and leave it as it is." In any case, my recommendation to you is to try using granularly-defined stored procedure definitions with Spring Batch but keep a close eye on performance and document any problems to review with your management.

  4. #4
    Join Date
    Jan 2008
    Location
    Antibes (FRANCE)
    Posts
    7

    Default

    Thanks a lot guys for your quick reply!

    In fact, the 'stored procedure only' policy is mainly enforced for historical reasons, and was introduced way before ORM frameworks emerged. Unfortunately, at my level I have no power to make an exception.
    Quote Originally Posted by dkaminsky View Post
    I would personally push back and say "this is a framework that is outside of our control"
    I can't do this either, since this is the first project in my division to use Spring (I mean not only Spring Batch). In fact, introducing Spring has been a personal initiative, therefore I have to prove that it can be tailored to our architecture and adapt to our policies.

    This is why dedicating a database to Spring Batch and thus having the freedom to choose the way of interacting with it, is a compelling alternative.

    Quote Originally Posted by lucasward View Post
    However, I wouldn't recommend using an in-memory database for the metadata.
    In fact, the in-memory database is just a temporary solution to get started with the framework.

    Quote Originally Posted by dkaminsky View Post
    The only real issue I think you should be concerned about is transaction management across multiple datasources, since you can only register one transaction manager with your step executor.
    Indeed, that's my biggest worry! Since I'm new to Spring, I'm not familiar with its transaction management model. I presume that using the DataSourceTransactionManager is no longer sufficient. Do I have to use JTA? With 2 data sources what would be the equivalent declaration of:
    "<bean id="sqlTransactionManager" class="org.springframework.jdbc.datasource.DataSou rceTransactionManager"
    lazy-init="true">
    <property name="dataSource" ref="dataSource" />
    </bean>" ?

    Again, thanks a lot for your help!
    Choucri

  5. #5
    Join Date
    Dec 2006
    Posts
    1,061

    Default

    If you wish to use JTA, spring has a specific JtaTransactionManager implemention that you can use. Please refer to the Spring reference documentation for more information.

    Doug is partly right, the StepExecutor does take a transaction manager, which is for the 'business' transaction. However, the storage of metadata needs to have transactions applied around it using AOP, which would allow for the usage of a separate transaction manager. You can see examples of this in our samples. It should also be noted that the only place this intersects is at certain times when the StepExecution or Step is stored as part of the business transaction, which is the case for restart data, however, this model will change a bit in milestone 4.

  6. #6

    Default Clarification

    To clarify my previous point:

    1. You have a data source which represents your metadata tables - ie. the Spring Batch schema. This data source is managed using Spring AOP.

    2. You have one or more other data sources which represent your business data. If you only have one, you do not need to use JTA, since you can create a Spring transaction manager and inject that into your StepExecutor. This is the case with my personal situation. However, if you have several "business" data sources, JTA may be necessary to properly control transactions between them, which is the warning I was trying to convey.

    3. Ditto the exception Lucas mentioned.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •