Jan 3rd, 2008, 10:31 AM
Using 2 different datasources with Spring Batch
I am completely new to Spring Batch and am just getting started with it, so I apologize if some of my questions are too obvious!
In my department we use Microsoft SQL Server databases. Also, we are restricted to the usage of stored procedures when interacting with the database. That's why I am trying to adapt the job and step incrementer to that policy and haven't succeeded so far.
Therefore, I was wondering if it is possible to use 2 databases simultaneously, the standard one for data processing and keep the embedded HSQLDB (or any other database) only for the framework?
Jan 3rd, 2008, 01:02 PM
You can choose any database that you like to run Spring Batch with. The framework does store metadata in 4 tables (will probably increase with m4 though). You could easily use a database to store metadata in, and another one for application concerns. However, I wouldn't recommend using an in-memory database for the metadata. The value of storing information about a batch run is being to able to look at it in real-time, and report off of it. For instance, you could look back and see what time a job was run two weeks ago, and how many records it processed, etc.
I'm curious about this 'everything has to be a stored procedure' policy at your company. In a batch scenario, this can cause some serious issues. For example, since the datasets are so large, and take long periods to process, we break them up into 'chunks' and commit after each chunk has been processed. This allows us to restart from the last chunk successfully processed, and keeps transaction sizes at a reasonable size, but not too small for performance reasons. It would not be possible to do this in a stored procedure, since the framework would have no control, and there would be no advantage in a stored procedure that processes one record. Furthermore, most of our 'stock' database item readers do not use stored procedures. The same is true for our metadata storage dao's.
Jan 3rd, 2008, 04:34 PM
Using Multiple Databases
I currently use two datasources, one for all the batch tables and sequences, and one for my data store. There is nothing in the framework that prevents you from doing this.
The only real issue I think you should be concerned about is transaction management across multiple datasources, since you can only register one transaction manager with your step executor.
I do agree that there is a problem with the policy of only using stored procedures. Unfortunately, as Lucas points out, this can potentially cause you to lose a great deal of the benefit of Spring Batch, but in a corporate environment I understand that you can't do much about this. I can't tell you what you should do about this, as I do not know your position. I would personally push back and say "this is a framework that is outside of our control - create a new username for it in the database that it will use solely and leave it as it is." In any case, my recommendation to you is to try using granularly-defined stored procedure definitions with Spring Batch but keep a close eye on performance and document any problems to review with your management.
Jan 4th, 2008, 03:42 AM
Thanks a lot guys for your quick reply!
In fact, the 'stored procedure only' policy is mainly enforced for historical reasons, and was introduced way before ORM frameworks emerged. Unfortunately, at my level I have no power to make an exception.
I can't do this either, since this is the first project in my division to use Spring (I mean not only Spring Batch). In fact, introducing Spring has been a personal initiative, therefore I have to prove that it can be tailored to our architecture and adapt to our policies.
Originally Posted by dkaminsky
This is why dedicating a database to Spring Batch and thus having the freedom to choose the way of interacting with it, is a compelling alternative.
In fact, the in-memory database is just a temporary solution to get started with the framework.
Originally Posted by lucasward
Indeed, that's my biggest worry! Since I'm new to Spring, I'm not familiar with its transaction management model. I presume that using the DataSourceTransactionManager is no longer sufficient. Do I have to use JTA? With 2 data sources what would be the equivalent declaration of:
Originally Posted by dkaminsky
"<bean id="sqlTransactionManager" class="org.springframework.jdbc.datasource.DataSou rceTransactionManager"
<property name="dataSource" ref="dataSource" />
Again, thanks a lot for your help!
Jan 4th, 2008, 01:05 PM
If you wish to use JTA, spring has a specific JtaTransactionManager implemention that you can use. Please refer to the Spring reference documentation for more information.
Doug is partly right, the StepExecutor does take a transaction manager, which is for the 'business' transaction. However, the storage of metadata needs to have transactions applied around it using AOP, which would allow for the usage of a separate transaction manager. You can see examples of this in our samples. It should also be noted that the only place this intersects is at certain times when the StepExecution or Step is stored as part of the business transaction, which is the case for restart data, however, this model will change a bit in milestone 4.
Jan 4th, 2008, 02:23 PM
To clarify my previous point:
1. You have a data source which represents your metadata tables - ie. the Spring Batch schema. This data source is managed using Spring AOP.
2. You have one or more other data sources which represent your business data. If you only have one, you do not need to use JTA, since you can create a Spring transaction manager and inject that into your StepExecutor. This is the case with my personal situation. However, if you have several "business" data sources, JTA may be necessary to properly control transactions between them, which is the warning I was trying to convey.
3. Ditto the exception Lucas mentioned.
Jan 7th, 2008, 06:42 AM
As you said, I created 2 Spring transaction managers, the business one injected in StepExecutor and the second one in a transaction advice applied to batch repository methods, and it works like heaven!
Thanks a lot for all your help!
Apr 28th, 2008, 02:25 PM
2 Spring transaction managers
Can someone please post an example of Spring Batch with two transaction managers? We will persiste our batch metric data to database_1 (Oracle) and our batch jobs will read and write to database_2 (SQL Server). Our batch environment will not run within an application container and we will be using Spring JDBC for persistence.
Originally Posted by Choucri FAHED
Apr 28th, 2008, 02:39 PM
There's not too much to post, the configuration for a Step with a transaction manager is already well defined in the reference documentation:
There's nothing stopping you from using a different transaction manager in your Dao from the one you wired in your step. However, without JTA, when the framework rolls back only the transaction manager wired into the step will rollback, not the one in your dao. Furthermore, you would have to find someway to commit on the dao transaction manager as well. The only way to safely get around this is to use JTA. Since you're running out of container, the only option I can think of is Jotm.
Jan 31st, 2012, 01:42 AM
Multiple data sources for chunk based processing
I have 2 different datasources, one to read and another one to write results like below:
<job id="sampleJob" job-repository="jobRepository">
<step id="step1" transaction-manager="myTransactionManager">
<chunk reader="itemReader" processor="itemProcessor" writer="itemWriter" commit-interval="10"/>
ItemReader should get data from dataSource_1.
ItemWriter should write data to dataSource_2.
As per the documentaion, we can configure a single transaction manager at tasklet
In this scenario, how do i use the transaction manager here ?