PDA

View Full Version : SFTP synchronization workflow



ilj
Jul 1st, 2012, 09:01 AM
hi, guys

i'm working on a large project, that heavily uses Spring Framework, especially Integration. several modules of our project download files via SFTP and process them in different ways. i had Spring Integration with very little Java code for some of the workflows our business people wanted, but then i got this one:

user drops a file into remote location.
scheduled job downloads the file and processes it - parses content and puts some data into DB. file name determines the use case.
when data needs to be updated, user edits the original file on remote location.
scheduled job detects the change and downloads and processes only the changed files.

this looks very simple from the user perspective. also seemed pretty easy to implement - save file names and their modification times into DB and check, whether we have already processed this change or not, download only the changed files and voila! but, i haven't found the way to do this via Spring Integration means.

what i have encountered, can be split into 3 separate problems:

i can't make AbstractInboundFileSynchronizingMessageSource process the file with same name twice or more, because you have AcceptOnceFileListFilter hard-coded. thus, i can't process a file without some kind of timestamp in its name.
i can't make AbstractInboundFileSynchronizingMessageSource process all the files i got during last remote directory synchronization, whatever the number was, but not try to synchronize again until the next poll, (see AbstractPollingEndpoint$Poller.run and AbstractInboundFileSynchronizingMessageSource.rece ive methods). thus, i either have a limited number of files processed per poll, or have to synchronize and process the same files over and over, until i'm sure i have processed enough.
i can't make AbstractInboundFileSynchronizer/SftpInboundFileSynchronizer to preserve modification date of files being synchronized.


thus, i had to fall back from plain and simple


<int-sftp:inbound-channel-adapter>
<int:poller/>
</int-sftp:inbound-channel-adapter>

to


<int:inbound-channel-adapter>
<bean class="MySource">
<constructor-arg name="synchronizer">
<bean class="MySynchronizer">
<constructor-arg name="sessionFactory" ref=""/>
</bean>
</constructor-arg>
</bean>
<int:poller max-messages-per-poll="1000"/>
</int:inbound-channel-adapter>

NOTE: i omitted all the attributes that do not relate to my problems. it's here just to describe the solution.

so, i implement the basic interfaces almost the same ways you have already done, just making it work the way i described. it works fine, but i'm not satisfied at all with such a solution.

so, my question is - have i missed some nice and easy way to implement this using Integration or you have not considered any of these cases useful and do not support them?

oleg.zhurakousky
Jul 1st, 2012, 09:36 AM
SFTP inbond-channel-adapter has a 'filter' attribute so are you saying you are having problems injecting your custom filter?

Not sure I understand the second bullet.

As for the third, I assume you are talking about modification date of the remote file. If that's the case I am not sure its even possible to preserve this date once file is transfered locally since to the local file system its a brand new which can be modified later but its timestamp will only reflect local modification time (not remote)

ilj
Jul 1st, 2012, 06:04 PM
SFTP inbond-channel-adapter has a 'filter' attribute so are you saying you are having problems injecting your custom filter?


this one is injected to synhronizer and is used to filter remote files. then, when you have copied files to local directory, you have another set of filters which is used for FileReadingMessageSource. check AbstractInboundFileSynchronizingMessageSource.buil dFilter method.



Not sure I understand the second bullet.


maybe i had to mention, that i have to delete files from local dir, since i don't want to process files which i have not downloaded during this poll. anyways, here's an example:
i have 5 different files in my remote directory and i want my adapter to check for modifications in them every hour. so i have


<int-sftp:inbound-channel-adapter>
<int:poller cron="0 0 * * * *"/>
</int-sftp:inbound-channel-adapter>


and then, if i set max-messages-per-poll to 5 and only 3 files are actually different, i will have synchronizer called after these 3 are processed, since no files are left. and synchronizer will download same 3 files again, but AbstractInboundFileSynchronizingMessageSource will create two more messages. there is no way to say "create messages for all the files you have just downloaded".



As for the third, I assume you are talking about modification date of the remote file. If that's the case I am not sure its even possible to preserve this date once file is transferred locally since to the local file system its a brand new which can be modified later but its timestamp will only reflect local modification time (not remote)

this sure is possible, at least with JSch:
you can create another abstract method AbstractInboundFileSynchronizer.getModificationTim e, implement it for SftpInboundFileSynchronizer as LsEntry.getAttrs().getMTime(). then you can set local file timestamp with File.setLastModified.

maybe i shall submit my code to clear things up?

ilj
Jul 9th, 2012, 06:45 AM
can anyone help me with this?

oleg.zhurakousky
Jul 9th, 2012, 12:36 PM
I am stil not sure I understand the second bullet (sorry), but it seems to be that it all comes down to you wanting to filter files based on modify state.
You can definitely raise a JIRA request and we can look at it
But for now you can probably use Service Activator with raw JSch code

ilj
Jul 11th, 2012, 02:54 AM
https://jira.springsource.org/browse/INT-2662
https://jira.springsource.org/browse/INT-2663
https://jira.springsource.org/browse/INT-2664

why Service Activator? it's an Inbound Adapter. see my opening post.

oleg.zhurakousky
Jul 11th, 2012, 07:29 AM
Sorry my mistake. I meant generic component such as SA, but in your case it would be a generic int:inbound-channel-adapter