View Full Version : SOA centralization harmful?
vmarcinko
Mar 25th, 2005, 12:02 AM
Hi folks.
I would like to hear architectural suggestion to problem that occurs frequently. I guess lot of you developed different applications that share common data in its databases, and one day some SOA evangelist came to you and said that it would be good to centralize such data under some new application, providing that data by some HTTP/XML remoting service ?
Such frequent example is User Repository. Instead that all applications have USER and ROLES tables in their databases for sake of authorization, let's centralize it under new User Repository application, to have user login in one place in whole system, instead of having separate ones for each application. Everything sounds great, since now, during authentication/authorization process, each application will connect to this new remoting service and perform authorization through this centralized place.
Except... What if each application have some specific user-related data in its database? Let's say it has some FAVORITES table having foreign key to USERS table. Before it was no problem, since databases are so powerfull with their SQL so one can fetch data in all possible optimized ways, and also data integrity is enforced by db constraints (foreign keys). But now, since USERS and ROLES table are not present in local database, I start to loose lot of functionality, since remoting interface can never be flexibile and rich enough, let alone performant, in providing me with needed data. Imagine you had some web pages that present this user-related FAVORITES data in a complex way, using COUNT(*), OFFSET, LIMIT, and many other ways, and it all worked easy and blindingly fast with all data in local database. But now, most of it seems practicly impossible... Not to mention that integrity of user-related data in local databases (FAVORITES records) is lost, since there are no foreign keys constraints to USERS table, and SOA evangelists suddenly start suggesting you to develop some periodic synchronizations with remote centralized users, and such wild ideas....
Thoughts?
-Vjeran
Martin Kersten
Mar 25th, 2005, 05:39 PM
Thoughts?
What you want to hear?
vmarcinko
Mar 25th, 2005, 11:18 PM
Well, I would like to hear maybe some suggestions to whole arhictecture problem described above? Is there some architecture out there that I'm not aware of, that solves elegantly this problem?
Or at least to hear from other people was this centralization worth in their cases, or was it best to leave applications data in separate databases, without doing centralization at all?
Martin Kersten
Mar 26th, 2005, 06:19 AM
Well ok.
A centralized login-service is mostly a security issue. The security issue may also be solved on the database level. Remember access and write rights. So for me the big question is not centralized services or not(whatever we understand by this term), it is about seperated databases or not. Your key problem is simply the seperation of semantically related datas. (expressed by foreign key constraints, like you already said).
If you seperate semantical related datas, you have a synchronization problem. Now we have several options. To synchronize you have to carry the CRUD operations (Create, Read, Update, Delete) to every point of responsibility (synchronization points). This can be done by timely synchronization (message driven for instance) or delayed synchronization (using update cycles for instance). (like you already said)
I don't know your architectural requirements, but if your databases are interchangeable you might first check, if it is possible to use the same distributed database within semantically interconnected datas. (using a distributed database as synchronization layer). I would rather buy a new server to play the node master than to implement a synchronization mechanism by myself. Also the databases had some great improvements about destributed transactions lately (MVCC etc.). There isn't that much overhead left compared to timely synchronization. And in this case (distributed database) you are using a well tested and well designed solution, which will save you a lot of time (aka money).
But there is another way to takle your problem, I guess.
We will just throw away everything we know about design patterns (design in terms of designing the architecture) and let the problem drive us.
Lets create a simple use-case:
Two tables:
users(userId, userName, userPassword)
favourites(userId foreignKey constraint towards users.userId, favourDescription)
Now the Service of login (merely an aspect) should be altered to use a new UserManager service. So we extract the responsibilities for the userName and userPassword etc.
This leads to (-->):
users(userId) + UserManager.getUserInfo(userId) for fetching existance, Name,...
favourites(userId foreign users.userId,favourDescription)
So by this our node need only two CRUD synchronization events:
1. createUser(userId)
2. removeUser(userId)
The update events of names etc now belongs to the UserManager service's bounded context. The createUser(userId) event can be done indirectly and asynchrone -> Close world assumtion, if your special node does not know an userId, the user does not have favourites.
So the only thing needed is the removeUser event. This can be handled using message driven architecture and an asynchrone synchronization mechanism to ensure atomarity of the message driven aproach in a timely fashion (e.g. garant that every user is truely removed within the whole system on a daily basis).
Since we now have the users table storing only primary keys, we can substitute it - more exactly the subsystem is semantically stable against the removal of the users table.
Now we need to talk about user informations, which are not part of a session description (you can store the username of the current user within the session datas). This is a caching issue and since your nodes are long live-nodes and I guess your network is not dynamic, you may use the normal suspects to solve this caching problem. This include things like distributed caches, or collecting every username the node optained and let this information age (e.g. a cached username is correct for the next 15minutes or mix it with a message driven approach (Send update name events as well).
I guess this solution is natural and provides minimal costs, you have to pay for this semantical seperation and these costs are the price you have to pay for the improvements on the security side. I wouldn't go for seperation until it is absolutly needed (for example in case of security requirements or that a single distributed database is no option (e.g if the modules are developed by 3rd parties)).
Hope I could help and get the discussion started,
Cheers,
Martin (Kersten)
PS: But there is another reason why you might want to go for a centralized user manager. You can buy a user manager from 3rd party... . Maybe the spring policemen can tell you more about it. We mostly use filters and AOP for security issues. I like the omnipotent (feature replicated) nodes. (I forgot the correct term and I am to lazy to grab my J2EE without EJB book).
Ben Alex
Mar 27th, 2005, 05:47 PM
I don't think anyone would disagree that a services oriented architecture is ideal if your requirements dictate it. Although doing it for the sake of it is a very bad idea. You'll have far more complexity in your architecture, there are more places where things can break (RDBMS FKs being unavailable to enforce integrity in the case of separated databases), and performance is at least an order of magnitude slower across-the-wire than same-JVM.
An excellent book on the subject is Enterprise Integration Patterns. It goes through the four common ways of integrating applications. A good overview is at http://www.eaipatterns.com/IntegrationStylesIntro.html, although the book itself treats everything in far more detail. They end up advocating the use of messaging, which makes sense over DB replication in many situations. Tools like Mule and JMS will assist in real-world implementations.
Related to the security-specific example, take a look at SAML. If something more practically useful in the present day is of interest, take a look at Yale CAS and Acegi Security. In this situation the CAS "web service" is responsible for returning a username to an interested application. There's no reason that username cannot be a UUID that serves as a PK/object identity for application-specific user identity information. If you investigate larger SSO designs (like Passport, Liberty), they tend to advocate obfuscated tokens representing a user, generally via UUIDs, and the individual applications then manage their own authorization/user profile information. The cancellation of a user account in this situation would generally be performed by a single webapp responsible for user administration inside CAS/SSO, which would publish a CancelEvent that is distributed via messaging to other applications so they know to remove their local user information.
vmarcinko
Apr 1st, 2005, 12:49 AM
Thanxs guys... I guess we all agree that this centralization/separation should be taken really carefully considering impact on whole system.
And Alex, speaking of this Yale CAS/SSO, and to save me time digging in it's documentation, does it have this publishing "UserRemoved" event feature?
vmarcinko
Apr 1st, 2005, 01:14 AM
Ugh, just to be sure about some things...
I understand that my USERS table should now contain only IDs of users, and user information should be queried through that web service, such as Username, Email etc...that are stored under separate DB (exposed as service call UserManager.getUserInfo(userId) let's say).
But because of performance reasons, it is now practicaly impossible to let's say develop Web Page with HTML table showing 100 users with their localy stored FAVORITES let's say...Because I want username shown beside each FAVORITE data, and that would require 100 separate web service calls needed to fetch each user's username from UserRepository, since username is now remotely available user data!
How to solve that then? Should I store also usernames in my local DB then, and take care of synchronization with repository?
Of course Martin, we assume that distributed database isn't available in the system.
Martin Kersten
Apr 1st, 2005, 03:49 AM
But because of performance reasons, it is now practicaly impossible to let's say develop Web Page with HTML table showing 100 users with their localy stored FAVORITES let's say...Because I want username shown beside each FAVORITE data, and that would require 100 separate web service calls needed to fetch each user's username from UserRepository, since username is now remotely available user data!
Sending 100 messages is surely a performance issue, right. So there is a need to avoid using 100 seperated calls but express the same informations. One I idea might be using nested messages. Sending one message but actually containing more then one command to perform (only one routing and avoiding of 99xof the one bit one message delay, but in sake for a longer message). This would decrease the network utilization.
Driving this even further, it might end with:
IUserManager.getUserInformations(int [] userIds/usernames/namepattern);
You know just go for a collection of users and query this all at once using the user-manager / service. So the only thing left is two message delays (request+response) + processing time.
How to solve that then? Should I store also usernames in my local DB then, and take care of synchronization with repository?
Would be an idea but I would go for caching + imposing a timely guarantee (like outdated user infos only remain for maximum duration of one hour). I don't like managing replicated informations myself.
Of course Martin, we assume that distributed database isn't available in the system.
Don't lough! :-D Using a distributed database as the network / synchronization layer is a great relief and the database vendors put great afford in distributed database organisation lately. You can now even include/exclude tables/rows for/from being distributed/replicated. You get very much for free here. But of cause in an inhomogeneous system this isn't that much of an option.
vmarcinko
Apr 1st, 2005, 08:02 AM
The idea with web service interface allowing grouped UserInfo fetching might work for this simple example, and I thought you could reply with that, but actually this way of thinking won't work for a bit complex requirements.
Let's say some other application needs to present HTML tables with FAVORITES (local user data) and usernames, but not plainly, but as a result from complex search queries involving sorting Usernames and Emails (remote data), with paging (offset and limit). Suddenly, you realize that even if you start bloating your UserRepository interface (frustrating endlessly interface developers), you can never achieve functional richness of SQL interface that you were using before, when all your data was in local DB.
Hmmm.... Hasn't Martin Fowler once said - "First rule of distribution: don't distribute!"
Martin Kersten
Apr 1st, 2005, 09:32 AM
Hmmm.... Hasn't Martin Fowler once said - "First rule of distribution: don't distribute!"
Another quite interesting quote: "Don't try to seperate, what can't be seperated."
The problem is really within the requirement-set. Sending one message but containing hundreds of commands is for example referred as tunneling. It is used for instances to boost performance of network communication (sending more then one packet at once using a diffrent protocol). But then you think, why not using a command representing hundreds. Then you remember the object oriented world. Sending a command means sending a message, sending a message means using a method. Blub you end with adding another message to your service interfaces... . (where you add the complexitivity depends on the situation of cause).
if you start bloating your UserRepository interface (frustrating endlessly interface developers)
There is another speak: "Choose between 'cheap, simple and fast'. You may have two of those but not all three." (I couldn't remembered the right words - sorry). This is the price you pay for performance optimization, I guess. To increase performance, you often find yourself adding unnecessary complexitivity (in terms of solving the task) and consuming additional resources (man-time, memory etc).
What suites you, depends on your situation of cause.
Cheers,
Martin (Kersten)
PS: About to distribute or not distribute, it can be drive further, I guess: Logically distribute as few layers as you can... . (using one distributed database you dont need to distribute your business and application layer as well -> no synchronization requirements for you to solve).
Powered by vBulletin® Version 4.2.1 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.