A colleague of mine and I got into a discussion about calculating derived values from data located in a database. In our case, we have an average rating we would like to calculate from a table of ratings. To implement it, I decided to write a service method that would eventually delegate to a DAO method that would be one SQL statement that would pull out the average rating. So in essence, it's one SQL statement to do this. He said that as the number of ratings go up, the cost of calculating the average rating "on the fly" would be higher and higher. Not only that, he said that at some point we are going to need to move the data out of the database and archive it, and at some point the data's not going to be there to make the calculation. He proposed that we add an "average rating field" to the object that is being rated. This in turn would be a database field that would hold the derived value. This value will be updated everytime a user creates a new rating and then it would be saved to the database.
I understand why he would want to do that because it is faster since you're calculating the average as you go and not all at once. But the problem I have with it is that it seems so unnatural. Adding a derived value as a field in the database doesn't quite fit well with my programming model. And it seems more prone to inaccuracy. What if one of the ratings change? Plus, it seems more of an optimization, and I don't want to optimize something that doesn't need optimization. Can anyone comment and help find a medium between the two approaches given the concerns? Are his concerns valid?


Reply With Quote