• user warning: Unknown column 'u.signature_format' in 'field list' query: SELECT c.cid as cid,, c.nid, c.subject, c.comment, c.format, c.timestamp,, c.mail, c.homepage, u.uid, AS registered_name, u.signature, u.signature_format, u.picture,, c.thread, c.status FROM comments c INNER JOIN users u ON c.uid = u.uid WHERE c.nid = 21025 AND c.status = 0 ORDER BY c.cid LIMIT 0, 50 in /var/www/ on line 991.
  • warning: file_get_contents( [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /var/www/ : eval()'d code on line 4.

Let's show some false results!

by Olivier Deschanels

This isn't the latest slogan from a movement of angry accountants, but a very serious proposition that I'd like to cover today.


In our profession as application developers, we have ever-growing masses of data to store and to manipulate. More and more often we have to produce results based on this mass of data, and understandably our users don't want to wait too long for a response while they query the database, looking for a specific result, a list, or a dashboard.


The first idea is to calculate the responses on demand, and do our best to make it happen quickly. However, when the number of connected users (whether 4D clients or via the web) increases, or when the calculations are complex and require multiple requests, its sometimes difficult to get a satisfactory response time. We can also find ourselves faced with requests that eat up all the server resources or impede otherwise normal usage of the database by emptying the cache to make room for whatever's needed to produce the desired response.


Taking a closer look, we realize that it's totally useless to calculate the results on demand, totally exact, two at a time.


Let's take, for example, a typical company with customers, salespeople managed by a sales director, and some accountants. Each morning, the sales director looks at the current month's revenue in order to know how sales are going. But what's he really looking at? The revenue itself, or revenue as the percentage of his mandated monthly goal? It's a safe bet that the more important number isn't the revenue, but the percentage achieved. From here we can avoid a number of calculations, or more precisely, shift them in time.



Effectively, if it's the percentage that our sales director is focusing on every morning, we can offer him a figure calculated before the offices are even opened, which is to say when the database activity is at a lull. At the moment the sales director reads the figures, they'd have already been calculated several hours before, and they wouldn't be in real time, but does it matter? The few invoices that come int he morning probably won't affect the percentage more than a few tenths of a percent. If by some stroke of luck (for the sales department) or bad luck (for the developer) there's an order that could really inflate the revenue, there's a good chance the director will already have heard the news and keep that in mind when looking at the pre-calculated figures.



To make this sort of pre-calculation acceptable, I use both transparency and the red herring technique at the same time.



Transparency means never hiding the fact that the number is pre-calculated. For that, I always add the date and time that it was done. Additionally, I'd offer, when possible, a function allowing immediate recalculation of the figures, announcing that this can take some time.



In parallel, I'm planting a red herring: This technique consists of diverting attention (here, from the fact  that the figure is pre-calculated and potentially erroneous) by pointing attention toward something else. In the case of our sales director, I'm offering him the figures via a personalized email. The figures are thus available directly in his inbox without having to connect to the server. This service gives him the figures necessary to direct the company's sales, and in his subconscious he knows that emailed information isn't instantaneous information, and thus I don't have to spend a lot of time justifying my choice.


As nothing is ever simple in our profession, we can find ourselves, with the same data and the same database, in the inverse situation. Effectively, the accounting department would prefer to have rigorously accurate figures, not a penny off. However, the accountants work with fixed numbers, certainly from the past. For example, the accounting analysis of the first quarter generally doesn't start before mid-April. For us developers, this is an opportunity. In effect, we have over a week to prepare the data to supply to our accountants, so we can thus leisurely prepare everything we need. In other words, we can anticipate the demands of our accountants and prepare what's costliest in terms of server resources at a time when they'll be little used, and not at the last minute. 


Pre-calculation is the secret weapon for optimizing numerous cases. Take, for example, the "top 10" offered in the 4D Forums that rewards contributors based on ten parameters. The calculation takes long to do and luckily it's not performed with each web request. It's actually performed once a day over a long time so as not to take up more resources than necessary. During this calculation, the data of the preceding day remain available and it's not until the calculation is over that the new daily numbers replace the old.



In the same manner, the user list on the right side of each page of the Forum isn't calculated in realtime. I've put in place a manager in the form of a stored procedure that's called only when a new user is connected to refresh the list. This manager is also prompted to life every 15 minutes to clean up the list and remove those who are no longer online. The list offered is thus false if you take the very moment into consideration, but gives a view of the users connected in the last fifteen minutes.


Displaying time-shifted results can be a very worthwhile strategy when the results are clearly identifiable as such!


To conclude, I'll relate to you a "tense" discussion that I had on this subject with a developer who insisted upon never making false calculations – his honor was at stake! I asked him then if his requests take more time and he told me that it only lasts three minutes, but that it's understandable because he's making a full balance of sales orders received by telephone. A little later in the conversation, I learned that the system can return about one order per minute. So the balance that the developer thought was absolutely exact was, in fact, false because the data has changed since the beginning and the end of the calculation.


For the moment, we have no choice but to return false results, unless we want to increase the activity of our application.

RSS 0 comment(s) to this post