Big O Explained: Why Systems Are Alike?
July 27, 2013
In several of my recent lectures, I pointed out that most end users cannot differentiate among search systems. The comment made about these systems is often, “Why can’t these systems be like Google?” I concluded that the similarity of requests suggests that systems are essentially identical.
One reason is that training in university and the “use what works” approach in the real world produces search, content processing, and analytics systems that are pretty much indistinguishable. There are differences, but these can be appreciated only when a person takes the systems apart. Even then, differences are difficult to explain; for example, why a threshold value in System A is 15 percent lower than in System B. When dealing with sketchy data, the difference is usually irrelevant.
Another reason is that today’s systems are struggling to cope with operations that stretch the capabilities of even the most robust systems. Developers have to balance what the engineering plan wants to do with what can be done in a reasonable amount of time on an existing system.
Enter Big O.
You may want to take a look at “Big O Notation Explained by a Self-Taught Programmer.” I found the write up interesting and clear. The main point in my opinion is:
Consider this function:
def all_combinations(the_list): results = [] for item in the_list: for inner_item in the_list: results.append((item, inner_item)) return resultsThis matches every item in the list with every other item in the list. If we gave it an array
[1,2,3]
, we’d get back[(1,1) (1,2), (1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
. This is part of the field of combinatorics(warning: scary math terms!), which is the mathematical field which studies combinations of things. This function (or algorithm, if you want to sound fancy) is consideredO(n^2)
. This is because for every item in the list (akan
for the input size), we have to don
more operations. Son * n == n^2
.Below is a comparison of each of these graphs, for reference. You can see that an
O(n^2)
function will get slow very quickly where as something that operates in constant time will be much better.
Net net: Developers have to do what works. Search and related content processes are complex. In order to get the work done, search systems have embraced “what works.” Over time, we get undifferentiable systems.
Disagree? Use the comments section to explain.
Stephen E Arnold, July 27, 2013
Sponsored by Xenky
Comments
One Response to “Big O Explained: Why Systems Are Alike?”
Hi Stephen,
The fact that most systems aren’t like Google does not mean that they are all the same: Ric is not Stephen and Hans is not Stephen, and yet Hans is not Ric. 🙂
Also, while people use heuristics (“what works”) when confronted with computationally hard problems (=problems that would take too much time and resources to be properly solved), not all heuristics are the same.
And before we get into the realm of computationally hard problems, there are some algorithms – sorry I should say methods for content analytics – that are simply better than others.
In any case, I can assure you that companies – and end users – differentiate search solutions: I can straight away think of three very large companies that are implementing Sinequa in replacement of competitors (“big names” with serious offerings). The impetus for the replacement came from users who did not get the fine-tuned relevance of search results they needed in their work environment. None of these companies is known for squandering its money on just marginal improvements. They ask themselves “where is the beef”! (You are old enough to know that series of advertisements with Clara Peller. :-))