Big O Explained: Why Systems Are Alike?

July 27, 2013

In several of my recent lectures, I pointed out that most end users cannot differentiate among search systems. The comment made about these systems is often, “Why can’t these systems be like Google?” I concluded that the similarity of requests suggests that systems are essentially identical.

One reason is that training in university and the “use what works” approach in the real world produces search, content processing, and analytics systems that are pretty much indistinguishable. There are differences, but these can be appreciated only when a person takes the systems apart. Even then, differences are difficult to explain; for example, why a threshold value in System A is 15 percent lower than in System B. When dealing with sketchy data, the difference is usually irrelevant.

Another reason is that today’s systems are struggling to cope with operations that stretch the capabilities of even the most robust systems. Developers have to balance what the engineering plan wants to do with what can be done in a reasonable amount of time on an existing system.

Enter Big O.

You may want to take a look at “Big O Notation Explained by a Self-Taught Programmer.” I found the write up interesting and clear. The main point in my opinion is:

Consider this function:

def all_combinations(the_list): results = [] for item in the_list: for inner_item in the_list: results.append((item, inner_item)) return resultsThis matches every item in the list with every other item in the list. If we gave it an array [1,2,3], we’d get back [(1,1) (1,2), (1,3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]. This is part of the field of combinatorics(warning: scary math terms!), which is the mathematical field which studies combinations of things. This function (or algorithm, if you want to sound fancy) is considered O(n^2). This is because for every item in the list (aka n for the input size), we have to do n more operations. So n * n == n^2.

Below is a comparison of each of these graphs, for reference. You can see that an O(n^2) function will get slow very quickly where as something that operates in constant time will be much better.

Net net: Developers have to do what works. Search and related content processes are complex. In order to get the work done, search systems have embraced “what works.” Over time, we get undifferentiable systems.

Disagree? Use the comments section to explain.

Stephen E Arnold, July 27, 2013

Comments

One Response to “Big O Explained: Why Systems Are Alike?”

Hans-Josef on July 29th, 2013 6:16 am

Hi Stephen,
The fact that most systems aren’t like Google does not mean that they are all the same: Ric is not Stephen and Hans is not Stephen, and yet Hans is not Ric. 🙂
Also, while people use heuristics (“what works”) when confronted with computationally hard problems (=problems that would take too much time and resources to be properly solved), not all heuristics are the same.
And before we get into the realm of computationally hard problems, there are some algorithms – sorry I should say methods for content analytics – that are simply better than others.
In any case, I can assure you that companies – and end users – differentiate search solutions: I can straight away think of three very large companies that are implementing Sinequa in replacement of competitors (“big names” with serious offerings). The impetus for the replacement came from users who did not get the fine-tuned relevance of search results they needed in their work environment. None of these companies is known for squandering its money on just marginal improvements. They ask themselves “where is the beef”! (You are old enough to know that series of advertisements with Clara Peller. :-))

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.