Google and Objective Search Results

October 20, 2014

I recall that in one conference presentation in Boston about Google I attended, the Googler (Dave Girouard, now a Xoogler) emphasized the objectivity of Google search results. I have heard the objective claim from many quarters over the years.

I noted the PC Magazine story “Google ‘Fixes’ Stephen Colbert’s Height Listing.” Here’s the passage I noted:

While Google hasn’t exactly dropped a packet full of stock options off on Colbert’s doorstep, it has managed to address Google’s concerns about his height listing. First up, Colbert now appears as 5 foot 10.5 inches tall on Google’s search results when you query for “Stephen Colbert height.” If you prefer metric, his height is now listed as 1.79 meters… “-ish.”

From my hollow in Harrod’s Creek, this strikes me as an example of Google’s ability to modify search results quickly. I am not sure that the “objective” reference used by Mr. Girouard years ago applies today. If true, Google can intervene in the vaunted PageRank process and make results changes quickly and at will.

Are those claims of outfits like Foundem founded? Maybe, just maybe?

Stephen E Arnold, October 20, 2014

Google Scholar and Google Silos of Content

October 18, 2014

I read “Making the World’s Problem Solvers 10% More Efficient.” The article explains that the Google engineer who was “the key inventor” of Google Scholar is leaving the GOOG.

The write up discloses a couple of interesting factoids; for example:

  • Google Scholar has been around for 10 years
  • The founder of Google Scholar took charge of Google’s indexing in year 2000
  • The inventor of Google Scholar had to figure out how to keep Google’s index fresh; that is, new and changed content are reflected in search results.

The most interesting point in the write up is this statement (I have added the boldface):

Also, the nature of academic papers presented some opportunities for more powerful ranking, particularly making use of the citations typically included in academic papers. Those same scholarly citations had been the original inspiration for PageRank, the technique that had originally made Google search more powerful than its competitors. Scholar was able to use them to effectively rank articles on a given query, as well as to identify relationships between papers.

What happened to Eugene Garfield? I know, “Who?” So does this passage mean that today’s Google Web search discards functionality originally included in year 2000?

But the big point for me is that Google is supposed to deliver “universal search.” To make use of Google Scholar, one must navigate to http://scholar.google.com and run separate queries. Is this universal? It seems to be old school siloing.

I like Google Scholar, but I think Google Web search may lack some of the refinements included in Google Scholar. Well, ads are important. Correction: Revenue is important. Perhaps Google will charge for access to Google scholar and compete directly with commercial database vendors? In my view, Google Scholar had a negative impact on commercial database vendors who charge libraries, corporations, and individual for access to curated and indexed professional and scholarly information. Google seems content to allow the Google Scholar service to drift along. Would more purpose be of value? Queries for patent 2012/0251502 A1’s “the isolated nucleic acid molecule includes the nucleotide sequence of SEQ ID NOs: 1 or 10, or a complement thereof. In another, the nucleic acid molecule includes a nucleotide sequence having at least 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 4600, 4700, 4800, or 4900 contiguous nucleotides of the nucleotide sequence of SEQ ID NO: 1” would permit Google to match Ebola ads to Google Scholar content?

Stephen E Arnold, October 18, 2014

Blippex: By the People, For the People

October 17, 2014

Would Blippex be the search engine Alexis de Toqueville would love? The search engine is, according to Bloomberg, “a new crowd sourced public search engine.” Blippex makes use of technology developed for Archify, a system providing users with access to their online history. According to CrunchBase, the system has received seed funding of $700,000.

A year ago, Blippex was described as “the first interesting search engine since Google?” Like Qwant, Blippex is a search system crafted in Europe. Like Qwant, Blippex has ambitions for nibbling into Google’s market share for Web search.

The idea is that the search system is “built by its own users,” a phrase used in the Quartz article to describe the system. Quartz continued:

One of Blippex’s key selling points is that Kossatz and Baeck [the founders] are fanatical about privacy. Though Blippex constructs its search results on the basis of data gathered from its users, it does it in a way that’s anonymous and untraceable to any individual Blippex user. This obsession with privacy allows Blippex to rank pages—i.e., decide which pages to show people—with an algorithm that Google can’t match, because if Google gathered the data that Blippex does, users would find it unacceptably creepy.

Blippex does not track its users. One of the key technologies for the system is WebRTC. WebRTC is an open project that enables Web browsers with Real-Time Communications (RTC) capabilities via simple JavaScript APIs. If you don’t want to fool around with browser add ins, you can use Blippex like any other Web search system.

I ran a query for “enterprise search.” The results were interesting. I did not know that sold state drives were related to a search by a sheriff’s department or to Lenovo.

image

The order of the results is determined by the amount of time a user spends on a page. This is the “dwell time.”

Worth a look. A privacy centric European search system will have its supporters. The challenge, of course, is that Google dominates Web search in Europe. What is Google’s market share? 80 or 90 percent? Perhaps European regulators can adjust this situation?

Stephen E Arnold, October 17, 2014

Gool.li Service Offline

October 16, 2014

This may be old news. We were updating out list of search engines and received an error from the service called Gool.li, a metasearch system. Our last check for this system was in January 2013. At that time the company’s Web site was online and an Android app was available. The name is a variant of the Arabic phrase for “tell me”. More information about the system is available in a nine deck slide presentation at this link.

As you may recall, the service used a panel-style interface or what the company called “cards design”. Each panel corresponded to particular types of content.

image

The system was described as delivering “knowledge as a service.” One interesting feature of the search results was a grouping of links by domains.

The company was based in Montréal and was a project of Al Akhawayn University. My search file suggests that the system architect may have been Jawad Jari and the service utilized Amazon Web services.

Web metasearch seems to be a harsh taskmaster.

Stephen E Arnold, October 17, 2014

Open Source Search and Kicking the Bukkit

October 15, 2014

There is a presentation “Kicking the Bukkit: Anatomy of an Open Source Meltdown” by Ryan Michela, a developer with experience in open source. Over several years, a game open source project rose and fell. I am not too interested in open source games. At the end of the Slideshare document, there are five reasons an open source game project failed.

Let me summarize these and encourage you to work through he full 55 slide deck. How many of these issues may have an impact on open source search systems. Keep in mind that commercial enterprises like Attivio and IBM make use of open source technology.

  1. Inclusion of decompiled code in an open source project
  2. License issues
  3. Ties ups within the community before a project gains momentum
  4. No contributor license agreement
  5. Disgruntled developers in the community.

The presentation includes a quote that I noted:

It only takes one unhappy developer to kill an unprotected project.

Is there an open source search company vulnerable to one or more of these issues? I can name a couple. I wonder if the firm’s funding sources are concerned about their investment “kicking the bucket”?

Stephen E Arnold, October 15, 2014

Search and Deceptive Ads

October 15, 2014

Short honk: I read “Study Says Google, Yahoo And Bing Are Running ‘Deceptive’ Ads — And Regulators Are Doing Nothing To Stop It.”

I assume this statement is a surprise to some folks:

Now, disclosure text has become very small, and the shading very subtle, meaning users often don’t realize they are clicking through to ads rather than the most relevant result for their query.

In an increasingly important quest for revenue, these allegedly deceptive ads may be just the beginning of math club maneuvers. Relevance has a new meaning. Perhaps it is a synonym for revenue?

Stephen E Arnold, October 14, 2014

User Interface Design Search Engine Harder Than Google Engineering

October 15, 2014

Web site design used to be reserved for graphic designers with a fancy degree and background in computer science. Times have changed from the daunting trials of coding to simple click and drag selections. The advent of WordPress, Tumblr, Wix, Weebly, and Squarespace Web site design services simplify the process so anyone can create a decent site in seconds. If, however, you are interested in building a site that is more interactive than standard templates, then start taking advantage of UICloud.

UICloud is a user interface design search engine that plows through results and retrieves information geared specifically to your design needs.

“UICloud is a project created by Double-J Design. It collects the best UI element designs from the Internet all over the world and provides a search engine for you to find the best UI element that you need. We are aiming to create the biggest platform for designers to showcase their top user interface designs and for developer to get the best UI elements for their project easily and quickly.”

UICloud combines elements of Web site browsing and searching in one place. If you search for a specific topic, the results appear in thumbnails so you can preview the art. It takes advantage of the “magazine” format that’s grown popular. Categories are reminiscent of old webrings and link lists that used to collect related Web sites in one place. Categories are a neat feature, because it saves the trouble of searching and takes you straight to browsing. Remember how half the links used to be defunct? It is easy to see that happening.

Users can submit their user interface design to UICloud and then it will be added to the search results. All the listings might not be under the creative commons agreement. The UICloud team notes that you need to check with the artist before you use them.

Whitney Grace, October 15, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Google: A Magic Leap

October 14, 2014

Is it true that if one bets on enough race horses, one will win? Seems logical to those who hang out at Churchill Downs I suppose.

Miles away from the race track, I found the audiences I addressed at last week’s intelligence and law enforcement conference skeptical of Google’s search results. In 2013, there was more surprise when I demonstrated how queries for “twerk” did not involve too much “search.”

After the sessions, attendees commented on how much work is required to ferret out relevant results to queries. The notion that LE and intel professionals had to learn command line syntax to get useful information was a situation I did not think would arise. Hey, Google has smart software, artificial intelligence, the world’s fastest search engine, yada yada.

Searching Google is actually difficult if one wants to answer certain types of questions; for example, who in Scotland sells tactical shotgun silencers. Give that query a whirl in your spare time.

I read “Google Set to Lead Huge Investment in Magic Leap and Its “Cinematic Reality”. The write up provides a surprisingly poignant glimpse of into the Google business machine. Google is no longer content to borrow notions like head mounted mobile phones. Google wants to lift beyond big balloons which rhymes with “loons.” Google does not want to solve death.

Nope. Google is betting that it can invest in companies and tap into a new, swelling revenue stream. Search, it seems, has become an optimization task for the Googlers. The future lies in “cinematic reality.”

Google wants to be the lead elephant in the investing parade for Magic Leap. You can work through the original document to get a sense of the Fancy Dan augmented reality technology Magic leap allegedly possesses.

My view is that Google has to find a way to sustain revenue growth. Search is not the prize winning stallion it once was. I assume that Google believes that investments in companies that deliver magic will produce big bucks.

For me, I am concerned that the utility of the Google search system will continue to decrease for the types of research I do. If the feedback I received from LE and intel professionals is representative, there are a number of serious individuals who want a Google search to return relevant results, not ads and promotions for Google products and services.

I am all for magic, but magic involves tricks. Search requires more than wild bets and a faith in magic.

I do not crave a more realistic three dimensional experience. I am okay with a system that:

  1. Includes useful content in an accessible interface. Google’s convoluted blog search is not what I call accessible.
  2. Presents results that are in line with needs of the user, not the needs of the advertiser.
  3. Provides more frequently refreshed indexes for pages with content that are not focused on Dancing with the Stars, vacations, and hotels.

I want some of that old time search magic. Maybe a futuristic, robotic pony clone will make Google billions. I prefer a search donkey that gets the job done. Onward, precision and recall.

Stephen E Arnold, October 14, 2014

Profile Engine: Sort of Finding the Forgotten

October 13, 2014

In the supplemental lecture added after the intel conference ended, I addressed the topic of disappearing content. The “right to be forgotten” is one of the great ideas emerging from government committees. I wonder who wants to be forgotten? I provided some basic information about finding information about these forgotten entities.

One of the attendees at my lecture alerted me to Profile Engine. I navigated to the link and learned:

Profile Engine is a fairly low-budget-looking search engine, started in 2007 in New Zealand and partly owned by the Auckland University of Technology. It allows you to find people on social networks. Google has been getting a lot of requests to reverse this trend—almost 3,300 results from Profile Engine have been taken down by Google since May, when the “right to be forgotten” came into effect.

You can find Profile Engine at http://profileengine.com/. We can’t endorse the system, but we will check it out, and I will have an update for my next lecture. Conference organizers extend invitations via email. If you don’t hear about an event, you need to get yourself unforgotten. That’s a bit of humor for this Monday morning.

Stephen E Arnold, October 13, 2014

SRCH2: Security and Speed

October 12, 2014

Oracle’s Secure Enterprise Search offered advanced security. Perfect Search stressed its speed. SES has been marginalized. That particular security pitch did not work. Perfect Search also has faded from the scene.

Perhaps pitching both security and speed will yield more together than as separate features.

SRCH2 asserts that it is four times faster than open source search engines. None of the open source search engines is a speed demon. Speed boosts require additional work on the specific subsystem introducing the latency for a particular deployment.

SRCH2’s “Real Time Computer Requires Faster Search” makes a case for the optimization built in to SRCH2’s system. The article states:

SRCH2 offers the world’s fastest search engine. Why is speed so important? After all, the human eye can’t detect the difference between a 10-millisecond and 50-millisecond response time.

Some data backing this assertion would be helpful. In a direct comparison of Lucid Works’ technology with ElasticSearch’s technology, the ArnoldIT team found that one was faster in indexing and the other was faster in query processing. Both could be improved with focused optimization. Perhaps SRCH2 will share some of their data which backs up the “four time faster claim? (I am not at liberty to release the performance data a client requested my team compile from live tests on my test corpus.

SRCH2’s “SRCH2 Introduces Access Control Lists to Improve Search Security.” The article states:

SRCH2 took the approach of providing native support of access control to set restrictions on search results. With SRCH2’s ACL feature, developers can restrict user permissions to access either certain records in an index, or specific attributes within a record or set of records.

The approach is useful. However, it is less robust that the Oracle approach which implemented a wider range of features provided by specialized Oracle subsystems.

Will the combination of security and speed pay off for SRCH2? Good question. I do not have an answer.

Stephen E Arnold, October 11, 2014

Next Page »