Could Xyggy Be the Future of Search?
May 19, 2010
The internet is so amazing… we can find information and answers to almost any query we can think of. Sure… it may take time, hits and misses and trial and error… and cause enough frustration to raise blood pressure, but we can do it. That’s why we know more about everything today than anyone else in the world has ever known before us. Doesn’t necessarily mean we use that information better… we just know it… or where to find it. That’s because search does its ‘gee-whiz thing.’ And we are hooked.
However, whether we know it or not, we are lacking. “Search is nowhere near a solved problem,” says Amit Singhal Google Research Fellow. “Although I’ve been at this for almost two decades now, I’d still guess that search isn’t quite out of its infancy yet. The science is probably just about at the point where we’re crawling. Soon we’ll walk. I hope that in my lifetime, I’ll see search enter its adolescence.”
That’s kind of a surprise because for us users, Google, Bing, Yahoo! and the like bring us hundreds of thousands of answers to many of our queries. Who could want for more?
Ah, there’s the key. We couldn’t want for more… but we could be more satisfied with what we get. Who can use 60.3 million results in .18 seconds—the actual Google return for the term ‘search engine’? How about the top 20 results that are really what we want? How about a search tool that better fits smart phone capabilities? (See ATT offers a clue as to where search is going) For 40 years now, because of technology limitations, we have been limited to text-driven search. It has been great… but that won’t cut it for tomorrow.
So who has a better idea? Enter Xyggy (say Ziggy), a comparatively new player in the search field with something very different. Founded in June 2008 by Dinesh Vadhia, its originator and CEO, Xyggy’s tagline is: find anything… and, most simply stated, that’s exactly what it’s all about.
Vadhia explains: “In our everyday lives we search for and find things all the time where the things–or if you prefer, call them objects or items–can literally be anything. Xyggy brings item-search into our digital lives where the items can be documents, web pages, images, social profiles, ads, audio, articles, video, investments, patents, resumes, medical records… in fact, anything ranging from text to non-text.
Social Networks, Testosterone, and Facebook
May 13, 2010
In my Information Today column which will run in the next hard copy issue, I talk about the advantage social networks have in identifying sites members perceive as useful. Examples are Delicious.com (owned by Yahoo) and StumbleUpon.com (once eBay and now back in private hands).
The idea is based in economics. Indexing the entire Web and then keeping up with changes is very expensive. With most queries answered by indexing a subset of the total Web universe, only a handful of organizations can tackle this problem. If I put on my gloom hat, the number of companies indexing as many Web pages as possible is Google. If I put on my happy hat, I can name a couple of other outfits. One implication is that Google may find itself spending lots of money to index content and its search traffic starts to go to Facebook. Yikes. Crisis time in Mountain View?
It costs a lot when many identify important sites and the lone person or company has to figure everything out for himself or herself. Image source: http://lensaunders.com/habit/img/peerpressuresmall.jpg
The idea is that when members recommend a Web site as useful, the company getting this Web site url can index that site’s content. Over time, a body of indexed content becomes useful. I routinely run specialized queries on Delicious.com and StumbleUpon.com, among others. I don’t run these queries on Google because the results list require too much work to process. One nagging problem is Google’s failure to make it possible to sort results by time. I can get a better “time sense” from other systems.
When I read “The Big Game, Zuckerberg and Overplaying your Hand”, I interpreted these observations in the context of the information cost advantage. The write up makes the point via some interesting rhetorical touches that Facebook is off the reservation. The idea is that Facebook’s managers are seizing opportunities and creating some real problems for themselves and other companies. The round up of urls in the article is worth reviewing, and I will leave that work to you.
First, it is clear that social networks are traffic magnets because users see benefits. In fact, despite Facebook’s actions and the backlash about privacy, the Facebook system keeps on chugging along. In a sense, Facebook is operating like the captain of an ice breaker in the arctic. Rev the engines and blast forward. Hit a penguin? Well, that’s what happens when a big ship meets a penguin. If – note, the “if” – the Facebook user community continues to grow, the behavior of the firm’s management will be encouraged. This means more ice breaker actions. In a sense, this is how Google, Microsoft, and Yahoo either operated or operated in their youth. The motto is, “It is better to beg for forgiveness than ask for permission.”
Azure Chip Search Consultants and the Goose
May 10, 2010
Remember the seventh grade. Charles Dickens, his Tale of Two Cities, this quotation:
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way- in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only?
Setting: I am at a fancy reception, standing in a corner, tucking my tail feathers beneath me. I shivered in fear as I watched the attendees mingle. To my surprise, an azure chip consultant approached me. I had two additional azure chip encounters, but I will encapsulate my observations into this single ur-azure trope. I even have a logo I envision when I think about the azure chip search consulting crowd.
Source: http://media.photobucket.com/image/losers/lupe318/LOSER.jpg
Conflict: The topic was, “Why are you picking on me / us?” As readers of this * free * (and “free” is an operative adjective) Web log, I comment on the “findings” of azure chip consultants.
Search as Oil Slick or Volcanic Ash
May 3, 2010
I had a conversation with a person familiar with enterprise search. In the course of the ebb and flow, a metaphor surfaced, and I wanted to capture it before it slipped away.
The idea is that an environmental event or a human action can trigger big consequences. Anyone trying to get from Europe on April 16, 2010, learned quickly about ash plumes. Now the unlucky residents of the US Gulf Coast have an opportunity to understand the diffusion pattern of an oil release.
What’s this have to do with search?
The idea which struck me as interesting is that search is now having a similar impact on activities, processes, and ecosystems far removed from ground zero. I am not able to accomplish much of my “work” unless I can locate the program, file, information, and data I need. I don’t really do anything with physical objects. I live in a world of data and constructs built upon information. Sure, I have a computer and keyboard, and without those hardware gizmos, I would be dead in the water or maybe a sea of red ink?
The search eruption. Source: http://www.liv.ac.uk/science_eng_images/earth/research/VolcanicAsh.jpg
Search is now disappearing in some organizations, absorbed into other applications. One way to describe this shift is to use the phrase “search enable application”. Another approach is to talk about search as a utility or an embedded service.
The Courier Journal and Winning Horse Races
May 2, 2010
Post-Derby day. Sunday newspaper day. Depressing, and it is only 9 am.
A near miss in New York City excited the NPR news team this morning (May 2, 2010). Nary a word about Greece, Spain, and Portugal, however. To get the details, I had to fire up my laptop and check out online news sources.
I walked to the end of my driveway to retrieve the Courier-Journal, where I used to work. I also picked up my home delivery copy of the New York Times. The NYT was wet because the blue plastic bag was not closed, so water happily nestled in the newsprint. I could tell at a glance that the NYT closed before the problem in Times Square was news. I tossed the paper aside to dry.
The C-J was the ad section and the soft features. No front page. What was delivered dripped water on the kitchen floor. My wife told me to sort the newspaper in the garage. Fun. The Derby was yesterday and I was curious about the coverage of the event. Despite the nose dive in the original content in the C-J over the last 20 years, reporters do hoof and gallop around the Derby in search of “stories”. Well, mostly it is “who got rich,” “who showed up”, and “who got in trouble”. No joy. A call to the C-J’s hotline triggered a recording that told me there were production problems with the Sunday paper. No big deal. There’s online, Twitter, and Facebook. The story was online here “Production Problems Prevent Delivery of Full Sunday Courier Journal.” I wonder if there were cutbacks and efficiencies applied that made one of the highest circulation editions of the year fail? Like aircraft maintenance, no one knows what shifts have been made until the toilets don’t work, the flights can’t leave the gate, or the pilot reports a “slight issue and some paperwork”.
The one section of the C-J that showed up is called “Forum”, and what do you know? The front page of section H for Sunday, May 2, 2010, ran a story with this headline: “Rethink: Newspapers are better off than you may think.” The author is a fellow named Arnold Garson, whom I don’t know. His picture shows a kindly visage in dark suit with red tie and the slug: “The Courier Journal remains a strong and credible local news provider and a profitable business today.”
Since my Sunday paper was missing the front page, the sports section, and some other bits, I am not on board with the assertion about “a strong and credible local news provider.” I think the “profitable business” part is really the point.
I read the article, which purports to be the text of a speed delivered on April 7, 2010, to the Downtown Kiwanis Club meeting. The article is a long piece, running about 80 column inches. If Mr. Garson read this speech, I am delighted I was not in attendance.
Summarizing the talk is easy: C-J makes money, reaches more than 85 percent of the readers, and makes money. Oh, I repeated myself. Sorry, but that point jumps out a couple of times in the text of the talk.
I noted some other highlights as well:
- The C-J is performing better than other newspapers; that is, “less bad” is “good”
- Delivery of the hard copy to “outlying areas” has been trimmed
- Ad rates and subscription prices are going up
- TV news viewers are older than C-J newspaper readers
- A 100 million people read newspapers.
You get the idea.
The C-J’s local Web site attracts 1.3 million unique users per month and generates 16 million page views. The C-J has achieved 380,000 mobile impressions per month. That’s good. The questions I had were:
- What’s a “unique”? What’s a page view” What’s a mobile impression?
- How does this compare with Facebook’s 400 million users in early 2010, up from 150 million in early 2009?
- What’s the relationship between circulation decline and uptake of the C-J’s Web site?
I could crank out more questions, but I want to jump to the wrap up of the talk. This is the assertion I find most interesting:
Ninety-nine percent of the nation’s newspapers, including The Courier-Journal will survive this recession based on our own core strengths, our determination to transform our business model and through the lift we will get from the recovery itself.
I am not sure how to make the leap from 99 percent survivability to “our own core strengths”. The core strengths seem to be advertising. I am not convinced the C-J does much local news. I understand determination. The assertion about the recovery seems to be a “maybe” argument. But it is tough to get coverage of the European financial crisis based on my reading the C-J every day. I have to turn elsewhere for that information.
Why do I think the talk is baloney? First, I fund the Seed2020 meet ups for women- and minority-owned businesses. I know that none of the more than 20 companies featured in the meet ups since November 2009 have been covered in the C-J. A couple of these businesses are real stories with solid news value. Nope. No coverage. One can argue that the weakening Business First, American Cities Business Journal publication is taking up the slack. Nope. The Seed2020 events show that there are solid news stories that are just not covered. I find the C-J argument on ground as muddy as the race track yesterday.
Second, without the C-J’s front page or the coverage of the NYT event in Times Square, I question the value of the newspaper as a timely source of information. Traditional deadlines and production problems underscore the irrelevance of the “business model” that will keep 99 percent of the newspapers in business. Mr. Garson does not provide any reference points for the number of newspapers in business in 1900, the number in 2000, and the number today. I do touch on this issue in Google: The Digital Gutenberg and won’t repeat the decline, consolidation, and homogeneity referenced in my monograph.
Third, the folks I know who are 55 and younger are not into newspapers. I watched how my son’s friends, now in their 30s, looked at the sports pages and their iPads and Macbooks. They talked to one another, chatted on their mobile devices, and sent text messages. This behavior took place as we sat at the kitchen table. The newspaper was marginalized.
Bottom-line: Timeliness, medium, and business model are intermingling with the DNA of people who don’t find the hard copy newspaper relevant. The C-J’s Arnold Garson is putting a positive spin on a reality that does not exist in our household.
Of course, I live in one of those outlying areas in Kentucky. I can log on to Newsnow.co.uk and learn about Europe. I can check Craigslist.com for ads. I can scan my Twitter stream to learn about the horrific accident that took place at Highway 42 and Highway 841 at 6 45 am.
No C-J needed for that. And I used to work there. Big changes to which the C-J and papers like the NYT are struggling to adapt. Like the long shots in the Derby yesterday, only one horse won. In my opinion, the C-J and the NYT are both entering the media race next year with long odds. Just my opinion and it is as valuable as a tip at the track.
Stephen E Arnold, May 2, 2010
Unsponsored post.
Yahoo and Search Models
April 30, 2010
I received an email this morning pointing to the strong showing of Google at the recent Web conference in Raleigh, North Carolina. I responded that Yahoo continues to push forward with what seem to me academic-type initiatives. In terms of traffic and revenue growth, I am waiting for some real action to take place.
After writing the email to the person who pinged me about Yahoo, I read “Yahoo’s Search Model Developing a New Face.” Suddenly this morning it is a Yahoo renascence, at least for a few minutes. The SFGate story recycles the conference presentations. The idea in the write up struck me as a variant of publish or perish or publicity or perish.
The passage that caught my attention was:
[The Yahoo report] … found that people only spend about one-sixth of their online time performing searches. That compares with half of their time for browsing and one-third for communicating, according to aggregated data pulled from the Yahoo Toolbar, a downloadable browser feature that provides quick links to a user’s favorite content.
The research shows that people are “doing” things to find information. Yep, that’s search. The problem is that the word “search” is pretty much without meaning in my opinion. The reality is that the yammer about social networks is missing the obvious point; that is, some users prefer to rely on what those in their so-called network tell them, not what an ad choked, power besotted, marketing injected public Web search system tells them.
Search vendors and their research papers are in the ivery tower world. Interesting stuff, but it is not as relevant as traffic and money. Source: http://www.ivorytowerframes.com/3765/IVORYTOWERIMAGE11.jpg
How do I know?
First, look at the sudden shift from Web search to services like Facebook. Even Caterina Fake’s Hunch.com service is about finding information. True, it combines smart software with inputs from humans but it indicates the boundary condition for the phase change that is taking place.
American Style Management and Search
April 22, 2010
At lunch today (April 13, 2010), there was a brief discussion about the article “Will France Outlaw American-Style Macho-Management?” The main idea is that a French executive implemented some “American style” management tactics. The result was employee dissatisfaction and alleged suicides as a result of work pressure. The Europeans with whom I spoke were uniformly critical of American MBAs and their management styles. I have worked with managers from different countries. Some of these individuals were American trained executives and others were graduates of the school of hard knocks.
After lunch, I did some thinking about the search companies’ management styles. In general, I find that the most hard charging professionals are in the sales and marketing departments. The staff at these companies is usually lean with much of the work outsourced. My exchanges with senior managers has been pretty much in line with my dealings with senior executives in government agencies in the US and elsewhere and in non profit and charity organizations. Most of these professionals have a deep concern for the customer and staff. Knowledge of products and their underlying technologies may be a bit of a challenge for some senior managers, particularly those who must chase funds and sales. Keeping the lights on takes precedence over the nitty gritty details. When I hear the phrase “lost in the weeds”, my radar registers an intruder.
Most of the potholes that I identify as weaknesses in search come not from top management but from the methods of implementing certain technical functions. I also find that outsourcing causes a fair share of disruption as well. Toss in the excitement needed to make sales, squirt marketing juice into the gears, and upselling services, and I find a volatile mix. There is also quite a bit of confusion generated by consultants who describe many different vendors in glowing terms because these happy words sell reports and consulting work but not necessarily search or content processing systems.
Search management survival. Source: http://www.hhmi.org/images/bulletin/feb2009/survival_image.jpg
Several observations:
- The pressure to generate revenue leads to some of the issues that I encounter. One small company did not get its funding and the pressure on the executives is palpable. There are quite a few vendors competing for search contracts, and I think that the advantage will remain with the companies that have a high profile and benefits that make sense to the client. I don’t think it is possible to advertise, Twitter, and blog oneself into the big time in search. Clients don’t have the time to verify that a newcomer’s system works. Most deals go to companies that have a track record. Companies that don’t need to generate revenue from a search license may have an advantage because “price” drops out of the procurement equation in some cases.
- The PR firms handling search have a great pitch, but most of these outfits crash and burn in their approach to the subject of search. Examples range from copy that literally sounds like other vendors’ promotional material to muddling Intranet search, Web site search, and Web search. I receive email begging me to view a demo and to interview a CEO. I am not a journalist. If I took time to participate in each of these demos, I would have no time to write my Google monographs and support my handful of clients. I think I have made two PR people cry and earned the wrath of dozens of others because I tell them no, leave me alone, or do your homework. Sadly the appeals to me are increasing.
- The potential licensees of a search system are increasingly confused. When I wrote the first edition of the Enterprise Search Report in 2003, I had a tough time explaining the differences between a couple of dozen vendors. If I were to tackle that type of project in 2010, I am not sure I could do the job as effectively as I did six or seven years ago. The reason is that some of the major vendors are increasingly alike. This gravitation to a common set of functions is partly the result of some leading firms buying other companies and partly because traditional search is becoming a commodity. The specialized systems steer clear of enterprise search and sell directly to the executive who needs this function. Examples range from a customer support system to a warranty analysis system to an eDiscovery system. In each case, a specific unit of an organization has a content problem to solve. Search is part of a broader solution.
- The new frontier in my opinion merges finding information, using it in a business process, and making specialized functions available to users. Examples include business intelligence, report generation, email alerts and notifications, and other features that may not look like search at first glance.
The Seven Forms of Mass Media
April 21, 2010
Last evening on a pleasant boat ride on the Adriatic, a number of young computer scientists to be were asking about my Google lecture. A few challenged me, but most seemed to agree with my assertion that Google has a large number of balls in the air. A talented juggler, of course, can deal with five or six balls. The average juggler may struggle to keep two or three in sync.
One of the students shifted the subject to search and “findability.” As you know, I floated the idea that search and content processing is morphing into operational intelligence, preferably real-time operational intelligence, not the somewhat stuffy method of banging two or three words into a search box and taking the most likely hit as the answer.
The question put to me was, “Search has not kept up with printed text, which has been around since the 1500s, maybe earlier. What are we going to do about mobile media?”
The idea is that we still have a difficult time locating the precise segment of text or datum. With mobile devices placing restraints on interface, fostering new types of content like short text messages, and producing an increasing flow of pictures and video, finding is harder not easier.
I remembered reading “Cell Phones: The Seventh Mass Media” and had a copy of this document on my laptop. I did not give the assertion that mobile derives were a mass medium, but I thought the insight had relevance. Mobile information comes with some interesting characteristics. These include:
- The potential for metadata derived from the user’s mobile number, location, call history, etc
- The index terms in content, if the system can parse information objects or unwrap text in an image or video such as converting an image to ASCII and then indexing the name of a restaurant or other message in an object
- Contextual information, if available, related to content, identified entities, recipients of messages, etc.
- Log file processing for any other cues about the user, recipient(s), and information objects.
What this line of thinking indicates is that a shift to mobile devices has the potential for increasing the amount of metadata about information objects. A “tweet”, for instance, may be brief but one could given the right processing system impart considerable richness to the information object in the form of metadata of one sort or another.
The previous six forms of media—[I] print (books, magazines, and newspapers), [II] recordings; [III] cinema; [IV] radio; [V] television; and [VI] Internet—fit neatly under the umbrella of [VII] mobile. The idea is mobile embraces the other six. This type of reasoning is quite useful because it gathers some disparate items and adds some handles and knobs to the otherwise unwieldy assortment in the collection.
In the write up referenced above, I found this passage interesting: “Mobile is as different from the Internet as TV is from the radio.”
The challenge that is kicked to the side of the information highway is, “How does one find needed information in this seventh mass media?” Not very well in my experience. In fact, finding and accessing information is clumsy for textual information. After 500 years, the basic approach of hunting, Easter egg style, has been facilitated by information retrieval systems. But I think most people who look for information can point out some obvious deficiencies. For example, most retrieval systems ignore content in various languages. Real time information is more of a marketing ploy than a useful means of figuring out the pulse count for a particular concept. A comprehensive search remains a job for a specialist who would be recognized by an archivist who worked in Ephesus’ library 2500 years ago.
Are you able to locate this video on Ustream or any other video search system? I could not, but I know the video exists. Here is a screen capture. Finding mobile content can be next to impossible in my opinion.
When I toss in the radio and other rich media content, finding and accessing pose enormous challenges to a researcher and a casual user alike. In my keynote speech on April 15, 2010, I referenced some Google patent documents. The clutch of disclosures provide some evidence that Google wants to apply smart software to the editorial job of creating personalized rich media program guides. The approach strikes me as an extension of other personalization approaches, and I am not convinced that explicit personalization is a method that will crack the problem of finding information in the seventh medium or any other for that matter.
Here’s my reasoning:
- Search and retrieval methods for text don’t solve problems. The more information processed means longer results lists and an increase in the work required to figure out where the answer is.
- Smart systems like Google’s or the Cuil Cpedia project are in their infancy. An expert may find fault with smart software that is actually quite stupid from the informed user’s point of view.
- Making use of context is a challenging problem for research scientists but asking one’s “friends” may be the simplest, most economical, and widely used method. Facebook’s utility as a finding system or Twitter’s vibrating mesh may be the killer app for finding content from mobile devices.
- As impressive as Google’s achievements have been in the last 11 years, the approach remains largely a modernization of search systems from the 1970s. A new direction may be needed.
The bright young PhDs have the job of figuring out if mobile is indeed the seventh medium. The group with which I was talking or similar engineers elsewhere have the job of cracking the findability problem for the seventh medium. My hope is that on the road to solving the problem of the new seventh medium’s search challenge, a solution to finding information in the other six is discovered as well.
The interest in my use of the phrase “operational intelligence” tells me one thing. Search is a devalued and somewhat tired bit of jargon. Unfortunately substituting operational intelligence for the word search does not address the problem of delivering the right information when it is needed in a form that the user can easily apprehend and use.
There’s work to be done. A lot of work in my opinion.
Stephen E Arnold, April 20, 2010
No sponsor for this post, gentle reader.
Google and Disruption: Will It Work Tomorrow?
April 15, 2010
Editor’s Note: The text in this article is derived from the notes prepared by Stephen E Arnold’s keynote talk on April 15, 2010. He delivered this speech as part of Slovenian Information Days in Portoroz, Slovenia.
Thank you, Mr. Chairman. I am most grateful for the opportunity to address this group and offer some observations about Google and its disruptive tactics.
I started tracking Google’s technical inventions in 2002. A client, now out of business, asked me to indicate if “Google really had something solid.”
My analysis showed a platform diagram and a list of markets that Google was likely to disrupt. I captured three ideas in my 2005 monograph “The Google Legacy“, which is still timely and available from Infonortics Ltd. in Tetbury, Glos.
The three ideas were:
First, Google had figured out how to add computing capacity, including storage, using mostly commodity hardware. I estimated the cost in 2002 dollars as about one-third what companies like Excite, Lycos, Microsoft, and Yahoo and were paying.
Second, Google had solved the problem of text search for content on Web pages. Google’s engineers were using that infrastructure to deliver other types of services. In 2002, there were rumors that Google was experimenting with services that ranged from email to an online community / messaging system. One person, whose name I have forgotten, pointed out that Google’s internal network MOMA was the test bed for this type of service.
Third, Google was not an invention company. Google was an applied research company. The firm’s engineers, some of whom came from Sun Microsystems and AltaVista.com, were adepts at plucking discoveries from university research computing tests and hooking them into systems that were improvements on what most companies used for their applications. The genius was focus and selection and integration.
Google is an information factory, a digital Rouge River construct. Raw materials enter at one end and higher value information products and services come out at the other end of the process.
In my second Google monograph, funded funded in part by another client, I built upon my research into technology and summarized Google’s patent activities between 2004 and mid 2007. Google Version 2.0: The Calculating Predator, also published by Infonortics Ltd., disclosed several interesting facts about the company.
Mindbreeze Goes Mobile
April 2, 2010
Fabasoft has rolled out a new add-on to allow licensed users to search via a smartphone or other mobile device.
I spoke with Michael Hadrian, the managing director of Fabasoft Distribution in Linz, Austria. Fabasoft is the holding company of Mindbreeze enterprise search system. In that conversation, I picked up two interesting insights into the Fabasoft Mindbreeze push into the market for enterprise search.
Mindbreeze Enterprise Mobile result list.
First, the Mindbreeze search technology, recently profiled in a consultant’s report, is now available as a cloud-based service. The idea is to shift from an on-premises installation to one that Fabasoft / Mindbreeze can provision and operate from the cloud. Mr. Hadrian told me, “The major benefits are achieving business related results faster and reducing the burden on an organization’s internal information technology resources.”
Second, a Mindbreeze licensee gains access to the company’s mobile interface. The idea is that a worker, regardless of his / her location, can use the Fabasoft Mindbreeze products to locate information in a wide range of sources processed by the Fabasoft Mindbreeze Enterprise system. These range from the standard Microsoft Office file types to more proprietary repositories such as those used by Lotus Domino / Notes customers.
A mobile search metadata display.